Recognition-Quality of Life Technology Center - Carnegie Mellon University


The goal of the project is to develop the techniques necessary for understanding the user’s environment based on sensor data. This includes techniques for localizing the user in the environment, to build geometric models of the user’s environment, to identify specific objects in the environment, and to understand the activities of others in the user’s environment.

Techniques for detecting events, such as those relevant to social interactions, from sensor data are important building blocks for many QoLT scenarios. For example, detecting the motion of a person moving toward the user or trying to get the user’s attention would provide invaluable input to a variety of systems. Results from an associated project provided the foundation for the development of event detection techniques. Recent experimentation with these algorithms showed that they could succeed even on relatively low resolution videos, with moving cameras and occlusions between subjects. Most importantly, the algorithms operate on cluttered videos with multiple moving objects (e.g., crowds or street scenes) in which techniques relying on background subtraction or motion tracking would fail. Finally, the algorithms incorporate features to ensure generalizations to new subjects or motion patterns different from the ones recorded in the training video(s).

After identifying the fundamental technology, we focus on addressing its key shortcomings leading to algorithms that are validated on inside-out perception data and that include fundamental representational and algorithmic changes toward real-time operation. Specifically, we set out to address two major limitations of these techniques. 1) These techniques can be extremely computationally demanding. This may be acceptable in many applications in which video data is processed off-line, but, QoLT requires immediate feedback based on the visual observations. This motivated a substantial effort in developing new data structures and algorithms for real-time operation. At this time, we have obtained initial results showing the feasibility of real-time detection of common actions, included in complex scenes. Beyond relevance to QoLT scenarios, this is a major contribution to research in video analysis which has not yet addressed these computational issues (and has generated a submission to the most prestigious computer vision conference this year). 2) sing these techniques with inside-out data requires robustness to far greater levels of motion jitter, occlusions, and other environmental effects. At present, we have validated the use of the detection technology with our prototype sensors.


Our approach to recognizing events in videos is unique. Two other research groups use related approaches, but they have been demonstrated in limited contexts for movie indexing and for surveillance, with few results on detection from the user’s perspective. All of the techniques described in related work are computationally expensive, while we are developing novel representations and techniques toward real-time operation, which are not addressed adequately in existing work.

The project’s research directions are selected so that each of the research topics corresponds to a requirement of one or several of the long-term Families of Engineered Systems (FoES): Localization addresses visual impairment issues; environment modeling is necessary for intelligent mobility; object recognition is the required perception capability for scenarios in which grasping and manipulating objects are needed; and activity recognition and understanding of other agents in the user’s environment is critical in developing systems that operate in social environments.


Project Team

  • Martial Hebert, Lead
  • Barkin Aygun
  • Fernando de la Torre
  • Takeo Kanade
  • Hong-Wen Kang
  • David Changsoo Lee
  • Pyry Matikainen
  • Maladau Mou
  • Michael McCue

Example of event detection in videos

Recognition figure

[click on image to enlarge]