Tuesday, May 28, 2013
Robotics Institute Faculty Talk - Kris Kitani
Understanding Human Activity from Video: Action Primitives, Activity Syntax and Activity Forecasting
Kris Kitani, postdoctoral research fellow of the Robotics Institute at Carnegie Mellon University
May 28, 2013, 10:00 a.m.
Carnegie Mellon University, Gates Hillman Complex, Room 6115
Video-based human activity analysis is a multi-faceted area of computer vision research that aims to architect intelligent systems that are able to model and recognize human activities through persistent observation. I will present my work in three aspects of activity analysis: (1) discovering action primitives, (2) inferring activity syntax and (3) forecasting human activity. The task of discovering action primitives from a video stream arises when a model of human activity requires that observed human motion be decomposed into a set of (possibly latent) atomic action units. I will explain how my work has shown the importance of using the features that completely describe the action space. The task of inferring activity syntax (grammar) becomes necessary when an intelligent system must model human activities at multiple levels of abstraction. I will show that grammatical induction from a noisy stream of actions units can be formulated as noisy channel communication problem where the minimum description length principle can be used to discover an optimal activity grammar. The task of forecasting human activity arises in assistive scenarios where intelligent agents must be able to anticipate and respond to human activities. I will show the importance of modeling human activity in a decision-theoretic framework which takes into account the rich visual context of activity. In the latter portion of my talk I will describe my past and ongoing collaborative efforts at CMU to advance computer vision research in the domains of first-person vision, digital sports and pervasive computing.
Dr. Kris Kitani is currently a postdoctoral research fellow of the Robotics Institute at Carnegie Mellon University, working in the area of computer vision. He graduated with a BS in Electrical Engineering from the University of Southern California in 1999, whereafter he worked in the area of automated visual defect detection for semiconductor fabrication at KLA-Tencor (Japan headquarters). In 2005 and 2008, he was awarded a MS and PhD in Information Science and Technology from the University of Tokyo (supervised by Dr. Yoichi Sato). His PhD work on learning action primitives from video was awarded the Best Student Paper Award at the Meeting on Image Recognition and Understanding (MIRU) 2008, the largest computer vision conference in Japan, and the Best Journal Paper Award from the Institute of Electronics, Information and Communications Engineers (IEICE) in 2010. From 2008 to 2011, he worked as an assistant professor at the University of Electro-Communications (UEC) Tokyo where he focused on the application of computer vision to interactive systems. He was also a visiting scholar in 2010 at the University of California at San Diego, where he worked with the National Federation of the Blind on image sonification.
During his time at CMU, he has worked primarily in the area of human activity forecasting, integrating optimal control and computer vision techniques to predict human activities before they happen. His work on activity forecasting was awarded the Best Paper Honorable Mention Award at the European Conference on Computer Vision (ECCV) in 2012. Dr. Kitani has also worked closely with the University of Tokyo, UEC Tokyo, NEC, NIKON and SAMSUNG. Recently, the BallCam project (video stabilization for fast spinning cameras), a joint collaboration with UEC Tokyo, has been featured in the popular press such as the Wall Street Journal, Popular Mechanics and ABC news.
Personal Homepage: Krist Kitani - http://www.cs.cmu.edu/~kkitani/Top.html