Computer Vision Takes Off

By Mark Roth

Media Inquiries

Shilo Rea

Director of Public Relations
shilo(through)cmu.edu
412-268-6094

Computer vision has exploded over the past five years, and it is now able to identify objects with uncanny accuracy, leading to advances in everything from surveillance cameras to self-driving vehicles.

Michael J. Tarr, head of the Psychology Department at Carnegie Mellon University, said there are two principal reasons for the rapid advances in computer vision, which uses artificial intelligence to interpret and process the scenes viewed by cameras and other devices. First, because of the web, millions of images have now been labeled, allowing robotic vision systems to train themselves in how to identify what’s in a scene, using a form of artificial intelligence known as deep learning.

Second, a new generation of graphics processing units, or GPUs, originally developed for the video gaming industry, has enabled much faster learning and identification of images. Also, the processing architecture used by deep networks mimics the human visual system, even to the point of apportioning the networks’ layers so they mirror the arrangement of functional brain areas humans use to see.

Tarr and CMU Professors Deva Ramanan and Katerina Fragkiadaki recently analyzed the advances and challenges in robotic vision at a seminar on campus.

Researchers also have made significant progress in computer vision because, as complicated as it is, the relationship between images and their contents is often very clear.

“In contrast, the nuances of the structure of language are very intricate,” Tarr said. “Think about the number of different ways you can construct a sentence with the same meaning, changing the order of the words or which words you use to refer to the same concept within a sentence.”

Fragkiadaki, assistant professor of machine learning, believes that while robotic vision has become very good at identifying objects in still images, its ability to do that with videos is more of a challenge. In videos, she said, not only is the scene changing over time, but computer vision systems do not currently understand the physics of movement in the way that even an infant does.

“Infants right in the first few months of life understand that even when objects disappear from view, they are still there, and what it means for some object to be occluded behind another object,” Fragkiadaki said. “What we’re missing right now in AI is this reasoning ability. Currently, our machines are very good at labeling. What we are trying to do now is to teach them to imagine. What is behind that chair? What is about to happen?”

Another goal she and Ramanan, associate professor in the Robotics Institute, have is to enable computer vision to learn with fewer examples of labeled images.

Computer vision programs today typically require a significant amount of data to learn how to identify objects, but humans “don’t need 10,000 examples of an elephant to know what an elephant is,” said Ramanan.

Ramanan is intrigued by the possibility of using animated simulations to help train robotic vision systems to handle important but rare situations, like when a child runs out into the street in front of a self-driving vehicle.

“It’s hard to collect training data [in the real world] where children play in the street or cars run red lights, but obviously that is super important to understand, so that’s where simulation could really help,” he said.

Simulations might also help robotic vision systems learn how to interpret humans’ intentions.

“At a four-way stop, there are actually complex interactions going on among drivers,” Ramanan said. “There is a kind of subtle protocol of who will go first. If we lived in a world where all cars were self-driving, they could use internal rules and signaling to interact with each other. But if you have some human-operated cars at the intersection, understanding the drivers’ intentions and goals is really important.”

Tarr noted one recent example of how computer vision systems still lack that ability.

“I recently saw a self-driving Uber car stuck at a busy intersection near the CMU campus. It was stuck there because people kept jaywalking in front of it when the light was the wrong color, and it clearly didn’t know how to deal with the human social capacity of how to negotiate at an intersection. For that you need to have a theory of mind and a theory of negotiating. The car could drive beautifully in terms of all the road hazards, but it didn’t know how to deal with the people,” Tarr said.

He also feels that two of the next breakthroughs that will come from computer vision capability are in-home assistants and health care devices.

Amazon recently added a camera to its Echo voice-activated home assistant, and eventually, Tarr thinks robotic visions systems in the homes of older people may help them stay healthy and safe later into their lives, doing everything from reminding them when to take pills, to helping them purchase items they need.

Experimental computer vision systems also are beginning to equal or better the abilities of doctors in some cases. One emerging area is using robotic vision to scan moles and other skin abnormalities to see if they might be cancerous.

“Imagine instead of going to a dermatologist that one day, you have a body scanner and you strip down and stand in it and it does a full dermatological exam much better than anything else could,” Tarr said.

The recent advances made in both computer vision and language systems hold out realistic hope for a new future.

“I think the AI revolution has been promised several times before and has failed to materialize, but I think this time it’s real. We have for the first time practical AI that will become embedded in almost everything in our lives. It will be in our refrigerators, our toasters, our house, our car, the crosswalk and the checkout machines at Giant Eagle. You’ll be dealing with a lot of smart machines that will do a lot of things for you in your daily life,” said Tarr.

Computer vision is one of the many brain research breakthroughs at Carnegie Mellon. CMU has created some of the first cognitive tutors, helped to develop the Jeopardy-winning Watson, founded a groundbreaking doctoral program in neural computation, and is the birthplace of artificial intelligence and cognitive psychology. Building on its strengths in biology, computer science, psychology, statistics and engineering, CMU launched BrainHub, an initiative that focuses on how the structure and activity of the brain give rise to complex behaviors.