Researchers at Carnegie Mellon University’s Robotics Institute have developed a computer method that understands the body poses and movements from multiple people from video in real time.
The method was developed with Panoptic Studio, a two-story dome embedded with 500 video cameras. The system also can understand the pose of each individual’s fingers, for what the researchers claim is the first time. The experiment shows that it is possible to detect the pose of a group of people using a single camera and a laptop computer.
Researchers say this could open up new ways for people and machines to interact with each other and for people to use machines to better understand the world around them. For example, recognizing hand poses may make it possible for people to interact with computers in new and natural ways such as simply by pointing at things.
Use cases for this technology could be nuances of nonverbal communication between individuals and robots, or self-driving cars getting an early warning that a pedestrian is about to step into the street by monitoring body language. It could also be used for machines to understand human behavioral diagnosis and rehabilitation for conditions such as autism, dyslexia and depression.
“We communicate almost as much with the movement of our bodies as we do with our voice,” says Yaser Sheikh, associate professor of robotics at Carnegie Mellon University. “But computers are more or less blind to it.”
Tracking multiple people in real time presents a number of challenges because simple programs don’t track an individual well in a group. It becomes even more of a problem with hand detection. Because people use their hands to hold objects and make gestures, cameras are unlikely to see all parts of the hand at the same time.
“A single shot gives you 500 views of a person’s hand, plus it automatically annotates the hand position,” says Hanbyul Joo, a Ph.D student in robotics at Carnegie. “Hands are too small to be annotated by most of our cameras, however, so for this study we used just 31 high-definition cameras, but still were able to build a massive data set.”