Household robots could soon interpret human actions and battlefield automatons might understand their comrades’ movements.
Researchers from Cornell University, US, are using Microsoft's Kinect motion-capture gaming technology to create robots that interpret sequences of movements as specific activities. “If a robot is to work with humans, it needs to know what the human is doing," says Ashutosh Saxena from Cornell. "We want to build robots that can observe people in their daily life so that it can help them.”
To achieve that, the researchers recorded four people performing 12 household routines, from chopping onions to drinking water, using Kinect, which uses infrared depth sensors to pick out the geometry of human bodies. Then, they developed machine-learning algorithms that interpret these activities.
Taking a tiered approach, small sub-routines, such as picking up a toothbrush or squeezing a tube, fit together logically to form higher-level procedures identified by the software - in this case, brushing one's teeth.
“During training, we provide the software with right and wrong answers, and it learns mappings from raw data to sub-actions and actions,” explains Saxena. Provided with raw footage, the system picks out sub-routines by itself, which it uses to recognise over-arching activities in the future. “In the testing phase, the system figures out small actions, like moving your hand, and pieces them together to work out what higher-level actions are being performed,” says Saxena.
The system rapidly adapts to the behaviour of different people. When it has never seen a person, it identifies activities correctly on average 63 per cent of the time. But when the person has been observed before, that jumps to 84 per cent. These results will be presented at the Association for the Advancement of Artificial Intelligence conference in San Francisco, US, in August.
Other researchers are enthusiastic about the work. “I like the learning methods they’re using,” says Alonso Patron-Perez from the University of Oxford, “and the inclusion of information provided by Kinect.” But he also points out that, as with any new technology, there is room for development. “Knowing about context - where people are or what they’re holding - could improve the performance,” he suggests.
The software could also be used with lidar (light detection and ranging) data, points out Saxena, making it useful outdoors and over long distances - ideal for defence applications. Indeed, the Cornell team is using similar software to identify military gestures, which could help battlefield robots understand soldiers' actions.
Regardless of application, this work signals a new era of more perceptive robotics. “Robots can see what people are doing,” says Saxena. “And that means they can change the environment in the way humans want them to.”
Read more on Human Activity Detection from RGBD Images