Robots are able to learn motor tasks through trial and error after researchers have developed algorithms to mimic the learning process of humans.
Researchers at the University of California, Berkeley demonstrated their technique by having a robot complete various tasks like putting a clothes hanger on a rack, assembling a toy plane or screwing a cap on a water bottle without pre-set details about its surroundings.
“What we're reporting on here is a new approach to empowering a robot to learn,” said Professor Pieter Abbeel of UC Berkeley's Department of Electrical Engineering and Computer Sciences.
“The key is that when a robot is faced with something new, we won't have to reprogram it. The exact same software which encodes how the robot can learn was used to allow the robot to learn all the different tasks we gave it.”
Traditionally, to help a robot make its way through a 3D world, it has to be pre-programmed to handle various scenarios.
But the UC Berkeley researchers turned to a new branch of artificial intelligence known as deep learning, which is loosely inspired by the neural circuitry of the human brain when it perceives and interacts with the world.
“For all our versatility, humans are not born with a repertoire of behaviours that can be deployed like a Swiss army knife, and we do not need to be programmed,” said postdoctoral researcher on the project Sergey Levine.
“Instead, we learn new skills over the course of our life from experience and from other humans. This learning process is so deeply rooted in our nervous system, that we cannot even communicate to another person precisely how the resulting skill should be executed.”
In AI deep learning, programs create ‘neutral nets’ in which layers of artificial neurons process overlapping raw sensory data, either as sound waves or image pixels. This helps the robot recognise patterns and categories among the data it is receiving.
However, applying deep reinforcement learning to motor tasks is more difficult since the task goes beyond the passive recognition of images and sounds.
“Moving about in an unstructured 3D environment is a whole different ballgame,” said PhD student and researcher Chelsea Finn.
“There are no labelled directions, no examples of how to solve the problem in advance. There are no examples of the correct solution like one would have in speech and vision recognition programs.”
The team worked with a Willow Garage Personal Robot 2 (PR2), now called BRETT. The algorithm controlling the robot’s learning included a reward function that provided a score based upon how well the robot was doing with the task.
Movements that bring the robot closer to completing the task will score higher than those that do not. The score feeds back through the neural net, so the robot can learn which movements are better for the task at hand.
This type of training allows the robot to learn on its own, as the algorithm calculates good values for the 92,000 parameters of the neutral net it needs to learn.
When BRETT receives relevant coordinates for the beginning and end of the task, it could, with this approach, master a typical assignment in 10 minutes. Without the coordinates the learning process takes about three hours.