A new breed of robot

The real challenges faced by artificial cognitive system design

Designers of artificial cognitive systems (ACS) have, until recently, tended to adopt one of two approaches to thinking robots: classical rule-based artificial intelligence or artificial neural networks. However, a new breed of cognitive, learning robot developed through the European Union-funded project COSPAL (COgnitiveSystems using Perception-Action Learning) combines the best of both worlds.

The classical approach to artificial intelligence (AI) relies on a rule-based system, in which the designer supplies the knowledge and scene representations, obliging the robot to follow a set decision-making process.

Biologically inspired artificial neural networks (ANNs), on the other hand, rely on processing continuous signals and a non-linear optimisation process to reach a response, which, due to the lack of preset rules, requires developers to carefully balance the system constraints and its freedom to act autonomously.

The problem is that, used individually, these systems have major shortcomings when it comes to developing advanced ACS architectures. Classical AI cannot solve them if it has not been pre-programmed to do so, while ANN is too trivial to solve complex tasks.

To combat these shortcmings, researchers in the COSPAL project used ANN to handle the low-level functions based on the visual input their robots received, and then employed classical AI on top of that in a supervisory function. This fusion of systems enables the robots to explore the world around them through direct interaction, creating ways to act in it and controlling their actions in accordance. This harnesses the strengths of both approaches, creaming off the AI superiority in functions akin to human rationality, and ANN's superiority in performing tasks for which humans would use their subconscious - things like basic motor skills and low-level cognitive tasks.

The most important difference between the COSPAL approach and most of what had been the state of the art is that ACS is scalable. It is able to learn by itself and can solve increasingly complex tasks with no additional programming.

There is a direct mapping from the visual precepts to performing the action. With previous systems, if something in the environment changed that the low-level system was not programmed to recognise, it would give random responses, but the supervising AI process would not realise anything was wrong. With the COSPAL approach, the system realises something is different and if its actions do not result in success it tries something else.

The shape-sorter

This trial-and-error learning approach was tested by making the COSPAL robot complete a shape-sorting puzzle but without telling it what it had to do. As it tried to fit pegs into holes it gradually learnt what would fit where, allowing it to complete the puzzle more quickly and accurately each time.

This puzzle was supposed to be solved using an industrial robot (Stäubli RX90), a side-view camera, and a camera mounted to the end-effector; both with a resolution of 1024×768.

Since the puzzle board was lying flat on the ground, the problem was not a full 3D problem but rather a 2+D problem, where the (relevant) height-position of objects consists of three different levels: on the ground; lifted; inserted. Solving this task using an engineered system is believed to be straightforward using standard methods. It is worth mentioning that such a system would probably fail if something from outside the model space happens. For COSPAL the shape-sorter scenario was to create a system that can learn to solve the puzzle starting from as little prior knowledge as possible.

The system has to learn through imitation - otherwise there is no way to make the system learn at all. The system knows how to close and open the gripper. This includes also a change of the vertical position over the ground, i.e., closing the gripper means lowering, closing, and lifting it again.


Bootstrapping (in which simple visual percepts are discovered first and then more complex ones built on top) has been applied to accelerate the learning of basic capabilities. Without bootstrapping, the system would start to learn such basic capabilities as visual servoing - a closed-loop control mechanism with visual feedback - and the concept of objects by explorative learning, where positive reward is given whenever consistent actions are performed. This is a lengthy process used by infants, who learn hand-eye coordination (corresponding to visual servoing) and object constancy. The following bootstrapping was performed before the system started its ordinary, incremental learning:

  • Visual servoing;
  • Object-gripper relation;
  • Object-hole relation.

The latter two relations were bootstrapped in combination and both processes deserve further explanation.

In the visual servoing mechanism, the controlling system successively reduces some error obtained from visual feedback until the final goal position is reached. This is in contrast to open-loop systems, which compute a goal position and approach it in a single movement. Visual servoing can be sorted into variants, depending on whether it is based on image coordinates or coordinates in 3D space, and whether it uses knowledge about camera calibration and inverse kinematics or direct control within a visual-motor model.

Within COSPAL, only image-based, direct visual servoing was applied, since engineered models were to be avoided where possible. The overall robustness of the system, i.e., its capability to deal with unforeseen situations, is improved in two ways by this strategy:

  • No camera calibration - if the (side-view) camera is moved, the visual servoing control adapts to the new geometry;
  • No explicit inverse kinematics - even for robots without joint-feedback or partially modified configuration (e.g. weight), the visual servoing control adapts to the current situation.

Using a direct servoing method requires constant learning regarding the robot control, but in particular the initial visual-motor model needs to be estimated. To accelerate this initial estimation, the model is bootstrapped by moving the robot to a variety of configurations in an effort to generate learning data for the visual-motor model. This means that the major part of the model is acquired in a batch-learning process.

Object relations are essential for formulating the concept of objects and goals of actions. Objects without relations to their own system, i.e. the gripper, are functionless and there is no reason to build a concept for them - they belong to the background.

Turning around this argument allows for learning the appearance of objects: the manipulator moves randomly and moves objects by chance. The resulting change of appearance helps to register the objects, as such, and their location relative to the gripper.

However, similar to the visual servoing, fully random exploration would take too much time in practice and the process is accelerated by putting the objects into the manipulator and then moving around them.

It is similar for the holes: The system could learn the proper relation of objects to holes by random exploration, but the likelihood of insertion by chance is extremely low. Instead, it is better to start with the objects in the holes, and perform the opposite action, moving the object away from the hole, to establish the relation between objects and holes.

Hierarchical learning

After these two bootstrapping steps, the system can start to acquire further competences by incremental learning. There needs to be some intrinsic moment in the system to learn, which corresponds to motivation in people, but fortunately this can be forced into a technical system as a part of the engineered principles. Presumably, it is enough to impose some moment to imitate what the system has observed. This moment leads to learning hierarchical competences according to the following steps in a complexity chain:

1. Reproduce the same action

The system observes a teacher doing the following action sequence: a) approach object 1; b) align gripper to object 1; c) grasp object 1; d) approach hole 1; e) align to hole 1; f) release object 1. Note that these single actions are not programmed into the system, but are shown by the teacher to the system.
What the system observes is about the following: a) put the gripper in relation to object 1 according to side-view camera (here the bootstrapped competences are required); b) put the gripper in relation to object 1 according to end-effector camera; c) close gripper (prior knowledge); d) put object in relation to hole 1 according to side-view camera; e) put object in relation to hole 1 according to end-effector camera; f) open gripper. By repeating exactly the same observations, the system starts to imitate the taught sequence.

2. Generalise to different position

In the previous example, 'object 1' and 'hole 1' have been identified to a large extent by their spatial position rather than their visual appearance. If both are now moved to different positions, the system has to find them through random exploration: unsuccessful trials will lead to reduced likelihood for similar actions and successful trials increase the likelihood.
Eventually, the system will repeatedly go to object 1, independently where it is located. The system has therefore replaced the identification by spatial position with identification by appearance. A similar transition happens for hole 1: after some false trials using the wrong holes or other objects, the system starts to identify the correct object to hole relation. The system has learned to put object 1 into hole 1 independently of the position.

3. Generalise to different objects

Object 1 is now removed (after it has been inserted successfully) such that there is no object 1 to approach. The system selects randomly another object (success) or hole (failure). Eventually, the empty gripper is most likely moved to an object. If an object is in the gripper, the teacher reports only success for the fitting hole, and the system generalises to use the bootstrapped object to hole relation for the object currently in the gripper. At the end of this stage, the system has learned to insert an arbitrary object into the appropriate hole. Due to the intrinsic moment to imitate the initial action, the system will continue until all objects are inserted.

Robustness and scalability

Several attempts to distort the running system have been made. For example, objects have been moved while they were approached. In the worst case, the system fails in its initial trial and either repeats the same action or picks another object and returns later to the first one.

One might now argue that the chosen scenario was too simple to be useful, but due to the generic bootstrapping, the robot arm could be replaced with any other actuator, the objects can be replaced, and even the relations between objects and holes (goals) can be replaced.

The COSPAL system is generic enough to be used for many other assembling tasks or, as it has been shown during the COSPAL project, to control a radio-controlled car. In this experiment, the system learned to steer the car towards coloured balls. The actuator was now the car, the objects were the balls, and the sub-goals were to get close to the balls. This has been achieved by simply replacing the bootstrapping and learning examples.

The radio controlled car was the starting point for a new project on cognitive systems, where the setting is changed to a dynamic one with interacting agents and a more serious demonstrator. In the DIPLECS project (Dynamic Interactive Perception-action LEarning in Cognitive Systems,, an artificial cognitive system is supposed to learn appropriate and adaptive assistance for drivers; the system warns the driver if required and if the system expects the driver to accept the warning. Furthermore, partly autonomous control of a radio-controlled car using generic, learned recognition and sub-goals, is planned to be demonstrated based on the COSPAL architecture.

Dr Michael Felsberg is work package leader in the EU project MATRIS and coordinator of the EU projects COSPAL and DIPLECS. This work has been supported by EC Grant IST-2003-004176 COSPAL - [new window].

Recent articles

Info Message

Our sites use cookies to support some functionality, and to collect anonymous user data.

Learn more about IET cookies and how to control them