The Mirrorbot Project Will Develop New Techniques Including Cel

/ MirrorBot
IST-2001-35282
Biomimetic multimodal learning in a mirror neuron-based robot
Title: Description of a Navigation Scenario, Including Objects Vocabulary and Syntax (Deliverable D15.1)Authors: Nicolas Rougier, Frédéric Alexandre, Julien Vitay
Covering period 1.6.2002-1.6.2005

MirrorBot Report 16

Report Version: 1
Report Preparation Date: 30 September 2004
Classification: Public
Contract Start Date: 1st June 2002 Duration: Three Years
Project Co-ordinator: Professor Stefan Wermter
Partners: University of Sunderland, Institut National de Recherche en Informatique et en Automatique at Nancy, Universität Ulm, Medical Research Council at Cambridge, Università degli Studi di Parma
/ Project funded by the European Community under the “Information Society Technologies Programme“

0. Overview 3

1. Introduction 4

2. Scenario and Grammar 5

3. Software Algorithm 7

4. References 9

0. Overview

Biomimetic multimodal learning and language have been developed in the robot to examine the emergence of representation of actions, perceptions, concepts, and language. MirrorBot has developed and studied emerging embodied representations based on mirror neurons and to achieve this aim a MirrorBot scenario and grammar was devised together with a behaviour structure. Particularly, we now have several robust and working modules that allow the robot to perform the following tasks:

· Look at objects

· Recognize an object among others

· Dock to a table

· Grasp an object

· Process and interpret sentences based on proposed grammar

On the basis of previous completed work packages, this report provides a new and more complex scenario that outlines both the main elements of the scenario, grammar and behaviour structure.

1. Introduction

In this report we describe a combined application of methods and models developed in the MirrorBot project. The mirror neuron insight gained from the partners working in biology and physiology has lead to different neuronal models of information processing and learning in the multimodal systems of vision, language and motor control. Extended robotic sensors like sonar sensors and an omni-directional camera that was added to the PeopleBot robot have also been included in the application.

The document is structured as follows. Section 2 introduces the scenario and its distinct phases, as well as the grammar and vocabulary used. Section 3 gives an overview about the implementation of the scenario. Section 4 describes the specific details of the implementation and the results. Section 5 considers technical details of the software used.

2. Scenario and Grammar

A physical scenario as well as a grammar have been developed which are simple enough to be tackled using real robots, but which incorporate all required actions that constitute a realistic application. The scenario is illustrated in Figure 1 where the robot is placed at a random position in the environment and fruits lying on a table are not in sight. An instruction of action and goal is given, using the language medium. The robot needs to detect and approach the table before any docking can take place. Then, after having chosen the target according to instructions, it can either show the target, dock and point to the target or dock and grasp the target.

Figure 1 Proposed scenario

To fulfil the requirements of the scenario the robot needs to combine various behaviours such as speech and language processing, avoidance, object recognition and localisation, navigation, self-localisation, gripper action and camera movement. Most of these behaviours such as speech recognition and avoidance will be available continuously, while others such as gripper action will only be enacted when the robot has reached the correct location. Object recognition requires the combination of various subsystems such as colour identification and edge detection. As with the human brain the system incorporates feedback mechanisms to ensure that it is fulfilling the aims of the instruction

The verbal instructions in the scenario are to be based on the MirrorBot grammar. The grammar was agreed in a common meeting and was specifically designed to reflect neuroscience findings (e.g. the hand, head, body distinction) directly in the scenario grammar. This grammar consists of 31 words and allows the construction of a set of basic instruction to manipulate the behaviour of the MirrorBot robot. The grammar contains two agents, the robot and the human, sets of body part related actions, directions, object colours and the actual objects.

Grammar formal description

agent ::= SAM action

BOT action

action ::= body_action

head_action

arm_action

stop

body_action ::= go object

movebody x_direction

turnbody y_direction

head_action ::= turnhead y_direction | z_direction

show object

hand_action ::= pick object

put object

lift object

drop object

touch object

x_direction ::= forward | backward

y_direction ::= left | right

z_direction ::= up | down

object ::= [colour] natural_object

[colour] artefact_object

colour ::= brown | blue | black | white

natural_object ::= nut | plum | dog | cat

artefact_object ::= desk | wall | ball | cup

3. Software Algorithm

The software algorithm is based on both concurrent and sequential organization of the various modules and thus, it is necessary to coordinate them in order to complete a given task. Let us consider the scenario where the robot is not in front of a table and where some sentence is pronounced.

Phase 1:

Using models described in [Elshaw et al. 2004, Weber et al. 2004, Knoblauch et al 2004, Wermter and Elshaw 2004] the robot needs first to perform the following tasks concurrently

· Wandering

· Identifying table

· Sentence processing

· Command sentence understanding

Once a table has been actually identified the robot stops “identifying” and “wandering” tasks.

Phase 2:

Using models described in [Elshaw et al. 2004, Weber et al. 2004] the robot needs now to approach the table and localize objects that are mediated by language.

· Approaching

· Sentence processing

· Command sentence understanding

The sentence processing has to be completed at the end of this task because it gives the necessary information about the object to look at, point at or grasp. In the end, the robot needs to know the object, the colour and the action to perform.

This sentence processing and command understanding is carried out by a cell assembly-based model of several visual, language, planning, and motor areas to enable the robot to understand and react to simple spoken commands. The essential idea is that different cortical areas represent different aspects of the same entity, and that the long-range cortico-cortical projections represent hetero-associative memories that translate between these aspects or representations.

Phase 3:

Using models described in [Kaufmann et al. 2005, Knoblauch et al. 2004, Rougier and Vitay 2005, Foster et al. 2000] the robot needs now to localize more precisely the object.

· Sequentially localizing objects of the given colour

· Identifying the focused object

· Comparing the focused object with the target object

This phase will not terminate until the target object is actually focused.

During this phase, the distributed representation of the sensory information has to be coherently processed to generate relevant actions. For this purpose, we propose a model of visual exploration of a scene by the means of localized computations in neural populations whose architecture allows the emergence of a coherent behaviour of sequential scanning of salient stimuli while the identification part relies on an object identification system that identifies objects using a hierarchical class grouping that meets real-time constraints.

Phase 4:

Using models described in [Elshaw et al 2004, Weber et al. 2004a, Weber et al. 2004b] and depending on the action, there are now 3 possible sub-phases:

Phase 4.1: look

Point camera at focused object.

Phase 4.2: point

Dock toward the focused object

Phase 4.3: grasp

Dock toward the focused object

Grasp the object

When either 4.1, 4.2 or 4.3 is completed, the task is considered to be done.

The main difficulty in this phase is to have the robot approach the table so that it can grasp an object. One constraint is that the PeopleBot robot has a short non-extendable gripper and wide “shoulders”. Therefore, it must approach the table at a perpendicular angle so that the gripper can reach over it. The implemented solution of this stage is based solely on neural networks: object recognition and localisation is trained, motivated by insights from the lower visual system. Based on the hereby obtained perceived location, we train a value function unit and four motor units via reinforcement learning. After training the robot can approach the table at the correct position and in a perpendicular angle.

4. References

Elshaw M., Weber C., Zochios A., Wermter S. A Mirror Neuron Inspired Hierarchical Network for Action Selection. Proceedings of NeuroBotics Workshop, Ulm, Germany, pp. 98-105, September 2004

Foster D.J, Morris R.G.M and Dayan P. A Model of Hippocampally Dependent Navigation, Using the Temporal Difference Learning Rule. Hippocampus, Vol. 10, pp 1-16, 2000.

Kaufmann U., Fay R., Palm G. Neural Networks for Visual Object Recognition Based on Selective Attention. 1st International SenseMaker Workshop on Life-Like Perception Systems, 2005.

Knoblauch A., Markert H., Palm G. An Associative Model of Cortical Language and Action Processing. Proceedings of NCPW9 , Plymouth, UK, September 2004.

Rougier N., Vitay J. Emergence of Attention within a Neural Population. Submitted, 2005.

Weber C., Elshaw M., Zochios A., Wermter S. A Multimodal Hierarchical Approach to Robot Learning by Imitation. Fourth International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Genoa, Italy, pp. 131-134, 2004a.