Commentary on Joel Norman: Two Visual Systems and Two Theories of Perception: An Attempt to Reconcile The Constructivist and Ecological Approaches

Abstract: 29 words
Main Text: 996 words
References: 177 words
Total Text: 1202 words

Conceptual Space as a Connection between the Constructivist and the Ecological Approaches in a Robot Vision System

Antonio Chella
Dipartimento di Ingegneria Automatica e Informatica,
University of Palermo,
Palermo,
90128
Italy
chella@unipa.it
http://www.csai.unipa.it


Abstract

The conceptual space (Gärdenfors 2000) is discussed as a representation structure that connect the constructivist and the ecological vision subsystems in an operating autonomous robot based on computer vision.


The two vision subsystems discussed by Norman and based on the constructivist and the ecological approaches have an immediate counterpart in the design of robotic architectures based on computer vision. On the one side, the ecological approach is adopted to design robot behaviors that reactively connect the information acquired by cameras and other sensors to robot actions as in the case of obstacle avoidance, path following, and orienting the robot towards a goal (see Arkin 1998).

On the other side, the constructivist approach is adopted to design the object recognition system of the robot, i.e., the high-level vision algorithms that let the robot identify and recognize the objects on which it needs to act in its working environment. In general, a robot object recognition system generates 2D/3D observer independent reconstruction of the objects in the perceived scene. It comes out from information-processing tasks that receives raw and low structured information (the data acquired by cameras) as input, and give as outputs highly structured data employed for identification and recognition (see Ullman 1996 and Edelman 1999 for examples). Several proposal are described in the literature about the connections of the two subsystems in operating robots (see Kortenkamp et al. 1998 for a review).

The autonomous robot operating at the Robotics Laboratory of the University of Palermo (a RWI B21 equipped with stereo head) connects the two subsystems by adopting a theoretically motivated approach based on the conceptual space (CS - Gärdenfors 2000). A CS is a metric space whose dimensions are related with the quantities processed by the robot sensors. Examples of dimensions could be colour, pitch, volume, spatial co-ordinates. In any cases, dimensions do not depend on any specific linguistic description: a generic conceptual space comes before any symbolic-propositional characterization of cognitive phenomena.

A knoxel is a point in the conceptual space and it represents the epistemologically primitive element at the considered level of analysis. In the implemented vision system (Chella et al. 1997), in the case of static scenes, a knoxel corresponds to a geon-like 3D geometric primitive, i.e., a superquadric (Pentland 1986). It should be noted that the robot itself is a knoxel in its conceptual space. Therefore, the perceived objects, as the robot itself, other robots, the surrounding obstacles, are all reconstructed by means of superquadrics and they correspond to suitable sets of knoxels in the robot's CS.

Some dimensions of the CS are related to the knoxel's shape (the axes length and the shape factors), that comes out from the robot's constructivist subsystem, while other dimensions are related to the displacement in space of the knoxel (the position of the center and the orientation of the axes) and they comes out from the robot's ecological subsystem, in Norman's terms. The conceptual space is therefore a result of the connection of the two subsystems and it contains all the information needed to the robot to describe the represented objects in symbolic terms and contemporary to act in its environment (Chella et al. 1998).

To account for dynamic scenes, the robot CS is generalized to represent moving and interacting entities (Chella et al. 2000). In this case, an intrinsically dynamic conceptual space is adopted. Simple perceived motions are categorized in their wholeness, and not as sequences of static frames. In other words, simple motions of superquadrics are the perceptual primitives for motion perception. According to this hypothesis, every knoxel corresponds to a simple motion of a superquadric, expressed by adding suitable dimensions in CS that describe the variation in time of the knoxel. For example, considering the knoxel describing a rolling ball, the robot's dynamic conceptual space takes into account not only the shape and position of the ball, but also its speed and acceleration as added dimensions (Marr and Vaina 1982). So, when the robot chases the rolling ball, it represens this action in its dynamic CS as a set of two knoxels, corresponding to the moving ball and the chasing robot itself. Also in this case, the dynamic conceptual space is a result of the connection of both the two subsystems proposed by Norman.

This new conceptual space allows the robot to represent and recognize dynamic scenes; in particular the scenes in which the robot moves itself in a dynamic environment. In this case, the behaviors of the ecological subsystem receive feedback and control from the CS during their own operations. The feedback is employed to monitor the operations of the behaviors to obtain satisfactory performances. This is another example of the connections between the two subsystems described by Norman.

The dynamic CS representation lets the robot anticipate possible future interactions with the objects in the environment (Gärdenfors 1997). In facts, the interaction between the robot and a generic object (e.g., the ball previously decribed) is represented as a sequence of sets of knoxels in CS. This sequence can be imagined and simulated in the robot's CS before the interaction really happens in the real world. In the implemented robot system, the imagined sequence of knoxel sets is recalled by a recurrent neural network (Elman 1990) receiving as input the knoxels describing the robot in the current environment. For example, when the robot perceives the quiet ball, it can imagine to bump it, or, when the robot perceives the ball that rolls, it can imagine to stop it. Therefore, the CS may represent simple forms of objects affordances.

Moreover, the rolling ball may disappear from the robot field of view because of an occluding obstacle. In this case, the robot represents the ball's trajectory in its CS by the associative mechanism previously outlined, and it anticipates the ball's future positions. Also in this case, the CS representation is usefully employed to suitably drive the behaviors of the ecological subsystem of the robot to catch the ball. In this sense, the CS allows for the description of some forms of high-level, conceptual affordances that allow the robot to represent immediate action plans.



References

Arkin, R. (1998) Behavior-Based Robotics. MIT Press, Cambridge, MA.
Chella, A., Frixione, M. & Gaglio, S. (1997) A cognitive architecture for artificial vision Artificial Intelligence 89:73-111.
Chella, A., Frixione, M. & Gaglio, S. (1998) An Architecture for Autonomous Agents Exploiting Conceptual Representations Robotics and Autonomous Systems 25:231-240.
Chella, A., Frixione, M. & Gaglio, S. (2000) Understanding dynamic scenes Artificial Intelligence 123:89-132.
Edelman, S. (1999) Representation and Recognition in Vision. MIT Press, Cambridge, MA.
Elman, J.L. (1990) Finding Structure in Time Cognitive Science 14:179-211.
Gärdenfors, P. (1997) The role of memory in planning and pretense Behavioral and brain sciences 20:24-25.
Gärdenfors, P. (2000) Conceptual Spaces. MIT Press, Cambridge, MA.
Kortenkamp, D., Bonasso, R.P. & Murphy, R. (1998) Artificial Intelligence and Mobile Robots - Case Studies of Successful Robot System. AAAI Press/MIT Press, Menlo Park, CA.
Marr, D. & Vaina, L. (1982) Representation and recognition of the movements of shapes Proc. R. Soc. Lond. B 214:501-524.
Pentland, A. (1986) Perceptual Organization and the Representation of Natural Form Artificial Intelligence 28:293-331.
Ullman, S. (1996) High-level Vision. MIT Press, Cambridge, MA.


Acknowledgements

I am grateful to Marcello Frixione, Salvatore Gaglio and Peter Gärdenfors.