Graphical Representations of Emotional Agents

GRAPHICAL REPRESENTATIONS OF EMOTIONAL AGENTS

Srividya Dantuluri

Chapter 1

Introduction

The purpose of this report is to perform a literature search in the field of graphical representation of believable agents and to identify the basic requirements for a tool or a theory in this area. We will identify the key issues that must be addressed in creating a credible virtual being and analyze the current research with respect to these issues.

1.1 Why do we need believable agents?

Extensive research has shown that users interpret the actions of computer systems using the same conventions and social rules used to interpret actions of humans [Oz, 1997]. This is more pronounced in the case of anthropomorphic software interface agents.

Software agents are expected to be believable. This does not imply that they should always speak the truth and should be reliable. The user, when interacting with a believable agent, should feel that he/she is dealing with a life-like character instead of a life-less computer. If humans sympathize with and accept an agent as human, they will be able to communicate with it better. Agents that are not believable are viewed as life-less machines [Chandra, 1997].

Research in the implementation of believable agents is not targeted to trick the user into believing that he is communicating with a human. Rather, it has a more benign purpose. As designers start building agent systems with emotions, they will need techniques for communicating these emotions to the user. This is precisely why research in believable agents is needed.

A more remote, yet cogent reason, for pursuing this research is to build companions like Data on Startrek, which is termed as the AI dream [Oz, 1997]. In the words of Woody Bledsoe, a former president of AAAI,

“Twenty-five years ago I had a dream, a daydream, if you will. A dream shared with many of you. I dreamed of a special kind of computer, which had eyes and ears and arms and legs, in addition to its "brain" ... my dream was filled with the wild excitement of seeing a machine act like a human being, at least in many ways.”

Research in believable agents directly deals with building complete agents with personality and emotion, and thus provides a new prospectus for pursuing the AI dream [Oz, 1997].

The current literature terms believable agents as “Virtual Humans”. Additionally, animated life-like characters are also called avatars [Leung, 2001]. The word “avatar” originates from the Hindu religion. It means “an incarnation of a Hindu deity, in human or animal form, an embodiment of a quality or concept, or a temporary manifestation of a continuing entity” [Dictionary]. For the purpose of this report, an avatar can be assumed to be a vivid representation of a user in a virtual environment. The term “avatar” is also used to denote the graphical representation of a software agent in a Collaborative Virtual Environment (CVE) [Salem, 2000]. The CVE is a multi-user virtual space in which users are represented by a 3D image.

The terms believable agents, animated life-like agents, virtual humans, and avatars are used interchangeably in this report.

1.2 What are the requirements for believable animation?

The Oz group from the Carnegie Mellon University’s School of Computer Science identified the following requirements for believability: [Oz, 1997]

· Personality: Personality is an attribute, which distinguishes one character from another. It includes everything unique and specific about the character, from the way they talk to the way they think.

· Emotions: Emotion can be defined as a mental state that occurs involuntarily based on the current situation [Dictionary]. The range of emotions exhibited by a character is personality-specific. Given a situation, characters with different personalities react with different emotions.

Personality and emotion are closely related [Gratch, 2002]. [Moffat, 1997] says that personality remains stable over an extended period of time whereas emotions are short term. Furthermore, while emotions focus on particular situations, events, or objects, elements determining personality are more extended and indirect.

Apart from personality and emotions, mood is also an important attribute that has to be considered while working with emotional agents. Mood and emotion differ in two dimensions: duration and intensity. Emotions are short-lived and intense, whereas mood is longer and has a lower intensity [Descamps, 2001].

Building a virtual human involves joining traditional artificial intelligence with computer graphics and social science [Gratch, 2002]. Synthesizing a human-like body that can be controlled in real-time includes computer graphics and animation. Once the virtual human starts looking like a human, people expect it to behave like one too. To have a believable intelligent human-like agent, the agent needs to possess personality and needs to display mood and emotion [Chandra, 1997]. Thus, research in the field of building a believable agent or a virtual human must rely immensely on psychology and communication theory to adequately convey nonverbal behavior, emotion and personality.

The key to realistic animation starts with creating believable avatars [Pina, 2002]. The expressiveness of an avatar is considered to be crucial for their effective communication capabilities [Salem, 2000]. The advantage of creating an embodiment for an agent (avatar) is to make it anthropomorphic and to provide a more natural method of interacting with it.

Gratch [Gratch, 2002] identifies the key issues that must be addressed in the creation of virtual humans as face-to-face conversations, emotions and personality, and human figure animation. The avatar can be animated to create body movements, hand gestures, facial expressions and lip synchronization.

[Noma, 2000] specifies that the animation of a virtual human should posses:

· Natural motion: In order to be plausible, the virtual human’s motion should look as natural as possible. The virtual human must have a body language that is human-like.

· Speech synchronization: The body motion, in particular the lip movement of the virtual human, should be in synchronization with the speech.

· Proper Controls: The user should be able to control the agent. Changes are to be allowed if needed. The user should be able to represent the basic emotions and use them to come up with a combination of emotions.

· Widespread system applicability: The tools for developing applications with animated life-like agents should be integrated into the current animation or interface systems.

It has been recognized that the non-verbal aspect of communication plays an important role in the daily life of humans [Tosa, 1996]. Human face-to-face conversation involves sending and receiving information through both verbal and non-verbal channels. Having a human-like agent makes it easier to understand the aim of a conversation and provides us with an opportunity to exploit some of the advantages of Non-Verbal Communication (NVC) like facial expressions and gestures [Salem, 2000].

Body animation, gestures, facial expressions, and lip synchronization are all very important for Non-Verbal Communication. The face exhibits emotions while the body demonstrates mood [Salem, 2000]. It is true that in a few applications, only the head and shoulders of a person may fill the screen. This does not imply that facial animation is more important than body animation. In applications where avatars are at a relatively larger distance from each other, facial expression can be too subtle and can be easily missed. In such a situation, gestures are more important. Therefore, an animation technique which provides both facial and body animation is deemed necessary.

Body gestures, facial expressions and acoustic realization act as efficient vehicles to convey emotions [Gratch, 2002]. Animation techniques are required to encompass body gestures, locomotion, hand movements, body pose, faces, eyes, speech, and other psychological necessities like breathing, blinking, and perspiring [Gratch, 2002].

Additionally [Gratch, 2002] designates the following requirements for the control architecture of believable agents.

· Conversational support: Initiating a conversation, giving up the floor, and acknowledging the other person are all important features of human face-to-face communication. The architecture used to build virtual humans should provide support for such actions. For example, looking repeatedly at the other person might be used as a form of giving the other person a chance to speak or waiting for the other person to speak. A quick nod as the speaker finishes a sentence acts as an acknowledgement.

· Seamless transition: A few behaviors like gestures require the virtual human to reconfigure its limbs from time to time. It would help if the architecture allows the transition from one posture to the other to be smooth and seamless.

In summary, it is generally agreed that techniques for building animated life-like agents are expected to synthesize virtual humans that depict plausible body animation, gestures, facial animation, and lip synchronization

Based on the above requirements, we have compiled several interesting questions in order to analyze the existing research in the graphical representation of emotional agents. They are as follows:

· How does the technique arrive at a set of useful gestures or expressions?

· Can gestures and expressions be triggered in a more natural way than selection from a tool bar?

· How is it ensured that all the gestures and expressions are synchronized?

· Can the user control the gestures and expression?

· Can the technique handle multiple emotions simultaneously?

· Does the system represent the mood of the agent?

· Does the technique provide both facial animation and body animation?

· Does the technique provide conversation support (gestures or expressions that indicate desire to initiate a conversation, to give up the floor and to acknowledge the speaker)?

· How does the technique decide the mapping between emotion or mood and graphics?

· Is the technique extendable?

· How many emotions does the technique represent? Is it possible to form complicated emotions using a combination of the represented emotions?

· Is the animation seamless?

· Is the technique evaluated? If yes, do professionals or users do the evaluation, is it limited or extensive, is it scientific or not?

· Is there a working model or demonstration available? If yes, does it comply with all the claims made?

As mentioned above, body animation, gestures, facial animation and lip synchronization are the important aspects in the animation of believable agents. This report will handle each of them independently in Chapters 2, 3 and 4 respectively. The additional requirements, if any, for each aspect will be identified and the current literature will be analyzed.

A few existing tools for the creation of animated agents like the Microsoft Agent, the NetICE project, Ken Perlin’s responsive face, DI-Guy, and the Jack animation system, will be evaluated based on the above mentioned criteria and a few additional criteria like license agreements and cost (Chapter 5).

We will then consider whether any of the existing techniques can be integrated together to provide a complete and plausible animation. The possible difficulties in such integration will be considered (Chapter 6).

This report seeks to explain key features of graphical representations. We will compare and contrast various graphical representations of agents as reported in the current literature. In addition to categorizing the various agents, we will offer some explanation of why various researchers have chosen to include or omit important features and what we see as future trends.

Chapter 2

Body Animation

Animating the human body demands more than just controlling a skeleton. A plausible body animation needs to incorporate intelligent movement strategies and soft muscle-based body surfaces that can change shape when joints move or when an external force is applied to them [Gratch, 2002]. The movement strategies include solid foot contact, proper reach, grasp, and plausible interactions with the agent’s own body and the objects in the environment. The challenge in body animation is to build a life-like animation of the human body that has sufficient detail to make both obvious and subtle movements believable. Also for a realistic body animation, maintaining an accurate geometric surface through out the simulation is necessary [Magnenat, 2003]. This means that the shape of the body should not change when viewed from a different angle or when the agent starts moving.

The existing human body modeling techniques can be classified as creative, reconstructive, and interpolated [Seo, 2003b, Magnenat, 2003 and Seo, 2003a].

The creative modeling techniques use multiple layers to mimic individual muscles, bones and tissues of the human body. The muscles and bones are modeled as triangle meshes and ellipsoids [Scheepers, 1997]. Muscles are designed is such a way that they change shape when the joints move. The skin is generated by filtering and extracting a polygonal “isosurface” [Wilhelms, 1997]. An isosurface is defined as “a surface in 3D space, along which some function is constant” [Jean]. The isosurface representation takes a set of inputs and draws a 3D surface corresponding to points with a single scalar value [Iso]. To be put in simpler words, the isosurface is a 3D surface and the vertices on the surface can be coupled with vertices on any other surface so that when a particular point on the latter moves, the corresponding point on the isosurface is also displaced in the same direction by the same amount. In creative modeling techniques, the vertices on the isosurface are coupled with the underlying muscles, which make the skin motion consistent with the muscle motion and the joint motion.

The creative modeling techniques were popular in the late 90s. Although the generated simulation looks real, it involves substantial user involvement in the form of human models, resulting in slow production time. Modern systems prefer reconstructive or interpolated models because of the above drawbacks in creative models.

The reconstructive approach aims at building the animation using motion capture techniques [Gratch, 2002]. The image captured can be modified using additional techniques. Research [Gleicher, 2001; Tolani, 2000; Lewis, 2000; and Sloan, 2001] has shown that using motion capture techniques to produce plausible animations is a strenuous task since it would be difficult to maintain “environmental constraints” like proper foot contacts, grasp and interaction with other objects in the environment [Gratch, 2002].

Since motion capture deals with the modification of the existing image, it is quite a challenge to make the animation look plausible. Chances are that the animation would look as if a separate image was pasted in the already existing environment. In other words, it might look as if the animation is not a part of the environment. Also, it is difficult to modify the generated images to produce different body shapes that the user intends. Thus, the user has little control over the animation [Magnenat, 2003].