Approach to the ACE Software Architecture

ACE Project

Reid Simmons

Robotics Institute

CarnegieMellonUniversity

September 20, 2007

Introduction

The ACE project will utilize the Syndicate architecture and, more specifically, the Trestle implementation of Syndicate [Sellner et.al., 2006]. Syndicate is an extension of the 3T architecture [Bonasso et.al., 1997] to deal with multi-robot coordination. The 3T architecture consists of behavioral, executive, and planning layers (Figure 1). Since ACE will not (initially, at least) use the planning layer, this document will not consider it further. We will first describe the general capabilities of 3T’s behavioral and executive layers, and then discuss how Syndicate extends that to multi-robot coordination. We then describe the overall approach to software design that we intend to use for the ACE project.

3T Skill Manager

The behavioral layer of 3T consists of the Skill Manager. A “skill” is a data-flow behavior that runs either in response to the arrival of new data (e.g. event-driven) or periodically (the period is specified when the skill is created). Skills are parameterized and have user-defined input and output ports. A user-defined function is invoked when new data arrives on an input port (if the skill is event-driven) or if the input ports are filled when the skill is scheduled to run, if the skill is periodic. Running this function may result in the skill placing data on its output ports. If these are, in turn, connected to other enabled skills, then they will be run, as well, either immediately or when their turn comes around. There are perceptual and action skills – perceptual skills have their input ports connected directly to skills that interface to sensors; action skills have their output ports connected directly to skills that interface with actuators. In Syndicate, it is the convention that every actuator and sensor has a corresponding skill in the behavioral layer that exposes the associated interface, or data, to the remainder of the skills.

The connection between the output port of one skill and the input port of another can be defined either statically (when the skills are created) or dynamically (during run time). Statically defined connections are a bit more computationally efficient, but dynamically-defined connections are more flexible. For instance, the sensor source of a behavior can be dynamically chosen based on current conditions, such as which sensor is currently available or what sensor resolution is needed for a given task.

Skills may be enabled or disabled. Only enabled skills are actually run. Data placed on the ports of disabled skills are just lost. Skills may be parameterized with user-defined sets of parameters. Skills may also signal events, which are passed up to the executive layer (described below).

The Skill Manager in the original 3T implementation used a declarative (textual) representation of the skill definitions and connections. A translator parsed the representation and generated C-code automatically from the descriptions. Our implementation uses templated C++ classes to define new skills and their connections.

3T Skill Executive

The original 3T implementation used RAPs [Firby, 1987] as the underlying reasoning engine for the executive layer. Syndicate uses TDL [Simmons & Apfelbaum, 1998]. We will first describe the general capabilities that the executive provides in the 3T architecture, and then talk about the specifics of our TDL-based implementation.

The executive layer is responsible for executing high-level tasks. A frequent question asked is “what is the difference between behaviors (skills) and tasks?” One answer is that behaviors tend to have a single focus (e.g., “avoid obstacles”, “move along a corridor”) while tasks are more structured (e.g., “deliver mail”, “grab and insert a plug”). More technically, behaviors are typically stateless: their actions depend only on current sensor values or parameter values (one exception is that behaviors may combine sensor values over time, using filters). Tasks, in contrast, are modeled using finite state machines. As such, they need past state to “remember” where they are in the state machine, to determine when the environment has changed sufficiently from expectations to indicate an exceptional condition, etc. In general, though, tasks do not make predictions about the future – that is the role of the planning layer (one exception is that path- or trajectory-planning is typically considered a task-level function).

The main roles of an executive include hierarchical decomposition of tasks into subtasks, dispatching tasks based on their temporal constraints, execution monitoring, and exception handling. High-level tasks are decomposed into lower-level tasks, which themselves may be further decomposed, forming a tree. This task tree(c.f. Figure 2) is created dynamically, at run time, and may differ from run to run based on specifics of the environment (e.g., some subtasks may be executed only conditionally, or some tasks may repeat iteratively). The leaves of the task tree (“command” nodes, in TDL) connect to the behavior layer (see below). The interior nodes (“goal” nodes, in TDL) represent parent/child relationships between nodes. In the original 3T implementation, the RAPs language represented different methods that could be used to decompose tasks. Each method had an associated applicability condition and a limit on how often it could be run for each task instance. In TDL, the decomposition is represented using a user-defined procedure that creates a task tree by virtue of executing that procedure. Special syntax in TDL (e.g., “spawn”, “with”) indicates how the task tree should be formed. While the method-based approach of RAPs is typically a cleaner representation of the task decomposition for simple tasks, TDL is much more flexible and expressive in the types of decomposition that can be expressed. For instance, if there are two methods for achieving a task that differ only in the middle step (subtask), in RAPs the common parts of the method would have to be duplicated, once for each method, while in TDL a single procedure would be written that has a conditional statement in the middle.

A key aspect of the executive is to manage the relationships between tasks. Tasks may be related temporally with respect to one another, and the executive must respect those constraints, invoking tasks only when the constraints are met. The basic constraints that all executives support are “serial” and “concurrent.” A serial constraint means that one task must completely finish executing (including all of its subtasks) before another task (including any of its subtasks) is allowed to start execution. A concurrent constraint really just means that there are no temporal constraints between the tasks – the tasks may execute at the same time, but they are not forced to (this latter constraint can be represented in TDL, but it is a bit more complex to specify).

Note that while the conceptual paradigm of TDL is multiple tasks executing with true concurrency, in reality spawning separate tasks imposes a large computational burden. Thus, the default for TDL is that multi-tasking is simulated by interleaving task execution within the TDL reasoning engine. If true concurrency is desired, tasks can be designated as separate tasks using a special TDL keyword, but this is not the recommended procedure.

TDL supports a host of other temporal constraints that increase the range of possible expressivity. To understand the constraints, it is first helpful to note that each task has a start time and an end time. One may assert constraints between the start/end times of one task and the start/end times of another task.[1] For instance, the serial constraint can be written as “the start of task 2 follows the end of task 1.” In addition, one may also assert “the start of task 2 follows the start of task 1,” which forces task 2 to start after task 1 starts, but otherwise allows them to run concurrently. One may also add a metric delay to any constraint, such as “task 2 starts 10 seconds after task 1 ends” or “task 2 starts 1 minute after task 1 starts” or “task 3 starts at 1pm.”

TDL also supports termination constraints between tasks, where one task is terminated automatically when some event occurs with respect to another task. For instance, one can assert “terminate task 2 when monitor 1 completes” or “terminate task 2 thirty seconds after task 1 starts.” To constrain two tasks to run simultaneously, one would assert “task 1 starts when task 2 starts, task 1 terminates when task 2 ends,task 2 starts when task 1 starts, task 2 terminates when task 1 ends.” In this way, whichever task starts first (due to other constraints) enables the other to begin, and whichever task ends first forces the other task to stop, as well.

Because termination can occur at any time, one cannot guarantee that the task will be terminated in a “safe” state. Often, it is important to ensure that the state (of the robot, or of the world) remains consistent. For this, TDL supports “on termination” constraints that enable a “clean-up” task to be run whenever a task is terminated. For instance, one can assert that the “stow-arm” task should be invoked on termination of any arm movement task, so that one can be assured that the arm is always safely stowed if it is not being used.

Execution monitoring and exception handling are two very important roles of the executive. Most executives have explicit support for the concept of monitors, either polling (running at a fixed frequency) or interrupt-driven (triggered by the arrival of some data from sensors or from the behavioral layer). Most executives also support the concept of “failure,” although this is implemented in different ways. In RAPs, for instance, tasks have a return status (succeed or fail) and the parent task may invoke different methods if a child task has failed. In TDL, tasks have no return value (in fact, the parent task has usually completed by the time the children tasks start execution). Instead, failures/exceptions are thrown, in a manner similar to the exception handling constructs of C++, Java, and Lisp. Tasks register handlers for named exceptions. When an exception is thrown, TDL travels up the task tree to find the first handler registered for that exception. If the handler can handle the exception, it does so; otherwise, it re-throws the exception up the tree. One important difference between the way TDL handles exceptions and the way other languages handle them is that, in TDL, the task tree underneath the exception handler is not automatically terminated when the exception is thrown (in the literature, this is referred to as the continuation semantics; most other languages use termination semantics). This gives the TDL exception handler much greater latitude in determining what to do – it may terminate the subtree (emulating termination semantics) and start again, or it may add a task to take care of the exception and have the rest of the tasks continue on normally, or it may terminate just part of the task tree, etc.

Syndicate

Syndicate is an extension of 3T that supports multi-robot coordination and includes specialized implementations of the behavioral and executive languages, as discussed in the preceding sections. In this section, we discuss how Syndicate connects the behavioral and executive layers and what extensions it provides for multi-robot coordination.

The executive layer interacts with the behavioral layer by enabling and disabling skills (also called “blocks” or “behaviors”), setting their parameter values and specifying dynamic connections between the input and output ports of blocks. The executive also specifies where to route status signals that skills may emit (e.g., success, failure, sensor data). The Skill Manager provides an interface that packages up all the communication needed between the behavioral and executive layers, making it easy to connect the two.

Some behaviors are written to detect their own end condition (e.g., “move to location x, y”) while other behaviors operate indefinitely, with another behavior being enabled to detect the end condition of the task (e.g., “insert plug” just runs visual servoing while another behavior looks to see when the insertion has been completed). Although this is not explicitly supported by the Syndicate architecture, we also use the convention that leaf-node tasks (and some internal-node tasks) actually decompose into two separate subtasks – an action task and a monitor task. Whilethis was done initially to support flexible sliding autonomy [Sellner et.al., 2006], we have found the separation to be architecturally satisfying, in that it makes apparent what conditions are being monitored. We intend to maintain this practice in the ACE project.

In Syndicate, each layer is implemented as a separate process, communicating via a message-passing package called IPC [Simmons & Whelan, 1997]. IPC provides a high-level interface for defining messages, registering handlers for messages, and sending complex data structures in a transparent, machine-independent fashion. The main communication paradigm supported by IPC is publish-subscribe: processes register interest in receiving messages of a given type and the sender never has to know who are the recipients. Publish-subscribe is asynchronous and non-blocking, which works out very well for embedded systems, since there is not a problem of accidental deadlock. In theory, the behavioral and executive layers could run on separate processors but, in general, for simplicity we run all processes associated with a given robot on the same processor.

Syndicate extends the 3T architectural framework by enabling direct connections between robots at each layer of the architecture (Figure 3). Essentially, this enables us to distribute the functionality of each layer amongst the robots. For instance, at the behavioral layer, we can set up a distributed visual servo loop, where the perception is done on one robot (the “roving eye”) and the manipulation is done on another robot (the “mobile manipulator”), which receives periodic pose estimates from the roving eye. While the ACE project is (currently) single-robot, we intend to continue having a separate roving-eye agent, both to facilitate code reuse and also to be prepared in case we want to explore using multiple robots in the future. In addition, this division makes it somewhat easier for the perception and manipulation to operate in parallel.

To facilitate distribution at the behavioral layer, our Skill Manager includes support for making connections between input and output ports on different processes. This is transparent to the behavior itself: from a coding perspective, the function that implements the behavior cannot tell whether a port is connected to a behavior on the same, or different, process.[2] The Skill Manager uses IPC to send data between ports on different processes. Most types of C-language data structures are supported, so behaviors can transfer quite complex data between themselves.

At the executive layer, the task tree may be distributed. For instance, some child nodes may reside in the executive of one robot, while sibling nodes may reside on another robot. When tasks are spawned, syntax of TDL is used to indicate which robot will be executing that task. Temporal constraints can be added between tasks on different robots in exactly the same way as for tasks on the same robot. Once again, IPC is used to handle the communication between processes – signaling events and sending data, as appropriate. All this is completely transparent to the user, to whom it all appears as if the robots where operating with one big, centralized task tree. For ACE, this distribution at the executive level is not likely to be critical, given that we have only one robot. However, it will still be used, to some extent, since we are treating the mobile manipulator and the roving eye as two separate agents.

At the planning layer, we have previously used a market-based approach with Syndicate to do task allocation in a distributed fashion [Goldberg et.al., 2003]. In addition, current students are looking at explicit planning and scheduling techniques for distributed, tight coordination of multi-robot systems [Sellner, 2007; Hiatt, 2007].

ACE

While the actual task definitions and constraints will be the subject of a subsequent report, here we discuss our overall strategy for architecting the ACE project.

As mentioned previously, for several reasons we will have two agents: a “roving eye” and “mobile manipulator.” The roving eye[3]is responsible for finding and tracking fiducials and reporting on their relative poses. It is unclear, at this point, how much interaction there will need to be with the base itself (e.g., turning and moving to search for and/or get a better view of the fiducials). We hope to minimize this, perhaps by using a pan-tilt unit to enable the cameras to move somewhat independently of the base.

The overall task structure is a series of insertion tasks – the robot moves to the dispenser, uses visual servoing to grab the cable and plug, move to the task board, visual servo the arm, perform the insertion, and test for successful completion (by pulling back on the plug and measuring the forces). Figure 4shows a typical task tree for this task, which can then be repeated as many times as desired, with different insertion points being specified.

We will begin by assuming that the task board is stationary and that the robot knows roughly where it is with respect to both the dispenser and task board.[4] We will assume that the noise in estimating the pose of the fiducials is within the tolerance for achieving the task (this assumption may be relaxed when we have good force control with the WAM arm). Also, until we transition to using the WAM arm and Barrett hand, we will not actually pick up the piece, but will start with it in the end effector (see companion document). Given this, we may decide not to move to the dispenser to start with, since this will just add time to the task execution without really demonstrating much in the way of complex task execution.