Reliable Design of Systems

(Faculty Involved: Henzinger, Aiken, Necula, Pister, Sastry, Wagner, Yelick)

Concerns over IT system reliability increase with society's reliance on these

systems. From a historical standpoint, this corresponds to the maturing of new technologies when the search for additional features is replaced by the need for High Confidence Systems. While functionality, speed, and

availability dominate the headlines

about new IT systems, the success of SISs will ultimately depend on

reliability, including safety, predictability, fault tolerance, and their

ability to interact with hard real-time constraints. Hand in hand with these demands are the needs to understand the vulnerabilities of SIS to information attack. This is especially critical since we expect that the SIS to be subjected to either malicious or accidental attack, and cannot afford to have the societal scale systems be compromised We believe that there

are three central issues that need to be addressed in order to achieve

acceptable reliability in SISs:

1 Modeling, prediction, and design of complex systems.

The first "weak link" in current design practice stems from the intrinsic

*complexity* of SISs. While "complexity" in science usually refers to the

understanding of complex systems that occur in nature (such as weather

prediction), we submit that a different kind of complexity arises in

systems that are designed by humans, and that if properly understood, this

complexity can be controlled. The complexity of SISs arises from the large

number of distributed but interacting components, the heterogeneous nature

of components (digital computers as well as analog devices), the many

levels of abstraction used in design (such as physical, system/protocol,

and application layers), the many different aspects of system performance

(such as functionality, timing, fault tolerance), and the many, often

unpredictable ways in which the environment (sensors, users, failures,

attackers) can influence the behavior of the system. Since current design

practices are hitting their limits in the design of more homogeneous

complex systems, such as microprocessors and software (see below), they

cannot achieve acceptable reliability in SISs. We propose research in

two key directions towards managing complexity by design:

1A (Sastry, Henzinger) Modeling and analysis of hybrid and embedded systems

1B (Henzinger) Multi-aspect interfaces for component-based design

1C (Sastry, Wagner) Information Assurance for SISs.

2 Software quality.

The second weak link in current design practice concerns the lack of

quality control in the development of large, concurrent software systems.

The "software quality" issue and its central importance to the future of

IT has been highlighted repeatedly by industrial as well as government

bodies, such as the PITAC report. As every SIS has significant software

components that interact with each other as well as with sensors and

actuators in many complex (see above) ways, software quality is an area

of particular vulnerability for SISs. We propose research in two key

directions towards improving software quality:

2A (Aiken) Combining static and dynamic software analysis

2B (Necula) Memory safety

3 Information Assurance for SISs.

While we have all been sensitized to the possible damage that can be wrought by denial of service, viral and other attacks on networks, we feel that SISs need to be event more resistant to attack. For sensor networks this translates into the need for an understanding of the vulnerabilities of the information gathering to appropriation of rogue nodes, attempts at deliberate mis-information and other modalities of attack. Some aspects of the information assurance are covered in item 1 above as well. The field of information assurance to information attack on embedded networks is in its infancy. We propose a program of research on information assurance for SISs:

3A (Wagner) Lightweight Authentication Protocols for Networked Embedded Systems.

3B (Sastry) Information Processing for Possibly Compromised SISs.

1A Modeling and analysis of hybrid and embedded systems

Both target applications of CITRIS involve complex digital systems (computers

and networks) interacting with the physical world through sensors and

actuators. Such a hybrid system combines the characteristics of discrete and

continuous behavior. Theories and tools for modeling and analyzing hybrid

systems have emerged over the past few years in control applications. Yet

this research has focused mostly on developing algorithms (and impossibility

results) for solving simple mixed discrete-continuous control problems. SISs

are hybrid systems of a complexity which is certainly not open to precise

algorithmic analysis. The focus must therefore shift to hybrid modeling

techniques for capturing composition, hierarchy, and heterogeneity in a mixed

discrete-continuous setting, and to approximate and stochastic techniques for

system analysis and simulation.

1B Multi-aspect interfaces for component-based design

Existing formal design methodologies are either optimistic or pessimistic.

The optimistic approach advocates strictly top-down, stepwise refinement

design, with the design team in full control of the complete design. It

does not allow for some parts or aspects of the design to be unknown,

unpredictable, or evolving. The pessimistic approach is strictly bottom-up,

component-based design, where some components may be preexisting, and the

design of each component must consider an environment that behaves in

arbitrary, possibly adversarial way. It does not allow for individual

components to be designed under assumptions about how the environment, that

is, the other components of the system, behave. We submit that neither

the fully optimistic ("all parts work together by design") nor the fully

pessimistic ("each part is on its own") paradigm is realistic for complex,

heterogeneous designs such as SISs. We propose to develop the foundations

of a formal approach to design that permits a synthesis of both paradigms.

The centerpiece of our approach is the development of

component interfaces that are much more

expressive than traditional interfaces used in software or hardware designs.

First, the interfaces we envision not only specify, on some abstract level,

what a component does, but also what the component expects the environment to

do. Such “assume-guarantee interfaces" allow the designer of a component to

adopt an optimistic view about some aspects of the other components, as if

those aspects were under the designer's control, and at the same time adopt a

pessimistic view about other aspects of the other components and the

environment, which may be unknown at design time or unpredictable. Second,

the interfaces we envision not only specify aspects that are traditionally

specified in interfaces, such as the number and types of the arguments of a

procedure, but permit the specification of wide variety of different aspects,

for example, that the call of the specified procedure must always be preceded

by the call of another (for e.g., initialization) procedure. There has been

considerable work on functional interface languages, little on timing and

security, and virtually none on other system aspects such as resource

management, performance, and reliability. This lack of multi-aspect

interface formalisms has forced designers to address timing, security,

performance, and reliability issues at all levels of the implementation in

order to attain the desired properties for the global system.

We propose to develop a theory of composition for multi-aspect interfaces,

which expose resource properties, such as real-time assumptions and

guarantees. We will develop algorithms and tools for checking the

consistency and compatibility of multi-aspect interfaces. The tools will

allow us to evaluate our methodology as application to the experimental

design projects within CITRIS. Multi-aspect interfaces benefit also the

validation, portability, and evolvability of a system. The validation task

can be decomposed into two largely independent phases: the validation that each

component satisfies its interface, and the validation that the overall system

goals are met, given the component interfaces. For the latter phase, we can

use the interfaces to construct a simulation environment that exhibits

worst-case component behavior, and validate the system with respect to such

an environment. When porting the system we can check that the system

requirements are met by the multi-aspect guarantees available in the new

architecture.

Real-time Behavior. Traditional API component specifications are usually

based on an informal, English-based description of the functionality, the

only formal information being input and output type information for function

calls. A typical example is the documentation of the POSIX libraries used

under the various flavors of Unix operating systems, or the documentation

provided for the Java class libraries. This informal approach to

specification has been helped by the fact that several of the most

fundamental programming concepts, such as virtual memory, speed-insensitive

communication (such as pipes), and virtual I/O have been developed in order

to isolate the developers from changes in the underlying architecture.

Hence, the informal specifications do not need to account for quantitative

aspects of the system, such as processor speed or memory size.

This situation is radically different for real-time systems, where the

quantitative differences of the underlying architectures (in terms of speed,

memory, response times) play an essential role in determining the viability

of the whole design, and are not masked by programming abstractions. The

quantitative timing aspects of the behavior of components, however, are

captured only in minimal part by the current informal approach to component

specification. For example, the documentation of the APIs provided by

current real-time operating systems is usually limited to their functional

behavior. The real-time component of their behavior is either specified

generically in separate documents, or it has to be inferred from a detailed

analysis of the architecture. Where timing information is given about a

process, it is often reduced to a single number, the priority. This has

essentially made impossible compositional techniques for real-time design.

In contrast, multi-aspect interfaces will permit the specification of timing

assumptions and guarantees in an abstract, platform-independent way. For

example, it will be possible formalize statements such as ``response A will

occur within x milliseconds, provided that the time to service interrupt

request B is at most y milliseconds.''

An assume-guarantee interface specifies that the component guarantees certain

behavior, under the assumption that the environment behaves correctly. To

support the analysis of systems derived from composing multi-aspect

interfaces, we need to develop techniques for the compositional,

assume-guarantee reasoning about timing and reliability aspects.

Assume-guarantee compositional reasoning is is understood for safety

properties (for example, if A says ``I do not crash if B does not crash'',

and B says ``I do not crash if A does not crash'', then indeed the

composition of A and B ensures that neither crashes), little understood for

real time properties (for example, if A says ``I will shut down by noon if B

shuts down by noon'', and B says ``I will shut down by noon if A shuts down

by noon'', then the composition does not ensure that both shut down), and not

at all studied for other types of properties. Indeed, resource properties

are notoriously non-compositional (a resource may suffice for one component,

but not for two).

Probabilistic Behavior. Multi-aspect interfaces will be able to capture not

only deterministic guarantees, but also probabilistic properties connected to

performance and reliability, such as the probability of meeting deadlines.

In fact, many timeliness properties of real-time systems can be studied only

in a probabilistic setting: a simple example is the message delivery delay

over a non-deterministic protocol such as Ethernet. We will develop

assume-guarantee techniques for reasoning compositionally about probabilistic

real-time interfaces. In addition, we will develop algorithms for checking

the compatibility of such interfaces. The focus on the interface level,

without regard to implementation details, will ensure compositionality and a

degree of abstraction that makes the application of formal tools feasible.

1C Information Assurance for SISs.

Research proposed here is in two areas:

Lightweight Authentication Protocols for Networked Embedded Systems and Information Processing for Possibly Compromised SISs.

Issues in lightweight authentication protocols for networked embedded systems to be addressed are new techniques for sender authentication, scalable key distribution techniques for challenge/authentication protocols for sensor nodes which are scattered in the environment to keep track of whether they have been compromised which leverage secure multicast work.

In addition, we will work on timely propagation of revocation information such as compromised keys, expired certificates and rekeying in the event of compromise of the nodes.

In terms of information processing for possibly compromised SISs we will study the use of traditional fault tolerant networking techniques such as replication and partitioning of network services, redundancy of network resources and survivable network overlays (including replica creation and management). In the area of self healing of networks, we will bring to bear techniques from statistical signal processing in two different ways: when a rogue node starts broadcasting information which is widely deviant from hypothesis that are determined to be true either on the basis of past data or from a number of other (possibly uncorrupted) nodes we will use decision networks to treat the rogue node information as an outlier and thereby discard it and at the same time propagate the hypothesis up the hierarchy of the SIS. In the event of a large scale attack on a group of nodes, we will explore the use of game theoretic methods (against an intelligent adversary) to isolate the group of nodes that are under attack and propagate alarms up the SIS hierarchy about the need for intervention.

We emphasize here that the area of information assurance for networked SIS is a large and complex body of new computer science research, and we will be guided in its development by the design and testing (chiefly in regards to scaling) of safety policies during the construction of the SISs.

2A Combining static and dynamic software analysis