Reliable Design of Systems
(Faculty Involved: Henzinger, Aiken, Necula, Pister, Sastry, Wagner, Yelick)
Concerns over IT system reliability increase with society's reliance on these
systems. From a historical standpoint, this corresponds to the maturing of new technologies when the search for additional features is replaced by the need for High Confidence Systems. While functionality, speed, and
availability dominate the headlines
about new IT systems, the success of SISs will ultimately depend on
reliability, including safety, predictability, fault tolerance, and their
ability to interact with hard real-time constraints. Hand in hand with these demands are the needs to understand the vulnerabilities of SIS to information attack. This is especially critical since we expect that the SIS to be subjected to either malicious or accidental attack, and cannot afford to have the societal scale systems be compromised We believe that there
are three central issues that need to be addressed in order to achieve
acceptable reliability in SISs:
1 Modeling, prediction, and design of complex systems.
The first "weak link" in current design practice stems from the intrinsic
*complexity* of SISs. While "complexity" in science usually refers to the
understanding of complex systems that occur in nature (such as weather
prediction), we submit that a different kind of complexity arises in
systems that are designed by humans, and that if properly understood, this
complexity can be controlled. The complexity of SISs arises from the large
number of distributed but interacting components, the heterogeneous nature
of components (digital computers as well as analog devices), the many
levels of abstraction used in design (such as physical, system/protocol,
and application layers), the many different aspects of system performance
(such as functionality, timing, fault tolerance), and the many, often
unpredictable ways in which the environment (sensors, users, failures,
attackers) can influence the behavior of the system. Since current design
practices are hitting their limits in the design of more homogeneous
complex systems, such as microprocessors and software (see below), they
cannot achieve acceptable reliability in SISs. We propose research in
two key directions towards managing complexity by design:
1A (Sastry, Henzinger) Modeling and analysis of hybrid and embedded systems
1B (Henzinger) Multi-aspect interfaces for component-based design
1C (Sastry, Wagner) Information Assurance for SISs.
2 Software quality.
The second weak link in current design practice concerns the lack of
quality control in the development of large, concurrent software systems.
The "software quality" issue and its central importance to the future of
IT has been highlighted repeatedly by industrial as well as government
bodies, such as the PITAC report. As every SIS has significant software
components that interact with each other as well as with sensors and
actuators in many complex (see above) ways, software quality is an area
of particular vulnerability for SISs. We propose research in two key
directions towards improving software quality:
2A (Aiken) Combining static and dynamic software analysis
2B (Necula) Memory safety
3 Information Assurance for SISs.
While we have all been sensitized to the possible damage that can be wrought by denial of service, viral and other attacks on networks, we feel that SISs need to be event more resistant to attack. For sensor networks this translates into the need for an understanding of the vulnerabilities of the information gathering to appropriation of rogue nodes, attempts at deliberate mis-information and other modalities of attack. Some aspects of the information assurance are covered in item 1 above as well. The field of information assurance to information attack on embedded networks is in its infancy. We propose a program of research on information assurance for SISs:
3A (Wagner) Lightweight Authentication Protocols for Networked Embedded Systems.
3B (Sastry) Information Processing for Possibly Compromised SISs.
1A Modeling and analysis of hybrid and embedded systems
Both target applications of CITRIS involve complex digital systems (computers
and networks) interacting with the physical world through sensors and
actuators. Such a hybrid system combines the characteristics of discrete and
continuous behavior. Theories and tools for modeling and analyzing hybrid
systems have emerged over the past few years in control applications. Yet
this research has focused mostly on developing algorithms (and impossibility
results) for solving simple mixed discrete-continuous control problems. SISs
are hybrid systems of a complexity which is certainly not open to precise
algorithmic analysis. The focus must therefore shift to hybrid modeling
techniques for capturing composition, hierarchy, and heterogeneity in a mixed
discrete-continuous setting, and to approximate and stochastic techniques for
system analysis and simulation.
<MORE DETAILS TO COME>
1B Multi-aspect interfaces for component-based design
Existing formal design methodologies are either optimistic or pessimistic.
The optimistic approach advocates strictly top-down, stepwise refinement
design, with the design team in full control of the complete design. It
does not allow for some parts or aspects of the design to be unknown,
unpredictable, or evolving. The pessimistic approach is strictly bottom-up,
component-based design, where some components may be preexisting, and the
design of each component must consider an environment that behaves in
arbitrary, possibly adversarial way. It does not allow for individual
components to be designed under assumptions about how the environment, that
is, the other components of the system, behave. We submit that neither
the fully optimistic ("all parts work together by design") nor the fully
pessimistic ("each part is on its own") paradigm is realistic for complex,
heterogeneous designs such as SISs. We propose to develop the foundations
of a formal approach to design that permits a synthesis of both paradigms.
The centerpiece of our approach is the development of
component interfaces that are much more
expressive than traditional interfaces used in software or hardware designs.
First, the interfaces we envision not only specify, on some abstract level,
what a component does, but also what the component expects the environment to
do. Such “assume-guarantee interfaces" allow the designer of a component to
adopt an optimistic view about some aspects of the other components, as if
those aspects were under the designer's control, and at the same time adopt a
pessimistic view about other aspects of the other components and the
environment, which may be unknown at design time or unpredictable. Second,
the interfaces we envision not only specify aspects that are traditionally
specified in interfaces, such as the number and types of the arguments of a
procedure, but permit the specification of wide variety of different aspects,
for example, that the call of the specified procedure must always be preceded
by the call of another (for e.g., initialization) procedure. There has been
considerable work on functional interface languages, little on timing and
security, and virtually none on other system aspects such as resource
management, performance, and reliability. This lack of multi-aspect
interface formalisms has forced designers to address timing, security,
performance, and reliability issues at all levels of the implementation in
order to attain the desired properties for the global system.
We propose to develop a theory of composition for multi-aspect interfaces,
which expose resource properties, such as real-time assumptions and
guarantees. We will develop algorithms and tools for checking the
consistency and compatibility of multi-aspect interfaces. The tools will
allow us to evaluate our methodology as application to the experimental
design projects within CITRIS. Multi-aspect interfaces benefit also the
validation, portability, and evolvability of a system. The validation task
can be decomposed into two largely independent phases: the validation that each
component satisfies its interface, and the validation that the overall system
goals are met, given the component interfaces. For the latter phase, we can
use the interfaces to construct a simulation environment that exhibits
worst-case component behavior, and validate the system with respect to such
an environment. When porting the system we can check that the system
requirements are met by the multi-aspect guarantees available in the new
architecture.
Real-time Behavior. Traditional API component specifications are usually
based on an informal, English-based description of the functionality, the
only formal information being input and output type information for function
calls. A typical example is the documentation of the POSIX libraries used
under the various flavors of Unix operating systems, or the documentation
provided for the Java class libraries. This informal approach to
specification has been helped by the fact that several of the most
fundamental programming concepts, such as virtual memory, speed-insensitive
communication (such as pipes), and virtual I/O have been developed in order
to isolate the developers from changes in the underlying architecture.
Hence, the informal specifications do not need to account for quantitative
aspects of the system, such as processor speed or memory size.
This situation is radically different for real-time systems, where the
quantitative differences of the underlying architectures (in terms of speed,
memory, response times) play an essential role in determining the viability
of the whole design, and are not masked by programming abstractions. The
quantitative timing aspects of the behavior of components, however, are
captured only in minimal part by the current informal approach to component
specification. For example, the documentation of the APIs provided by
current real-time operating systems is usually limited to their functional
behavior. The real-time component of their behavior is either specified
generically in separate documents, or it has to be inferred from a detailed
analysis of the architecture. Where timing information is given about a
process, it is often reduced to a single number, the priority. This has
essentially made impossible compositional techniques for real-time design.
In contrast, multi-aspect interfaces will permit the specification of timing
assumptions and guarantees in an abstract, platform-independent way. For
example, it will be possible formalize statements such as ``response A will
occur within x milliseconds, provided that the time to service interrupt
request B is at most y milliseconds.''
An assume-guarantee interface specifies that the component guarantees certain
behavior, under the assumption that the environment behaves correctly. To
support the analysis of systems derived from composing multi-aspect
interfaces, we need to develop techniques for the compositional,
assume-guarantee reasoning about timing and reliability aspects.
Assume-guarantee compositional reasoning is is understood for safety
properties (for example, if A says ``I do not crash if B does not crash'',
and B says ``I do not crash if A does not crash'', then indeed the
composition of A and B ensures that neither crashes), little understood for
real time properties (for example, if A says ``I will shut down by noon if B
shuts down by noon'', and B says ``I will shut down by noon if A shuts down
by noon'', then the composition does not ensure that both shut down), and not
at all studied for other types of properties. Indeed, resource properties
are notoriously non-compositional (a resource may suffice for one component,
but not for two).
Probabilistic Behavior. Multi-aspect interfaces will be able to capture not
only deterministic guarantees, but also probabilistic properties connected to
performance and reliability, such as the probability of meeting deadlines.
In fact, many timeliness properties of real-time systems can be studied only
in a probabilistic setting: a simple example is the message delivery delay
over a non-deterministic protocol such as Ethernet. We will develop
assume-guarantee techniques for reasoning compositionally about probabilistic
real-time interfaces. In addition, we will develop algorithms for checking
the compatibility of such interfaces. The focus on the interface level,
without regard to implementation details, will ensure compositionality and a
degree of abstraction that makes the application of formal tools feasible.
1C Information Assurance for SISs.
While we have all been sensitized to the possible damage that can be wrought by denial of service, viral and other attacks on networks, we feel that SISs need to be event more resistant to attack. For sensor networks this translates into the need for an understanding of the vulnerabilities of the information gathering to appropriation of rogue nodes, attempts at deliberate mis-information and other modalities of attack. Some aspects of the information assurance are covered in item 1 above as well. The field of information assurance to information attack on embedded networks is in its infancy. We propose a program of research on information assurance for SISs:
Research proposed here is in two areas:
Lightweight Authentication Protocols for Networked Embedded Systems and Information Processing for Possibly Compromised SISs.
Issues in lightweight authentication protocols for networked embedded systems to be addressed are new techniques for sender authentication, scalable key distribution techniques for challenge/authentication protocols for sensor nodes which are scattered in the environment to keep track of whether they have been compromised which leverage secure multicast work.
In addition, we will work on timely propagation of revocation information such as compromised keys, expired certificates and rekeying in the event of compromise of the nodes.
In terms of information processing for possibly compromised SISs we will study the use of traditional fault tolerant networking techniques such as replication and partitioning of network services, redundancy of network resources and survivable network overlays (including replica creation and management). In the area of self healing of networks, we will bring to bear techniques from statistical signal processing in two different ways: when a rogue node starts broadcasting information which is widely deviant from hypothesis that are determined to be true either on the basis of past data or from a number of other (possibly uncorrupted) nodes we will use decision networks to treat the rogue node information as an outlier and thereby discard it and at the same time propagate the hypothesis up the hierarchy of the SIS. In the event of a large scale attack on a group of nodes, we will explore the use of game theoretic methods (against an intelligent adversary) to isolate the group of nodes that are under attack and propagate alarms up the SIS hierarchy about the need for intervention.
We emphasize here that the area of information assurance for networked SIS is a large and complex body of new computer science research, and we will be guided in its development by the design and testing (chiefly in regards to scaling) of safety policies during the construction of the SISs.
2A Combining static and dynamic software analysis