Investigation into Effective SEFI Mitigation for On-Board Data Handling Architectures
Shazia Maqbool and Craig Underwood
Surrey Space Centre, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Abstract:
Radiation-hard integrated-circuit (IC) technology typically lags behind commercial-off-the-shelf (COTS) technology by about two generations and hence, there is pressure to implement space-systems using COTS technologies.
COTS technology has constantly been changing. Devices are becoming smarter in terms of their performance and transparent operation to the user – hence, adding control features internal to the device. In the past, it was the control related devices only, e.g. microprocessors, which were susceptible to single event functional interrupts (SEFIs). However, constantly increasing complexity of devices has extended these functional interrupts to other data handling devices, e.g. memories and field programmable gate arrays (FPGAs). SEFI signatures can vary depending on the device under consideration. In general, an unexpectedly high error rate, device-“hangs” or changes in device current consumption are likely to be seen. Traditional mitigation techniques are either unable to detect such errors or require large amount of redundancy.
We describe a fault-tolerant architecture designed to enhance commercial-off-the-shelf (COTS) device-based space-system reliability, and to provide automated system recovery, in the presence of SEFIs. Our architecture is based on the concept of a fast data network interlinking all units of the data handling subsystem to an intelligent supervisor node. The supervisor monitors status messages from the units and intervenes when the state of a unit does not match expectations or messages stop arriving. In such an event, the supervisor attempts to identify the nature of the fault and to recover the unit accordingly.
We present an example demonstration of the proposed principles. The test bed is shown in Fig 1. PC1 simulates the OBC subsystem and it sends packet to the Celoxica RC203 development board (acting as the interface FPGA) every 100ms on the parallel port. The RC203 board is connected to PC2 &3 on Ethernet. PC2 acts as the supervisor, whilst PC3 has been added as another node in the system to simulate a multi-node environment (both for the supervisor and the network).
Purpose of this test bed is to estimate computational resources required by the supervisor and size of the interface FPGA, and therefore to put forward recommendations for their implementation. In addition we want to calculate latency in detection and recovery. Multi-node environment is required to demonstrate that the supervisor can be made immune to problems such as deadlocks.
Corresponding/Presenting Author
Ms. Shazia Maqbool, Ph.D Student, Surrey Space Centre, BA Building, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Tel: +44 (0) 1483 683411, email