An Architecture for Reconfigurable Computing in Space
Robert F. Hodson1, Kevin Somervill1, John Williams2, Neil Bergman2, Robert Jones III3
1NASA Langley Research Center
2The University of Queensland
3ASRC Aerospace Corp.
Introduction
The availability of reconfigurable radiation tolerant Field Programmable Gate Arrays (FPGAs) for space applications creates an opportunity to explore and potentially exploit new processing paradigms for increased computing performance. Studies have shown that custom FPGA-based implementations or soft core processors coupled with custom co-processors can greatly improve the performance of some applications. Imaging applications have been shown to achieve speedups of eight to 800 over an 800 MHz Pentium III processor [1]. Other embedded benchmarking programs have shown an average performance speedup of 5.8 when soft core processors are combined with custom co-processors. Reduced power consumption (on average 57%) has also been demonstrated [2].
Space-qualified device availability and promising initial performance studies suggest that development of a space-qualified reconfigurable computing platform is a viable computing solution for certain space applications. The Reconfigurable Scalable Computing (RSC) project, funded through NASA’s Exploration Systems Mission Directorate (ESMD), is an on-going effort to develop a reconfigurable computing platform to support the processing requirements of future NASA missions. The high-level system architecture is presented within along with an overview of the challenges of space-based reconfigurable computing.
Architecture Goals and Objectives
Strategic technical challenges were develop by ESMD to guide the development of sustainable and affordable solutions for future exploration missions to the Moon and Mars. The RSC architecture is directly traceable to the strategic technical challenges of reconfigurability and modularity. Reconfigurability led to the selection of SRAM-based FPGAs as a fundamental computing resource to enable adaptation to new or unanticipated circumstances. Modularity led to design decisions related to physical form-factor and levels of granularity in the RSC architecture.
Applications are often implemented on traditional general-purpose processors. The desire to leverage the traditional (sequential) general-purpose computing approach led to a decision to support soft core processor(s) in FPGAs. Additionally custom co-processor support for specialized processing was also desirable due the performance improvements this approach has been shown to yield. A computing platform that could be configured (or reconfigured) to support the appropriate blend of general and special-purpose processing needed to meet application performance requirements is one of the architecture’s objectives.
In developing a computing platform for a varied application space it is desirable to provide system flexibility. A modular approach with “reasonable” grain size per module allows for a scalable solution that can be tailored to meet the processing requirements for a given application. The ability to scale a processing solution requires an inter-module communications capability to facilitate data and control flow between processing elements. Combining the elements discussed, an architecture as depicted in Figure 1 begins to emerge.
Figure 1. High-level architecture of the RSC platform.
General-purpose Computing Model
One approach to reconfigurable computing is to use traditional general-purpose processors (ASICs or hard-cores) in conjunction with specialized computing co-processors implemented in FPGAs. Although the RSC’s architecture could support this approach through its standard bus interface, this is not the approach used by the RSC. The RSC implements soft processor(s) for general-purpose computing. The soft processor core supported is the 32-bit MicroBlaze RISC processor. A triple modular redundant (TMR) version of this processor is currently under study by the Xilinx Single Event Effect (SEE) consortium and is being tested for radiation effects [3]. Because this processor is optimized for Xilinx FPGAs, it is possible to fit two TMR-MicroBlaze processors in a single Xilinx Virtex-4 FX60 device and still have resources available for additional special-purpose processing.
The MicroBlaze architecture can be customized to meet an application’s processing requirements. For example a cache, barrel shifter, multiplier, divider, and floating point unit are all customizable options. The RSC architecture uses the MicroBlaze with custom instruction and data caches that will support its Harvard architecture and provide the necessary support for single event upset (SEU) mitigation. Another advantage of the soft core approach is the ability change SEU mitigation techniques within the FPGA to meet mission requirements. For example, an upset in an image may not require mitigation but and upset in a control processor could be catastrophic. Different mitigation strategies can be applied in each case by programming the FPGA appropriately.
Special-purpose Computing Model
Much of the performance gain of FPGA-based computing comes from the ability to design special-purpose cores that are optimized to perform computationally expensive tasks. The RSC architecture inherently supports this approach with FPGA resources (logic, memory, DSP slices, etc.) that can be programmed by developers. These cores must communicate with memory, other cores, I/O devices, and soft processors. The RSC architecture supports several forms of communications for special-purpose computing. High-speed (2.5 Gbps) serial I/O is provided for external communications to/from the RSC’s Reconfigurable Processing Module (RPM). Communications with the MicroBlaze processor or I/O devices can be accommodated via the On-Chip Peripheral Bus (OPB). Additionally the MicroBlaze processor supports Fast Simplex Links (FSL) which provides an instruction-pipeline interface via point-to-point communication links with FIFO queuing. Also, cores can directly access internal block RAM within the FPGA or access external memory via direct memory access (DMA). Figure 2 shows how custom logic may be combined with a MicroBlaze (uB) soft core in the RSC’s RPM.
Figure 2. Block Diagram of the RSC’s Reconfigurable Processing Module (RPM).
Network Model
The MicroBlaze processor does not have the shared interrupt and cache coherence hardware support needed for Symmetric Multiprocessing (SMP). These limitations make a loosely-coupled Massively Parallel Processing (MPP) approach attractive. In this computational model each processor supports its own unique memory space and communication between processors is performed through Network Interface Controllers (NICs) that are used to send and receive messages between processing nodes.
The RSC has adopted the MPP approach, but has a multi-level network. There are potentially three classes of communications in a RSC system. These relate to the physical implementation of the RSC system. The RSC is designed with modules that can be stacked together to form a PCI bus (compatible with the PCI-104 standard). Stacks can also be interconnected via a Network Module (NM) to scale to larger systems. Therefore it is possible for communications (1) between modules, via the PCI interface, (2) between stacks, via the NM, and (3) between processors within the same FPGA. The RSC networking architecture provides for all three classes of communication seamlessly and abstracts communication details from the user through traditional software abstraction provided by the operating system. Figure 3 shows the network model protocol layers for inter-module communications.
Figure 3. Network Protocol Layers.
The details of a message transfer between two processors on different modules are shown in Figure 4. The message is created by the application layer, converted to packets by the operating system and sent via NIC-to-NIC communications across the physical media (in this example, the PCI bus).
Figure 4. Inter-module message communication flow.
Software Model
To facilitate application development, the RSC architecture supports software layers to abstract hardware details and provide a rich set of development tools. The uClinux operating system is run on the MicroBlaze processor providing typical OS functionality including process management, file management, and device/network abstraction. uClinux is a derivative of the popular Linux operating system for processors that lack memory management units. The GCC toolchain can be used with uClinux to provide a development environment that includes compilers, assemblers, linkers, debuggers, and other tools. These software elements provide a traditional software programming paradigm for a soft processor in a reconfigurable system.
Additionally, to support the MPP structure of the RSC architecture, the Message Passing Interface (MPI) will be implemented to provide high-level primitives for inter-process communications. MPI functionality includes capabilities to send/receive/broadcast messages as well as synchronize processes. A subset of the common MPICH implementation of MPI which has approximately 125 library calls will be ported to the RSC platform. MPICH can be implemented on top of TCP/IP protocol but can also be optimized to take advantage of underlying hardware support and bypass the operating system for improved performance. This provides a mechanism to optimize the RSC message passing architecture by eliminating unnecessary layers in the TCP/IP protocol when they are not applicable or inefficient in the uClinux implementation of the protocol stack.
A development environment is also needed for custom core development. Traditional hardware description languages like VHDL and Verilog can be used, but also newer higher level tools such as StarBridge System’s Viva graphical environment or Celoxica’s DK Design Suite can be used. Viva support in particular is being developed for the RSC platform. Plans for a Viva system description of the RPM and interfacing primitives are underway. The combination of a common operating system, toolchain, message passing library, and HDL tools provides a rich environment for productive application development.
Fault Tolerance
Radiation-induced errors, typically from particles trapped in radiation belts or from cosmic rays, are a common source of faults in space avionics. A variety of techniques are use to mitigate against single event upsets and improve fault tolerance in space electronics. The RSC platform uses several approaches. The reconfigurable logic in the Xilinx FPGAs can have three types of errors that require mitigation: logic errors, memory errors, and configuration errors. Errors in logic (transients or bit flips) are eliminated through TMR. Three identical circuits vote to eliminate the affected cell and the error is corrected. This is done using the Xilinx XTMR tool for logic triplication. One drawback of this approach is the need to triplicate inputs and outputs to the device. The I/Os are tied together on the circuit board effectively reducing the device’s I/O capacity by two-thirds.
A TMR design alone is not enough to ensure a fault free circuit. The FPGA’s configuration memory must also be scubbed (continually rewritten) to correct any errors due to SEUs. The scrubbing logic is external to the Xilinx FPGA in a rad-tolerant Actel FPGA. The Actel device uses antifuse technology along with TMR, so no scrubbing of configuration memory is needed. Memory errors can also be eliminated with TMR but error correction codes, like Hamming codes, are also used to detect and correct errors. Unlike configuration memory the contents of a generic RAM cells are not known a priori and therefore the contents of each cell must be readout, checked for errors, corrected if necessary, and written back. A special case of memory error is a cache error. Because a cache line is always duplicated in main memory (for a write-through cache), a detect-and-invalidate method can be utilized since the memory element can be re-fetched from main memory. The cache should still be periodically scrubbed to prevent multiple bit errors from occurring which may be undetectable. Cache tags must also be checked for errors.
Memories external to the reconfigurable logic must also have Error Detection And Correction (EDAC). For the RSC platform, these memories include non-volatile memory and SDRAM. Additional logic to support this is implemented in the Actel device. Memory buffers in the Actel require EDAC and scrubbing where data can become stale and accumulate multi-bit errors.
Physical Model
As mentioned previously, the RSC uses a stackable form factor, see Figure 5. The RSC project is extending the PCI-104 standard for space applications. The project has been calling this standard SPACE-104. SPACE-104 is ruggedized, shielded, and conduction cooled to support launch and space environments. A stackable 33MHz, 32 bit PCI bus is implemented and is backwards compatible with the PCI-104 standard. The form factor is larger than PCI-104 to support the larger footprints of space-grade FPGAs but still provides for connection of PCI-104 cards for ground support and testing. The larger side of its rectangular form factor also provides a large contact area for to remove heat from the stack. A second high-density connector is defined in the SPACE-104 standard to provide additional inter-module I/O and future expansion to a 64-bit PCI implementation.
Figure 5. Two interconnected RSC stacks.
The fundamental modules that make up an RSC stack are the Reconfigurable Processing Module, which was previously discussed, the Network Module that is used to bridge between multiple stacks, the Command and Control Module which is the command interface for the system, and the Power Module that performs power conversion and distribution.
This stackable approach has proven effective for embedded terrestrial systems and its modularity allows for the system to be customized to meet processing requirements. The stackable approach also has the added advantages of not requiring separate backplane and enclosure designs.
Challenges
In the development of any new computing system there are many challenges to overcome. Developing a computing system for space complicates the design even more. Managing this complexity is a primary challenge in developing the RSC platform. The RSC project is taking an incremental approach to its design, first implementing basic functionality and then adding performance enhancements. The use of reconfigurable logic aligns with this approach. As an example a simple direct mapped cache can be developed first, followed by a cache with error detection and scrubbing, later more advanced techniques, like lazy write or selective prefetch can be implemented. There are similar incremental steps for the development of other subsystems.
In addition to design complexity there are technical challenges. The availability of dense rad-hard non-volatile memory is a problem. Space-grade devices of sufficient density are essentially non-existent. Some companies have screened their own FLASH memory devices and other efforts like the development of Chalcogenide RAM are underway. Fast rad-hard SDRAM is also problematic. The performance of many applications is bounded by memory bandwidth and the availability of high-speed SDRAM is limited.
FPGAs power distribution becomes an important concern. High power with low core voltages makes a Switching Point of Load (SPOL) power system desirable to eliminate losses. Linear regulators can be used but waste power and generate excessive heat that must be managed by the thermal system. A better solution is a SPOL converter, but again they do not appear to be available as space grade parts.