Circuit Modeling and Fault Injection Approach to Predict SEU Rate and MTTF in Complex Circuits[1]

Fabian Vargas, Alexandre Amory

Catholic University – PUCRS

Electrical Engineering Dept.

Av. Ipiranga, 6681. 90619-900 Porto Alegre, Brazil

Abstract

This work presents a novel approach to predict the SEU rate and the mean time to failure (MTTF) for complex circuits. When compared to traditional in-flux methods, the approach described herein does not require laboratory experiments to characterize microelectronic devices for operation in radiation environments. Therefore, due to the simplicity to be performed the proposed approach presents an intrinsic low-cost. Also, because it is a fully analytical approach based on a bundle of computer programs, researchers and development engineers need only a workstation to compute the failure (SEU) rate and estimate MTTF. We also present a computation example to illustrate the proposed approach. This methodology is being automated through the development of a CAD tool that performs circuit modeling, fault injection and simulation data analysis.

Keywords: Failure Rate; MTTF Estimation; VHDL Language; Transient-Fault Injection; Fault Simulation; SEU; Reliable Complex Circuits.

1. Preliminary Considerations

By the year of 1975, the existence of Single-Event Upsets (SEUs) had just been discovered [1], even though predictions of such a phenomena were made in 1962 [2]. Since 1975, the investigation of single particle phenomena has progressed rapidly [3-5]. Extensive theoretical work has been performed to explain failure mechanisms, and sophisticated test techniques and procedures have been developed to extrapolate the laboratory data failure rates in realistic or worst case radiation environments like space, nuclear power plants, or commercial flights operating in high altitudes (33,000 feet) [6-10].

Most single particle phenomena in electronic devices can be characterized by a critical charge related to the circuit design, and a cross section related to the geometry of the sensitive area to cause upset, latchup, or burnout. The critical charge Qc of a memory cell is defined as the greatest charge that can be deposited in its sensitive node area before the cell be corrupted, that is, its logic state is inverted [8]. The cross section of a device is just the total number of errors divided by the circuit incident particle fluency. This parameter is given in cm2 and represents, in other words, the memory cell sensitive node area. The sensitive node area of a memory element is represented by the interface between the reverse-biased n+ (resp. p+) drain depletion region and the p-substrate (resp. n-well), for the case of an n-well (reps. p-well) technology. This drain depletion region belongs to off-transistors of memory cells [4,9].

SEU Characterization process of microelectronic devices has been directed toward measuring a cross section versus LET (Linear Energy Transfer) by means of laboratory experiments. LET represents the amount of energy that is transferred to silicon when an incident particle strikes the circuit surface and looses energy through the ionization of the substrate. Therefore, the threshold or critical LET (LETc) of a circuit represents the amount of energy deposited equal to the critical charge Qc for a memory cell. In other words, any incident particle that deposits in a sensitive node an energy higher than the one represented by the LETc provokes an upset on that node. Fig. 1 illustrates results for memory upsets in typical microcircuits [7,23]. Curve A shows data for a simple RAM which has only one type of upset mechanism. In this case, there is a single value of threshold Linear Energy Transfer (LET) and device error cross section. Curve B illustrates the results for a microprocessor where there are several types of bistable circuits, each with its own threshold LET and sensitive area. In this case, the data will appear as staircase with each step representing the addition of a new failure mode. Characterization of such a type of device is very complex and requires several sets of tests at many values of LET. The worst case parameters for such a device are characterized by a limiting cross section, tL, and a minimum threshold or critical LET (i.e., LETc), as shown in fig. 1. The LETc of a device, given in “MeV.cm2/mg”, determines the portion of a particles energy spectrum to which the device is sensitive. The integrated flux value over this range times the cross section area of the circuit’s sensitive volume yields the expected number of state changes in the circuit, or SEU rate, usually reported in “upset/bit.day”.

Fig. 1. Representative data for device error cross section versus particle LET for two types of microcircuits [7,23].

Laboratory experiments are typically performed using the in-flux test method [23]. The radiation source is usually a high-energy particle accelerator, such as a cyclotron, which can be operated with a variety of ion species. In practice, only one type of ion specie is used during the laboratory experiment due to the associated high-cost necessary to change ion sources into the accelerator. The irradiations are performed with the device under test in an evacuated chamber, with the device package lid removed. The test socket is mounted on a platform which can be rotated so that the angle of incidence between the ion beam and chip surface can be changed. The circuit is electrically exercised by a tester connected to the test socket through a set of cables and special connectors to the vacuum chamber. Despite the fact this experimental approach provides very accurate SEU rate predictions, it presents important drawbacks. The most important of them is the high cost associated, since just two or three cyclotron hours may result in some thousands of tenths of dollars. In addition, the use of this type of equipment requires the development of specific hardware (and software) interfaces, which takes money and time during the design process. Finally, the parameter “time-to-market” is drastically affected because the requirement of using this type of equipment implies the development of rigorous test sets, which take long procedures to be validated before the device characterization step itself takes place. For detailed information about the SEU test equipment and related proceedings, the reader should address references [9,23]. Fig. 2 compares the proposed approach with the in-flux test method, commonly used so far.

In [11,12] the authors present a tool (FT-PRO) that manipulates automatically a VHDL description by appending fault tolerant functions [13-16]. These functions are based on information redundancy by means of two types of coding techniques: a) Parity code (one bit per memory element) and b) Hamming code plus one parity bit, to perform single error correction/double error detection per memory element [17]. At present, FT-PRO is being modified to be incorporated into the design flow shown in fig. 2b (more precisely, in the “Circuit Design” step), where configurable fault-tolerant functions have been appended in the “VHDL Design Libraries” [11].

Therefore, the determination of the SEU rate for complex circuits, as microprocessors for instance, is a very complex, time and money consuming step at the end of the design process. By attempting to minimize this problem, the present work proposes a novel approach to predict the error (SEU) rate and the mean time to failure (MTTF) for such a type of circuits. This is an analytical approach. When compared to traditional in-flux methods, the approach described herein does not require laboratory experiments to characterize microelectronic devices for operation in radiation environments. Therefore, due to its execution simplicity the proposed approach presents an intrinsic low-cost. Also, because it is an analytical approach based on a bundle of computer programs, researchers and development engineers need only a workstation to obtain the failure rate and estimate MTTF at their own work site.

It is also important to mention that several expressive works with different degrees of success have also been proposed in the literature to perform fault modeling, fault injection [18,19], and to automate the fault simulation process as well [20-22]. The most important difference between the proposed approach and those found in the literature is the fact the work proposed herein is the first to present not only fault injection mechanisms adapted to the case of circuits modeled in VHDL language, but it also considers a fault modeling strategy that really represents real radiation-induced transient faults (i.e., SEUs) in memory elements of complex circuits. Additionally, the proposed work is being automated through the coupling with the FT-PRO Tool methodology.

Fig. 2. Comparison between the design flows of devices for operation in radiation environments: (a) the traditional in-flux method [11,12] and (b) the proposed approach.

2. Circuit Modeling and Fault Injection Approach

It is of common agreement the widespread use of high-level description languages to describe hardware parts as software programs. Consequently, a transient fault that affects the hardware operation can be considered as a fault affecting the software execution. In other words, a bit-flip fault affecting the hardware operation (e.g., an SEU-induced fault in a memory element) can have an equivalent representation at the software implementation level. In this section, we present the circuit modeling & fault-injection techniques we have developed to produce transient faults in memory elements during VHDL fault-simulation. The fault model assumed is not restricted to single faults, thus any combination of faults can occur in a memory element or group of memory elements of the circuit.

The circuit modeling & fault injection strategy deals to prepare the VHDL code to run in a fault simulation process. As can be seen in fig. 2, the starting point is a synthesizable VHDL description of the circuit whose reliability with respect to transient faults in memory elements is to be estimated.

As the first step, the circuit modeling & fault injection instantiates an “Error Management Unit - EMU” inside the architecture of the circuit VHDL main code. The goal of this unit is to control the whole fault injection process during fault simulation. To do so, this unit:

a) reads data from an external file: randtime.txt (which was generated by the Srand Function, to be detailed later), in order to obtain the time instants to inject faults in the circuit;

b) reads data from the external file: randtime.txt to get the initial seeds for the LFSR processes that generate the register address and the bit position where faults will be injected. These LFSR processes are instantiated as a Component into the architecture of the Error Management Unit, and will be detailed later in this section. The LFSR processes are completely controlled by the EMU.

c) generates a simulation report file: result.txt, which contains information about the total number of faults injected, the list of memory elements and bit positions affected by faults, and the number of faults injected in each one of these elements.

The EMU has been implemented as a separate piece of code and then, instantiated as a Component inside the architecture of the circuit VHDL main code [13,14], as can be seen in fig. 3. Of course, it must be kept in mind that these modifications performed along with the circuit VHDL code have the unique purpose of allowing fault simulation. Thus before synthesis, they are completely eliminated from the main VHDL code.

The main structures of the skeleton-based VHDL code can be detailed as follows:

- Lines 4 - 6 and 8 - 23 describe the Entity and the Architecture of the circuit_example , that will be modeled to run in a fault simulation process.

- Lines 28 - 30 and 32 - 55 describe the Entity and the Architecture of the Error Management Unit, that will control the fault simulation process and provide the user with a final simulation data report.

- In lines 11 - 13, the Component Error Management Unit is declared and then instantiated inside the architecture of the circuit_example (lines 19 and 20).

- In lines 35 - 37, the Component LFSR is declared and then, instantiated inside the architecture of the Error Management Unit as an address selector of the memory element that will be upset (lines 47 and 48). The same LFSR is instantiated again in lines 51 and 52 as a selector of the bit position inside the memory element that will be inverted.

- In lines 40 and 41, the files randtime.txt and result.txt are generated, on the computer hard disc. The first file is a read-only one, and contains a list of time instants generated by the Srand Function, at the operating system level. The EMU uses this list to control the time instants when faults will be injected during the fault simulation process. During this process, the EMU writes data into the result.txt file. These data concern the address of all registers and bit positions selected for fault injection during simulation, and the history of the faults injected, per register address and bit position, that were detected and corrected, only detected, and not detected. This information will be used later (as described in Section 3), in order to calculate the cross section and the failure rate for the circuit on the validation.

Line

/

Code Structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55 / library IEEE;
use IEEE.std_logic_1164.all;
Entity circuit_example is
port ( ...);
End circuit_example;
Architecture arch of circuit_example is
|
|
Component ErrorManagementUnit
port ( ...);
End Component;
|
|
Begin
|
|
EMU: ErrorManagementUnit
port map ( ... );
|
|
End arch;
------
library IEEE;
use IEEE.std_logic_1164.all
Entity ErrorManagementUnit is
port ( ... );
End ErrorManagementUnit;
Architecture arch_EMU of ErrorManagementUnit is
|
|
Component LFSR is
port ( ... );
End Component;
|
|
File RandomTimeFile : TEXT open READ_MODE is “randtime.txt”;
File ResultFile : TEXT open WRITE_MODE is “result.txt”;
|
|
Begin
|
|
LFSR_Reg_ Selector: LFSR
port map ( ... );
|
|
LFSR_Bit_Selector: LFSR
port map ( ... );
|
|
End arch_EMU;

Fig. 3. Skeleton of the VHDL code generated by the “circuit modeling & fault injection” strategy. This skeleton-based VHDL code is melt to run in a fault simulation set. One of the main characteristics of this proposal is the ease automation of the procedure by which the skeleton can be generated from a synthesizable VHDL circuit description.