SoC Self Test Based on a Test-Processor
Authors
Tobial Koal, Rene’ Kothe, Heinrich T. Vierhaus, Brandenburg University of Technology Cottbus, Germany
ABSTRACT
Testing complex systems on a chip (SoCs) with up to billions of transistors has been a challenge to IC test technology for more than a decade. Most of the research work in IC test technology has focused on problems of production testing, while the problem of self test in the field of application has found much less attention. With SoCs being used also in long-living systems for safety critical applications, such enhanced self test capabilities become essential for the dependability of the host system. For example, automotive electronic systems must be capable of performing a fast and effective start-up self test. For future self-repairing systems, fault diagnosis will become necessary, since it is the base for dedicated system re-configuration. One way to solve this problem is a hierarchical self-test scheme for embedded SoCs, based on hardware and software. The core of the test architecture then is a test processor device, which is optimised to organize and control test functions efficiently and at minimum cost. This device must be highly reliable by itself. The chapter introduces the basic concept of hierarchical HW / SW based self test, the test processor concept and architecture, and its role in a hierarchical self test scheme for SoCs.
INTRODUCTION
Test technology for large-scale integrated circuits and systems has entered a new age with new challenges. While the problems of IC production have been a focus of research since the late 1970s, test technology for integrated circuits and system in their field of application has found much less attention. Frequently, system tests “in the field” are run by software functions only, mostly not even knowing or using extra test circuitry that was implemented for supporting production tests. As large-scale integrated “embedded” electronic sub-systems are more and more used in time- and safety critical applications, system tests of high quality also “in the field” are becoming a must. Thereby start-up-tests, on-line tests, and specific diagnostic tests for fault identification are parts of the total solution. With “embedded” processors becoming an omni-present asset of such systems, there is the chance that such a processor may become the essential core of a self-test process, thereby replacing the IC-tester-device that is used in production test. The chapter first explains the architecture of a typical “system on a chip” (SoC) and introduces the essential challenges in test technology. Then the concept of a specific test-supporting processor and it’s specific properties is introduced. Special attention is given to the problem of “testing the test processor”. Size and performance of a series of test processor designs are compared. Such a processor can, if properly designed, assume various roles in a hierarchical SoC (self-) test concept. If properly designed, such a processor can even perform dynamic tests for global interconnects of an IC. Finally, a concept of using such a processor also as a “watchdog” to observe the correct function of an other (larger) processor is introduced.
BACKGROUND
Testing large-scale integrated „Systems on a Chip“ (SoCs) has been a problem since their arrival in the 1990s (Design and Test Roundtable, 1997 & 1999, Zorian, Y., Marinissen, E. J. & Dey, S. ,1998). Not only the complexity of SoCs has created problems, but also their heterogeneous structure. Although an SoC does not incorporate embedded processors by definition, real systems of such kind will usually consist of one or more processor cores, memory blocks, logic blocks, often also analog and mixed-signal functional blocks, and, to some extent, even radio frequency (RF) devices. SoCs containing multiple processor cores, so-called MP-SoCs, are the core of all hand-held communication devices. Typically, such complex systems are operated in a “locally synchronous – globally asynchronous” mode, also including a complex communication scheme based on multiple buses, bridges and bus couplers. A typical structure is shown in figure 1.
Figure 1: Structure of a multiple processor system on a chip (MP-SoC)
Unlike application specific ICs (ASICs), MP-SoCs will always have their functionality mainly defined by “embedded” software. Already this feature limits their functional testability by conventional methods. Even worse, a semiconductor manufacturer, who does not have the software that will later on run on the system, will not be able to perform a comprehensive functional test. Furthermore, embedded hardware blocks will often be imported as pre-designed “components off the shelf” (COTS) or as IP (intellectual property) blocks, whose real structure may even be unknown to the system designer. Then testing is usually based on a set of patterns delivered by the IP-block vendor. How such patterns can be applied to the embedded block has been a matter of research for some time (see chapters on SoC testing elsewhere in this book).
Within a specific IEEE working group, innovative test technology for SoCs has been developed since the 1990s (Zorian, Y., 1997, Zorian, Y., Marinissen, E. J. & Dey, S., 1998, Goel, S. K.& Marinissen, E. J., 2002 ). The basic concept developed there consists of test-supporting extra circuitry around embedded blocks, so-called wrappers, and additional test access channels. By such means, test access for functional testing of embedded blocks or even for structure-oriented tests is facilitated.
For production test, however, there is still the need to test embedded logic (processors, ASICs) by scan test technology (Kobayashi, T., Matsue, T. & Shibata, F., 1968, Eichelberger, E.B. & William, T. W., 1977). In scan test, test access is made possible by linking all flip-flops into one or more shift register structures. Then there is an indirect test access to all inputs and outputs of combinational logic blocks through these scan chains. Most SoCs have a full scan design, based on multiple parallel scan chains (Hamzaoglu, L. & Patel, J., 1999, Hsu, F. F., Butler, K. M. & Patel, J., 2001), optionally also an additional test access channel. In state-of-the-art test technology, highly compacted test information is applied to several chips under test in parallel. The test control information is first generated off-line, then transferred to the device under test (DUT) in a highly compressed form (see also chapter 5.2). This test information is then de-compacted by on-chip circuitry and fed into multiple parallel scan chains (Novak, O., & Hlaviczka, J., 2000, Rajski, J. Tyszer, J., 2002, Rajski, J., Tyszer, J., Kassab, M. & Mukherjee, N., 2001). Multiple parallel scan chains are also used for advanced deterministic self test strategies (Liang, H.-G., Hellebrand, S.& Wunderlich, H.-J., 2001). In latest developments, also an on-chip fault diagnosis is performed (Mrigalski, G., Pogiel, A., Rajski, J., Tyszer, J. & Wang, C., 2004, Leininger, A., Gössel, M. & Muhmenthaler, P., 2004), and fault information is, in case of detected faults, stored in on-chip memory blocks for further analysis (Pöhl, F., Beck, M., Arnold, R., Rzeha, J., Rabenalt, T. & Gössel, M., 2007).
Test technology, where on-chip processor devices and embedded software are used to support test procedures, has been suggested a few times (Hellebrand, S., Wunderlich, H.-J.& Hertwig, A., 1996), but has not gained real importance for production tests. The reason apparently is that “embedded” processors can be faulty themselves and are not easy to test (Corno, F., Sonza Reorda, M., Squillero, G. & Violante, M., 2001). On the other hand, embedded processors are frequently used for software-based start-up-tests in the field of application. Amazingly, those tests hardly ever use any of the extra test circuitry implemented for production testing. Using an “embedded” processor device for tests that require a high degree of reliability and fault coverage is a non-trivial problem for several reasons. First, there is the hen-and-egg-problem, since it is very difficult to prove or guarantee that such processors work correctly themselves. The reasons for their limited applicability are:
- software faults,
- non-deterministic processor operation (e. g due to caches),
- transient errors occurring during the testing process,
- undetected hardware faults in such processors,
- their relatively high power consumption.
Essentially, standard types of embedded processors are in general no good candidates to support critical hardware test functions, since typical embedded processors are not always highly reliable by nature. On the other hand, at least for safety-critical applications, there is the need for high-quality test procedures in the field of applications, which must be performed without an external test machine. Logic built-in self test (BIST) can be part of the solution, but will, for example, not easily cover interconnects on SoCs. For such purpose, we developed a specific test processor, which can be used to perform off-line tests, but which can also be helpful for on-line testing, for example by acting as a watchdog that supervises the correct operation of a larger “normal” processor on-line (Galke, C., Pflanz, M.& Vierhaus, H. T., 2002).
EMBEDDED TEST TECHNOLOGYBASED ON A TEST PROCESSOR
A Hierarchical Test Scheme
For high-quality tests “in the field” there is no golden device such as an external test machine, which is performing correctly by definition. Therefore we need a hierarchical (self-) test approach, which uses available hardware and software resources under realistic conditions. The first question to be discussed is whether a core-role in testing can be assumed by any available processor device in an SoC.
Embedded processors used for high-quality testing must meet several requirements which are partly contradictory. First, the processor design must be fully deterministic. Speculative execution of instructions and even uncertain behaviour, typically associated with cache-based memory systems, must be avoided. Second, the processor must be economical in terms of size and power consumption. For this reason, we decided for a 16 bit RISC architecture, partly based on the DLX basic architecture (Hennessy, J.& Patterson, D., 1990). Third, the processor must be either very well testable (in case of an external test facility), or it must be self-testing. Fourth, it must have a more-or-less standard programming interface and associated programming tools, such as a C-compiler. Fifth, the processor may need extra instructions for special test operations, which are time-critical and would be too slow if composed from standard executions.
The general idea was to use such a device as the cornerstone of a hierarchical SoC self test strategy (Kretzschmar, C., Galke, C.& Vierhaus, H. T., 2004). As the single reliable device, on which a hierarchically organised self test scheme can be built, the processor first has to be highly reliable by itself. That means, within a self test scheme, the processor has to run a self test procedure first, without any support from external devices. Only a memory self test process for an embedded memory block is also allocated. With this processor validated as fault free, further steps of a hierarchical SoC test may follow, including bus tests, scan-based tests of logic and other devices plus, finally, functional self tests of other processors and system-level tests. The hierarchical SoC self test process to be supported is sketched in figure 2.
Figure 2: Hierarchical Self Test Scheme for SoCs
In this scheme, the test processor plays a key role in testing external logic blocks and other processor devices, for example by re-using scan test circuitry already used in production testing. However, the test of global interconnects such as system buses will also play a crucial role, since such structures are not easily covered by logic self tests or functional self tests of individual processors.
Unfortunately, such a processor design was not available from any academic or industrial source anywhere. It turned out to be a development task for generations of students in bachelor- and master theses. In some sort of a master development plan, the design of the processor went through the following steps:
1. Design a minimum-sized RISC processor with 16-bit registers and -bus width.
2. Add functional self test features that facilitate a short and effective self test for this device.
3. Add features for interrupt handling.
4. Extensions for on-line fault recognition.
5. Extensions of the instruction set and the processor I / O ports that facilitate an effective test of external bus structures.
6. Development of an optional multiplier / divider unit.
7. Processor generation system with optional feature selection.
8. Fast version with pipeline.
9. Fine-tuning of the design and implementation into a prototype ASIC.
The first design step showed that the minimum complexity would be below 4000 equivalent gates, with, however, only 8 internal universal registers (Hennig, H., 2001). The first improvement added 8 more registers and two specific machine instructions that would run two internal registers either as a linear feedback shift register (LFSR), or a multiple-input signature register (MISR). Such functionality is cheap and is essential, if the processor supports external structural test by acting as a pattern generator (Hellebrand, S., Wunderlich, H.-J. & Hertwig, A., 1996). This version also includes an optimised functional self test procedure that uses all register, all instructions, all ports and, of course, all functional units. The self test process has to end with a specific signature after a pre-defined number of clock cycles (Schwabe, H., Galke, C., Vierhaus, H. T., 2004). In this case, however, the extra logic implemented to supervise the functional test routine proved to be difficult to test. The extensions that would facilitate sophisticated external bus tests included a few specific instructions that run over an extended number of clock cycles plus an extended I / O interface. The objective was to have a macro-instruction that can send out a 16-bit word over a bus interface and record the result from the same bus one clock cycle later for detection of dynamic fault effects, such as large-scale delays caused by line coupling.
Since the processor has no instruction pipeline, two separate instructions for “output port write” and successive “input port read” would have a delay of 5 clock cycles, which does not match the concept of driving the external bus under test with the processor clock for dynamic fault detection.
The processor version that has an internal on-line errors detection uses control word supervision for the control logic, based on a partial duplication on the control circuitry, and Berger code analysis for the data path, which mainly consists of an ALU. With these extensions, the processor complexity is still under 10 000 equivalent gates. The inclusion of a multiplier plus a divider circuit and, even more, the pipelined version bring the complexity to a level slightly beyond 20 000 equivalent gates. The final steps of development (6 and 8) only served to explore the limits of the concept, where the test processor might optionally have to replace a normal embedded processor. Finally, we developed a processor synthesis scheme that could generate a processor configuration upon demand (Rudolf, D., 2006, Frost, R., Rudolph, D., Galke, C., Kothe, R. Vierhaus, H. T., 2007). The processor configuration software can not generate all possible features of the processor in arbitrary combinations, but only a set of versions with compatible features (figure 3).