Reconfigurable multi-channel system for hard-field tomography
S Garcia Castillo and K B Ozanyan
School of Electrical and Electronic Engineering, The University of Manchester, Manchester M60 1QD, United Kingdom
Abstract. A wide variety of modalities are used in hard-field Computed Tomography (CT), based on THz, optical, x-ray or g-ray emission. Multi-channel implementations, typical for high-speed industrial imaging, call for specific hardware and software solutions targeting particular requirements for limited access. The work reported here addresses this issue by demonstrating a generic digital architecture for hard-field tomography, implemented as a complete tomography system, capable of imaging from incomplete data and with limited resources. The system is aimed to implement measurements and data processing widely used in hard-field tomography. It is modular, to allow easy expansion, and is based on the arrangement of a minimal number of digital signal processors (DSPs), field programmable gate arrays (FPGAs) and Sigma-Delta Analogue to Digital Converters (S-D ADCs). The system architecture and its reconfigurable features are demonstrated with the implementation of a 32-channel system. The system is programmed to perform digital lock-in detection, for a typical case of optical signal measurements and for Guided-Path Tomography (GPT). Results from GPT of temperature fields are presented.
1. Introduction
High performance tomography systems with multi-processor architectures have been successfully developed for different applications, incl. medical, [1], [2]. On the basis of results with the Extended Hypercube (EH) architecture for image reconstruction by Convolution Back Projection and Fourier Inversion methods, K. Rajan et al. proposed in 1997 a hierarchical bus-based system (HBBS) [2]. In that design, eight processors shared a common bus and a common memory in a two level bus-based system. The nodes in a single cluster were connected in an EH structure using the processor link ports. Under this scheme, data can be shared between two processors without using the common data bus. Consequently, the bus congestion problem, inherent in bus-based systems, is alleviated. The tomography system reported here is based on the HHBS approach, along with the inclusion of pre-processing units for each channel.
Our calculations show that in spite of the excellent processing capabilities of low-cost commercial DSPs, for some multi-channel applications, is still not possible to process and send in real time all information collected at the front-end. Although, some units with Multiple Instruction Multiple Data (MIMD) computational architectures are specially designed for multi-processing applications (e.g. quad-SHARC), they have some limitations, which make them less suitable for tomographs. For example, it would be a very challenging design task to utilize these DSPs to control a considerable number of external devices (e.g. 32 and more ADCs) receiving simultaneous data from them. Additionally, it would be almost impossible to implement sophisticated processing tasks such as lock-in detection for each channel in parallel, thus consuming important resources which otherwise could be used for other computational tasks (e.g. image reconstruction). On the other hand, devices such as FPGAs can be programmed to execute as many simultaneous operations as the internal resources allow. The FPGAs are ideally suited for multi-tasking for a number of reasons: they offer significant benefits including improved computational capabilities and higher bandwidths than most DSPs; they may be used to enable fast data acquisition technologies, relieving DSPs from the impractical burden of handling external devices and pre-processed acquired data. FPGAs offer the best speed and function performance when arithmetic operations implementing multiplicators, decimators, FIR filters, demodulators, etc. are required.
2. System description
2.1. A Hierarchical architecture
The lowest hierarchical level of our system is shown in Figure 1-a. It consists of a minimum number of pre-processing units (P-PUs, squares in Figure 1-a) sharing a common data bus with a Processing Unit (PUn, circles in Figure 1-a). 4 PUs are in direct link with each other for data transfers between them and each is connected to a Master Processing Unit (MAPU1) for data transfers to the next level of hierarchy.
Figure 1, (a) Lowest hierarchical level, (b) Expanded architecture
The number of P-PUs in a particular data bus will depend on the data rate at each channel and the maximum PU data bus throughput. As can be seen in Figure 1-a, a 32-channel tomograph can be implemented with 4 PUs and one MAPU. In order to expand this unit into a larger tomograph it is necessary to include another, higher level of hierarchy as can be seen in Figure 1-b. The above scheme results in a 128-channels system consisting of 5 MAPUs and 16 PUs with the same characteristics and functionality. With this approach, a number of data buses (streams) are created, between which there is no contention for data. For each stream of the cluster there is a PU that controls data transfers and the address generation of individual P-PUs. The PUs are interconnected among them for data transfers and interfaced with a MAPU, which manages the data transfer.
2.2. 32 channels system implementation
A 32-channel system has been implemented using low-cost PUs (DSPs) and P-PUs (FPGAs in conjunction with S-D ADCs). The selected processor is the ADSP-21262 from Analog Devices. This processor is designed for general purpose signal processing and is capable of 1200 MMACS (mega multiply-accumulate operations). Some of its most important features are the relatively large amount of on-chip memory (2Mbits SRAM and 4Mbits ROM) as well as the high bandwidth I/O. These eliminate the need for external memory blocks and allow the configuration of all 3 types of interfaces that are needed: i) to link each PU with its pre-processing units; ii) to link each PU with other PUs in the same cluster and iii) to interface each PU with the MAPU. The ADSP-21262 has these characteristics along with a super Harvard architecture or a modified Harvard architecture with various program and data buses for their instructions and data, thus decreasing execution times in DSP applications.
All P-PUs incorporate a versatile 16-bit Sigma-Delta analogue-to-digital converter (AD7725, Analog Devices) with re-configurable filter characteristics. The main component of each P-PU channel, a low-cost FPGA (XC3400 Spartan-3, Xilinx), controls the ADC, serves as a ROM to store the ADC’s user-defined filters and performs digital processing. A single P-PU consists of an FPGA interfaced to an ADC. The ADC is connected to a basic low-noise driver that converts a single ended input signal into a differential one. The FPGA is programmed in VHDL (VHSIC Hardware Description Language) as a state-machine function to control its respective ADC, to store data and to generate the required clock signal. The ADC’s configuration file is loaded from the FPGA. The output sample rate of the ADC is the ratio between the clock and the decimation value entered in a compiler available from the ADC manufacturer. An interrupt signal from the ADC is generated at this rate in order to latch the value at the parallel port output. The interrupt signal is synchronised with the internal clock of the FPGA and fed internally for signal processing.
2.3. Data packing and processing
Each FPGA receives its data through a parallel port from an ADC. This data is pre-processed (e.g. digitally filtered and demodulated) and sent to a PU through a common bus shared with other P-PUs. In this case, the number of P-PUs is determined by the parallel bus data rate of the ADSP-21262 (i.e. PU) and the output data rate of each P-PU. The output data rate of each P-PU can be configured from 50ks/s up to 780ks/s (16-bits), which correspond to the minimum and maximum output data rate of the selected ADC. 8 P-PUs have been allocated per DSP, therefore generating a maximum of 12.5Mbytes/s per stream (i.e. one DSP). The ADSP-21262 has a parallel port specially designed to interface SRAM and peripheral devices. This port utilises multiplexed address and data pins (AD15-0) along with three other signals (RD, WR, ALE) to control its transfer operations. In order to acquire the data from its 8 P-PUs, each MPU generates an address corresponding to each P-PU, a P-PU places its data into the bus and then MPU reads the data value at the parallel port. The maximum transfer rate of the DSPs’ parallel port is around 20 Mbytes/s at 75 MHz core rate for the current implementation of the system.
Once the data is stored in each PU, it is packed in 32-bits words and sent simultaneously to the MPU. The latter has an interface with a dedicated FPGA. This FPGA, with the same characteristics as the ones used in the P-PUS, performs some functions such as: i) control a USB2 interface for connection to other equipment, ii) generate data required to control external devices (e.g. digitally synthesized carrier sinewave for driving/modulating radiation sources, etc). Additionally, this FPGA has been programmed to control a USB2 chip (Figure 1-a, External Connection). Processed data in the memory of the MPU is sent to the FPGA utilising the ADSP-21262 parallel port. Through the USB2 interface, the data is sent to other equipment (e.g. to a display for the reconstructed image or for additional processing). The USB2 interface is programmed in VHDL and is able to transmit data to the external equipment at up to 200 Mbits/s.
3. System applications
Data acquisition in absorption tomography is similar to a number of optical measurements, where it is common to use amplitude-modulated radiation sources. Lock-in amplifiers are commonly used to detect low-level, slow-varying signals in the presence of interference and noise, often many times larger than the signal which contains narrow band-limited information. A lock-in amplifier measures an AC voltage and outputs a voltage proportional to the amplitude of the ac signal. It suppresses all frequencies outside its LPF’s passband and utilises phase sensitive detection (PSD) in order to convert the measured data in phase and amplitude information. The response of a single PSD depends on the phase difference between the input signal and the reference, which varies from one measurement situation to another. This can be avoided by using a second PSD in tandem, thus yielding to a quadrature demodulator.
In order to demonstrate the processing characteristics and potential of the P-PUs; the FPGA software has been configured for a particular application. It requires the recovery of information (up to 50kHz bandwidth) from an optical signal with a sinusoidal carrier (e.g.180kHz) and amplitude variations due to optical attenuation (between 5% and 20%). These are realistic parameters for optical absorption tomography (e.g. for imaging of gaseous fuel in internal combustion engines [4]) where the absorption signal is within only a few percent (typically < 10%) of the total signal on the detectors and is recovered by a lock-in technique. Additionally, as a direct demonstration of the flexibility and re-programmability of the proposed system, the special case of Guided-Path Tomography (GPT) was used as an illustration. Photonic variants of GPT have been pioneered, but true 32-channel tomography sensor heads are still under development. GPT allows combining the low-frequency measurements with hard-field tomography techniques. Consequently, the system was tested for GPT in the context of temperature mapping [5], for which a specific sensor and its signal conditioning circuitry have been implemented. The temperature distribution is reconstructed off-line from AC measurements of the temperature-induced resistance changes in an array of non-interacting transducers. Within this experiment, a circular rotating ring was used to hold 32 nickel wires in parallel. A source of heat was positioned under the ring. Each wire was excited with a 180 kHz sinusoidal current provided by the system to obtain an AC output signal proportional to the resistance of the wire. Each signal was amplified and conditioned to meet the requirements of the input front-end drivers.
A lock-in quadrature demodulator has been implemented [6] with a hybrid design including VHDL code and IP cores from Xilinx. A LPF with 160 coefficients was implemented. The results shown that the average floor noise at 5% modulation index was around –110dB in the band of interest (i.e. 50kHz bandwidth) and the SNR is 55dB. Other modulation indexes (e.g. 10%, 15%) were analyzed yielding the expected results. If a maximum modulation index of 20% is taken, the SNR is 65 dB.
The information obtained from 32 channels was packed and sent to a PC using the USB-2 interface available in the system. Software created in LabVIEW with a MATLAB script was used to reconstruct the image of the measurements utilising the inverse Radon transform. Figure 2-a shows the image reconstruction of a heating source placed under the centre of the ring and Figure 2-b the shows the heating source away from the centre (top-centre of the image).
The lock-in amplifier designed for absorption tomography has been reconfigured for Guided-Path (GPT) tomography without any additional hardware intervention. GPT is a new concept in the field of indirect imaging which allows combining the low-frequency measurements with hard-field tomography techniques. Further information about GPT can be found in [5]. Within this experiment, a circular rotating ring was used to hold 32 nickel wires in parallel. A source of heat was positioned under the ring. Each wire was excited with a 180 kHz sinusoidal current provided by the system to obtain an AC output signal proportional to the resistance of the wire. Each signal was amplified and conditioned to meet the requirements of the input front-end drivers. Then, each signal was demodulated and further filtered in parallel utilising the P-PUs. The information obtained from 32 channels was packed and sent to a PC using the USB-2 interface available in the system. Software created in LabVIEW with a MATLAB script was used to reconstruct the image of the measurements utilising the inverse Radon transform. Figure 2-a shows the image reconstruction of a heating source placed under the centre of the ring and Figure 2-b the shows the heating source away from the centre (top-centre of the image).
Figure 2, (a) Heating source under the centre of the ring, (b) Heating source away from the centre
4. Conclusions
The system described above is re-configurable for data acquisition by a number of modalities, leaving a varying portion of the hardware resources free for other usage. Previous research done in [7] has demonstrated some capabilities of multi-processor architectures such as the present one, including implementation of different reconstruction algorithms. The current reconfigurable system is equipped to address this task and therefore further research will be carried out to analyse different digital data processing possibilities for online image reconstruction.