The ATLAS Liquid Argon Calorimeters Read Out Driver (ROD)

The TMS320C6414 DSP Mezzanine board

Abstract : In this document, a detailed description of the ATLAS Liquid Argon Calorimeters Read Out Drivers prototype mezzanine designed around the TMS320C6414 from Texas Instrument, is given. The document is organized as follows : after an overall presentation of the board, the functionality of the main components are described. The document ends with miscellaneous points as power supply, JTAG, board technical considerations and planned cost.

Author : Julie PRAST

Web page : http://wwwlapp.in2p3.fr/Electronique/Experiences/ATLAS-ELEC/Rod/index.html

File name : pu6414final.doc

Created : June 2002.

Last updated : 04/09/2002

Version : 1.2

The TMS320C6414 Processing Unit 06/09/2002 6/23

1  Table of contents

1 Table of contents 2

2 Overall presentation 3

2.1 Introduction 3

2.2 The TMS320C6414 mezzanine final architecture 3

3 MAIN COMPONENTS 4

3.1 The input FPGA 4

3.1.1 Parallelisation and control 4

3.1.2 Data organization in the internal dual port memory 5

3.1.3 DSP interface 5

3.2 The DSP 6

3.2.1 EMIFA 6

3.2.2 EMIFB 6

3.2.3 Host Port Interface (HPI) 6

3.2.4 McBSP 6

3.2.5 Interrupts and general purposes Pins 7

3.3 The Output FPGA 7

3.3.1 The TTC interface 7

3.3.2 The VME interface 8

3.4 The output FIFO 9

4 Power supply and JTAG 10

4.1 Power-supply 10

4.2 JTAG chain 10

5 MISC 10

5.1 Board Technical Considerations 10

5.2 Estimated cost 10

5.3 Availability of components and tools 11

5.4 Coming milestones 11

6 REFERENCES 11

7 ABBREVIATION 12

8 ANNEXES 13

8.1 Annexe 1 : ADC Read out input event format 13

8.2 Annexe 2 : Schematics, routing and placement 14

The TMS320C6414 Processing Unit 06/09/2002 6/23

2  Overall presentation

2.1  Introduction

To assess the feasibility of the project, the ATLAS LAr collaboration has decided in 1999 to make a ROD (Read Out Drivers) demonstrator. The project consisted in the construction of a motherboard (developed by the University of Geneva, Switzerland), into which could be plugged up to 4 daughterboard processing units (PU), each PU treating 64 calorimeter channels (an half FEB). The architecture of the PU was based around a Digital Signal Processor (DSP). Three PU were designed, two based on an integer DSP the TMS320C6202 from Texas Instrument (developed by the CPPM-Marseille-France and Nevis-USA) and the other based on a floating point processor the ADSP21160 from Analog Devices (developed by the LAPP- Annecy- France).

The techniques evolution and the arrival of a new powerful DSP opens the possibility to double the system density by handling 128 channels instead of 64 in a single DSP. That’s why in the second half of 2001, an evaluation board based around the TMS320C6414 from Texas Instrument were developed, conjointly by Nevis and LAPP teams. This DSP was finally adopted by the ROD community for the final board last June.

The final design consists of 4 daughterboards per ROD module, each daughterboard equipped with two DSP, increasing the ROD density to 8 FEB per DSP, instead of two in the demonstrator.

Further more, for financial reasons, the ROD system should provide staging capabilities. In the staging scenario, a daughterboard among two, will be populated on the motherboard at the beginning of the experiment, bringing to 256 the number of channels treated by a single DSP, at the expense of potential loss in performance (reduced trigger rate).

This document aims to describe the final daughter board architecture.

2.2  The TMS320C6414 mezzanine final architecture

Figure 1 shows the TMS320C6414 mezzanine final architecture :

Figure 1 : TMS320C6414Mezzanine block diagram

The mezzanine is a 120*85 mm board, composed of two 64 pins and one 84 pins connectors that can be plugged on the motherboard. The mezzanine is composed of two processing units (PU), able to treat each up to 128 calorimeter channels (1 FEB) in normal mode and 256 channels (2 FEB) in staging mode. Each PU is composed of an input FPGA (InFPGA), a TMS320C6414 DSP from Texas Instrument and an output FIFO. The mezzanine contains also an output FPGA (OutFPGA) used for the VME and TTC interface.

Input FEB data enters the InFPGA where they are formatted and checked as needed for the DSP algorithm. When an event is ready, an interrupt is sent to the DSP which launches a DMA to read the data on the 64-bits EMIFA bus. Once the DSP has finished processing an event, it writes the results in the output FIFO through the 16 bits EMIFB bus.

The TTC data are received in the OutFPGA and sent to each DSP via 2 serial ports (McBSP). A serial port is for the Trigger type and the other is for the BCID and EventID.

The OutFPGA allows the control of the board by VME, in particular :

-  DSP boot written and histograms read through the 16-bits Host Port Interface (HPI) of the DSP.

-  Full duplex serial port (McBSP2) with each DSP (run number written, DSP commands, status read)

-  InFPGA configuration written (number of samples, number of gains, mode…) and status read through a serial line.

-  InFPGA boot.

3  MAIN COMPONENTS

3.1  The input FPGA

The InFPGA parallelizes incoming FEB data, verifies their consistency (in particular potential corruption coming from radiation effects, like SEU), and formats the data as needed for the DSP algorithm. The InFPGA is an APEX20k160EBC208-1 from Altera. The choice of this component is described in references [3].

The InFPGA has three main parts :

1.  Parallelization and control of the incoming data.

2.  Data organization in the dual port memory.

3.  DSP interface.

Figure 2 shows the InFPGA architecture in staging mode, ie when treating 2 FEBs:

Figure 2 : Input FPGA architecture (staging mode)

3.1.1  Parallelisation and control

·  The InFPGA receives input data from the mother board through two 80 MHz 16-bits bus. Each 16-bits bus corresponds to one FEB (128 channels = 16 ADC). In normal mode, the InFPGA treats one FEB, whereas in staging mode, the InFPGA treats two FEBs. The format of the incoming data is described in annexe 1.

·  For each FEB, timing signals are generated internally (indication of the data type : header, RADD, data, …)

·  For each ADC, the InFPGA detects the start of event and does the serial to parallel conversion (2 bits -> 16 bits).

·  For each word, except the start and end of event, the parity is checked.

·  For each data word, the gain is isolated and compared to the other gains of the same channel. Gains from 8 channels are grouped in 16 bits data words.

·  Several checks are performed :

·  Parity check.

·  Start of data alignment to check half-FEB (HFEB) desynchronization.

·  Identical gain for all samples of a given channel.

·  Identical control words and RADD within an HFEB.

·  Identical control words and RADD between two HFEB.

When en error is detected, the InFPGA fills the event status word. This word is interpreted by the DSP in the synchronization task. If the status is different to zero, the DSP will take the appropriate decision, as asking for a FEB reset. The synchronization task is described in more details in reference [10].

·  The content of the event status word is described below:

31 26 / 25 / 24 / 23 8 / 7 6 / 5 / 4 / 3 / 2 / 1 / 0
0 / Radds / bcids / gain or BOF / parity / ctrl3 / gain / radd / ctrl2 / ctrl1 / ones

Table 1 : Content of the event status word.

bit 0 ones : The beginning of event is missing in one of the 16 groups of channels.

bit 1 ctrl1 : Ctrl1 mismatch

bit 2 ctrl2 : Ctrl2 mismatch

bit 3 radd : RADD mismatch

bit 4 gain : Gain mismatch (the gain bits are not preserved across the time samples, for some channels)

bit 5 ctrl3 : CTRL3 mismatch

[7..6] parity: Parity errors counts (0 = no error, 1 = 1 error, 2 = 2 errors, 3 = 3 or more errors)

[23..8] gain or BOF In case of error in the gain or BOF, precise which group of channels is concerned (bit 8 = ch0)

bit 24 bcids comparison between BCID of channels 0-63 versus channels 64-128

bit 25 radds comparison between RADD of channels 0-63 versus channels 64-128

3.1.2  Data organization in the internal dual port memory

Data is then organized in a dual port memory as needed to optimize the DSP algorithm. The data format is presented below :

bits [63..48] / bits[47..32] / bits [31..16] / bits[15..0]
0 / Event status / Event status / EventID / BCID
1 / ctrl1 / ctrl2 / ctrl3 / Radd1
2 / Radd2 / Radd3 / Radd4 / Radd5
3 / C0 Gain / C0 S1 / C0 S2 / C0 S3
4 / C0 S4 / C0 S5 / C1 gain / C1 S1
5 / C1 S2 / C1 S3 / C1 S4 / C1 S5
6 / C2 Gain / C2 S1 / C2 S2 / C2 S3
194 / C127 S2 / C127 S3 / C127 S4 / C127 S5

Table 2 : Dual port memory organization

The dual port memory of the InFPGA is configured as a dual bank memory. While the DSP is reading a complete event from one bank, a new event can be written into the other bank. Therefore, the minimal size of the InFPGA dual port memory is two events. Please refer to note [3] for more details about the dual port memory organization.

3.1.3  DSP interface

When a complete FEB event is stored in the memory, the InFPGA sends an interrupt to the DSP. The DSP then launches a DMA to read the event. The reading is cadenced by the DSP clock. The frequency is configurable, but it is today foreseen at 100 MHz (CPU/6). From the reading side, the internal dual port memory of the input FPGA is seen by the DSP as a FIFO, implying that the data is read in consecutive addresses. There is one interrupt and DSP chip select per FEB.

3.2  The DSP

The DSP is the 600 MHz TMS320C6414GLZ, which is among the latest generation of the Texas Instrument DSP.

The DSP receives FEB data on its EMIFA memory bus, TTC information on serial ports McBSP0 and McBSP1 and transmits the output data through its EMIFB memory bus. The Host Port Interface (HPI) is used to boot the DSP and read histograms.

Details about the DSP code are given in the following references :

·  Input /output management and code structure: see reference [5]

·  Real time operating system : see reference [11]

·  Physics code : see reference [12]

Paragraphs bellow summarizes the DSP communications.

3.2.1  EMIFA

The External Memory Interface A (EMIFA) is directly connected to the InFPGA. The EMIFA is a 64 bits wide bus. The EMIFA clock AECLKOUT2 is generated internally and is configured to run at CPU/6 = 100 MHz. This EMIFA is used to read input FEB data in the InFPGA. As a dual port RAM is foreseen in the InFPGA for each FEB, one interrupt and one Chip Enable (CE) are foreseen per FEB.

CE / DSP memory address / R/W / Connected to / Purpose / Mode
CE0 / 8000 0000 / R / InFPGA / FEB 1 DP-RAM / synchronous read interface.
CE1 / 9000 0000 / R / InFPGA / FEB 2 DP-RAM (staging) / synchronous read interface.
CE2 / A000 0000 / InFPGA / Connected but not defined
CE3 / B000 0000 / InFPGA / Connected but not defined

Table 3 : EMIFA memory map

3.2.2  EMIFB

The External Memory Interface B (EMIFB) is directly connected to the output FIFO. The EMIFB is a 16 bits wide bus. The EMIFB clock BECLKOUT2 is generated internally and is configured to run at CPU/6 = 100 MHz. This EMIF is used to send output events to the FIFO.

CE / DSP memory address / R/W / Connected to / Purpose / Mode
CE0 / 6000 0000 / W / Output FIFO / Output FIFO write / synchronous write interface
CE1 / 6400 0000 / W / Output FPGA / end of output DMA / synchronous write interface
CE2 / 6800 0000 / / / Connected but not defined
CE3 / 6C00 0000 / / / Connected but not defined

Table 4 : EMIFB memory map

3.2.3  Host Port Interface (HPI)

The HPI is a parallel port through which the VME (host processor) can directly access the CPU’s memory space. The VME has ease of access because it is the master of the interface. The host and CPU can exchange information via internal memory. For more details about the HPI, have a look on reference [6], chapter 7.

On the mezzanine board, the HPI is configured as a 16-bits wide bus connected to the output FPGA. The HPI is mainly used for the DSP boot, histograms reading and debugging purposes.

VMER/W / connected to / Purpose / Mode
W / output fpga / Boot / No particular configuration
R / Output fpga / Histograms, debug variables / No particular configuration

Table 5 : HPI description.

Note : During DSP reset, the OutFPGA pulls down the HD5 signal to configure the HPI as a 16 bits words interface.

3.2.4  McBSP

The Multi channel Buffered Serial ports are Full duplex communication DSP serial ports, with independent framing and clocking for receive and transmit.

The TMS320C6414 has 3 McBSP connected to the output FPGA. McBSP0 and McBSP1 are used for TTC data (DSP receives only), while McBSP2 is bi-directional and used to send commands to the DSP or to read the DSP status.

McBSP / R/W / Connected to / Purpose
McBSP0 / R / output FPGA / BCID + Event ID
McBSP1 / R / output FPGA / Trigger type transmission
McBSP2 / R/W / outputFPGA / DSP Commands write
DSP status read, …

Table 6 : McBSP description

3.2.5  Interrupts and general purposes Pins

Table 7 summarizes the DSP General Purposes (GP) configuration pins.

GP / function / Connected to / Purpose
GP0 / Output GP / InFPGA / Input FPGA led1 blanking
GP1/CLKOUT4 / NC
GP2/CLKOUT6 / NC
GP3 / Output GP / InFPGA / Input FPGA led1 blanking
GP4/EXT_INT4 / interrupt / InFPGA / Interrupt dedicated to FEB1 DMA
GP5/EXT_INT5 / interrupt / InFPGA / Interrupt dedicated to FEB2 DMA
GP6/EXT_INT6 / interrupt / InFPGA / Not defined
GP7/EXT_INT7 / interrupt / InFPGA / Not defined
GP8/CLKS2 / NC
GP9 / Output GP / Connector B / PU_IRQ
GP10 / Output GP / Connector B / BUSY
GP11 / Input GP / OutFPGA / FIFO almost full
GP12 / OutFPGA / Not defined
GP13 / OutFPGA / Not defined
GP14 / OutFPGA / Not defined
GP15 / NC

Table 7 : interrupts and general purposes pins