Overcoming LTE PHY Design Challenges Using ESL Design Methodologies
By: Louie Valeña, Field Applications Engineer, CoWare K.K.
The 3rd Generation Partnership Project (3GPP) announced the functional freeze[1] of the LTE specs (Release 8) in Dec 2008 [1], but even before that, Nokia Siemens Networks had already announced the availability of LTE base stations [2] and LG had announced LTE baseband chips for the handset [3]. These and other initial implementations will need to be optimized over time to optimize cost/performance and upgradedto comply with the latest version of the specification. With NTT docomo [4] and Verizon Wireless [5] announcing LTE service availability in 2010, the design and development time for hardware, software and systems is painfully short. A comprehensive design and verification methodology is required to meet the tight development schedule while meeting or exceeding performance criteria.
Electronic System Level (ESL) design aims to be this comprehensive design and verification methodology. This is achieved by using simulation models with a high level of abstraction to act as an “executable specification” for all the design teams involved. A paper spec is subject to misunderstanding and misinterpretation. An executable spec works around these limitations by embedding the designer’s “intent” into the spec. The executable spec acts as the “golden testbench” for all the design teams involved, thereby removing the need for each team to create their own testbenches.
This article aims to introduce some of the design challenges that may be encountered in designing the physical layer (PHY) for LTE and how ESL tools can help create executable specs to overcome them.
Design Challenge #1: Reading and understanding the specs
The bulk of the LTE PHY specs are in two documents: TS 36.211 [6] which describes the physical channels and their modulation; and TS 36.212 [7] which describes multiplexing and channel coding performed on data from the MAC. Although the specs include some block diagrams to facilitate comprehension, it’s still quite a feat to visualize how data moves and is transformed while moving from one block to the next just from perusing the specs. CoWare provides an LTE library which can be used as a reference guide to facilitate comprehension of the spec. Figure 1 shows the detail view of the hierarchical LTE encoder block. It fills in the details outlined in TS 36.212 v.8.5.0 Sec. 5.3.2. Note that probes can be attached to block outputs to monitor how signals change during processing.
Figure 1: Detail view of the hierarchical LTE encoder block showing the processing performed on the downlink shared channel as specified in TS 36.212 v.8.5.0 Sec. 5.3.2. Note that parameters can depend on other parameters in higher hierarchies and can be passed down to lower hierarchies, as well. Probes can be attached to block outputs to show how the signals change during simulation.
Design Challenge #2: Creating an executable spec to investigate system performance and act as a golden testbench for all design teams
Many companies participating in the standardization process have algorithm development teams dedicated to writing C programs to create and evaluate various proposals. Unfortunately, the simulation programs are seldom usable outside the algorithm development teams due to non-uniform coding styles and lack of suitable documentation. They are seldom used as executable specs because they are difficult to read, maintain and interface to. An executable spec for PHY layer design should have the following characteristics:
- Dataflow model of computation – Simulation programs are usually classified according to the model of computation used. Some commonly used models of computation are: continuous time (e.g. SPICE, Verilog-A), discrete event (e.g. Verilog, VHDL) and dataflow. In continuous time and discrete event models, the order in which blocks/functions are executed need to be determined during runtime, which requires significant overhead. For static dataflow, the execution schedule can be determined before runtime, allowing faster simulations. When multiple sampling rates are involved (multi-rate), non-dataflow simulators which use a discrete time fixed-step solver model of computation would need to execute all blocks at the fastest common multiple clock. For static dataflow, the runtime schedule would appear as nested “for” loops, thereby allowing each block to execute at its “designated” sampling frequency. In Fig. 2, the AFE can be modeled in complex baseband with a bandwidth wide enough to evaluate the effects of interfering signals on system performance. Assuming an LTE bandwidth of 20MHz, the frequency of the ADC clock would be fADC = 30.72MHz (2048 point DFT with 15kHz subcarrier spacing). Intermodulation response rejection tests described in [9] specifies an unmodulated carrier at ±17.5MHz offset and a 5MHz modulated interfering signal located ±35MHz away from the desired channel. This implies that the AFE portion should be sampled (for simulation purposes) at >3fADC = 92.16MHz to satisfy Nyquist’s sampling criterion. A discrete time fixed-step simulator would need to process everything at >92.16MHz, while a static dataflow simulator would process each block at the required rate (e.g. >92.16MHz for the AFE and 30.72MHz for the ADC). This is the primary reason why dataflow is considered to be the best model of computation for signal processing applications.
Figure 2: Simplified block diagram of a UE receiver with 2 antennas (AFE: analog front-end; BPF: band pass filter; LNA: low noise amplifier; VGA: variable gain amplifier; LPF: low pass filter: ADC: analog-to-digital converter; DAC: digital-to-analog converter; AGC: automatic gain control; NCO: numerically controlled oscillator; CP: cyclic prefix; DFT: discrete Fourier transform). Baseband processing blocks are in orange. The LTE specs describe how data is to be transmitted but not how they are to be recovered.
- Hierarchical block diagram editor – Viewing a block diagram to trace signal flow is a lot easier than going through several pages of C code. A hierarchical block diagram editor allows the user to quickly grasp the signal flow and manage complexity. (See Fig. 1).
- Source code available for all blocks/models – This allows implementers to examine the details of the “executable spec” and use it as a starting point for their own implementation; or modify it to suit their purposes. The source code for all primitive blocks should be available for viewing and editing.
- Rapid simulation – The execution should finish as quickly as possible to allow sweeping over several parameters and getting results quickly for various channel scenarios and usage profiles. More simulations run in less time results in fewer surprises during field testing, less “back to the drawing board” moments, and thus provide a huge costs savings to the project. Using a C++ infrastructure and running compiled simulations over a distributed network should be supported. Multi-core support further takes advantage of dual-core CPUs by automatically subdividing the design into independent threads which can run on separate cores, resulting in a 1.7x ~ 1.9x speedup compared to a single core.
- Single-process co-simulation framework – This simplifies bottom-up verification. Implementation of digital blocks would be done using Verilog/VHDL. Initial implementation of analog blocks would be done using Verilog-A or Verilog/VHDL-AMS. Note that the commonly used IPC (inter-process communication) method for co-simulation has high overhead and can be very slow. During co-simulation with Verilog/VHDL-AMS, the algorithm portion of the design is directly linked with the RTL/AMS simulator using PLI/VPI, incurring no IPC overhead.
CoWare’s LTE library includes:
- Downlink reference system with ideal receiver
- Downlink reference system with practical channel estimator
- Uplink reference system with ideal receiver
- Cell Search reference system
- MIMO channel model supporting EPA, EVA, ETU, SCM-A/B/C/D as well as user-defined scenarios with M transmit and N receive antennas (no limitations on M nor N)
- MIMO receiver supporting spatial multiplexing using zero forcing (ZF), minimum mean squared error (MMSE) or maximum likelihood (ML); and transmit diversity using maximal ratio combining (MRC)
The library models the test cases and simulation scenarios published by the LTE working group. Users of the LTE library therefore have a higher probability of being compliant to the spec since the LTE library essentially becomes a shared database. Designers can insert their own implementations into the “LTE executable spec” and check performance against the test cases published by the LTE working group. Figure 3 shows the reference throughput performance obtained compared with results from other 3GPP LTE participants.The LTE library may be used as a starting point for algorithm developers to explore particular implementations and evaluate their performance from a known, good reference point.
Figure 3: LTE downlink (PDSCH) reference system throughput simulation results compared with other 3GPP participants’ results for FDD dual-stream MIMO, 10MHz, 50RB, 2 codewords, 2 layers, 2 Tx antennas, MMSE, no feedback, precoding #0, 2 x 16QAM, coding rate = 1/2, EVA5, RVseq = 01,2,3.
Design Challenge #3: Exploring and evaluating analog front-end architectures which will meet performance requirements while minimizing power and cost.
The architectures to be considered include:
- Super-heterodyne with analog quadrature modulation/demodulation
- Super-heterodyne with digital quadrature modulation/demodulation
- Direct conversion
Super-heterodyne architectures involve multiple frequency translations. They provide the best sensitivity and selectivity at the expense of a bigger parts count, bigger BOM and larger area. Analog quadrature modulation/demodulation requires mixers, a phase shifter and a combiner. Balancing the gains of the I and Q arms and achieving an exact 90° phase shift is impossible in an analog implementation. Analog quadrature modulation/demodulation circuits suffer from gain/phase imbalance and carrier leakage. Typical discrete devices have a minimum sideband suppression of -28dBc at high RF output frequencies (e.g. 1.9GHz). This would correspond to roughly 4° of phase imbalance and 0.1dB of gain imbalance [9]. Such imperfections degrade EVM and adversely affect total system performance [10]. LTE uses 64QAM to achieve higher throughput and requires an EVM of less than 8%. Figure 4 shows the constellation diagram for 64QAM with quadrature modulator imperfections.
Figure 4: Constellation diagram for 64QAM. The top diagram is the ideal constellation. The bottom diagram shows the constellation with 4° of phase imbalance and 0.1dB of gain imbalance resulting in an EVM of 3%. Note that even though this value is less than the required 8%, the constellation is visibly skewed and will reduce the overall performance of the system. The signal points have been enlarged for easier viewing.
A digital quadrature modulator/demodulator requires 2 multipliers and an adder. In its simplest form, the local oscillator can be generated as a sequence +1, 0, -1, 0, +1, …,, that is, a sine wave with 4x oversampling. The multiplier then becomes a simple switch selecting between the original I/Q signal, an inverted version and zero. A digital quadrature modulator/demodulator doesn’t suffer from gain/phase imbalance and carrier leakage. However, it requires upsampling (zero insertion and filtering) to match the sampling frequency of the local oscillator.
A direct conversion architecture promises the lowest parts count, BOM, area and power dissipation. However, the use of an analog quadrature modulator/demodulator is unavoidable. Specifying the parameters too “tightly” in the analog quadrature modulator/demodulator would lead to low yield and low volumes since “champion” samples would have to be selected. It would be better to specify the component “roughly” and compensate digitally. Fig. 5 shows how quadrature modulator compensation and power amplifier linearization in an LTE eNB transmitter may be modeled with the front-end modules in Verilog-AMS.
Fig. 5: Block diagram of quadrature modulator compensation and power amplifier linearization. The front-end modules (green) are modeled in complex baseband with Verilog-AMS to provide an “executable specification” for the analog design team. The digital baseband portion of the design can be exported as a SystemC block for use within analog/RF simulator that co-simulate with SystemC. Note that direct conversion is used in transmission but not for the linearizer feedback. In a highly integrated system, the LO signal of the quadrature modulator would be pulled by the power amplifier, making it unusable for direct downconversion.
LTE supports multiple bandwidths: 1.4 MHz, 3 MHz, 5 MHz, 10 MHz, 15 MHz and 20 MHz. This allows carriers to gradually migrate users from GSM/EDGE to LTE. Device developers would likely not know which part of a carrier’s available bandwidth would be assigned for LTE, so it would be more practical to cope with the multiple bandwidth issue digitally, that is, filtering in the digital domain. This implies that the baseband I/Q analog filters would have a passband of 20 MHz (-10 MHz to +10 MHz in the complex domain) to cover all possible cases. This places stringent requirements on the analog-to-digital converter’s dynamic range since there may be strong GSM/EDGE signals right beside the desired and possibly weak LTE signals. Dynamic simulations would need to be performed to determine the optimum number of bits for the analog-to-digital converters in the presence of AGC and analog compensation circuits.
The transmit power amplifier consumes most of the power available in a handset. Using a highly-efficient but non-linear power amplifier with digital adaptive predistortion allows longer battery life while coping with poor antenna VSWR and exceeding LTE requirements [11]. Exploring and evaluating various predistortion algorithms and architectures requires dynamic simulations to select the optimum solution.
Using complex baseband representation for RF signals is sufficient when selecting the optimum front-end architecture. Complex baseband involves “moving” the RF carrier to zero Hertz and selecting a bandwidth (sampling frequency) wide enough to cover all signals of interest in blocking, interfering and intermodulation scenarios. Complex baseband representation allows the system designer to determine the characteristics of filters (e.g. passband, stopband, passband ripple) and amplifiers (e.g. gain, saturation) required to meet or exceed the specifications. Complex baseband dataflow modeling for front-end architecture exploration is more efficient (executes faster) than using AMS languages. Models for phase noise, non-linear amplifiers, quadrature modulator/demodulator errors, filters and others to help designers quickly model analog front-end architectures.
Design Challenge #4: Evaluate algorithms which will meet performance requirements while minimizing area, power and cost.
Some key algorithms include:
- Signal acquisition and start of frame detection
- Coarse frequency synchronization
- Channel estimation and equalization
- Fine frequency/phase synchronization
- Symbol timing synchronization
- MIMO receiver
- PMI, CQI, RI calculation and reporting
- FFT/IFFT processing
- Turbo/convolutional decoder
Transmitter processing is explicitly defined in the standard, but receiver processing is not. Designing and evaluating receiver algorithms requires transmitted signals to work with. The LTE library includes uplink/downlink transmitters and receivers to accelerate algorithm development.
There are many algorithms which may be used to realize the above tasks [12]. The “executable spec” should act as a golden reference against which all algorithms may be compared. The golden reference indicates the ideal performance of the system. It is created by providing the receiver with perfect knowledge of all impairments (multipath channel characteristics, carrier frequency/phase error, etc.). Any practical implementation of an algorithm will fall short of the ideal performance and constitutes an implementation loss. More complex algorithms will have a small implementation loss at the cost of higher power dissipation or larger area or longer latency.
Simulations are required to evaluate the performance of various algorithms over different scenarios and select the “best”. The “best” algorithm would have the least complexity (least number of operations and least amount of memory used) and least latency (computation delay) while meeting or exceeding performance requirements.
An LTE library should offer the above algorithms as well as testbenches to check performance as a starting point for developers to evaluate their own algorithms.
Design Challenge #5: Convert floating-point algorithms into fixed-point for optimum performance.
Floating-point allows values to be represented with a large dynamic range and high precision but requires more hardware resources (area and power) compared to a fixed-point representation. Floating-point is used in initial algorithm evaluation to obtain an upper bound on performance but is seldom used in a hardware implementation. Fixed-point incurs a quantization loss and a limited but “good enough” dynamic range. C programs developed during the standardization process are always done in floating-point and creating fixed-point versions of critical functions is not a trivial task [13]. It is important to take advantage of C++ polymorphism to build models whose datatype can be set to floating-point, fixed-point, complex, scalar, vector, matrix or image with a parameter change. Also offers several analysis utilities like statistics (e.g. min, max) and histogram to facilitate fixed-point conversion should be available. Parameter sweeping simulation (e.g. for bid width) on distributed simulations using Grid Engine [14] achieve the desired design space exploration productivity.
Design Challenge #6: Partition the baseband design for implementation on dedicated hardware, programmable accelerators, or software.
The criteria for selecting between a dedicated hardware and a pure software implementation of an algorithm are fairly straightforward: if it needs to run really fast with low probability of being changed, it’s a good candidate for a dedicated hardware implementation; if the processing is fairly complex with a lot of parameters but low throughput, it’s a good candidate for implementation in software.