WYV64
Parallel multiplier-accumulator based on radix-2 modified Booth algorithm by using a VLSI architecture
Sathya, A.
Advance Computing Conference (IACC), 2013IEEE3rdInternational
DOI:10.1109/ECS.2014.6892605
Publication Year:2013 Page(s): 1656 – 1663
Project Title: Parallel multiplier-accumulator based on radix-2 modified Booth algorithm by using a VLSI architectureOperations
Domain:VLSI
Reference:IEEE
Publish Year:13-14 Feb. 2014 Page(s):1 – 5
D.O.I:10.1109/ECS.2014.6892605
Software Tool :XILINX
Language : Verilog HDL
Developed By:Wine Yard Technologies, Hyderabad
Parallel multiplier-accumulator based on radix-2 modified Booth algorithm by using a VLSI architecture
Abstract:
In this paper, we proposed a new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delayed in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1's-complement-based radix-2 modified Booth's algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. The proposed architecture was synthesized with 250, 180 and 130 /xm, and 90 nm standard CMOS library. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. We used Sakurai's alpha power law for the delay modeling. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.
Existing method:
Abinary multiplieris anelectronic circuitused indigital electronics, such as acomputer, tomultiplytwobinary numbers. It is built usingbinary adders.A variety ofcomputer arithmetictechniques can be used to implement a digital multiplier. Most techniques involve computing a set ofpartial products, and then summing the partial products together. This process is similar to the method taught to primary schoolchildren for conducting long multiplication on base-10 integers, but has been modified here for application to a base-2 (binary)numeral system.
With the advancement in the VLSI technology, there is an ever increasing quench for portable and embedded Digital Signal Processing (DSP) systems. DSP is omnipresent in almost every engineering discipline. Faster additions and multiplications are the order of the day. Multiplication is the most basic and frequently used operations in a CPU. Multiplication is an operation of scaling one number by another. Multiplication operations also form the basis for other complex operations such as convolution, Discrete Fourier Transform, Fast Fourier Transforms, etc. With ever increasing need for faster clock frequency it becomes imperative to have faster arithmetic unit. Therefore, DSP engineers are constantly looking for new algorithms and hardware to implement them.
Existing multiplier technique:
1.Array based multiplier
2.Carry-save array multipliers
3.Wallace tree multiplier…etc.
Proposed method:
In the majority of digital signal processing (DSP) applications the critical operations are the multiplication and accumulation. Real-time signal processing requires high speed and high throughput Multiplier-Accumulator (MAC) unit that consumes low power, which is always a key to achieve a high performance digital signal processing system. The purpose of this work is, design and implementation of a low power MAC unit with block enabling technique to save power. Firstly, a 1-bit MAC unit is designed, with appropriate geometries that give optimized power, area and delay. The delay in the pipeline stages in the MAC unit is estimated based on which a control unit is designed to control the data flow between the MAC blocks for low power. Similarly, the N-bit MAC unit is designed and controlled for low power using a control logic that enables the pipelined stages at appropriate time. The adder In this paper, a new architecture for a high-speed MAC is proposed. In this MAC, the computations of multiplication and accumulation are combined and a hybrid-type CSA structure is proposed to reduce the critical path and improve the output rate. It uses MBA algorithm based on 1’s complement number system. A modified array structure for the sign bits is used to increase the density of the operands. A carry look-ahead adder (CLA) is inserted in the CSA tree to reduce the number of bits in the final adder. In addition, in order to increase the output rate by optimizing the pipeline efficiency, intermediate calculation results are accumulated in the form of sum and carry instead of the final adder outputs.A multiplier can be divided into three operational steps. The first is radix-2 Booth encoding in which a partial product is generated from the multiplicand X and the multiplier Y . The
second is adder array or partial product compression to add all partial products and convert them into the form of sum and carry. The last is the final addition in which the final multiplication result is produced by adding the sum and the carry.
Applications:
- Digital systems designing
- Digital signal processing
- Analog and Digital communications
- Arithmetic and Logic Unit (ALU)
- Microprocessors
Advantages:
- Area Efficient multipliers.
- Low power multipliers.
- High speed multipliers.
Conclusion:
In this paper, a new MAC architecture to execute the multiplication-accumulation operation, which is the key operation, for digital signal processing and multimedia information processing efficiently, was proposed. By removing the independent accumulation process that has the largest delay and merging it to the compression process of the partial products, the overall MAC performance has been improved almost twice as much as in the previous work.
Circuit Diagram:
Screen shots:
| Page