# Design and Implementation of High Speed Modified Russian Peasant Multiplier using 8-2 Adder Compressors

**E. Jagadeeswara Rao<sup>1</sup>, A.Rama Vasantha<sup>2</sup>** 

<sup>1, 2</sup> Dept. of Electronics & Communication Engg. Aditya college of Engg. & Tech., Surampalem, AP, India emandi.jagadeesh@gmail.com, <u>vasanthaadiraju@gmail.com</u>

Abstract - FIR filters, microprocessor and digital signal processor are the core system of multipliers. MAC is the most important building block in DSP system. The key element of high throughput multiplier and accumulator unit (MAC) is to achieve a high-performance digital signal processing application, but multipliers are the most time, area, and power consuming circuits. In this paper, Modified Russian Peasant Multiplier (MRPM) using adder compressors has been proposed. According to Russian Rules, Divide and conquer technique is used in the multiplication process. But, in perspective of digital design, only shifters and adders are used in Russian Peasant Multiplier (RPM) to produce Partial Product Generation (PPG). In this paper first we present an approach towards the reduction of delay in RPM by using 8:2 adder compressors (8:2 AC), in the partial product reduction stage. The proposed design is also compared to the RPM which uses Ripple carry adder (RCA) and carry selector adder (CSA) in terms of propagation delay. The proposed design enhances speed of the system by 70.81% compared to the RPM using RCA and 92.11% compare to RPM using CSA. The total operation is coded with Verilog HDL using Model-Sim 6.3C, synthesized by using Xilinx ISE 14.7 design tool. Keywords: MRPM, RCA, CSA, 8:2 AC, PPG and Verilog HDL.

# I. INTRODUCTION

Multiplications are important and tedious task among arithmetic operations. So, multipliers are the major components in the various processors like arithmetic, signal, and image processors. There are many multiplication based functions like multiply and accumulate, convolution, filtering etc. in signal processing and image processing. The execution time for this process highly depends on the speed of operation of multiplier unit. In many DSP algorithms multiplication consumes more time compared to other basic operations, so the critical delay path for the complete operation is determined by the delay required for the multiplication unit and it substantiates the performance of the algorithm. Addition and multiplication are widely used operations in computer arithmetic; for addition full-adder cells have been extensively analysed for approximate computing [1-3].

All DSP algorithms would need some form of the Multiplication and Accumulation Operation. It is consists of an adder, multiplier and the accumulator. Usually adders implemented in DSPs are RCA, CSA or CSA. Basically the multiplier will multiply the input values and give the results to the adder, which will add the multiplier results to the

previously accumulated results. In this paper, MRPM using 8:2 has been designed. The reason for using the RPM is that, using this multiplier can reduce the number of partial products during multiplication. In final addition stage design an adder using 8:2 AC. This architecture is used to reduce the area, delay and power.

This paper is organized as follows. Section II is a review of existing schemes for RPM. The new designs of an approximate 8-2 AC are presented in Section III. Introduction 8-bit RPM algorithms are given in Section IV and high speed adder in Section V. Proposed high speed MRPM see in Section VI. Simulation results for multipliers with the approximate compressors are provided in Section VII and Section VIII concludes the manuscript.

## II. LITERATURE SURVEY

In [Chang.T.Y. and Hsiao, M.J., 1998] described the carry select adder using single ripple carry adder. a carry-select adder that requires a single carry-ripple adder with zero carryin, an add- one circuit, and a multiplexer. Having a lower transistor count and 1.5 more units of two input NAND gate delay, the add-one circuit is used to replace the original carry-ripple adder with carry- in Cin = 1. The transistor count can be reduced by 29.2% with a speed penalty of 5.9% for n = 64.

In [Gunasekaran, K., and Manikandan, M, 2014], Reconfigurable FIR filter has been designed by using Russian Peasant Multiplier (RPM). For performing addition operation of MAC unit, Carry Select Adder (CSLA) with Sklansky Adder is used in the design. It offers 30.9% reduction of area than traditional CSLA. Further to improve the architecture, some changes are made in CG block of CSLA architecture.

In [Elguibaly, F, 2000] explained a fast parallel multiplier – accumulator using modified booth algorithm. A dependence graph (DG) to visualize and describe a merged multiply-accumulate (MAC) hardware depend on the modified Booth algorithm. The carry-save technique is used in the Booth encoder, and the accumulator sections to ensure the fastest implementation. The DG applies to any MAC data and allows designing multiplier structures that are normal and have minimal delay, sign-bit extensions, and data path width. Using the DG, a fast pipelined implementation is proposed, in which an accurate delay model for deep submicron CMOS technology is used. The delay model explains multi-level gate delays, taking into account input ramp and output loading.

In [Saikumar, M., et al. 2014] described the design and performance analysis of multiply –accumulate (MAC) unit. Multiply-Accumulate (MAC) unit is designing for various high performance applications. MAC unit is a fundamental building block in the computing devices, especially Digital Signal Processor (DSP). MAC unit operates multiplication and accumulation process. MAC unit consists of multiplier, adder, and accumulator. In the traditional MAC unit model, multiplier is designed using modified booth multiplier. In this paper, MAC unit model is designed by incorporating the various multipliers such as Array Multiplier, Ripple Carry Array Multiplier with Row Bypassing Technique, Wallace Tree Multiplier and DADDA Multiplier in the multiplier module and the performance of MAC unit models is analysed in terms of area, delay and power.

#### **III. COMPRESSORS**

Compressors by far have been considered as the most efficient building blocks of a high speed multiplier. It provides an advantage of accumulation of partial products at an expense of least possible power dissipation. Rather than entirely summoning partial products with the help of CSA/Ripple adder tree, a structure of compressors would complete the same task in much lesser time and also will simultaneously eradicate the problems of large power consumption and optimization of the area. This addition of partial products when done using conventional method of implementing full adders and half adders cannot account as much to lessening of delay associated with the critical path as when counter or compressors are used. The reason for the apparent preference of compressors over counters is the advantages it provides in terms of power, number of transistors used and the delay associated with the critical path(comprising of XORs mainly) [4]. The compressor design implemented in this paper prefers both MUXs and XORs.

The internal structure of the 3-2 adder compressor is presented in Fig. 1-a. The maximum delay is given by two XOR gates. The final sum S of the 3-2 adder compressor is given in expression (1). The 3-2 adder compressor can also be used as a full-adder (i.e. mux-based full-adder) when the input C is used as a carry input.

$$S = Sum + 2 * Carry \tag{1}$$

The internal structure of the 4-2 adder compressor is presented in Fig.1-b. It has a reduced critical path compared to conventional adders since the maximum delay is given by three XOR gates. The 4-2 compressor has five inputs (A, B, C, D, C<sub>in</sub>), where  $C_{in}$  is the input carry, and three outputs (Sum, Carry and  $C_{out}$ ). In this adder compressor, the carry output  $C_{out}$ is independent of the input carry ( $C_{in}$ ), making it possible to implement this structure with higher performance. The final sum S result of the 4-2 adder compressor is given in (2).

$$S = Sum + 2 * (C_{out} + Carry)$$
<sup>(2)</sup>

The internal structure of the 5-2 adder compressor is presented in Fig. 1-c. The maximum delay is given by six XOR gates. The final sum S of the 5-2 adder compressor is given in (3).

$$S = Sum + 2* (C_{out1} + C_{out2} + Carry)$$
(3)

The internal structure of the 7-2 adder compressor is presented in Fig. 1-d [5]. The maximum delay is given by ten XOR gates. The final sum S of the 7-2 adder compressor is given in (4).

$$S = Sum + 2 * (C_{out1} + C_{out2} + Carry)$$
(4)

In this paper 8-2 adder design using 3-2, 4-2, 5-2 and 7-2. The internal structure of the 8-2 adder compressor is presented in Fig. 2(a,b,c,d) [6]. The final sum S of the 8-2 adder compressor is given in (5).

$$S = Sum + 2 * (C_{out0} + C_{out1} + C_{out2} + C_{out3} + C_{out4} + Carry)$$
(5)



Fig.1. Adder compressors internal structures: (a) 3-2; (b) 4-2; (c) 5-2; (d) 7-2.

# IV. EXISTING RPM

Existing RPM is designed to improve the hardware utilization of the circuit. The main aim of VLSI System design is to reduce the hardware complexity, power consumption and to increase the speed & throughput of the system. Hence, the aim of proposed work is reduce the delay and power consumption of multiplication. In general, Multiplication function has three important steps:

- Partial Product Generation (PPG)
- Wallace Tree Reduction (WTR)
- Partial Product Addition (PPA)

Existing RPM has been illustrated in Fig.4. It gives n rows of partial products using only Multiplexers [7].



Fig.2. The structure of 8-2 adder compressor using: (a) Only 4-2 adder compressors; (b) Combination of 5-2, 4-2 and 3-2 adder compressors; (c) Combination of 3-2 and 4-2 adder compressors; (d) Combination of 7-2 and 3-2 adder compressors [6].



Fig. 3 Architecture of Existing RPM [7]

## V. HIGH SPEED ADDERS

For any multiplication algorithm contains three steps but in this summation of partial products is an important step to generate the final result. The performance of the multiplier

depends on how fast partial products get added to obtain the final result. Many researchers can work in this area to achieve fast adders. The fundamental adder architecture is a Ripple Carry Adder and further develops number of adders such as CLA, Carry select adder, Carry save adder and Carry skip adder etc. In this ripple carry adder is well known for its regular structure and maximum delay because each step waits for the carry from the previous step. CLAs have a minimum delay but areas associated with these adders are maximum. Carry skip adder gives the more performance than ripple carry adder but it's consist of extra hardware circuitry to skip the carry generated [8]. Carry save adder gives the further addition by reducing addition there are number of three into two. The major drawback carry save adder consumes larger area [9]. Further carry select adder uses the two ripple carry adders and it does not wait for previous stage to execute. The carry select adder with higher bits exhibits excellent area and speed trade off compare with other adder architectures [10]. Many modifications can be done in carry save adder for sacrificing its speed for area [11].

Fig. 6 presents an addition of eight 8-bit values as an example. It is noted in Fig. 6 that adder circuits are required to recombine the partial sums of previous values (i.e. recombination line), since a *Carry* signal from the compressor n must be added with the *Sum* signal of the compressor n + 1 to generate the final sum (*S*) of bit n + 1.



VI. PROPOSED MRPM

If we increase the speed of the any multiplier either reduce the area of the partial products generation or reduce final sum. In this paper, proposed 8-bit MRPM using 16-bit adder using different 8-2 adder compressors and these compressors discuss in section 3. The proposed 8-bit multiplier is shown in Fig. 7. The proposed and existing multipliers develop a code in Verilog and simulate using Xilinx 14.7 and these delays results were compared with existing RPM with CSA and RCA.

Text heads organize the topics on a relational, hierarchical basis. For example, the paper title is the primary text head because all subsequent material relates and elaborates on this one topic. If there are two or more sub-topics, the next level head (uppercase Roman numerals) should be used and, conversely, if there are not at least two sub-topics, then no subheads should be introduced. Styles named "Heading 1," "Heading 2," "Heading 3," and "Heading 4" is prescribed.



Fig.7 Architecture of proposed MRPM

VII. RESULTS AND DISCUSSION

The design was synthesized on Xilinx ISE and the functional verification of Existing RPM and proposed MRPM was done on Xilinx ISIM. The targeted device is of Spartan-3e of Spartan family. The grade speed of the design is set to -5. The following section contains the results obtained by synthesizing the design in Xilinx ISE. Table I represents the results of the delay obtained from the proposed design of the MRPM and the results published by contemporary researchers.

| comgutations |        |      |           |
|--------------|--------|------|-----------|
| Design       | Slices | LUTs | Delay(ns) |
| Method-1     | 93     | 182  | 31.469    |
| Method-2     | 88     | 161  | 18.646    |
| Method-3     | 91     | 161  | 21.355    |
| Method-4     | 79     | 142  | 16.556    |
| Method-5     | 64     | 114  | 12.674    |
| Method-6     | 70     | 123  | 15.551    |

Table 1: Comparison Existing RPM and proposed MRPM of 8x8

The Comparison of Existing RPM and proposed four MRPM is shown in Table 1. In Table 1 consists of six methods performance analysis. Method-1 is nothing but existing RPM with CSA[13], Method-2 is nothing but RPM with RCA[7], Method-3 is nothing but Proposed MRPM with 16-bit adder compressors with Fig. 2(a) 8-2 adder compressor, Method-4 is nothing but Proposed MRPM with 16-bit adder compressors with Fig. 2(b) 8-2 adder compressor, Method-5 nothing but Proposed MRPM with 16-bit adder compressors with Fig. 2(c) 8-2 adder compressors with Fig. 2(c) 8-2 adder compressor and Method-6 nothing but Proposed

MRPM with 16-bit adder compressors with Fig. 2(d) 8-2 adder compressor. Relevant graphs of above tables shown in Fig.8 (a), Fig. 8 (b) and Fig. 8 (c).In Fig. 8(a), Method-5 (12.674 ns) gives very less delay compare to other methods and indicates in violet color. Also large delay appears in method-1 (31.469 ns) indicates in sky blue color.

In Fig. 8(b), Method-5 (64) occupies very less slices compare to other methods and indicates in red color. Also occupies large slices appears in method-1 and 3 (161) indicates in sky blue color. In Fig. 8(c), Method-5 (64) occupies very less LUTs compare to other methods and indicates in red color. Also occupies large LUTs appears in method-1 (182) indicates in dark red color.



Fig. 8 Comparison of Different RPMs (a) Only delay, (b) Only No. of Slices and (c) only No. of LUT

(ONLINE)

#### VIII. CONCLUSION

From the obtained results for the proposed design, it can be seen that use 8-2 AC's can enhance performance of the system significantly. The results and the comparisons were presented and clearly illustrating the advantages of the proposed design. Enhanced speed of the system by 67.7% as compared to the Existing RPM with RCA and 55.7% Existing RPM with CSA were achieved.

The proposed MPRM unit offers 31.182% reduction in Slices, 0.877 % reduction in LUTs and 7.89 % reduction in delay than existing RPM. In future, Proposed MAC unit offers great advantage in minimizing the chip size for designing the communication standards.

## REFERENCES

- V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, "IMPACT: IMPrecise adders for Low-Power Approximate Computing", Proc. of Int. Symp. on Low Power Electronics and Design (ISLPED). 1-3 Aug. 2011
- [2] S. Cheemalavagu, P. Korkmaz, K.V. Palem, B.E.S. Akgul, and L.N. Chakrapani, "A Probabilistic CMOS Switch and its Realization by Exploiting Noise," in Proc. IFIP-VLSI SoC, Perth, Australia, Oct 2005
- [3] H.R. Mahdiani, A. Ahmadi, S.M. Fakhraie, C. Lucas, "Bio-Inspired Imprecise Computational Blocks for Efficient VLSI Implementation of Soft-Computing Applications", IEEE Trans. on Circuits and Systems I: Regular Papers, Vol. 57, No. 4, pp. 850-862, April 2010
- [4] V. G. Oklobdzija, D. Villeger,"Improving Multiplier Design by using Improved Column Compression Tree optimized Final Adder in CMOS technology", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 3, Issue-2, pp 292-301, 1982
- [5] M. Rouholamini, O. Kavehie, A.-P. Mirbaha, S. J. Jasbi, and K. Navi, "A new design for 7:2 compressors," Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl. (AICCSA), Amman, Jordan, May 2007, pp. 474–478
- [6] J. S. Altermann, E. A. C. da Costa, and S. Bampi, "Fast Forward and Inverse Transforms for the H.264/AVC Standard using Hierarchical Adder Compressors," in Proc. IEEE/IFIP Int. Conf. VLSI Syst. Chip (VLSI-SoC), Madrid, Spain, pp. 310– 315, Sep. 2010
- [7] Dr.N.C.sendhilkumar, "Design and Implementation of Power Efficient Modified Russian Peasant Multiplier using Ripple Carry Adder", International Journal of MC Square Scientific Research, pp. 154-165, 2017
- [8] Guyot, B. Hochet, and J. Muller, "A Way to Build Efficient Carry Skip Adders, " IEEE Trans. on Computers, Vol. 36, No. 10, pp. 1144--1152, 1987
- [9] M. Ortiz, F. Quiles, J. Hormigo, F. J. Jaime, J. Villalba, and E. L. Zapata, "Efficient Implementation of Carry-Save Adders in FPGAs", Proc. of 20<sup>th</sup> IEEE Int. Conf. on App. Specific

Systems, Architectures and Processors (ASAP 2009), IEEE, pp. 207-210, 2009

- [10] R. Uma, V. Vijayan, M. Mohanapriya, and S. Paul, "Area, Delay and Power Comparison of Adder Topologies," Int. J. of VLSI and Communication Systems, vol. 254, 2012
- [11] B. Ramkumar and H. M. Kittur, "Low-power and area-efficient carry select adder," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 20, no. 2, pp. 371-375, 2012
- [12] Bianca Silveira, Guilherme Paim, Cláudio Machado Dinizand and Sergio Bampi, "Power-Efficient Sum of Absolute Differences Hardware Architecture Using Adder Compressors for Integer Motion Estimation Design", IEEE Trans. on Circuits and Systems–I, pp. 1-12, 2017
- [13] C. Uthaya Kumar and B. Justus Rabi, "Design and Implementation of Modified Russian Peasant Multiplier using MSQRTCSLA based Fir Filter", International Journal of science and Tech., pp. 1-6, 2016



**Mr.E.Jagadeeswara Rao** received the B.Tech degree in Electronics and communication from St. Theressa institute of Engg & Tech.Garividi in 2010, and the M.Tech in VLSI Design in Sir C R R Collge of Engg. in 2015 and currently working as a Asst. Professor in ACET,Surampalem.He is research interest in Design of Low Power and high speed digital system designs and Mixed signal design.



**Mrs.A.Rama Vasantha** received B.Tech degree in Electronics and communication from Kakinada institute of Engg & Tech. 2007, and the M.Tech in System & Signal Processing in JNTUH. in 2011 and currently working as a Asst. Professor in ACET, Surampalem.

INTERNATIONAL JOURNAL OF RESEARCH IN ELECTRONICS AND COMPUTER ENGINEERING