# An Improved Combined Architecture of the Four FDCT Algorithms

Atri Sanyal<sup>1</sup>, Saloni Kumari<sup>2</sup>, Amitabha Sinha<sup>3</sup> <sup>1,2</sup>NSHM College of Management & Technology Kolkata, West Bengal, India. <sup>3</sup>Birbhum Institute of Engineering & Technology Birbhum,West Bengal, India (E-mail: atri.sanyal@nshm.com)

Abstract- Four popular and efficient FDCT algorithms are chosen. A combined architecture which was proposed earlier has been considered. The earlier architecture was selecting one algorithm among these four by means of a control bus consisting of four input lines. The input and output bus system was not used in that architecture, so the number of input and output blocks required was huge. The earlier architecture has been redesigned using bus system. Then an improved architecture was proposed which was selecting one algorithm among four by a 2 input control line using clock enabled subsystem and I/O bus system. Both these architectures were implemented using Matlab Simulink. All the components of both the combined architectures have been manually modified to 16 bit fixed point data type. Next using HDL coder automated VHDL code is generated. The generated VHDL code is manually modified to minimize signal loss. Both the architectures have been synthesized using Xilinx ISE 14.5. A test bench program was written to test both architectures timing behaviour using same set of input data. From the synthesis and post route timing simulation report it was found that the new combined architecture is better than the previous one in terms of hardware utilization and timing which is evident from parameters like number of Slice LUTs, Maximum padding delay time, Maximum combinational path delay etc.

*Keywords-*\_Improved Combined Architecture, FDCT algorithm, Matlab Simulink, VHDL, Control Signals, Xilinx Synthesis, Post Route Simulation.

#### I. INTRODUCTION

JPEG is the most dominant form of image compression that centers around the DCT(Discrete Cosine Transformation) algorithm [1]. In JPEG total image matrix is broken into 8\*8 sub blocks and then working from left to right and right to bottom, DCT is applied to each and every image block[1]. As DCT is designed to work on pixel values ranging from -128 to 127, therefore original block is levelled off by subtracting 128 from each entry. The n rows of an N point DCT matrix T are defined by[1]: 1> For all i=1 to n :  $(t1i=\sqrt{1/n})$ 2> For all i=1 to n and k=2 to n :  $(t_{ki}=\sqrt{2/n}\cos((\pi(2i-1)(2k-1))/2n))$ . An 8x8 DCT matrix composed after using the above formula looks like this:

0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.4904 0.4157 0.2778 0.0975-0.0975-0.2778 -0.4157 -0.4904 0.4619 0.1913 -0.1913 -0.4619 -0.4619 -0.1913 0.1913 0.4619 0.4157 -0.0975 -0.4904 -0.2778 0.2778 0.4904 0.0975 -0.4157 0.3536 -0.3536 -0.3536 0.3536 0.3536 -0.3536 0.3536 0.3536 0.2778 -0.4904 0.0975 0.4157 -0.4157 -0.0975 0.4904 -0.2778 0.1913 -0.4619 0.4619 -0.1913 -0.1913 0.4619 -0.4619 0.1913 0.0975 -0.2778 0.4157 -0.4904 -4157 0.2778 -0.0975

From a DCT matrix it is clear that Symmetries exist in DCT function and this can be used to reduce the computation load in DCT. The basic n point DCT requires n<sup>2</sup> multiplication and n(n-1) additions to find the value of y=(T\*original) where original is the image pixel matrix. For 8\*8 matrix, it will amount to 8\*8=64 multiplication and 8(8-1)=56 addition[1],[2]. There are number of Fast DCT algorithms which aim to improve the computational load by using the symmetries present in the DCT matrix.[3]-[11]. Among them 4 popular and efficient DCT algorithms based on dataflow diagrams are chosen in order to implement the architecture The data flow diagram of the selected 4 FDCT algorithms named Chen's, Arai's, Loeffler's and Jeong's are given in Fig.1(a),1(b),1(c) and 1(d) respectively[3]-[6].





Fig1: The dataflow diagrams taken from Chen's, Arai's, Jeong's and Loefflers papers[3,4,5,6]

An earlier architecture [13] was proposed which attempted to implement 4 FDCT algorithms [3],[4],[5],[6] combined inside where each one among the four can be selected by a 4 input control signal. Here an attempt was made to improve the previous architecture. The rest of the paper is arranged in the following order Section II describes the selected four FDCT algorithms in terms of their Hardware complexity and characteristics. Section III describes the redesign of the earlier architecture of [13] specially the use of common bus and common input output blocks for all four subsystem instead of individual input and output blocks of all four subsystems. Section IV describes the new improved architecture and its hardware components, Section V briefly describes how both the architectures were coded, synthesized and simulated/tested. Section VI describes the comparison of the previous and improved architecture in terms of hardware utilization and timing. Section VII discusses conclusion and future scope of this paper.

#### ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE)

## II. INTRODUCTION OF THE SELECTED 4 FDCT ALGORITHMS

First reported in 1977, Chen's FDCT algorithm[2] is one of the first and widely used FDCT algorithms with a fixed complexity. The algorithm can be extended to 4,8,16,32and more number of input points though a 8 point variety is considered here. In the data flow diagram the input values are f0-f7 and output values are F0-F7 with a scale factor of 2. The circular nodes are implemented as adders, line containing -1 value are implemented as unary minus blocks, line containg  $C_{\text{something}}$  or  $S_{\text{something}}$  are implemented as multipliers with constant values of cosine(something) or sine(something) values.

Arai's were introduced in 1988 [3] and is reported to be one of the fastest. The algorithm uses lowest number of multipliers than other algorithms. The dataflow diagram contains input values f(0) to f(7) and output values F(0) to F(7) where the first one F(0) is having a scale factor of 8 and others have 16. The constants are listed in the following table.

Table 1: Values of constants of Arai's[3]

| 1 | <b>a</b> <sub>1</sub> | <b>a</b> <sub>2</sub> | <b>a</b> <sub>3</sub> | <b>a</b> <sub>4</sub> | <b>a</b> 5 |
|---|-----------------------|-----------------------|-----------------------|-----------------------|------------|
|   | 0.7071                | 0.5411                | 0.7071                | 1.3065                | 0.3826     |

Jeong's [4] was reported in 1998 and also contain same number of multipliers and adder as Arai's. It also has a special property of shifting most of the multiplications at later stage to minimize propagation errors due to fixed point truncation. The 12 constants which are multiplied are listed in the following table

Table 2: Values of constants of Jeong's[4]

| C0 | 1/Cos(pi/4)             |
|----|-------------------------|
| C1 | 1.414/4                 |
| C2 | Cos(pi/4)/2             |
| C3 | Cos(pi/4)/Cos(pi/8)     |
| C4 | Cos(pi/4)/(4*C(pi/8))   |
| C5 | Cos(pi/8)/Cos(pi/8)     |
| C6 | 1/Cos(pi/8)             |
| C7 | Cos((3*pi)/8)/Cos(pi/8) |
| C8 | Cos(pi/8)/4*Cos(pi/16)  |

## IJRECE Vol. 6 ISSUE 4 (OCTOBER- DECEMBER 2018)

| C9  | Cos(pi/8)/4*Cos((7*pi)/16) |
|-----|----------------------------|
| C10 | Cos(pi/8)/4*Cos((3*pi)/16) |
| C11 | Cos(pi/8)/4*Cos((5*pi)/16) |

Finally Loeffler's was proposed on 1989 and is reported to be fastest to calculate DCT and IDCT though the no of multipliers are more than Arai's or Jeong's. The convention followed while converting the dataflow diagram into simulink model is same as followed in Chen's. The following table contains a comparison of different simulink components required for the four above mentioned algorithm.

Table 3: No of Simulink Components required for 4 FDCT algorithms

|            | I/O   | Add | Unary | Product | Constant |
|------------|-------|-----|-------|---------|----------|
|            | Block |     | Minus |         |          |
| Chen's     | 8     | 27  | 8     | 18      | 18       |
| Arai's     | 8     | 28  | 16    | 13      | 13       |
| Jeong's    | 8     | 28  | 12    | 13      | 13       |
| Loeffler's | 8     | 15  | 11    | 14      | 14       |

### III. A COMBINED ARCHITECTURE FOR 4 FDCT ALGORITHMS

A Combined architecture of all these four FDCT (Fast Discrete Cosine Transform) algorithm has been deviced in [13]. The 4 FDCT algorithms selected was Chen's, Arai's, Vetterli's and Loeffler's. In our paper we have replaced Vetterli's with Jeong's as this is now more popular than Vetterli's. The previously proposed system of [13] is redesigned but the control signal remained 4 bits. The combined system was performing any one of the four FDCT (Fast Discrete Cosine Transform) algorithms by just changing the control signals. Four control signals are taken C1, C2, C3 and C4. We have taken 16 bit 8 nos of integer input values of an image pixel.Four sub-systems of four FDCT (Fast Discrete Cosine Transform) algorithms (Chen's, Arai's, Jeong's, Loffler's,) are taken. Depending on the values of controls, connections are done. Only that sub-system is connected whose control signal is given the value 1. Rest of them are not connected. To create a common 16 bit 8 input and output bus system connected to all 4 subsystem, Bus Creator and Bus Selector components are used from Simulink Toolset.

Table 4: Number of control signals used[13]

|          | C1 | C2 | C3 | C4 |
|----------|----|----|----|----|
| Chen     | 1  | 0  | 0  | 0  |
| Arai     | 0  | 1  | 0  | 0  |
| Jeong    | 0  | 0  | 1  | 0  |
| Loeffler | 0  | 0  | 0  | 1  |

## ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE)

Matlab implementation of Combined Architecture has been shown below:



Fig2: Matlab Implementation of a Combined Architecture[13]

In fig2, 4 sub-system blocks are taken which contains Chen's, Arai's, Loeffler's and Jeong's FDCT algorithm. Matlab implementation of one of the four FDCT i.e. Loeffler's FDCT algorithm which is in the 4<sup>th</sup> sub-system has been shown below:



Fig3: Matlab implementation of Loeffler's FDCT algorithm[4]

In Fig3, the 4<sup>th</sup> sub-system of combined architecture has been shown, the algorithm requires 18 products and 27 additions to compute the DCT on an 8x8 pixel matrix. While doing the implementation from the data flow diagram, 8 input blocks of 16 bit signed integer (source) are taken for taking input, "ADD" blocks are used for "addition", "Unary minus" blocks are used for converting the value to negative, "Product" Blocks are used to multiply the values with constants, "Out" blocks of fixed 16 bit data type are used to display the output. The multipliers, adders, and unary minus blocks of every stage is manually converted to fixed 16 bit data type

Table5: Number of Simulink library blocks used in Loeffler's algorithm

| Input<br>Block | Add | unary<br>minus | Product | Constants |
|----------------|-----|----------------|---------|-----------|
| 8              | 15  | 11             | 14      | 14        |

This process is repeated for all other subsystem present in the architecture. The 4 control signals are taken to choose

#### IJRECE VOL. 6 ISSUE 4 (OCTOBER- DECEMBER 2018)

one from the 4 FDCT algorithm, 8 outputs from 4 subsystems i.e 4x8=32 product blocks are taken to connect control signals and output from each of the sub-system to Bus selector. 16 Bus selectors and 8 Bus creator are chosen from which 16 bit 8 nos of Bus creator and 8 Bus Selector are used for the selection of input and rest of the 8 Bus selector are used for the selection of output. BUS is used instead of the MUX as it make the architecture more generalised one. This is one of the improvement which is done in previous combined architecture. Total number of Simulink blocks are shown in the table below:

| Table6. | Number | of blocks | used in | [13] |  |
|---------|--------|-----------|---------|------|--|
| rabieo. | Number | OI DIOCKS | useu m  | [13] |  |

| Sub-<br>system | Control<br>Input | Product | Bus<br>selector | Displays | Input | Bus<br>creator |
|----------------|------------------|---------|-----------------|----------|-------|----------------|
| 4              | 4                | 32      | 16              | 8        | 8     | 8              |

#### IV. AN IMPROVED COMBINED ARCHITECTURE FOR 4 FDCT ALGORITHMS

An improved combined architecture has been designed which consists of less number of blocks compared to the previous combined architecture. We have taken enabled subsystem (This is conditionally executed sub-system that runs once at each major time step while control signal has a positive value.) blocks, which consists of clocked enable input and one of the dataflow diagram of an FDCT algorithm implemented inside. Two control signals along with two NOT gates are taken. The sub-system is connected according to the values of the control signals as shown below:

Table7: Number of Control Signals used

|          | C1 | C2 |
|----------|----|----|
| Chen's   | 0  | 0  |
| Arai's   | 1  | 0  |
| Jeong's  | 0  | 1  |
| Loeffler | 1  | 1  |

The Matlab implementation of the Improved Combined architecture is shown below:

| <b>`</b>                      | gm_new10 * - Simulink                            |                   |
|-------------------------------|--------------------------------------------------|-------------------|
| File Edit View Display Diagra | n Simulation Analysis Code Tools Help            |                   |
| 因·白•田 中中令                     | 20 @ • 20 • 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |                   |
| gm_mov10 × Drabled Subsystem  |                                                  |                   |
| iii 🧃 gri jirei 30 🕨          |                                                  | •                 |
| 0                             |                                                  |                   |
| E3                            |                                                  |                   |
| =                             |                                                  |                   |
| 83                            |                                                  |                   |
| 25                            |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
|                               |                                                  |                   |
| (ex)                          |                                                  |                   |
| Ē1                            |                                                  |                   |
| 39                            |                                                  |                   |
| Ready                         | 30%                                              | VariableStep Auto |

Fig4: Matlab implementation of enable combined architecture

#### ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE)

Improved combined architecture of fig4 consists of 4 enable sub-systems named Chen, Arai, Jeong, Loeffler which is designed using Matlab Simulink blocks. Matlab implementation of one of the four FDCT algorithm inside the third subsystem i.e. Jeong's FDCT algorithm is shown below:

| Booksideer |
|------------|

Fig5: Implementation of Jeong's algorithm[3]

In Fig5,Matlab implementation of 3<sup>rd</sup> sub-system of the Improved Combined architecture has been shown. The FDCT required 12 multiplications and 28 additions to compute the DCT on an 8x8 pixel matrix.While doing the implementation the same procedure is followed in case of implementing the adder, unary minus, product, input and output blocks and the subsequent data type change. The required Simulink blocks for implementing the algorithm is shown in table8.

Table8: Number of Simulink library blocks used in Jeong's Algorithm.

| Input<br>Block | Add | Unary<br>Minus | Product | Constants |
|----------------|-----|----------------|---------|-----------|
| 8              | 28  | 12             | 13      | 13        |

Improved Combined architecture consists of 16 BUS selector and 8 BUS creator. The 8 BUS selector and 8 BUS creator are used for the selection of input using input block. Two control signals along with the two NOT Gate and four AND Gates are used. The 8 BUS selector are used for Output using OUT blocks. Inputs are taken using input block which is 16 bit signed integer and the same is set for the out block. Total number of Matlab Simulink blocks used are shown in the following Table:

Table9: Number of Matlab Simulink Blocks used in Improved Combined Architecture

| Enable<br>d Sub-<br>system | Control<br>s | AND | Bus<br>selecto<br>r | Display<br>s | Inpu<br>t | Bus<br>creato<br>r | N<br>O<br>T |
|----------------------------|--------------|-----|---------------------|--------------|-----------|--------------------|-------------|
| 4                          | 2            | 4   | 16                  | 8            | 8         | 8                  | 2           |

## V. VHDL CODE GENERATION, HARDWARE SYNTHESIS AND TIMING SIMULATION

Automated VHDL code is generated by HDL coder. The code is modified manually in order to minimize signal loss.

#### IJRECE VOL. 6 ISSUE 4 (OCTOBER- DECEMBER 2018)

Then, code is synthesised in Xilinx ISE 14.5, in Vertex7. Hardware synthesis is done till mapping and then place and route. A test bench program is is written which is to choose the 4<sup>th</sup> subsystem (implementing Loeffler's FDCT algorithm) in both the combined Architecture and execute them with the same set of data. Out of the four simulations which is behavioural, post-map, Translation and post route, Post-route is supposed to be the closest to hardware implementation and hence is shown. Three screenshots of each simulation diagram is taken due to large amount of signals involved and presented in the following two figures.



Fig6: post and route simulation of previous combined architecture



#### ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE)



Fig7: post and route simulation of Improved Combined architecture.

## VI. COMPARISON OF IMPROVED COMBINED AND PREVIOUS COMBINED ARCHITECTURE

Table10: Name and number of blocks used in both circuits.

| Blocks used       | Previous<br>Combined | Improved<br>Combined |  |
|-------------------|----------------------|----------------------|--|
|                   | circuit              | circuit              |  |
| Sub-systems       | Sub-system           | Enabled sub-         |  |
|                   | block:4              | system block:4       |  |
| Input             | 8                    | 8                    |  |
| Control           | 4                    | 2                    |  |
| signals(constant) |                      |                      |  |
| PRODUCT/AND       | 32                   | 4                    |  |
| BUS selector      | 16                   | 16                   |  |
| BUS creator       | 8                    | 8                    |  |
| OUT               | 8                    | 8                    |  |
| NOT Gate          | 0                    | 2                    |  |

In the above table, comparison of Matlab Simulink Blocks of the Previous and Improved Combined Architecture has been shown. In the previous Combined architecture 4 subsystems are taken whereas in the improved one 4 enabled sub-system is taken. In the Improved Combined architecture the number of control signals has also been reduced to two from four, AND blocks are used instead of PRODUCT block which is also reduced in the number from 32 to only 4. The number of BUS Selector, BUS creator, INPUT, OUT blocks are same in both the Combined architecture. Clocks are used in Improved combined architecture which make this architecture streamlined, this was absent in previous combined architecture. Moreover, Improved combined architecture is more simple to understand and implement. We can see the effect of this reduced hardware requirement from the synthesis report where the improved combined architecture took less number of occupied slices than the previous architecture though total number of LUT slices is

#### IJRECE VOL. 6 ISSUE 4 (OCTOBER- DECEMBER 2018)

slightly more than the previous combined architecture. They both use almost same number of IOBs and exactly same number of DSP cores for floating point multiplications. Timing performance is also better in the improved architecture as the Maximum padding delay and Maximum combinational path delay is lower than the previous one.

Table11: Comparison of both the combined architecture taken from Device Utilization Summery Report and Timing Report after synthesis and post route timing simulation in Xilinx ISE 14.5

|                                        | Improved<br>combined<br>Architecture | Previous<br>Combined<br>Architecture |
|----------------------------------------|--------------------------------------|--------------------------------------|
| Number of Slice<br>LUTs                | 3,248                                | 3,027                                |
| Number of occupied<br>Slices           | 1,212                                | 1,229                                |
| Number of bonded<br>IOBs               | 646                                  | 644                                  |
| Number of DSP48E1s                     | 55                                   | 55                                   |
| Maximum padding delay                  | 38.451                               | 40.089                               |
| Maximum<br>Combinational Path<br>delay | 23.799                               | 23.825                               |

### VII. FUTURE SCOPE AND CONCLUSION

An improved combined architecture has been devised which will perform any one of the 4 FDCT algorithms using only two control signals. The previous combined architecture has also been completed till post-route simulation which was not done in the previous paper []. Moreover, instead of the MUX the BUS selector and BUS Creator has been taken in account to make the architecture more generalised. In paper [13] the selection procedure using 4 control signals was further utilized for 4 different linear transformations [12]. Here also the immediate future work will be done to utilize the selection procedure using 2 or further (if required) control signals and the other design improvement presented here for designing other linear transformations and other operations related to image processing applications. Designing instruction set and a complete image transform processor remains the future and ultimate goal.

### REFERENCES

[1].Ken Carben and Peter Gent, "Image Compression and Discrete Cosine Transform", Math45 college of Redwood

[2] W. Chen, C.H.Smith, and S.C.Fralick,"A fast computational algorithm for the discrete cosine transform,"IEEE, Trans, *COMM-25*, pp.1004-1009,Sep.1977.

[3].Arai Y, Aqui T, Nakajima M: A fast DCT-SQ Scheme for images, Trans IEICE #71 (1988), 1095-1097

### ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE)

[4]. Yeonsik Jeong, Imgeun Lee, Hak Soo Kim, Kyu tae Park, "Fast DCT algorithm with fewer multiplication stage" , Electronics Letters  $16^{\rm th}April$  1988 vol.34, No. 8

[5]. C. Loeffler, A. Lightenberg, and G. Moschytz, "Practical fast 1-D DCT algorithms with 11multiplications", Proc. IEEE ICASSP, vol. 2, pp. 988–991, Feb. 1989.

[6] B.G. Lee, "FCT - A Fast Cosine Transform," IEEE International Conference on Acoustics, Speech and Signal Processing San Diego 1984, pp. 28A.3.1-28A3.4, March 1984.

[7] H. S. Hou, "A Fast Algorithm For Computing the Discrete Cosine Transform," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, No. 10, pp.1455-1461, Oct. 1987

[8] C. W. Kok, "Fast Algorithm for Computing Discrete Cosine Transform," IEEE Trans. Signal Process. , vol. 45, NO.3, pp.757-760, Mar. 1977

[9] P. Lee and F.-Y. Huang, "Restructured Recursive DCT and DST Algorithms," IEEE Trans. Signal Process. vol. 42, NO. 7, pp.1600-1609, Jul. 1994

[10] Z. Cvetkovic and M. V. Popovic, "New Fast Recursive Algorithms for the Computation of Discrete Cosine and Sine Transforms," IEEE Trans. Signal Process., vol. 40, NO. 8, pp.2083-2086, Aug. 1992.

[11] M. Vetterli and H. Nussbaumer, "Simple FFT and DCT algorithms with reduced number of operations," Signal Process., vol. 6, pp. 267–278, Aug. 1984.

[12].Swapan Kumar Samaddar, Atri Sanyal,Amitabha Sinha, "A Generalized Architecture for Linear Transform", Proc. IEEE International Conference CNC 2010, Oct 04-05, 2010, Calicut, Kerala, India.

[13].Atri Sanyal, Swapan K Samaddar, "A Combined Architecture for FDCT Algorithms", Proc IEEE 3<sup>rd</sup> International Conference on ICCCT 2012, Nov 23-25, 2012, MNNIT Allahabad, India. IEEE Computer society, PP 33-37, ISBN: 978-0-7695-4872-2/12



Atri Sanyal is an Assistant Professor in the Department of Computer Application of NSHM College of Management and Technology, Kolkata (affiliated to MAKAUT, WB). He is currently pursuing his Ph.D. from MAKAUT, WB under Prof. Amitabha Sinha in the field of Reconfigurable computing architecture of image processing applications. He holds a M.Sc. in Computer Science degree from Banaras Hindu University and a ME(CSE) degree from West Bengal University of Technology ( currently renamed MAKAUT, WB). He has written 12 conference and journal papers and co-authored two books published by Lap Lombard publishing, Germany. His research interests are Image processing, Reconfigurable computing, Computer Architecture, Data Mining etc. He has guided a number of M.sc (CS) and BCA students in their projects. He has teaching and academic administration experience of more than 12 years.



Saloni Kumari is currently working in Deloitte Technologies. He has passed BCA from NSHM College of Management and Technology, Kolkata (affiliated to MAKAUT,WB) in 2018. She has undergone her final year project under the supervision of Atri Sanyal in the domain of reconfigurable architecture on image processing applications. Her research interest includes Reconfigurable Computer architecture, Image and signal processing.



Dr. Amitabha Sinha is the Director of Birbhum Institute of Engineering and Technology, an AICTE approved Govt Aided

### ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE)

College under MaulanaAbulKalam Azad University of Technology, West Bengal (MAKAUT, WB). With a graduation in Electronics & Tele-Communication Engineering from Bengal Engineering College (Now IIEST), Shibpore and a PostGraduation in Electronics from University of Kent at Canterbury (U.K.), Prof. Sinha holds a Ph.D degree in Computer Sc. & Engg. from Indian institute of Technology (IIT), Delhi which he had obtained in 1984. He is a Fellow of the Institute of Engineers (India). Prof. Amitabha Sinha has been working in industry, premier academic institutes, R&D centers and IT/Telecom organizations in India & abroad for more than thirty two (32) years and his areas of research include Embedded System Design, VLSI design, Digital Signal Processing, Re-configurable Architecture using FPGAs, Software Defined Radio, Processor Architecture and System Onchip Design, etc. He had published more than 85 research papers in International journals and conferences, out of which more than 30 in journals. He has co-authored four books published by Lap Lambard publishing, Germany.Prof. Sinha had chaired a no of conferences including IEEE and delivered invited talks in India, U.S.A., Singapore, China, Germany, Russia, Australia and Hong-Kong. He has guided more than 100 M. Tech. students and a number of Ph.D students.