Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015

Outline • Parallel Adders – Structural features – Recurrence algorithms • Weinberger • Ling – Minimum depth structures • Kogge-Stone • Ladner-Fischer • Sparse Adders – Sparse adders in literature • Energy Optimal Sparseness – Limits on sparseness – Effect of increased sparseness on adder energy • Implementation results • Conclusion

Parallel Adder Structure

Structural Features of Parallel Adders • Logic Depth (LD): maximum number of stages from input to output • Prefix (P): number of signals (or maximum fan-in) processed at each stage. – Prefix 2 means two signals are processed in a node. – Logical depth changes depending on the prefix. • minimum possible number of stages = log R N ( N -bit adder, prefix R ). – For N =64 : LD min = 6 for prefix 2, LD min = 3 for prefix 4. • Fan-out (F): The maximum number of logical branching in the prefix tree. • Wiring Complexity (WC): The maximum number of wire tracks passing along a bit-pitch of the technology in any stage of the prefix tree.

Recurrence Algorithms Weinberger Ling

Minimum Depth Adders Kogge-Stone Ladner-Fisher - Minimum depth (log 2 N) - Minimum depth (log 2 N) - Minimum fanout (2) - Maximum fanout (N/2) - Maximum wiring (N/2) - Minimum wiring (1) P.M. Kogge and H.S. Stone, “A parallel algorithm for the efficient solution R.E. Ladner, M.J. Fischer; ‘Parallel Prefix Computation’ JACM, 27(4):831- of a general class of recurrence equations”, IEEE Trans. Computers Vol. C- 838, Oct. 1980. 22, No. 8, Aug. 1973, pp.786-793.

SPARSE ADDER

Sparse Adder Structure • Critical path in prefix adder – Sum block: 1 gate – Carry block: 1+log 2 N gates • Cannot reduce critical path length beyond log 2 N, however can move complexity to less critical sum block. • Solution: Sparse adder – Generate every M th carry signal – Pre-compute sum signals for missing carry signals – Select true sum signal based on computed carry signals • Dilutes carry block, complicates sum block • Saves area, power without changing critical path length

Prefix Graphs for Sparse Adders

SPARSE ADDERS IN LITERATURE

Conditional Sum (COS) Adder 32-bit prefix 2 COS adder prefix scheme. Sklansky, J.; , "Conditional-Sum Addition Logic," Electronic Computers, IRE Transactions on , vol.EC-9, no.2, pp.226-231, June 1960.

Carry Select (CSL) Adder 64-bit prefix 4 sparse 4 CSL adder prefix scheme. Bedrij, O. J.; , "Carry-Select Adder," Electronic Computers, IRE Transactions on , vol.EC-11, no.0, pp.340-346, June 1962.

Sparse Adder [Mathew, 2003] 32-bit prefix 2 sparse 4 LF prefix scheme Weinberger adder Mathew, S.; Anders, M.; Krishnamurthy, R.K.; Borkar, S.; , "A 4-GHz 130-nm address generation unit with 32-bit sparse-tree adder core," Solid-State Circuits, IEEE Journal of , vol.38, no.5, pp. 689- 695, May 2003.

ENERGY OPTIMAL SPARSENESS

Carry Tree Sparseness • Sparse carry trees reduce energy in parallel adders • Energy improvement is due to the complexity reduction of the carry path by reduced wiring and number of gates. • A certain amount of complexity is moved to the sum path implying a limit on the sparseness of the carry tree.

Carry Tree Sparseness cont. • Making the carry tree sparse does not change the critical path length of the carry block. • However, increases the critical path length for the sum block. • Critical path length of carry block for an N -bit Ling adder using prefix 2 computations is log 2 N • A sparse M adder uses M -bit parallel adders in the sum block to compute conditional sum signals • Hence, critical path length for sum block is 2+log 2 M

Limit on Sparseness • Weinberger recurrence – Carry critical path: 1+log 2 N – Sum critical path: 2+log 2 M 2+log 2 M ≤ 1 + log 2 N ⇒ M ≤ N /2 • Ling recurrence – Carry critical path: log 2 N – Sum critical path: 2+log 2 M 2+log 2 M ≤ log 2 N ⇒ M ≤ N /4

SUM PATH DESIGN IN A SPARSE ADDER

Sum Path Weinberger Ling c i = t i −1 h i −1

RCA vs PPA in Partial Sum Computation RCA (Ripple Carry Adder) PPA (Parallel Prefix Adder) Depth = 5 Depth = 4

RCA vs PPA: Critical path length Degree of Ripple carry Parallel prefix Sparseness ( M ) (1+ M ) (2+log 2 M ) 2 3 3 4 5 4 8 9 5 16 17 6

8-bit Partial Sum Computation using PPA Structure

Theoretical results EFFECT OF INCREASED SPARSENESS

Total gate count -Gate counts are equal for KS and LF adders.

Total Gate Complexity - Complexity for a gate is defined as the number of inputs (for inverter 1, two-input nand 2, etc.) - For KS sparse 4 gives least complexity for 32 to 256 bit adders. - For LF sparse 2 gives least complexity for 32 and 64, and sparse 4 for 128 and 256 bit adders.

Normalized Gate Complexity - Complexities are normalized to their full carry tree (sparseness 1) complexities. - For KS sparseness achieves 30% reduction in complexity. - For LF sparseness achieves 20% reduction in complexity.

Total Wire Complexity - Wire complexity is defined as the total wire length (e.g. a wire from bit 32 to 64 will have a length of 32 units). - For KS complexity reduces as sparseness increases. - For LF wire cmplx. optimum sparseness is 2 for 32 and 64 bit, and 4 for 128 and 256 bit adders.

Normalized Wire Complexity - Complexities are normalized to their full carry tree (sparseness 1) complexities. - For KS sparseness achieves 80% reduction in complexity. - For LF sparseness achieves 20% reduction in complexity.

Theoretical Results • For 64-bit LF adders, sparse 2 yields both minimum gate complexity and total wire length – It must be noted that the reduction in gate complexity in LF adder is due to removal of buffers as opposed to the more complex AND-OR gates in KS adder. – Hence, the improvement in gate complexity for LF adder is smaller compared to the improvement in KS adder. – The increase in gate complexity beyond sparse 8 in KS adder will circumvent energy savings achieved through reduced wiring complexity. • Energy optimum sparseness degree will be determined by the gate capacitance to the wire capacitance ratio. – For low performance design region, gate sizes are small hence wire capacitances will dominate and KS sparse 8 is expected to outperform KS sparse 4 in terms of energy at same performance. – For LF adder on the other hand, it is not worth going beyond sparse 4 due to increased complexity in both measures. • For 128- and 256-bit adders sparse 4 yields the most savings for both KS and LF structures.

RESULTS

Technology Technology STDCELL Library • 45nm TSMC Gate Available Strength • VDD= 1.1V • Temp = 25`C AOI21 1x,2x,4x,6x,8x AOI22 1x,2x,4x,6x,8x • Typical process corner INV 1x,2x,4x,6x,8x,12x,16x,32x • Multi-Vth standard cell NAND2 1x,2x,4x,6x,8x library (low, standard, high) NOR2 1x,2x,4x,6x,8x OAI21 1x,2x,4x,6x,8x OAI22 1x,2x,4x,6x,8x

Design Environment • Designed adders • Input driver: 16x inverter – KS adder w/ full, sparse 2, • Output load: 16x inverter sparse 4, and sparse 8 • 25% activity at inputs carry trees • Adders designed for – LF adder w/ full, sparse 2, minimum energy using sparse 4, and sparse 8 carry trees delay targets between • Circuit sizing using Design 300ps to 400ps. Compiler • Placement and routing using Encounter • Post layout simulations using Primetime

Energy-Delay

Leakage Power

Wire Energy

Conclusion • Energy savings of 50% and 22%, and leakage power savings of 70% and 40% are achieved with increased sparseness degree of carry trees for KS and LF adders, respectively. • For 64-bit KS Ling adder, energy optimal sparseness is 4. For 64-bit LF Ling adder, energy optimal sparseness is 2. • Both optimal KS and LF adders reach the same minimum delay target of 300ps. • Experimental results suggest that LF S2 is 7% more energy efficient than KS S4 at minimum delay point. • Theoretical results suggest that sparse 4 carry tree should be used for both KS and LF adders of sizes 128-bit and above.

Questions? THANK YOU …

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 - PowerPoint PPT Presentation

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders Structural features Recurrence algorithms Weinberger Ling Minimum depth structures Kogge-Stone Ladner-Fischer

EE 457 Unit 2b Fast Adders Carry-Lookahead Adders (Carry-Lookahead Adder) FAST ADDERS 2b.3

EE 457 Unit 2b Fast Adders (Carry-Lookahead Adder) 2 Carry-Lookahead Adders FAST ADDERS 3

Outline Digital CMOS design Arithmetic operators Adders Adders Comparators Shifters

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Adding Adders in County Durham Agenda Background Vulnerability of adders Connecting

Combinational Circuits Chapter 3 S. Dandamudi Outline Introduction Adders

Combinational Circuits Chapter 3 S. Dandamudi Outline Introduction Adders

Unit 11 Adders & Arithmetic Circuits 11.2 Learning Outcomes I understand what gates are

Unit 12 Adders & Arithmetic Circuits 12.2 Learning Outcomes I understand what gates are

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

Rio Plus 20 and the Implications for NCDs INTERRELATIONS BETWEEN NCDS AND SUSTAINABLE DEVELOPMENT

Consistent inversion of noisy non-abelian X-ray transforms Gabriel P. Paternain IAS Workshop on

Digital Redlining: Ensuring Equal Access in the Digital Age Dr. Chris Gilliard @hypervisible 1.

Models of Language Evolution Theory Replicator Dynamics Session 4: Introduction to Game Theory

CS 204: BGP Jiasi Chen Lectures: MWF 12:10-1pm Humanities and Social Sciences 1403

Computer Communication Networks Final Review ICEN/ICSI 416 Fall 2016 Prof. Dola Saha 1

vChain: Enabling Verifjable Boolean Range Queries over Blockchain Databases Cheng Xu Ce Zhang

P EERING : An AS for Us (and You) 1 We are building a BGP testbed called P EERING Exchange

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 - PowerPoint PPT Presentation

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders Structural features Recurrence algorithms Weinberger Ling Minimum depth structures Kogge-Stone Ladner-Fischer

EE 457 Unit 2b Fast Adders Carry-Lookahead Adders (Carry-Lookahead Adder) FAST ADDERS 2b.3

EE 457 Unit 2b Fast Adders (Carry-Lookahead Adder) 2 Carry-Lookahead Adders FAST ADDERS 3

Outline Digital CMOS design Arithmetic operators Adders Adders Comparators Shifters

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Adding Adders in County Durham Agenda Background Vulnerability of adders Connecting

Combinational Circuits Chapter 3 S. Dandamudi Outline Introduction Adders

Combinational Circuits Chapter 3 S. Dandamudi Outline Introduction Adders

Unit 11 Adders &amp; Arithmetic Circuits 11.2 Learning Outcomes I understand what gates are

Unit 12 Adders &amp; Arithmetic Circuits 12.2 Learning Outcomes I understand what gates are

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

Rio Plus 20 and the Implications for NCDs INTERRELATIONS BETWEEN NCDS AND SUSTAINABLE DEVELOPMENT

Consistent inversion of noisy non-abelian X-ray transforms Gabriel P. Paternain IAS Workshop on

Digital Redlining: Ensuring Equal Access in the Digital Age Dr. Chris Gilliard @hypervisible 1.

Models of Language Evolution Theory Replicator Dynamics Session 4: Introduction to Game Theory

CS 204: BGP Jiasi Chen Lectures: MWF 12:10-1pm Humanities and Social Sciences 1403

Computer Communication Networks Final Review ICEN/ICSI 416 Fall 2016 Prof. Dola Saha 1

vChain: Enabling Verifjable Boolean Range Queries over Blockchain Databases Cheng Xu Ce Zhang

P EERING : An AS for Us (and You) 1 We are building a BGP testbed called P EERING Exchange

Unit 11 Adders & Arithmetic Circuits 11.2 Learning Outcomes I understand what gates are

Unit 12 Adders & Arithmetic Circuits 12.2 Learning Outcomes I understand what gates are