Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 - - PowerPoint PPT Presentation

optimal sparseness in binary adders
SMART_READER_LITE
LIVE PREVIEW

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 - - PowerPoint PPT Presentation

Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders Structural features Recurrence algorithms Weinberger Ling Minimum depth structures Kogge-Stone Ladner-Fischer


slide-1
SLIDE 1

Optimal Sparseness in Binary Adders

ARITH 22 Lyon, France 2015

slide-2
SLIDE 2

Outline

  • Parallel Adders

– Structural features – Recurrence algorithms

  • Weinberger
  • Ling

– Minimum depth structures

  • Kogge-Stone
  • Ladner-Fischer
  • Sparse Adders

– Sparse adders in literature

  • Energy Optimal Sparseness

– Limits on sparseness – Effect of increased sparseness on adder energy

  • Implementation results
  • Conclusion
slide-3
SLIDE 3

Parallel Adder Structure

slide-4
SLIDE 4

Structural Features of Parallel Adders

  • Logic Depth (LD): maximum number of stages from input to
  • utput
  • Prefix (P): number of signals (or maximum fan-in)

processed at each stage.

– Prefix 2 means two signals are processed in a node. – Logical depth changes depending on the prefix.

  • minimum possible number of stages = logRN (N-bit adder, prefix R).

– For N=64 : LDmin = 6 for prefix 2, LDmin = 3 for prefix 4.

  • Fan-out (F): The maximum number of logical branching in

the prefix tree.

  • Wiring Complexity (WC): The maximum number of wire

tracks passing along a bit-pitch of the technology in any stage of the prefix tree.

slide-5
SLIDE 5

Recurrence Algorithms

Weinberger Ling

slide-6
SLIDE 6

Minimum Depth Adders

Kogge-Stone Ladner-Fisher

P.M. Kogge and H.S. Stone, “A parallel algorithm for the efficient solution

  • f a general class of recurrence equations”, IEEE Trans. Computers Vol. C-

22, No. 8, Aug. 1973, pp.786-793. R.E. Ladner, M.J. Fischer; ‘Parallel Prefix Computation’ JACM, 27(4):831- 838, Oct. 1980.

  • Minimum depth (log2N)
  • Minimum fanout (2)
  • Maximum wiring (N/2)
  • Minimum depth (log2N)
  • Maximum fanout (N/2)
  • Minimum wiring (1)
slide-7
SLIDE 7

SPARSE ADDER

slide-8
SLIDE 8

Sparse Adder Structure

  • Critical path in prefix adder

– Sum block: 1 gate – Carry block: 1+log2N gates

  • Cannot reduce critical path length

beyond log2N, however can move complexity to less critical sum block.

  • Solution: Sparse adder

– Generate every Mth carry signal – Pre-compute sum signals for missing carry signals – Select true sum signal based on computed carry signals

  • Dilutes carry block, complicates

sum block

  • Saves area, power without

changing critical path length

slide-9
SLIDE 9

Prefix Graphs for Sparse Adders

slide-10
SLIDE 10

SPARSE ADDERS IN LITERATURE

slide-11
SLIDE 11

Conditional Sum (COS) Adder

Sklansky, J.; , "Conditional-Sum Addition Logic," Electronic Computers, IRE Transactions on , vol.EC-9, no.2, pp.226-231, June 1960. 32-bit prefix 2 COS adder prefix scheme.

slide-12
SLIDE 12

Carry Select (CSL) Adder

Bedrij, O. J.; , "Carry-Select Adder," Electronic Computers, IRE Transactions on , vol.EC-11, no.0, pp.340-346, June 1962. 64-bit prefix 4 sparse 4 CSL adder prefix scheme.

slide-13
SLIDE 13

Sparse Adder [Mathew, 2003]

Mathew, S.; Anders, M.; Krishnamurthy, R.K.; Borkar, S.; , "A 4-GHz 130-nm address generation unit with 32-bit sparse-tree adder core," Solid-State Circuits, IEEE Journal of , vol.38, no.5, pp. 689- 695, May 2003. 32-bit prefix 2 sparse 4 LF prefix scheme Weinberger adder

slide-14
SLIDE 14

ENERGY OPTIMAL SPARSENESS

slide-15
SLIDE 15

Carry Tree Sparseness

  • Sparse carry trees reduce energy in parallel

adders

  • Energy improvement is due to the complexity

reduction of the carry path by reduced wiring and number of gates.

  • A certain amount of complexity is moved to

the sum path implying a limit on the sparseness of the carry tree.

slide-16
SLIDE 16

Carry Tree Sparseness cont.

  • Making the carry tree sparse does not change the

critical path length of the carry block.

  • However, increases the critical path length for the sum

block.

  • Critical path length of carry block for an N-bit Ling

adder using prefix 2 computations is

log2N

  • A sparse M adder uses M-bit parallel adders in the sum

block to compute conditional sum signals

  • Hence, critical path length for sum block is

2+log2M

slide-17
SLIDE 17

Limit on Sparseness

  • Weinberger recurrence

– Carry critical path: 1+log2N – Sum critical path: 2+log2M 2+log2M ≤ 1 + log2N ⇒ M ≤ N/2

  • Ling recurrence

– Carry critical path: log2N – Sum critical path: 2+log2M 2+log2M ≤ log2N ⇒ M ≤ N/4

slide-18
SLIDE 18

SUM PATH DESIGN IN A SPARSE ADDER

slide-19
SLIDE 19

Sum Path

Weinberger Ling

ci = ti−1 hi−1

slide-20
SLIDE 20

RCA vs PPA in Partial Sum Computation

RCA (Ripple Carry Adder) PPA (Parallel Prefix Adder)

Depth = 5 Depth = 4

slide-21
SLIDE 21

RCA vs PPA: Critical path length

Degree of Sparseness (M) Ripple carry (1+M) Parallel prefix (2+log2M) 2 3 3 4 5 4 8 9 5 16 17 6

slide-22
SLIDE 22

8-bit Partial Sum Computation using PPA Structure

slide-23
SLIDE 23

EFFECT OF INCREASED SPARSENESS

Theoretical results

slide-24
SLIDE 24

Total gate count

  • Gate counts are equal for KS and LF adders.
slide-25
SLIDE 25

Total Gate Complexity

  • Complexity for a gate is defined as the number of inputs (for inverter 1, two-input nand 2, etc.)
  • For KS sparse 4 gives least complexity for 32 to 256 bit adders.
  • For LF sparse 2 gives least complexity for 32 and 64, and sparse 4 for 128 and 256 bit adders.
slide-26
SLIDE 26

Normalized Gate Complexity

  • Complexities are normalized to their full carry tree (sparseness 1) complexities.
  • For KS sparseness achieves 30% reduction in complexity.
  • For LF sparseness achieves 20% reduction in complexity.
slide-27
SLIDE 27

Total Wire Complexity

  • Wire complexity is defined as the total wire length (e.g. a wire from bit 32 to 64 will have a

length of 32 units).

  • For KS complexity reduces as sparseness increases.
  • For LF wire cmplx. optimum sparseness is 2 for 32 and 64 bit, and 4 for 128 and 256 bit adders.
slide-28
SLIDE 28

Normalized Wire Complexity

  • Complexities are normalized to their full carry tree (sparseness 1) complexities.
  • For KS sparseness achieves 80% reduction in complexity.
  • For LF sparseness achieves 20% reduction in complexity.
slide-29
SLIDE 29

Theoretical Results

  • For 64-bit LF adders, sparse 2 yields both minimum gate complexity and

total wire length

– It must be noted that the reduction in gate complexity in LF adder is due to removal of buffers as opposed to the more complex AND-OR gates in KS adder. – Hence, the improvement in gate complexity for LF adder is smaller compared to the improvement in KS adder. – The increase in gate complexity beyond sparse 8 in KS adder will circumvent energy savings achieved through reduced wiring complexity.

  • Energy optimum sparseness degree will be determined by the gate

capacitance to the wire capacitance ratio.

– For low performance design region, gate sizes are small hence wire capacitances will dominate and KS sparse 8 is expected to outperform KS sparse 4 in terms of energy at same performance. – For LF adder on the other hand, it is not worth going beyond sparse 4 due to increased complexity in both measures.

  • For 128- and 256-bit adders sparse 4 yields the most savings for both KS

and LF structures.

slide-30
SLIDE 30

RESULTS

slide-31
SLIDE 31

Technology

Technology

  • 45nm TSMC
  • VDD= 1.1V
  • Temp = 25`C
  • Typical process corner
  • Multi-Vth standard cell

library (low, standard, high) STDCELL Library

Gate Available Strength AOI21 1x,2x,4x,6x,8x AOI22 1x,2x,4x,6x,8x INV 1x,2x,4x,6x,8x,12x,16x,32x NAND2 1x,2x,4x,6x,8x NOR2 1x,2x,4x,6x,8x OAI21 1x,2x,4x,6x,8x OAI22 1x,2x,4x,6x,8x

slide-32
SLIDE 32

Design Environment

  • Designed adders

– KS adder w/ full, sparse 2, sparse 4, and sparse 8 carry trees – LF adder w/ full, sparse 2, sparse 4, and sparse 8 carry trees

  • Circuit sizing using Design

Compiler

  • Placement and routing

using Encounter

  • Post layout simulations

using Primetime

  • Input driver: 16x inverter
  • Output load: 16x inverter
  • 25% activity at inputs
  • Adders designed for

minimum energy using delay targets between 300ps to 400ps.

slide-33
SLIDE 33

Energy-Delay

slide-34
SLIDE 34

Leakage Power

slide-35
SLIDE 35

Wire Energy

slide-36
SLIDE 36

Conclusion

  • Energy savings of 50% and 22%, and leakage power savings
  • f 70% and 40% are achieved with increased sparseness

degree of carry trees for KS and LF adders, respectively.

  • For 64-bit KS Ling adder, energy optimal sparseness is 4. For

64-bit LF Ling adder, energy optimal sparseness is 2.

  • Both optimal KS and LF adders reach the same minimum

delay target of 300ps.

  • Experimental results suggest that LF S2 is 7% more energy

efficient than KS S4 at minimum delay point.

  • Theoretical results suggest that sparse 4 carry tree should

be used for both KS and LF adders of sizes 128-bit and above.

slide-37
SLIDE 37

THANK YOU …

Questions?