Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 - - PowerPoint PPT Presentation
Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 - - PowerPoint PPT Presentation
Optimal Sparseness in Binary Adders ARITH 22 Lyon, France 2015 Outline Parallel Adders Structural features Recurrence algorithms Weinberger Ling Minimum depth structures Kogge-Stone Ladner-Fischer
Outline
- Parallel Adders
– Structural features – Recurrence algorithms
- Weinberger
- Ling
– Minimum depth structures
- Kogge-Stone
- Ladner-Fischer
- Sparse Adders
– Sparse adders in literature
- Energy Optimal Sparseness
– Limits on sparseness – Effect of increased sparseness on adder energy
- Implementation results
- Conclusion
Parallel Adder Structure
Structural Features of Parallel Adders
- Logic Depth (LD): maximum number of stages from input to
- utput
- Prefix (P): number of signals (or maximum fan-in)
processed at each stage.
– Prefix 2 means two signals are processed in a node. – Logical depth changes depending on the prefix.
- minimum possible number of stages = logRN (N-bit adder, prefix R).
– For N=64 : LDmin = 6 for prefix 2, LDmin = 3 for prefix 4.
- Fan-out (F): The maximum number of logical branching in
the prefix tree.
- Wiring Complexity (WC): The maximum number of wire
tracks passing along a bit-pitch of the technology in any stage of the prefix tree.
Recurrence Algorithms
Weinberger Ling
Minimum Depth Adders
Kogge-Stone Ladner-Fisher
P.M. Kogge and H.S. Stone, “A parallel algorithm for the efficient solution
- f a general class of recurrence equations”, IEEE Trans. Computers Vol. C-
22, No. 8, Aug. 1973, pp.786-793. R.E. Ladner, M.J. Fischer; ‘Parallel Prefix Computation’ JACM, 27(4):831- 838, Oct. 1980.
- Minimum depth (log2N)
- Minimum fanout (2)
- Maximum wiring (N/2)
- Minimum depth (log2N)
- Maximum fanout (N/2)
- Minimum wiring (1)
SPARSE ADDER
Sparse Adder Structure
- Critical path in prefix adder
– Sum block: 1 gate – Carry block: 1+log2N gates
- Cannot reduce critical path length
beyond log2N, however can move complexity to less critical sum block.
- Solution: Sparse adder
– Generate every Mth carry signal – Pre-compute sum signals for missing carry signals – Select true sum signal based on computed carry signals
- Dilutes carry block, complicates
sum block
- Saves area, power without
changing critical path length
Prefix Graphs for Sparse Adders
SPARSE ADDERS IN LITERATURE
Conditional Sum (COS) Adder
Sklansky, J.; , "Conditional-Sum Addition Logic," Electronic Computers, IRE Transactions on , vol.EC-9, no.2, pp.226-231, June 1960. 32-bit prefix 2 COS adder prefix scheme.
Carry Select (CSL) Adder
Bedrij, O. J.; , "Carry-Select Adder," Electronic Computers, IRE Transactions on , vol.EC-11, no.0, pp.340-346, June 1962. 64-bit prefix 4 sparse 4 CSL adder prefix scheme.
Sparse Adder [Mathew, 2003]
Mathew, S.; Anders, M.; Krishnamurthy, R.K.; Borkar, S.; , "A 4-GHz 130-nm address generation unit with 32-bit sparse-tree adder core," Solid-State Circuits, IEEE Journal of , vol.38, no.5, pp. 689- 695, May 2003. 32-bit prefix 2 sparse 4 LF prefix scheme Weinberger adder
ENERGY OPTIMAL SPARSENESS
Carry Tree Sparseness
- Sparse carry trees reduce energy in parallel
adders
- Energy improvement is due to the complexity
reduction of the carry path by reduced wiring and number of gates.
- A certain amount of complexity is moved to
the sum path implying a limit on the sparseness of the carry tree.
Carry Tree Sparseness cont.
- Making the carry tree sparse does not change the
critical path length of the carry block.
- However, increases the critical path length for the sum
block.
- Critical path length of carry block for an N-bit Ling
adder using prefix 2 computations is
log2N
- A sparse M adder uses M-bit parallel adders in the sum
block to compute conditional sum signals
- Hence, critical path length for sum block is
2+log2M
Limit on Sparseness
- Weinberger recurrence
– Carry critical path: 1+log2N – Sum critical path: 2+log2M 2+log2M ≤ 1 + log2N ⇒ M ≤ N/2
- Ling recurrence
– Carry critical path: log2N – Sum critical path: 2+log2M 2+log2M ≤ log2N ⇒ M ≤ N/4
SUM PATH DESIGN IN A SPARSE ADDER
Sum Path
Weinberger Ling
ci = ti−1 hi−1
RCA vs PPA in Partial Sum Computation
RCA (Ripple Carry Adder) PPA (Parallel Prefix Adder)
Depth = 5 Depth = 4
RCA vs PPA: Critical path length
Degree of Sparseness (M) Ripple carry (1+M) Parallel prefix (2+log2M) 2 3 3 4 5 4 8 9 5 16 17 6
8-bit Partial Sum Computation using PPA Structure
EFFECT OF INCREASED SPARSENESS
Theoretical results
Total gate count
- Gate counts are equal for KS and LF adders.
Total Gate Complexity
- Complexity for a gate is defined as the number of inputs (for inverter 1, two-input nand 2, etc.)
- For KS sparse 4 gives least complexity for 32 to 256 bit adders.
- For LF sparse 2 gives least complexity for 32 and 64, and sparse 4 for 128 and 256 bit adders.
Normalized Gate Complexity
- Complexities are normalized to their full carry tree (sparseness 1) complexities.
- For KS sparseness achieves 30% reduction in complexity.
- For LF sparseness achieves 20% reduction in complexity.
Total Wire Complexity
- Wire complexity is defined as the total wire length (e.g. a wire from bit 32 to 64 will have a
length of 32 units).
- For KS complexity reduces as sparseness increases.
- For LF wire cmplx. optimum sparseness is 2 for 32 and 64 bit, and 4 for 128 and 256 bit adders.
Normalized Wire Complexity
- Complexities are normalized to their full carry tree (sparseness 1) complexities.
- For KS sparseness achieves 80% reduction in complexity.
- For LF sparseness achieves 20% reduction in complexity.
Theoretical Results
- For 64-bit LF adders, sparse 2 yields both minimum gate complexity and
total wire length
– It must be noted that the reduction in gate complexity in LF adder is due to removal of buffers as opposed to the more complex AND-OR gates in KS adder. – Hence, the improvement in gate complexity for LF adder is smaller compared to the improvement in KS adder. – The increase in gate complexity beyond sparse 8 in KS adder will circumvent energy savings achieved through reduced wiring complexity.
- Energy optimum sparseness degree will be determined by the gate
capacitance to the wire capacitance ratio.
– For low performance design region, gate sizes are small hence wire capacitances will dominate and KS sparse 8 is expected to outperform KS sparse 4 in terms of energy at same performance. – For LF adder on the other hand, it is not worth going beyond sparse 4 due to increased complexity in both measures.
- For 128- and 256-bit adders sparse 4 yields the most savings for both KS
and LF structures.
RESULTS
Technology
Technology
- 45nm TSMC
- VDD= 1.1V
- Temp = 25`C
- Typical process corner
- Multi-Vth standard cell
library (low, standard, high) STDCELL Library
Gate Available Strength AOI21 1x,2x,4x,6x,8x AOI22 1x,2x,4x,6x,8x INV 1x,2x,4x,6x,8x,12x,16x,32x NAND2 1x,2x,4x,6x,8x NOR2 1x,2x,4x,6x,8x OAI21 1x,2x,4x,6x,8x OAI22 1x,2x,4x,6x,8x
Design Environment
- Designed adders
– KS adder w/ full, sparse 2, sparse 4, and sparse 8 carry trees – LF adder w/ full, sparse 2, sparse 4, and sparse 8 carry trees
- Circuit sizing using Design
Compiler
- Placement and routing
using Encounter
- Post layout simulations
using Primetime
- Input driver: 16x inverter
- Output load: 16x inverter
- 25% activity at inputs
- Adders designed for
minimum energy using delay targets between 300ps to 400ps.
Energy-Delay
Leakage Power
Wire Energy
Conclusion
- Energy savings of 50% and 22%, and leakage power savings
- f 70% and 40% are achieved with increased sparseness
degree of carry trees for KS and LF adders, respectively.
- For 64-bit KS Ling adder, energy optimal sparseness is 4. For
64-bit LF Ling adder, energy optimal sparseness is 2.
- Both optimal KS and LF adders reach the same minimum
delay target of 300ps.
- Experimental results suggest that LF S2 is 7% more energy
efficient than KS S4 at minimum delay point.
- Theoretical results suggest that sparse 4 carry tree should
be used for both KS and LF adders of sizes 128-bit and above.
THANK YOU …
Questions?