Parallel prefix adders Kostas Vitoroulis, 2006. Presented to Dr. - - PowerPoint PPT Presentation

parallel prefix adders
SMART_READER_LITE
LIVE PREVIEW

Parallel prefix adders Kostas Vitoroulis, 2006. Presented to Dr. - - PowerPoint PPT Presentation

Parallel prefix adders Kostas Vitoroulis, 2006. Presented to Dr. A. J. Al-Khalili. Concordia University. Overview of presentation Parallel prefix operations Binary addition as a parallel prefix operation Prefix graphs Adder


slide-1
SLIDE 1

Parallel prefix adders

Kostas Vitoroulis, 2006. Presented to Dr. A. J. Al-Khalili. Concordia University.

slide-2
SLIDE 2

Overview of presentation

Parallel prefix operations Binary addition as a parallel prefix

  • peration

Prefix graphs Adder topologies Summary

slide-3
SLIDE 3

Parallel Prefix Operation

Terminology background:

Prefix: The outcome of the operation depends on the initial inputs. Parallel: Involves the execution of an operation in parallel. This is

done by segmentation into smaller pieces that are computed in parallel.

Operation: Any arbitrary primitive operator “ ° ” that is associative

is parallelizable

  • it is fast because the processing is accomplished in a parallel fashion.
slide-4
SLIDE 4

Example: Associative operations are parallelizable

Consider the logical OR operation: a + b The operation is associative: a + b + c + d = ((( a + b ) + c) + d ) = (( a + b ) + ( c + d))

Serial implementation: Parallel implementation:

slide-5
SLIDE 5

Operator: “ ° ” Input is a vector:

A = AnAn-1 … A1

Output is another vector:

B = BnBn-1 … B1

where B1 = A1 B2 = A1 ° A2 … Bn = A1 ° A2 … ° An this is the unary operator known as “scan” or “prefix sum”

Bn represents the

  • perator being applied to

all terms of the vector.

Mathematical Formulation: Prefix Sum

slide-6
SLIDE 6

Example of prefix sum

Consider the vector: A = AnAn-1 … A1 where element Ai is an integer The “*” unary operator, defined as: *A = B

With B = BnBn-1 … B1

B1 = A1 B2 = A1 * A2 B3 = A1 * A1 * A3 … and ‘ * ’ here is the integer addition operation.

slide-7
SLIDE 7

1 B1 2 B2 3 B3 5 B5 6 B6 4 B4 1 B1

Example of prefix sum

Calculation of *A, where A = 6 5 4 3 2 1 yields: B = *A = 21 15 10 6 3 1 Because the summation is associative the calculation can be done in parallel in the following manner:

2 B2 + 3 B3 + 5 B5 +

B1 = A1 = 1

6 B6 + + 4 B4 + +

B2 = A1 + A2 = 3

B3 = (A1 + A2) + A = 6

3

B6 = A6 +… +A1 = (A6 + A5) + ((A4+A3) +(A2 +A1)) = 21

1 B1 2 B2 3 B3 5 B5 6 B6 4 B4 1 B1 2 B2 + 3 B3 5 B5 6 B6 4 B4 + + + +

Parallel implementation versus Serial implementation

slide-8
SLIDE 8

Binary Addition

Each stage i i adds bits ai, bi, ci-1 and produces bits si, ci The following hold:

y3 y2 y1 x0 x1 x2 x3 + y0

This is the pen and paper addition of two 4-bit binary numbers x and y. c represents the generated carries. s represents the produced sum bits. A stage of the addition is the set of x and y bits being used to produce the appropriate sum and carry bits. For example the highlighted bits x2, y2 constitute stage 2 which generates carry c2 and sum s2 .

s0 s1 s2 s3

c0 c1 c2 c3

s4

ai bi ci Comment: Formal definition: The stage “kills” an incoming carry. “Kill” bit: “Propagate” bit: “Generate” bit: 1 ci-1 The stage “propagates” an incoming carry 1 ci-1 The stage “propagates” an incoming carry 1 1 1 The stage “generates” a carry out

i i i

y x p ⊕ =

i i i

y x k + =

i i i

y x g

  • =
slide-9
SLIDE 9

Binary Addition

The carry ci generated by a stage i i is given by the equation: This equation can be simplified to: The “ai” term in the equation being the “alive” bit. The later form of the equation uses an OR gate instead of an XOR which is a more efficient gate when implemented in CMOS technology. Note that: Where ki is the “kill” bit defined in the table above. ai bi ci Comment: Formal definition: The stage “kills” an incoming carry. “Kill” bit: “Propagate” bit: “Generate” bit: 1 ci-1 The stage “propagates” an incoming carry 1 ci-1 The stage “propagates” an incoming carry 1 1 1 The stage “generates” a carry out

( )

1 1 − −

⋅ ⊕ + ⋅ = ⋅ + =

i i i i i i i i i

c y x y x c p g c

i i i

y x p ⊕ =

i i i

y x k + =

i i i

y x g

  • =

( )

1 1 − −

⋅ + = ⋅ + + ⋅ =

i i i i i i i i i

c a g c y x y x c

i i

k a =

slide-10
SLIDE 10

Carry Look Ahead adders

The CLA adder has the following 3-stage structure:

Pre-calculation of pi, gi for each stage Calculation of carry ci for each stage. Combine ci and pi of each stage to generate the sum bits si

Final sum.

slide-11
SLIDE 11

Carry Look Ahead adders

The pre-calculation stage is implemented using the

equations for pi, gi shown at a previous slide:

Alternatively using the “alive” bit:

  • Note the symmetry when we use the “propagate” or the “alive” bit… We can use them interchangeably in the equations!

x0y0 p0 g0 x1y1 p1 g1 x2y2 p2 g2 x0y0 a0 g0 x1y1 a1 g1 x2y2 a2 g2

slide-12
SLIDE 12

Carry Look Ahead adders

The carry calculation stage is implemented using the

equations produced when unfolding the recursive equation:

1 1 − −

⋅ + = ⋅ + =

i i i i i i i

c a g c p g c ( )

K etc g p p g p g g p g p g c p g c g p g c g c

1 2 1 2 2 1 1 2 2 1 2 2 2 1 1 1

⋅ ⋅ + ⋅ + = ⋅ + ⋅ + = ⋅ + = ⋅ + = =

g0p0 c0 g1p1 c1 c2 g2p2

Carry generator block

slide-13
SLIDE 13

Carry Look Ahead adders

The final sum calculation stage is implemented using the carry and

propagate bits ci,pi:

If the ‘alive’ bit ai is used the final sum stage becomes more complex

as implied by the equations above.

cinp0 s0 c2p3 s3 c1p2 s2 c0p1 s1

i i i i i i i i i i i i i

y x a with c a g s Note y x p with c p s + = ⋅ + = ⊕ = ⊕ =

− −

, : ,

1 1

slide-14
SLIDE 14

Binary addition as a prefix sum problem.

( )( ) ( )

1 1

, , , p g p g p g

n n n n

K

− −

We define a new operator: “ ° ” Input is a vector of pairs of ‘propagate’ and ‘generate’ bits: Output is a new vector of pairs: Each pair of the output vector is calculated by the

following definition:

) , ( ) , ( : ) , ( ) , ( ) , (

1 1

p g P G Where P G p g P G

i i i i i i

= =

− −

  • (

)( ) ( )

1 1

, , , P G P G P G

n n n n

K

− −

  • perations

AND OR the being with p p g p g p g p g

y x y x x y y x x

, , ) , ( ) , ( ) , ( ⋅ + ⋅ ⋅ + =

slide-15
SLIDE 15

Binary addition as a prefix sum problem.

) , ( ) , ( ) , ( ) , (

x x x x x x x x x x x

p g p p g p g p g p g = ⋅ ⋅ + =

  • )

, ( ) , ( ) , ( ) , ( ) , (

1 1 1 1 1 1

p g P G Where P G p g P G

i i i i i i

= =

− −

  • Properties of operator “ ° ”:

Associativity (hence parallelization)

Easy to prove based on the fact that the logical AND,

OR operations are associative.

With the definition:

Gi becomes the carry signal at stage i of an adder. Illustration on

next slide.

The operation is idempotent

Which implies

n m and j i Where P G P G P G

j m j m n i n i j i j i

≥ ≥ = ) , ( ) , ( ) , (

: : : : : :

slide-16
SLIDE 16

Binary Addition as a prefix sum problem.

K

  • etc

p p p g p p g p g p p p g p g p g P G p g P G p p g p g P G p g P G p g P G have We p p g p g p g p g P G p g P G With

y x y x x y y x x i i i i i i

) ), ( ) ), ( ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( : ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( :

1 2 3 1 2 3 2 3 3 1 2 3 1 2 2 3 3 2 2 3 3 3 3 1 2 1 2 2 1 1 2 2 2 2 1 1 1 1 1 1

⋅ ⋅ ⋅ ⋅ + ⋅ + = ⋅ ⋅ ⋅ + ⋅ + = = ⋅ ⋅ + = = = ⋅ ⋅ + = =

− −

… The familiar carry bit generating equations for stage i i in a CLA adder.

) , ( ) , ( : p g P G Where =

b3 b2 b1 b0 a0 a1 a2 a3 +

A stage i will generate a carry if gi=aibi and propagate a carry if pi=XOR(ai,bi) Hence for stage i: ci=gi+pici-1

slide-17
SLIDE 17

Addition as a prefix sum problem.

Conclusion: The equations of the well known CLA adder can be formulated as a parallel prefix problem by employing a special operator “ ° ”. This operator is associative hence it can be implemented in a parallel fashion. A Parallel Prefix Adder (PPA) is equivalent to the CLA adder… The two differ in the way their carry generation block is implemented. In subsequent slides we will see different topologies for the parallel generation of carries. Adders that use these topologies are called Parallel Prefix Adders.

slide-18
SLIDE 18

Parallel Prefix Adders

The parallel prefix adder employs the 3-stage structure

  • f the CLA adder. The improvement is in the carry

generation stage which is the most intensive one:

Pre-calculation of Pi, Gi terms Calculation of the carries. This part is parallelizable to reduce time. Simple adder to generate the sum

Straight forward as in the CLA adder

Prefix graphs can be used to describe the structure that performs this part.

Straight forward as in the CLA adder

slide-19
SLIDE 19

Calculation of carries – Prefix Graphs

The components usually seen in a prefix graph are the following:

processing component: buffer component:

) , (

2 2

in in

p g

( ) (

)

2 1 2 1 1

, ,

in in in in in

  • ut
  • ut

p p g p g p g ⋅ ⋅ + =

( )

1 1,

in in p

g

( )

  • ut
  • ut p

g ,

( )

  • ut
  • ut p

g ,

( )

in in p

g ,

( )

  • ut
  • ut p

g ,

( )

  • ut
  • ut p

g ,

( ) ( )

in in

  • ut
  • ut

p g p g , , =

slide-20
SLIDE 20

Prefix graphs for representation of Prefix addition

  • Example: serial adder carry generation represented by prefix graphs

c1 (p2, g2) (p3, g3) (p4, g4) (p5, g5) (p6, g6) (p7, g7) (p8, g8) c2 c3 c4 c5 c6 c7 c8 (p1, g1)

slide-21
SLIDE 21

Key architectures for carry calculation:

1960: J. Sklansky – conditional adder 1973: Kogge-Stone adder 1980: Ladner-Fisher adder 1982: Brent-Kung adder 1987: Han Carlson adder 1999: S. Knowles

Other parallel adder architectures:

1981: H. Ling adder 2001: Beaumont-Smith

slide-22
SLIDE 22

1960: J. Sklansky – conditional adder

slide-23
SLIDE 23

1960: J. Sklansky – conditional adder

c1 (p2, g2) (p3, g3) (p4, g4) (p5, g5) (p6, g6) (p7, g7) (p8, g8) c2 c3 c4 c5 c6 c7 c8 (p1, g1)

The Sklansky adder has:

Minimal depth High fan-out nodes

slide-24
SLIDE 24

1973: Kogge-Stone adder

c1 (p2, g2) (p3, g3) (p4, g4) (p5, g5) (p6, g6) (p7, g7) (p8, g8) c2 c3 c4 c5 c6 c7 c8 (p1, g1)

The Kogge-Stone adder has:

Low depth High node count (implies more area). Minimal fan-out of 1 at each node (implies faster performance).

slide-25
SLIDE 25

1980: Ladner-Fischer adder

c1 (p2, g2) (p3, g3) (p4, g4) (p5, g5) (p6, g6) (p7, g7) (p8, g8) (p1, g1) c2 c3 c4 c5 c6 c7 c8

The Ladner-Fischer adder has:

Low depth High fan-out nodes

  • This adder topology appears the same as the Schlanskly conditional sum adder. Ladner-Fischer formulated

a parallel prefix network design space which included this minimal depth case. The actual adder they included as an application to their work had a structure that was slightly different than the above.

slide-26
SLIDE 26

1982: Brent-Kung adder

c1 (p2, g2) (p3, g3) (p4, g4) (p5, g5) (p6, g6) (p7, g7) (p8, g8) (p1, g1) c2 c3 c4 c5 c6 c7 c8

The Brent-Kung adder is the extreme boundary case of:

Maximum logic depth in PP adders (implies longer calculation

time).

Minimum number of nodes (implies minimum area).

slide-27
SLIDE 27

1987: Han Carlson adder

The Han-Carlson adder combines the Brent-Kung and

Kogge-Stone structures into a hybrid structure.

Efficient Suitable for VLSI implementation.

slide-28
SLIDE 28

1999: S. Knowles

Knowles proposed

adders that trade off:

Depth, interconnect,

area.

These adders are

bound by the Lander-Fischer (minimum depth) and Brent-Kung (minimum fanout) topologies.

Brent-Kung topology (Minimum fan-out) Ladner-Fischer topology (Minimum depth, high fanout) Knowles topologies (Varied fan-out at each level )

slide-29
SLIDE 29

An interesting taxonomy:

Harris[2003] presented an interesting 3-D taxonomy of the adders presented so far. Each axis represents a characteristic of the adders:

  • Fanout
  • Logic depth
  • Wire connections

He also proposed the following structure:

slide-30
SLIDE 30

1981: H. Ling adder

Ling Adders are a different family of adders. They can still be formulated as prefix adders. Ling adders differ from the “traditional” PP adders in that:

They are based on a different set of equations. The new set of equations introduces the following tradeoffs:

Precalculation of Pi, Gi terms is based on more complex equations

Calculation of the carries is based

  • n simpler equations

Final addition stage is more complex

slide-31
SLIDE 31

2001: Beaumont-Smith

c1 (p2, g2) (p3, g3) (p4, g4) (p5, g5) (p6, g6) (p7, g7) (p8, g8) (p1, g1) c2 c3 c4 c5 c6 c7 c8

The Beaumont-Smith adders incorporate nodes that can accept

more than a pair of inputs and produce the carry calculation.

These ‘higher valency’ nodes are optimized circuits for a specific

technology (CMOS).

The above topology is a Beaumont-Smith tree based on the

Kogge-Stone architecture

slide-32
SLIDE 32

Summary (1/3)

The parallel prefix formulation of binary addition

is a very convenient way to formally describe an entire family of parallel binary adders.

slide-33
SLIDE 33

Summary (2/3)

  • A parallel prefix adder can be seen as a 3-stage process:
  • There exist various architectures for the carry calculation part.
  • Trade-offs in these architectures involve the
  • area of the adder
  • its depth
  • the fan-out of the nodes
  • the overall wiring network.

Pre-calculation of Pi, Gi terms Calculation of the carries. Simple adder to generate the sum

slide-34
SLIDE 34

Summary (3/3)

Variations of parallel adders have been

  • proposed. These variations are based on:

Modifying the carry generation equations and

reformulating the prefix definition (Ling)

Restructuring the carry calculation trees based by

  • ptimizing for a specific technology (Beaumond-

Smith)

Other optimizations.

slide-35
SLIDE 35

References:

Beaumont-Smith, Cheng-Chew Lim, “Parallel Prefix Adder Design”, IEEE, 2001 Han, Carlson, “Fast Area-Efficient VLSI Adders, IEEE, 1987 Dimitrakopoulos, Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders”, IEEE 2005 Kogge, Stone, “A Parallel Algorithm for the Efficient solution of a General Class of Recurrence equations”, IEEE, 1973 Simon Knowles, “A Family of adders”, IEEE, 2001 Ladner, Fischer, “Parallel Prefix Computation”, ACM, 1980 Brent, Kung, “A regular Layout for Parallel Adders”, IEEE, 1982

  • H. Ling, “High-Speed Binary Adder”, IBM J. Res. And Dev., 1980
  • J. Sklansky, “Conditional-Sum Addition Logic”, IRE transactions on computers, 1960
  • D. Harris, “A Taxonomy of Parallel Prefix Networks”, IEEE, 2003