A Novel Modular Adder for One Thousand Bits and More Using Fast - - PowerPoint PPT Presentation

a novel modular adder for one thousand bits and more
SMART_READER_LITE
LIVE PREVIEW

A Novel Modular Adder for One Thousand Bits and More Using Fast - - PowerPoint PPT Presentation

A Novel Modular Adder for One Thousand Bits and More Using Fast Carry Chains of Modern FPGAs Marcin Rogawski, Ekawat Homsirikamol & Kris Gaj George Mason University USA 1 Co-Authors Ekawat Homsirikamol Marcin Rogawski a.k.a


slide-1
SLIDE 1

¡ A Novel Modular Adder for One Thousand Bits and More Using Fast Carry Chains of Modern FPGAs

1 ¡

Marcin Rogawski, Ekawat Homsirikamol & Kris Gaj George Mason University USA

slide-2
SLIDE 2

Co-Authors

Ekawat Homsirikamol a.k.a “Ice” Marcin Rogawski PhD @ GMU, Summer 2013 Currently @ Cadence Design Systems San Jose, CA PhD Student

slide-3
SLIDE 3

3

  • Adders used in multiple branches of science & engineering
  • Basic building block of more complex arithmetic

computations (multiplication, modular reduction, etc.)

  • Need for long-operand adders (≥1024 bits) in cryptography

(RSA, Diffie-Hellman, Elliptic Curve Cryptography, Pairing-Based Cryptography, post-quantum cryptography)

  • FPGAs contain special dedicated resources (fast carry

chains) supporting fast addition, but only for operands in the range of 32-64 bits.

Motivation

slide-4
SLIDE 4

4

Fast Carry Chains of Modern FPGAs

cin b0 b1 a1 a0

1

s FA a0 b0 a1 b1 LUT 1 cout s

1

s s

a) b)

LUT 1 LUT LUT cin cout FA

Xilinx FPGAs Altera FPGAs

  • Minimize delays
  • Save reconfigurable resources
slide-5
SLIDE 5

5

Parallel Prefix Network (PPN) Adder

slide-6
SLIDE 6

6

Parallel Prefix Network – Major Concept (1)

Given: (gn-1, pn-1) …. (g2, p2) (g1, p1) (g0, p0) Generate-propagate signals for each bit position Calculate (in parallel): Generate-propagate signals for each block of bits starting at position 0 (g[0,n-1], p[0,n-1]) …. (g[0,2], p[0,2]) (g[0,1], p[0,1]) (g[0,0], p[0,0])

slide-7
SLIDE 7

7

Parallel Prefix Network – Major Concept (2)

Calculate:

pci = g[0,i-1] + c0p[0,i-1]

Assuming c0 = 0 (no need to cascade adders that are already very long):

pci = g[0,i-1]

where i=1..n Projected carry at position i

slide-8
SLIDE 8

8

Kogge-Stone PPN

  • Minimum Latency (log2N)
  • Large Area
slide-9
SLIDE 9

9

Brent-Kung PPN

  • Good trade-off between Latency (2 log2N – 2) and Area
slide-10
SLIDE 10

10

Parallel Prefix Network (PPN) Adder in FPGA

  • All logic must be implemented using LUTs!
  • Large PPN required (e.g., n=1024)
slide-11
SLIDE 11

11

Our High-Radix Parallel Prefix Network Adder

slide-12
SLIDE 12

12

GPS: Generate-Propagate-Sum in Xilinx FPGAs

slide-13
SLIDE 13

13

S: Sum unit in Xilinx FPGAs

slide-14
SLIDE 14

14

Our High-Radix Parallel Prefix Network Adder

  • GPS and S units implemented using Fast Carry Chains
  • The size of PPN reduced from n=1024 to N=1024/w
slide-15
SLIDE 15

15

General Construction for the Modular Adder

R

n

2 −P

n n n n n

0 1

n n n

B A

cout#2 cout#1

R = A + B mod P R = A + B – P when A + B ≥ 2n > P (cout#1)

  • r

A + B – P ≥ 0 (cout#2) R = A + B

  • therwise
slide-16
SLIDE 16

16

Our Construction for the High-Radix PPN Modular Adder

1

spc

1

spc

N−1

fpN−1

N−1

fg fpcN−1 fpc1

w

1

1

sg GPS

w

1 sg

w

1

N−1

sg spN−1

w w

GPS

w w w w

GPS fg

w w w w

fpcN spc

N N−2

fg fg

N−1

fg

w w

GPS

w

fpN−1 GPSc GPSc

w w w

IP

w N−1

IP

1

IP

S S

w

sp

N−1

sg spN−1 sg 1 spc sp1 sp0 sel fpc1

1

1 fpc spcN−1

N−1

sel sel fp

1 1

fg fp0 fp PPN PPN R R R A B A B B A

1 1 N−1 N−1 N−1

slide-17
SLIDE 17

17

GPSc: Generate-Propagate-Sum with carry

slide-18
SLIDE 18

18

Our Construction for the High-Radix PPN Modular Adder

1

spc

1

spc

N−1

fpN−1

N−1

fg fpcN−1 fpc1

w

1

1

sg GPS

w

1 sg

w

1

N−1

sg spN−1

w w

GPS

w w w w

GPS fg

w w w w

fpcN spc

N N−2

fg fg

N−1

fg

w w

GPS

w

fpN−1 GPSc GPSc

w w w

IP

w N−1

IP

1

IP

S S

w

sp

N−1

sg spN−1 sg 1 spc sp1 sp0 sel fpc1

1

1 fpc spcN−1

N−1

sel sel fp

1 1

fg fp0 fp PPN PPN R R R A B A B B A

1 1 N−1 N−1 N−1

Two additions: Ÿ Ÿ overlapped in time Ÿ Ÿ sharing resources (S units)

slide-19
SLIDE 19

19

Target FPGA Families

Technology ¡ Low-­‑cost ¡ High-­‑ performance ¡ 65 ¡nm ¡ Virtex-­‑5 ¡ 45 ¡nm ¡ Spartan-­‑6 ¡ Technology ¡ Low-­‑cost ¡ High-­‑ performance ¡ 65 ¡nm ¡ Stra2x ¡III ¡ 40 ¡nm ¡ Cyclone ¡IV ¡

Xilinx FPGAs Altera FPGAs

slide-20
SLIDE 20

20

Design Flow

RTL Design VHDL ¡Code ¡ Option Optimization & Parameter Exploration FPGA ¡Tools ¡ Netlist ¡

Post ¡ Place ¡& ¡Route ¡ Results ¡

Functional Verification Timing Verification SpecificaEon ¡ Test ¡Vectors ¡

GMU ATHENa (FPL 2010)

ATHENa used to simplify parameter exploration (multiple values of generics) and option optimization for both Xilinx and Altera FPGAs

slide-21
SLIDE 21

21

Choosing the Best PPN & Word Size Adders – Altera Cyclone IV

slide-22
SLIDE 22

22

Choosing the Best PPN & Word Size Adders – Xilinx Spartan 6

slide-23
SLIDE 23

23

Choosing the Best PPN & Word Size Modular Adders – Altera Cyclone IV

slide-24
SLIDE 24

24

Choosing the Best PPN & Word Size Modular Adders – Xilinx Spartan 6

slide-25
SLIDE 25

25

The Best Choices of PPN Type & Word Size

Adders Modular Adders

Family PPN (w, N) PPN Word Size

Cyclone IV KS (16, 64) KS (16, 64) Stratix III BK (16, 64) BK (16, 64) Spartan 6 BK (32, 32) KS (128, 8) Virtex 5 KS (16, 64) KS (64, 16) KS: Kogge-Stone Parallel Prefix Network BK: Brent-Kung Parallel Prefix Network w – word size N – size of PPN

slide-26
SLIDE 26

26

The Best Long-Operand Adders Proposed to Date

H.D. Nguyen, B. Pasca, T.B. Preuβer, FPGA-Specific Arithmetic Optimizations of Short-Latency Adders FPL 2011, Chania, Greece

  • Adders based on Carry-Select Architecture

(rather than PPN Architecture)

  • Three specific architectures proposed
  • AAM: Add-Add-Multiplex
  • CAI: Compare-Add-Increment
  • CCA: Compare-Compare-Add
  • Limited results (only Virtex 5) included in the original paper
  • AAM architecture re-implemented and results collected

for different FPGAs

slide-27
SLIDE 27

27

Comparison with Other Adders – Virtex 5

slide-28
SLIDE 28

28

Comparison with Other Adders – Virtex 5

slide-29
SLIDE 29

29

Comparison with Other Adders – Spartan 6

slide-30
SLIDE 30

30

Comparison with Other Adders – Spartan 6

slide-31
SLIDE 31

31

Comparison with Other Adders – Cyclone IV

slide-32
SLIDE 32

32

Comparison with Other Adders – Stratix III

slide-33
SLIDE 33

33

Comparison Between Modular Adders Xilinx Virtex 5

slide-34
SLIDE 34

34

Comparison Between Modular Adders Xilinx Virtex 5

slide-35
SLIDE 35

35

Comparison Between Modular Adders Xilinx Spartan 6

slide-36
SLIDE 36

36

Comparison Between Modular Adders Xilinx Spartan 6

slide-37
SLIDE 37

37

Comparison Between Modular Adders Altera Stratix III

slide-38
SLIDE 38

38

Comparison Between Modular Adders Altera Stratix III

slide-39
SLIDE 39

39

Comparison Between Modular Adders Altera Cyclone IV

slide-40
SLIDE 40

40

Comparison Between Modular Adders Cyclone IV

slide-41
SLIDE 41

41

Modular Adder/Subtractor

cout#2 n

2 −P 0 1

n n n n n n n n n n n n n n n n

0 1 0 1 0 1

n n n n n

0 1

n

2 −P P

n n n n n n n

B A R

cout#2 cout#1

A P R 1 B

n

R

SUB SUB

A 1 B

n n

cout#1

slide-42
SLIDE 42

42

Overhead of Modular Adder/Subtractor

Altera Cyclone IV

slide-43
SLIDE 43

43

Overhead of Modular Adder/Subtractor

Altera Stratix III

slide-44
SLIDE 44

44

Overhead of Modular Adder/Subtractor

Xilinx Spartan 6

slide-45
SLIDE 45

45

Overhead of Modular Adder/Subtractor

Xilinx Virtex 5

slide-46
SLIDE 46

46

Proposed New Dedicated Resources of Modern FPGAs

  • Dedicated (hardwired) PPNs (Kogge-Stone and/or Brent-Kung)
  • Standard sizes (e.g., 32 and/or 64)
  • Support fast addition and modular addition for large operand sizes

(as described in this paper)

  • Support for fast addition and modular addition of medium
  • perand sizes (up to 64), using classical PPN adders
  • Pipelined registers that can be activated or bypassed
slide-47
SLIDE 47

47

  • A new family of

High-Radix Parallel Prefix Network Adders using fast carry chains of modern FPGAs

  • New family outperforming the best previously known

FPGA-specific adders and modular adders for Xilinx FPGAs

  • Very small performance penalty for an extension

to adders/subtractors

  • A proposal for embedding medium-size hardwired PPN

structures in the new generations of FPGAs

Conclusions

slide-48
SLIDE 48

48

  • Possible optimizations for Altera FPGAs
  • Better (preferably analytical) method of choosing

an optimum word size for

  • Other FPGA families
  • Other operand sizes
  • Optimal method of pipelining for adders and modular adders
  • Extended and more detailed proposal of new FPGA

resources supporting fast addition

Future Work

slide-49
SLIDE 49

Questions? Thank you!

49

Suggestions?

ATHENa: http:/cryptography.gmu.edu/athena CERG: http://cryptography.gmu.edu