High-Speed Elliptic Curve Cryptography Accelerator for Koblitz - - PowerPoint PPT Presentation

high speed elliptic curve cryptography accelerator for
SMART_READER_LITE
LIVE PREVIEW

High-Speed Elliptic Curve Cryptography Accelerator for Koblitz - - PowerPoint PPT Presentation

Preliminaries FPGA Implementation Results, Comparisons and Conclusions High-Speed Elliptic Curve Cryptography Accelerator for Koblitz Curves Kimmo J arvinen Jorma Skytt a Helsinki University of Technology Department of Signal


slide-1
SLIDE 1

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

High-Speed Elliptic Curve Cryptography Accelerator for Koblitz Curves

Kimmo J¨ arvinen Jorma Skytt¨ a

Helsinki University of Technology Department of Signal Processing and Acoustics Otakaari 5A, FIN-02150, Finland {Kimmo.Jarvinen,Jorma.Skytta}@tkk.fi

April 14, 2008

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-2
SLIDE 2

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Outline

1

Preliminaries Elliptic Curve Cryptography Koblitz Curves Window Method and Multiple Point Multiplication

2

FPGA Implementation Design Specifications Architecture of the Implementation

3

Results, Comparisons and Conclusions Results Comparisons Conclusions and Future Work

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-3
SLIDE 3

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Introduction to Elliptic Curve Cryptography

Public-key cryptography method which uses a group of points on an elliptic curve, E, defined over a finite field, Fq Faster and shorter keys than, e.g., RSA

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-4
SLIDE 4

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Introduction to Elliptic Curve Cryptography

Public-key cryptography method which uses a group of points on an elliptic curve, E, defined over a finite field, Fq Faster and shorter keys than, e.g., RSA Elliptic Curve Point Multiplication Q = kP where k is a positive integer and P = (x, y) is a point on E Computed with point additions, P1 + P2, and point doublings, 2P1

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-5
SLIDE 5

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Point Multiplication on Koblitz Curves

Koblitz curves Frobenius maps, φ(P1), instead of point doublings ⇒ faster computation k must be converter to τ-adic representation

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-6
SLIDE 6

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Point Multiplication on Koblitz Curves

Koblitz curves Frobenius maps, φ(P1), instead of point doublings ⇒ faster computation k must be converter to τ-adic representation Point multiplication Frobenius map for all bits of k Point addition if the bit is 1

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-7
SLIDE 7

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Point Multiplication on Koblitz Curves

Koblitz curves Frobenius maps, φ(P1), instead of point doublings ⇒ faster computation k must be converter to τ-adic representation Point multiplication Frobenius map for all bits of k Point addition if the bit is 1 Example 1001110001001111001 A AAA A AAAA A 10

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-8
SLIDE 8

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Point Multiplication on Koblitz Curves

Koblitz curves Frobenius maps, φ(P1), instead of point doublings ⇒ faster computation k must be converter to τ-adic representation Point multiplication Frobenius map for all bits of k Point addition if the bit is 1, point subtraction if ¯ 1 Example 1001110001001111001 A AAA A AAAA A 10 10100¯ 1000101000¯ 1001 A A S A A S A 7

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-9
SLIDE 9

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Window Method

Windowing further reduces the number of point additions

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-10
SLIDE 10

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Window Method

Windowing further reduces the number of point additions Idea of windowing Instead of computing AAA several times: Precompute AAA Use the precomputed value every time for the string 111! We precompute values for the strings 10¯ 1, 101, and 1001

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-11
SLIDE 11

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Window Method

Windowing further reduces the number of point additions Idea of windowing Instead of computing AAA several times: Precompute AAA Use the precomputed value every time for the string 111! We precompute values for the strings 10¯ 1, 101, and 1001 Example τNAF 10¯ 10100010010010100¯ 10¯ 1 A S A S A A A S S 9 Width-4 τNAF 301000000¯ 7000050000¯ 5 A A S A S 5

Precomputations:

3

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-12
SLIDE 12

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Multiple Point Multiplication

Sum of n point multiplications Q = k(1)P(1) + k(2)P(2) + . . . + k(n)P(n)

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-13
SLIDE 13

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Multiple Point Multiplication

Sum of n point multiplications Q = k(1)P(1) + k(2)P(2) + . . . + k(n)P(n) Efficient computation with Shamir’s trick Precompute all combinations of P(1). . . P(n), e.g. P(1) + P(2) and P(1) − P(2) Interpret k(1). . . k(n) as n-row table, e.g.

100100¯ 101001010 10¯ 10010010100¯ 10

Frobenius map for all columns Point addition with precomputed point if column is nonzero

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-14
SLIDE 14

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Multiple Point Multiplication

Sum of n point multiplications Q = k(1)P(1) + k(2)P(2) + . . . + k(n)P(n) Efficient computation with Shamir’s trick Precompute all combinations of P(1). . . P(n), e.g. P(1) + P(2) and P(1) − P(2) Interpret k(1). . . k(n) as n-row table, e.g.

100100¯ 101001010 10¯ 10010010100¯ 10

Frobenius map for all columns Point addition with precomputed point if column is nonzero τ-adic joint sparse form (τJSF) τJSF maximizes the number of zero columns in the table

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-15
SLIDE 15

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Algorithmic Comparison

Window method

Input: Integer k, point P Output: Result point Q = kP kℓ−1...k0 ← w-τNAF(k) P1, P3, ..., P2w−1−1 ← PreC(P) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

Multiple point multiplication

Input: n integers k (i), n points P(i) Output: Result point Q = n

i=1 k(i)P(i)

kℓ−1...k0 ← τJSF(k(1), ..., k (n)) P1, P2, ..., P(3n−1)/2 ← PreC(P(1), ..., P(n)) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-16
SLIDE 16

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Algorithmic Comparison

Window method

Input: Integer k, point P Output: Result point Q = kP kℓ−1...k0 ← w-τNAF(k) P1, P3, ..., P2w−1−1 ← PreC(P) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

Multiple point multiplication

Input: n integers k (i), n points P(i) Output: Result point Q = n

i=1 k(i)P(i)

kℓ−1...k0 ← τJSF(k(1), ..., k (n)) P1, P2, ..., P(3n−1)/2 ← PreC(P(1), ..., P(n)) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-17
SLIDE 17

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Algorithmic Comparison

Window method

Input: Integer k, point P Output: Result point Q = kP kℓ−1...k0 ← w-τNAF(k) P1, P3, ..., P2w−1−1 ← PreC(P) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

Multiple point multiplication

Input: n integers k (i), n points P(i) Output: Result point Q = n

i=1 k(i)P(i)

kℓ−1...k0 ← τJSF(k(1), ..., k (n)) P1, P2, ..., P(3n−1)/2 ← PreC(P(1), ..., P(n)) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-18
SLIDE 18

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Algorithmic Comparison

Window method

Input: Integer k, point P Output: Result point Q = kP kℓ−1...k0 ← w-τNAF(k) P1, P3, ..., P2w−1−1 ← PreC(P) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

Multiple point multiplication

Input: n integers k (i), n points P(i) Output: Result point Q = n

i=1 k(i)P(i)

kℓ−1...k0 ← τJSF(k(1), ..., k (n)) P1, P2, ..., P(3n−1)/2 ← PreC(P(1), ..., P(n)) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-19
SLIDE 19

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Algorithmic Comparison

Window method

Input: Integer k, point P Output: Result point Q = kP kℓ−1...k0 ← w-τNAF(k) P1, P3, ..., P2w−1−1 ← PreC(P) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

Multiple point multiplication

Input: n integers k (i), n points P(i) Output: Result point Q = n

i=1 k(i)P(i)

kℓ−1...k0 ← τJSF(k(1), ..., k (n)) P1, P2, ..., P(3n−1)/2 ← PreC(P(1), ..., P(n)) Q ← O for i = ℓ − 1 down to 0 do Q ← φ(Q) if ki = 0 then Q ← Q + sign(ki)P|ki | end if end for Q ← xy(Q)

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-20
SLIDE 20

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Objectives of the Implementation

Specifications NIST K-163, Koblitz curve Finite field F2163 with polynomial basis (Multiple) point multiplications with n = 1, n = 2, and n = 3 FPGAs offer combination of high-speed and flexibility Primary application: Proof-of-concept implementation for Packet-Level Authentication (PLA) communication scheme

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-21
SLIDE 21

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Objectives of the Implementation

Specifications NIST K-163, Koblitz curve Finite field F2163 with polynomial basis (Multiple) point multiplications with n = 1, n = 2, and n = 3 FPGAs offer combination of high-speed and flexibility Primary application: Proof-of-concept implementation for Packet-Level Authentication (PLA) communication scheme Design Principles Maximize throughput and maintain low computation time

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-22
SLIDE 22

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Objectives of the Implementation

Specifications NIST K-163, Koblitz curve Finite field F2163 with polynomial basis (Multiple) point multiplications with n = 1, n = 2, and n = 3 FPGAs offer combination of high-speed and flexibility Primary application: Proof-of-concept implementation for Packet-Level Authentication (PLA) communication scheme Design Principles Maximize throughput and maintain low computation time by. . .

1

Utilizing the common structure of the algorithms

2

Using specific processing units from our previous works

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-23
SLIDE 23

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Top Level Architecture

FIFO Buffer Regs 1 1 2 2 1 P1...13 P|ki| Main Processor 1 Q = (x, y) Postprocessor 1 (X, Y, Z) Control 1 P (i) k(i)

datain 2nd stage 3rd stage dataout

Preprocessor 1

1st stage

τNAF/JSF Converter 2 (ki, fi) |ki| sign(ki) fi

Specialized processing units τNAF/JSF converter ⇒ Width-4 τNAF or 2/3-term τJSF Preprocessor ⇒ Precomputations Main processor ⇒ For loop Postprocessor ⇒ Coordinate conversion, xy(Q)

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-24
SLIDE 24

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Main Processor: Idea

Background Point additions computed sequentially Data dependencies prevent efficient parallelization in point additions

(X3, Y3, Z3) = (X1, Y1, Z1) + (x2, y2) : A = Y1 + y2Z 2

1 ;

B = X1 + x2Z1 C = BZ1; Z3 = C2; D = x2Z3 X3 = A2 + C(A + B2 + aC) Y3 = (D + X3)(AC + Z3) + (y2 + x2)Z 2

3 FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-25
SLIDE 25

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Main Processor: Idea

Background Point additions computed sequentially Data dependencies prevent efficient parallelization in point additions Idea Most operations do not need Y1

(X3, Y3, Z3) = (X1, Y1, Z1) + (x2, y2) : A = Y1 + y2Z 2

1 ;

B = X1 + x2Z1 C = BZ1; Z3 = C2; D = x2Z3 X3 = A2 + C(A + B2 + aC) Y3 = (D + X3)(AC + Z3) + (y2 + x2)Z 2

3 FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-26
SLIDE 26

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Main Processor: Idea

Background Point additions computed sequentially Data dependencies prevent efficient parallelization in point additions Idea Most operations do not need Y1 Point additions (and Frobenius maps) can be interleaved

(X3, Y3, Z3) = (X1, Y1, Z1) + (x2, y2) : A = Y1 + y2Z 2

1 ;

B = X1 + x2Z1 C = BZ1; Z3 = C2; D = x2Z3 X3 = A2 + C(A + B2 + aC) Y3 = (D + X3)(AC + Z3) + (y2 + x2)Z 2

3

Z0/1 : Computation of Z3 X0/1 : Computation of X3 Y0−3 : Computation of Y3

Z0 Z1 Z0 Z1 X0 X1 Z0 Z1 X0 X1 X0 X1 Y0 Y2 Y0 Y2 Y1 Y3 Y1 Y3 Y1 Y3 Y0 Y2 X0 X1 Z0 Z1 Y0 Y2 Y1 Y3 2: 1: 3: 4: FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-27
SLIDE 27

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Main Processor: Implementation

Implementation strategy Design coordinate-specific processing units build around field multipliers with latencies: multiplication + 1

  • ut2

MULT SQR

in0 in2

XOR SQR

in1

  • ut0

XOR

  • ut1

Z

SQR

  • ut0

in3 s

1

in0 in1 in2

XOR XOR SQR MULT SQR XOR SQR

X in1

  • ut0

in3 in2 s

1

in0 in4

  • ut1

XOR SQR MULT MULT XOR XOR SQR

Y

Up-left: Z unit, Up-right: X unit, Down: Y unit

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-28
SLIDE 28

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Optimizations

The design includes 6 finite field multipliers: Preprocessor 1 Main processor 4 Postprocessor 1 Multiplier digit size, D, defines both latency and area

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-29
SLIDE 29

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Optimizations

The design includes 6 finite field multipliers: Preprocessor 1 Main processor 4 Postprocessor 1 Multiplier digit size, D, defines both latency and area

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 21 24 28 33 41 55 82163 500 1000 1500 2000 2500 3000 3500 4000 Digit size, D Latency Window, w=4 Multiple, n=2 Multiple, n=3 TNAF/JSF 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 21 24 28 33 41 55 82163 500 1000 1500 2000 2500 3000 3500 4000 Digit size, D Latency Window, w=4 Multiple, n=2 Multiple, n=3 TNAF/JSF 1 2 3 4 5 6 7 8 9 10 11 500 1000 1500 2000 2500 3000 3500 4000 Digit size, D Latency Coordinate conversion, xy(Q) TNAF/JSF

Left: preprocessor, Middle: main processor, Right: postprocessor

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-30
SLIDE 30

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Results from Quartus II 6.0 SP1

Area consumption in Stratix II S180C3 Component ALUTs Regs. ALMs M4Ks Converter 4,906 2,862 2,862 7 Preprocessor 2,037 1,546 1,332 14 Main processor 16,642 10,045 10,930 Postprocessor 2,874 2,336 1,953 Total 26,616 16,966 16,930 21

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-31
SLIDE 31

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Results from Quartus II 6.0 SP1

Area consumption in Stratix II S180C3 Component ALUTs Regs. ALMs M4Ks Converter 4,906 2,862 2,862 7 Preprocessor 2,037 1,546 1,332 14 Main processor 16,642 10,045 10,930 Postprocessor 2,874 2,336 1,953 Total 26,616 16,966 16,930 21 Computation time and throughput Operation Time (µs) Throughput (ops) Window, w = 4 16.36 161,290 Multiple, n = 2 24.28 70,773 Multiple, n = 3 35.06 60,603

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-32
SLIDE 32

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Comparisons

FPGA-based implementations using NIST K-163

Ref. n Device Area µs

  • ps

Dimitrov 1 Vir.-II 6,494 slices + memory 35.75 27,972 J¨ arvinen1 3

  • Str. II

67,467 ALMs + memory 114.2 166,000 J¨ arvinen2 1

  • Str. II

13,472 ALMs + memory 25.81 49,318 Lutz 1 Vir.-E 10,017 LUTs, 1,930 FFs 75 13,333 Okada 1

  • F. 10K

—— 45600 22 Ours 1

  • Str. II

16,930 ALMs, 21 M4Ks 16.36 161,290 Ours 3

  • Str. II

16,930 ALMs, 21 M4Ks 35.06 60,603

Faster than other published implementations Only 1/4 of area compared to J¨ arvinen1 ⇒ 4 × 60, 603 ≈ 242, 000 ⇒ Speedup 46 %

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-33
SLIDE 33

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Comparisons

FPGA-based implementations using NIST K-163

Ref. n Device Area µs

  • ps

Dimitrov 1 Vir.-II 6,494 slices + memory 35.75 27,972 J¨ arvinen1 3

  • Str. II

67,467 ALMs + memory 114.2 166,000 J¨ arvinen2 1

  • Str. II

13,472 ALMs + memory 25.81 49,318 Lutz 1 Vir.-E 10,017 LUTs, 1,930 FFs 75 13,333 Okada 1

  • F. 10K

—— 45600 22 Ours 1

  • Str. II

16,930 ALMs, 21 M4Ks 16.36 161,290 Ours 3

  • Str. II

16,930 ALMs, 21 M4Ks 35.06 60,603

Faster than other published implementations Only 1/4 of area compared to J¨ arvinen1 ⇒ 4 × 60, 603 ≈ 242, 000 ⇒ Speedup 46 %

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-34
SLIDE 34

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Conclusions and Future Work

Conclusions We showed that very high throughput and low computation time are achievable with reasonable cost in modern FPGAs by. . . Selecting the most efficient algorithms (Koblitz curves, window methods, multiple point multiplications) Utilizing the common structure of the algorithms Pipelining carefully optimized dedicated processing units

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-35
SLIDE 35

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Conclusions and Future Work

Conclusions We showed that very high throughput and low computation time are achievable with reasonable cost in modern FPGAs by. . . Selecting the most efficient algorithms (Koblitz curves, window methods, multiple point multiplications) Utilizing the common structure of the algorithms Pipelining carefully optimized dedicated processing units Future work We will study at least the following aspects. . . Other field sizes, faster τNAF/JSF converter, latency-area product optimizations, side-channel resistivity, etc.

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology

slide-36
SLIDE 36

Preliminaries FPGA Implementation Results, Comparisons and Conclusions

Thank you. Questions?

FCCM 2008, April 14–15, 2008, Palo Alto, CA, USA

  • K. J¨

arvinen, J. Skytt¨ a — Helsinki University of Technology