FPGA Design of Self-certified Signature Verification on Koblitz - - PowerPoint PPT Presentation

fpga design of self certified signature verification on
SMART_READER_LITE
LIVE PREVIEW

FPGA Design of Self-certified Signature Verification on Koblitz - - PowerPoint PPT Presentation

Preliminaries Algorithms and Implementation Results and Discussion FPGA Design of Self-certified Signature Verification on Koblitz Curves Kimmo J Jorma Skytt arvinen Juha Forsten a Helsinki University of Technology Signal Processing


slide-1
SLIDE 1

Preliminaries Algorithms and Implementation Results and Discussion

FPGA Design of Self-certified Signature Verification on Koblitz Curves

Kimmo J¨ arvinen Juha Forsten Jorma Skytt¨ a

Helsinki University of Technology Signal Processing Laboratory Otakaari 5A, FIN-02150, Finland {Kimmo.Jarvinen,Juha.Forsten,Jorma.Skytta}@tkk.fi

September 12, 2007

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-2
SLIDE 2

Preliminaries Algorithms and Implementation Results and Discussion

Outline

1

Preliminaries Introduction Koblitz curves Signatures

2

Algorithms and Implementation Point multiplication Precomputation Implementation

3

Results and Discussion Results on an FPGA Conclusions and future work

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-3
SLIDE 3

Preliminaries Algorithms and Implementation Results and Discussion Introduction Koblitz curves Signatures

Introduction

Packet Level Authentication (PLA)1

Enormous speed requirements! Elliptic curve cryptography because short signatures and fast performance are needed Koblitz curve, NIST K-163, used to maximize speed Self-certified ID based signatures because they are short and computationally less complex

1See http://www.tcs.hut.fi/Software/PLA/new/index.shtml

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-4
SLIDE 4

Preliminaries Algorithms and Implementation Results and Discussion Introduction Koblitz curves Signatures

Introduction

Packet Level Authentication (PLA)1

Enormous speed requirements! Elliptic curve cryptography because short signatures and fast performance are needed Koblitz curve, NIST K-163, used to maximize speed Self-certified ID based signatures because they are short and computationally less complex

Development in FPGA technology

Growth in resources enables massive parallelization Point multiplication times < 100 µs have been reported

We focus on maximizing operations per second instead of minimizing computation time of a single operation

1See http://www.tcs.hut.fi/Software/PLA/new/index.shtml

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-5
SLIDE 5

Preliminaries Algorithms and Implementation Results and Discussion Introduction Koblitz curves Signatures

Koblitz curves

Koblitz curves have the form EK : y2 + xy = x3 + ax2 + 1 If P = (x, y) is a point on EK, then its Frobenius endomorphism, φ(P) = (x2, y2), is also on EK. Very efficient point multiplication

Integer presented in τ-adic non-adjacent form (NAF)2 Point doublings replaced by Frobenius maps Only m/3 point additions

2Solinas, Des. Codes Cryptogr. 19(2-3), 2000

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-6
SLIDE 6

Preliminaries Algorithms and Implementation Results and Discussion Introduction Koblitz curves Signatures

Self-certified identity based signatures

Used in the current version of the PLA Signature verification is the most critical operation

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-7
SLIDE 7

Preliminaries Algorithms and Implementation Results and Discussion Introduction Koblitz curves Signatures

Self-certified identity based signatures

Used in the current version of the PLA Signature verification is the most critical operation Signature verification A signature is verified by computing: WA = DECOMPRESS(rA − HASH(IDA), bA) − rAWD, and

HASH(M) = c − [dG + cWA]x

(mod r)

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-8
SLIDE 8

Preliminaries Algorithms and Implementation Results and Discussion Introduction Koblitz curves Signatures

Self-certified identity based signatures

Used in the current version of the PLA Signature verification is the most critical operation Signature verification A signature is verified by computing: WA = DECOMPRESS(rA − HASH(IDA), bA) − rAWD, and

HASH(M) = c − [dG + cWA]x

(mod r) which simplify into the 3-term point multiplication: dG + c(uG) − crAWD

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-9
SLIDE 9

Preliminaries Algorithms and Implementation Results and Discussion Introduction Koblitz curves Signatures

Self-certified identity based signatures

Used in the current version of the PLA Signature verification is the most critical operation Signature verification A signature is verified by computing: WA = DECOMPRESS(rA − HASH(IDA), bA) − rAWD, and

HASH(M) = c − [dG + cWA]x

(mod r) which simplify into the 3-term point multiplication: dG + c(uG) − crAWD = k1P1 + k2P2 + k3P3

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-10
SLIDE 10

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Point multiplication

Q = k1P1 + k2P2 + k3P3 Shamir’s trick ⇒ 3-term double-and-add algorithm 3-term τ-adic joint sparse form3 Simplified algorithm

1

Precompute all possible combinations Rk1,k2,k3 = k1,jP1 + k2,jP2 + k3,jP3

2

Perform φ(P) for all bits

3

If k1,j, k2,j, k3,j = 000, add Rk1,k2,k3 to Q using mixed coordinate point additiona

aAl-Daoud et al. IEEE Tran. Comp. 51(8), 2002 3Brumley, ICICS 2006, LNCS 4307

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-11
SLIDE 11

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Precomputed points

k3k2k1 Point k3k2k1 Point k3k2k1 Point k3k2k1 Point 000 R0 = O 10¯ 1 R7 = R3 − R1 n/a ¯ 101 −R7 001 R1 = P1 110 R8 = R3 + R2 00¯ 1 −R1 ¯ 1¯ 10 −R8 010 R2 = P2 1¯ 10 R9 = R3 − R2 0¯ 10 −R2 ¯ 110 −R9 100 R3 = P3 111 R10 = R8 + R1 ¯ 100 −R3 ¯ 1¯ 1¯ 1 −R10 011 R4 = R2 + R1 11¯ 1 R11 = R8 − R1 0¯ 1¯ 1 −R4 ¯ 1¯ 11 −R11 01¯ 1 R5 = R2 − R1 1¯ 11 R12 = R9 + R1 0¯ 11 −R5 ¯ 11¯ 1 −R12 101 R6 = R3 + R1 1¯ 1¯ 1 R13 = R9 − R1 ¯ 10¯ 1 −R6 ¯ 111 −R13

Precomputations require 10 point additions(/subtractions)

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-12
SLIDE 12

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Precomputed points

k3k2k1 Point k3k2k1 Point k3k2k1 Point k3k2k1 Point 000 R0 = O 10¯ 1 R7 = R3 − R1 n/a ¯ 101 −R7 001 R1 = P1 110 R8 = R3 + R2 00¯ 1 −R1 ¯ 1¯ 10 −R8 010 R2 = P2 1¯ 10 R9 = R3 − R2 0¯ 10 −R2 ¯ 110 −R9 100 R3 = P3 111 R10 = R8 + R1 ¯ 100 −R3 ¯ 1¯ 1¯ 1 −R10 011 R4 = R2 + R1 11¯ 1 R11 = R8 − R1 0¯ 1¯ 1 −R4 ¯ 1¯ 11 −R11 01¯ 1 R5 = R2 − R1 1¯ 11 R12 = R9 + R1 0¯ 11 −R5 ¯ 11¯ 1 −R12 101 R6 = R3 + R1 1¯ 1¯ 1 R13 = R9 − R1 ¯ 10¯ 1 −R6 ¯ 111 −R13

Precomputations require 10 point additions(/subtractions) Pairs (Rk, Rk+1) are computed so that

1

Rk = Ri + Rj, and

2

Rk+1 = Ri − Rj

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-13
SLIDE 13

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Precomputed points

k3k2k1 Point k3k2k1 Point k3k2k1 Point k3k2k1 Point 000 R0 = O 10¯ 1 R7 = R3 − R1 n/a ¯ 101 −R7 001 R1 = P1 110 R8 = R3 + R2 00¯ 1 −R1 ¯ 1¯ 10 −R8 010 R2 = P2 1¯ 10 R9 = R3 − R2 0¯ 10 −R2 ¯ 110 −R9 100 R3 = P3 111 R10 = R8 + R1 ¯ 100 −R3 ¯ 1¯ 1¯ 1 −R10 011 R4 = R2 + R1 11¯ 1 R11 = R8 − R1 0¯ 1¯ 1 −R4 ¯ 1¯ 11 −R11 01¯ 1 R5 = R2 − R1 1¯ 11 R12 = R9 + R1 0¯ 11 −R5 ¯ 11¯ 1 −R12 101 R6 = R3 + R1 1¯ 1¯ 1 R13 = R9 − R1 ¯ 10¯ 1 −R6 ¯ 111 −R13

Precomputations require 10 point additions(/subtractions) Pairs (Rk, Rk+1) are computed so that

1

Rk = Ri + Rj, and

2

Rk+1 = Ri − Rj

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-14
SLIDE 14

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Precomputed points

k3k2k1 Point k3k2k1 Point k3k2k1 Point k3k2k1 Point 000 R0 = O 10¯ 1 R7 = R3 − R1 n/a ¯ 101 −R7 001 R1 = P1 110 R8 = R3 + R2 00¯ 1 −R1 ¯ 1¯ 10 −R8 010 R2 = P2 1¯ 10 R9 = R3 − R2 0¯ 10 −R2 ¯ 110 −R9 100 R3 = P3 111 R10 = R8 + R1 ¯ 100 −R3 ¯ 1¯ 1¯ 1 −R10 011 R4 = R2 + R1 11¯ 1 R11 = R8 − R1 0¯ 1¯ 1 −R4 ¯ 1¯ 11 −R11 01¯ 1 R5 = R2 − R1 1¯ 11 R12 = R9 + R1 0¯ 11 −R5 ¯ 11¯ 1 −R12 101 R6 = R3 + R1 1¯ 1¯ 1 R13 = R9 − R1 ¯ 10¯ 1 −R6 ¯ 111 −R13

Precomputations require 10 point additions(/subtractions) Pairs (Rk, Rk+1) are computed so that

1

Rk = Ri + Rj, and

2

Rk+1 = Ri − Rj

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-15
SLIDE 15

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Precomputed points

k3k2k1 Point k3k2k1 Point k3k2k1 Point k3k2k1 Point 000 R0 = O 10¯ 1 R7 = R3 − R1 n/a ¯ 101 −R7 001 R1 = P1 110 R8 = R3 + R2 00¯ 1 −R1 ¯ 1¯ 10 −R8 010 R2 = P2 1¯ 10 R9 = R3 − R2 0¯ 10 −R2 ¯ 110 −R9 100 R3 = P3 111 R10 = R8 + R1 ¯ 100 −R3 ¯ 1¯ 1¯ 1 −R10 011 R4 = R2 + R1 11¯ 1 R11 = R8 − R1 0¯ 1¯ 1 −R4 ¯ 1¯ 11 −R11 01¯ 1 R5 = R2 − R1 1¯ 11 R12 = R9 + R1 0¯ 11 −R5 ¯ 11¯ 1 −R12 101 R6 = R3 + R1 1¯ 1¯ 1 R13 = R9 − R1 ¯ 10¯ 1 −R6 ¯ 111 −R13

Precomputations require 10 point additions(/subtractions) Pairs (Rk, Rk+1) are computed so that

1

Rk = Ri + Rj, and

2

Rk+1 = Ri − Rj

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-16
SLIDE 16

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Precomputed points

k3k2k1 Point k3k2k1 Point k3k2k1 Point k3k2k1 Point 000 R0 = O 10¯ 1 R7 = R3 − R1 n/a ¯ 101 −R7 001 R1 = P1 110 R8 = R3 + R2 00¯ 1 −R1 ¯ 1¯ 10 −R8 010 R2 = P2 1¯ 10 R9 = R3 − R2 0¯ 10 −R2 ¯ 110 −R9 100 R3 = P3 111 R10 = R8 + R1 ¯ 100 −R3 ¯ 1¯ 1¯ 1 −R10 011 R4 = R2 + R1 11¯ 1 R11 = R8 − R1 0¯ 1¯ 1 −R4 ¯ 1¯ 11 −R11 01¯ 1 R5 = R2 − R1 1¯ 11 R12 = R9 + R1 0¯ 11 −R5 ¯ 11¯ 1 −R12 101 R6 = R3 + R1 1¯ 1¯ 1 R13 = R9 − R1 ¯ 10¯ 1 −R6 ¯ 111 −R13

Precomputations require 10 point additions(/subtractions) Pairs (Rk, Rk+1) are computed so that

1

Rk = Ri + Rj, and

2

Rk+1 = Ri − Rj

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-17
SLIDE 17

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Precomputed points

k3k2k1 Point k3k2k1 Point k3k2k1 Point k3k2k1 Point 000 R0 = O 10¯ 1 R7 = R3 − R1 n/a ¯ 101 −R7 001 R1 = P1 110 R8 = R3 + R2 00¯ 1 −R1 ¯ 1¯ 10 −R8 010 R2 = P2 1¯ 10 R9 = R3 − R2 0¯ 10 −R2 ¯ 110 −R9 100 R3 = P3 111 R10 = R8 + R1 ¯ 100 −R3 ¯ 1¯ 1¯ 1 −R10 011 R4 = R2 + R1 11¯ 1 R11 = R8 − R1 0¯ 1¯ 1 −R4 ¯ 1¯ 11 −R11 01¯ 1 R5 = R2 − R1 1¯ 11 R12 = R9 + R1 0¯ 11 −R5 ¯ 11¯ 1 −R12 101 R6 = R3 + R1 1¯ 1¯ 1 R13 = R9 − R1 ¯ 10¯ 1 −R6 ¯ 111 −R13

Precomputations require 10 point additions(/subtractions) Pairs (Rk, Rk+1) are computed so that

1

Rk = Ri + Rj, and

2

Rk+1 = Ri − Rj

Unified point addition and subtraction: (Rk, Rk+1) ← Ri ± Rj

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-18
SLIDE 18

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Unified point addition and subtraction

Point addition (x3, y3) = (x1, y1) + (x2, y2) λ = y1 + y2 x1 + x2 x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 Point subtraction (x4, y4) = (x1, y1) − (x2, y2) λ = y1 + y2 + x2 x1 + x2 x4 = λ2 + λ + x1 + x2 + a y4 = λ(x1 + x4) + x4 + y1

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-19
SLIDE 19

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Unified point addition and subtraction

Point addition (x3, y3) = (x1, y1) + (x2, y2) λ = y1 + y2 x1 + x2 x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 Point subtraction (x4, y4) = (x1, y1) − (x2, y2) λ = y1 + y2 + x2 x1 + x2 x4 = λ2 + λ + x1 + x2 + a y4 = λ(x1 + x4) + x4 + y1 Inversion is the same4

4Mentioned by Okeya et al. in ACISP 2005, LNCS 3574

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-20
SLIDE 20

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Unified point addition and subtraction

Point addition (x3, y3) = (x1, y1) + (x2, y2) λ = y1 + y2 x1 + x2 x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 Point subtraction (x4, y4) = (x1, y1) − (x2, y2) λ = y1 + y2 + x2 x1 + x2 x4 = λ2 + λ + x1 + x2 + a y4 = λ(x1 + x4) + x4 + y1 Inversion is the same4 Some additions can be saved by rearranging operations

4Mentioned by Okeya et al. in ACISP 2005, LNCS 3574

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-21
SLIDE 21

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Unified point addition and subtraction

Point addition (x3, y3) = (x1, y1) + (x2, y2) λ = y1 + y2 x1 + x2 x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 Point subtraction (x4, y4) = (x1, y1) − (x2, y2) λ = y1 + y2 + x2 x1 + x2 x4 = λ2 + λ + x1 + x2 + a y4 = λ(x1 + x4) + x4 + y1 Inversion is the same4 Some additions can be saved by rearranging operations Total cost reduces from 2I + 4M + 2S + 17A to I + 4M + 2S + 14A

4Mentioned by Okeya et al. in ACISP 2005, LNCS 3574

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-22
SLIDE 22

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Montgomery’s trick

Method Cost I = 9M Na¨ ıve 10 (I + 2M + S + 8A) + 5A 110M Unified 5 (I + 4M + 2S + 14A) 65M

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-23
SLIDE 23

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Montgomery’s trick

Method Cost I = 9M Na¨ ıve 10 (I + 2M + S + 8A) + 5A 110M Unified 5 (I + 4M + 2S + 14A) 65M Unified + Montgomery I + 17M + 2S + 9A + 5 (4M + 2S + 14A) 46M

Trades inversions to multiplications 1/x1 and 1/x2 computed so that 1/x1 = x2/(x1x2) and 1/x2 = x1/(x1x2) n inversions computed with 3(n − 1) multiplications and 1 inversion

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-24
SLIDE 24

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

FAP Multiplier Adder Squarer F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-25
SLIDE 25

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

FAP Multiplier Adder Squarer F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-26
SLIDE 26

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

FAP Multiplier Adder Squarer F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-27
SLIDE 27

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

FAP Multiplier Adder Squarer F F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-28
SLIDE 28

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

FAP Multiplier Adder Squarer F F F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

ν should be from the set F : {1 − 15, 17, 19, 21, 24, 28, 33, 41, 55, 82, 163} Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-29
SLIDE 29

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

F F F F F F F F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

ν should be from the set F : {1 − 15, 17, 19, 21, 24, 28, 33, 41, 55, 82, 163} Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-30
SLIDE 30

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

F F F F F F F F F F F F F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

ν should be from the set F : {1 − 15, 17, 19, 21, 24, 28, 33, 41, 55, 82, 163} Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-31
SLIDE 31

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

F F F F F F F F F F F F F F F F F F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

ν should be from the set F : {1 − 15, 17, 19, 21, 24, 28, 33, 41, 55, 82, 163} Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-32
SLIDE 32

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

F F F F F F F F F F F F F F F F F F F F F F F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

ν should be from the set F : {1 − 15, 17, 19, 21, 24, 28, 33, 41, 55, 82, 163} Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-33
SLIDE 33

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

ν should be from the set F : {1 − 15, 17, 19, 21, 24, 28, 33, 41, 55, 82, 163} Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-34
SLIDE 34

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Architecture

F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F Massey-Omura multiplier Bit-serial, only one F-block Latency: m + c + 1 clock cycles Digit-serial, ν F-blocks Latency: ⌈ m

ν ⌉ + c + 1

ν should be from the set F : {1 − 15, 17, 19, 21, 24, 28, 33, 41, 55, 82, 163} Ops Time Area

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-35
SLIDE 35

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Parameters

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 21 24 28 33 41 55 82 163 20 40 60 80 100 120 140 160 180 Throughput F−blocks 39 35 32 30 27 25 24 22 21 20 19 18 17 16 15 14 13 12 11 10 8 7 5 4 2 100 200 300 400 500 600 700 800 900 Time FAPs

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-36
SLIDE 36

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Parameters

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 21 24 28 33 41 55 82 163 20 40 60 80 100 120 140 160 180 Throughput F−blocks 39 35 32 30 27 25 24 22 21 20 19 18 17 16 15 14 13 12 11 10 8 7 5 4 2 100 200 300 400 500 600 700 800 900 Time FAPs

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-37
SLIDE 37

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Parameters

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 21 24 28 33 41 55 82 163 20 40 60 80 100 120 140 160 180 Throughput F−blocks 39 35 32 30 27 25 24 22 21 20 19 18 17 16 15 14 13 12 11 10 8 7 5 4 2 100 200 300 400 500 600 700 800 900 Time FAPs

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-38
SLIDE 38

Preliminaries Algorithms and Implementation Results and Discussion Point multiplication Precomputation Implementation

Parameters

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 21 24 28 33 41 55 82 163 20 40 60 80 100 120 140 160 180 Throughput F−blocks 39 35 32 30 27 25 24 22 21 20 19 18 17 16 15 14 13 12 11 10 8 7 5 4 2 100 200 300 400 500 600 700 800 900 Time FAPs

162,000 ops, 117 µs

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-39
SLIDE 39

Preliminaries Algorithms and Implementation Results and Discussion Results on an FPGA Conclusions and future work

Results on an Altera Stratix II S180C3

VHDL Altera Quartus II 6.0 SP1

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-40
SLIDE 40

Preliminaries Algorithms and Implementation Results and Discussion Results on an FPGA Conclusions and future work

Results on an Altera Stratix II S180C3

VHDL Altera Quartus II 6.0 SP1 Results from Quartus II 67,467 ALMs (94 %) 240 M512 (26 %), 305 M4K (40 %) Two clocks: 164 MHz and 82 MHz

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-41
SLIDE 41

Preliminaries Algorithms and Implementation Results and Discussion Results on an FPGA Conclusions and future work

Results on an Altera Stratix II S180C3

VHDL Altera Quartus II 6.0 SP1 Results from Quartus II 67,467 ALMs (94 %) 240 M512 (26 %), 305 M4K (40 %) Two clocks: 164 MHz and 82 MHz Performance One verification 114.2 µs (average) Up to 166,000 ops!

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-42
SLIDE 42

Preliminaries Algorithms and Implementation Results and Discussion Results on an FPGA Conclusions and future work

Conclusions and future work

Very high ops achievable with modern FPGAs

Development in FPGAs: speed and area Parallelization Time of single operation vs. ops

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-43
SLIDE 43

Preliminaries Algorithms and Implementation Results and Discussion Results on an FPGA Conclusions and future work

Conclusions and future work

Very high ops achievable with modern FPGAs

Development in FPGAs: speed and area Parallelization Time of single operation vs. ops

Future work:

Polynomial basis? Counterpart implementations for signature generation Other operations (hash, modular arithmetic) Possible problems (side channel attacks, power, etc.)

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria

slide-44
SLIDE 44

Preliminaries Algorithms and Implementation Results and Discussion Results on an FPGA Conclusions and future work

Thank you. Questions?

  • K. J¨

arvinen, J. Forsten and J. Skytt¨ a CHES 2007, September 11-13, 2007, Vienna, Austria