The complete cost of cofactor h = 1 Implementing Weierstrass curves - - PowerPoint PPT Presentation

the complete cost of cofactor h 1
SMART_READER_LITE
LIVE PREVIEW

The complete cost of cofactor h = 1 Implementing Weierstrass curves - - PowerPoint PPT Presentation

The complete cost of cofactor h = 1 Implementing Weierstrass curves with complete formulas Peter Schwabe Daan Sprenkels 18 December 2019 Radboud University, peter@cryptojedi.org, daan@dsprenkels.com 1 Introduction Some history


slide-1
SLIDE 1

The complete cost of cofactor h = 1

Implementing Weierstrass curves with complete formulas

Peter Schwabe Daan Sprenkels 18 December 2019

Radboud University, peter@cryptojedi.org, daan@dsprenkels.com 1

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3

Some history

◮ Traditionally, we use various different Weierstraß curves ◮ Considered unsafe because of incomplete formulas ◮ 2006: Curve25519 [Ber06] proposed as better alternative

2

slide-4
SLIDE 4

Cofactor (in)security

Interesting cases of cofactor insecurity in protocols (mis)using Curve25519: ◮ 2017: [lfS17] reported major vulnerability in Monero

3

slide-5
SLIDE 5

Cofactor (in)security

Interesting cases of cofactor insecurity in protocols (mis)using Curve25519: ◮ 2017: [lfS17] reported major vulnerability in Monero ◮ 2019: [CJ19] found three other vulnerabilities caused by cofactor insecurity

3

slide-6
SLIDE 6

The Monero vulnerability

◮ Transaction involves a ring signature ◮ Trivial case: ring size is 1

4

slide-7
SLIDE 7

The Monero vulnerability

◮ Transaction involves a ring signature ◮ Trivial case: ring size is 1 ◮ Double-spending is prevented by a key image I

4

slide-8
SLIDE 8

The Monero vulnerability

◮ Transaction involves a ring signature ◮ Trivial case: ring size is 1 ◮ Double-spending is prevented by a key image I

  • I binds the transaction to signer’s public key P

4

slide-9
SLIDE 9

The Monero vulnerability

◮ Transaction involves a ring signature ◮ Trivial case: ring size is 1 ◮ Double-spending is prevented by a key image I

  • I binds the transaction to signer’s public key P
  • Binding is in zero-knowledge

4

slide-10
SLIDE 10

The Monero vulnerability

◮ Transaction involves a ring signature ◮ Trivial case: ring size is 1 ◮ Double-spending is prevented by a key image I

  • I binds the transaction to signer’s public key P
  • Binding is in zero-knowledge
  • Key image I should be unique

4

slide-11
SLIDE 11

Monero transactions (simplified)

◮ Have generators G1, G2; private key x; public key P; key image I.

◮ signx(m)

  • Sign m with private key x
  • Choose random u ∈R hZℓ
  • Compute commitment a2 = [u]G2; c = H(m, a1, a2);

r = u + cx

  • Output signature s = (a1, a2, r)

5

slide-12
SLIDE 12

Monero transactions (simplified)

◮ Have generators G1, G2; private key x; public key P; key image I.

◮ signx(m)

  • Sign m with private key x
  • Choose random u ∈R hZℓ
  • Compute commitment a2 = [u]G2; c = H(m, a1, a2);

r = u + cx

  • Output signature s = (a1, a2, r)

◮ verifyP,I(m, s)

  • [r]G1

?

= a1 + [c]P

  • [r]G2

?

= a2 + [c]I

  • I unique?

5

slide-13
SLIDE 13

Attacking Monero signatures

◮ Challenge. Find some signature+keypair a2, c, r, and I, s.t. [r]G2 = a2 + [c]I = a2 + [c]I ′, where I = I ′.

6

slide-14
SLIDE 14

Attacking Monero signatures

◮ Challenge. Find some signature+keypair a2, c, r, and I, s.t. [r]G2 = a2 + [c]I = a2 + [c]I ′, where I = I ′. ◮ Solution. Choose I ′ = I + Tα, where α|c and [α]Tα = O.

6

slide-15
SLIDE 15

Attacking Monero signatures

◮ Challenge. Find some signature+keypair a2, c, r, and I, s.t. [r]G2 = a2 + [c]I = a2 + [c]I ′, where I = I ′. ◮ Solution. Choose I ′ = I + Tα, where α|c and [α]Tα = O. ◮ Correctness. a2 + [c]I ′ = a2 + [c](I + Tα)

6

slide-16
SLIDE 16

Attacking Monero signatures

◮ Challenge. Find some signature+keypair a2, c, r, and I, s.t. [r]G2 = a2 + [c]I = a2 + [c]I ′, where I = I ′. ◮ Solution. Choose I ′ = I + Tα, where α|c and [α]Tα = O. ◮ Correctness. a2 + [c]I ′ = a2 + [c](I + Tα) = a2 + [c]I + c α

  • [α]Tα

6

slide-17
SLIDE 17

Attacking Monero signatures

◮ Challenge. Find some signature+keypair a2, c, r, and I, s.t. [r]G2 = a2 + [c]I = a2 + [c]I ′, where I = I ′. ◮ Solution. Choose I ′ = I + Tα, where α|c and [α]Tα = O. ◮ Correctness. a2 + [c]I ′ = a2 + [c](I + Tα) = a2 + [c]I + c α

  • [α]Tα

= a2 + [c]I + c α

  • O

6

slide-18
SLIDE 18

Attacking Monero signatures

◮ Challenge. Find some signature+keypair a2, c, r, and I, s.t. [r]G2 = a2 + [c]I = a2 + [c]I ′, where I = I ′. ◮ Solution. Choose I ′ = I + Tα, where α|c and [α]Tα = O. ◮ Correctness. a2 + [c]I ′ = a2 + [c](I + Tα) = a2 + [c]I + c α

  • [α]Tα

= a2 + [c]I +

✚✚✚ ✚

c α

  • O

6

slide-19
SLIDE 19

Attacking Monero signatures

◮ Challenge. Find some signature+keypair a2, c, r, and I, s.t. [r]G2 = a2 + [c]I = a2 + [c]I ′, where I = I ′. ◮ Solution. Choose I ′ = I + Tα, where α|c and [α]Tα = O. ◮ Correctness. a2 + [c]I ′ = a2 + [c](I + Tα) = a2 + [c]I + c α

  • [α]Tα

= a2 + [c]I +

✚✚✚ ✚

c α

  • O

= a2 + [c]I

6

slide-20
SLIDE 20

Surely this could have been prevented?

Easy fix: ◮ Protocol assumed [r]G2 = a2 + [c]I, only for a single I ◮ Not the case for Curve25519

7

slide-21
SLIDE 21

Surely this could have been prevented?

Easy fix: ◮ Protocol assumed [r]G2 = a2 + [c]I, only for a single I ◮ Not the case for Curve25519 ◮ Fix: check if the order of I is ℓ

7

slide-22
SLIDE 22

Surely this could have been prevented?

Easy fix: ◮ Protocol assumed [r]G2 = a2 + [c]I, only for a single I ◮ Not the case for Curve25519 ◮ Fix: check if the order of I is ℓ

  • i.e. check [ℓ]I

?

= O

7

slide-23
SLIDE 23

Surely this could have been prevented?

Easy fix: ◮ Protocol assumed [r]G2 = a2 + [c]I, only for a single I ◮ Not the case for Curve25519 ◮ Fix: check if the order of I is ℓ

  • i.e. check [ℓ]I

?

= O

  • Fun fact: this check makes the verification 2× slower

7

slide-24
SLIDE 24

Why didn’t they validate points?

8

slide-25
SLIDE 25

Why didn’t they validate points?

My guess:

(highlight added by me)

8

slide-26
SLIDE 26

Surely this could have been prevented?

Easy fix: ◮ Protocol assumed [r]G2 = a2 + [c]I, only for a single I ◮ Fix: check if the order of I is ℓ

  • i.e. check [ℓ]I

?

= O ◮ Better fix: use a prime-order curve

9

slide-27
SLIDE 27

Surely this could have been prevented?

Easy fix: ◮ Protocol assumed [r]G2 = a2 + [c]I, only for a single I ◮ Fix: check if the order of I is ℓ

  • i.e. check [ℓ]I

?

= O ◮ Better fix: use a prime-order curve ◮ Best fix: use Ristretto [Ham15, dVGT+19]

9

slide-28
SLIDE 28

Research question

◮ Curve25519: nontrivial cofactor ◮ Weierstraß: slow or incomplete formulas

10

slide-29
SLIDE 29

Research question

◮ Curve25519: nontrivial cofactor ◮ Weierstraß: slow or incomplete formulas ◮ But how much slower exactly?

10

slide-30
SLIDE 30

Research question

What is the actual performance benefit of Curve25519 over traditional (Weierstrass) curves when using complete formulas?

11

slide-31
SLIDE 31

Our contribution

Our research: ◮ Implement variable base-point scalar multiplication

  • for a prime-order curve,
  • that looks similar to Curve25519,
  • using complete formulas,
  • on Sandy Bridge, Haswell, and Cortex M4.

12

slide-32
SLIDE 32

Our contribution

Our research: ◮ Implement variable base-point scalar multiplication

  • for a prime-order curve,
  • that looks similar to Curve25519,
  • using complete formulas,
  • on Sandy Bridge, Haswell, and Cortex M4.

◮ Compare performance with Curve25519

12

slide-33
SLIDE 33

Selecting a curve

slide-34
SLIDE 34

Selecting a curve

◮ I.e. E : y2 = x3 − 3x + 13318, defined over F2255−19.

13

slide-35
SLIDE 35

Selecting a curve

◮ I.e. E : y2 = x3 − 3x + 13318, defined over F2255−19. ◮ Prime-order curve; same field as Curve25519

13

slide-36
SLIDE 36

Implementation

slide-37
SLIDE 37

Scalar multiplication

◮ Use left-to-right fixed-window method (w = 5)

14

slide-38
SLIDE 38

Scalar multiplication

◮ Use left-to-right fixed-window method (w = 5) ◮ Uses 263 · double + 59 · add operations

14

slide-39
SLIDE 39

Addition formulas

Use the Renes-Costello-Batina addition formulas [RCB16] ◮ Complete formulas (no exceptions) ◮ No optimized software implementations published

15

slide-40
SLIDE 40

Field arithmetic

Sandy Bridge ◮ AVX: has 2-way parallel 64-bit integer arithmetic ◮ AVX: has 4-way parallel floating-point arithmetic ◮ → use radix-221.25 representation based on [Ber04]

16

slide-41
SLIDE 41

Field arithmetic

Sandy Bridge ◮ AVX: has 2-way parallel 64-bit integer arithmetic ◮ AVX: has 4-way parallel floating-point arithmetic ◮ → use radix-221.25 representation based on [Ber04] Haswell ◮ AVX2: has 4-way parallel 64-bit integer arithmetic ◮ → use radix-225.5 representation based on [BS12]

16

slide-42
SLIDE 42

Field arithmetic

Sandy Bridge ◮ AVX: has 2-way parallel 64-bit integer arithmetic ◮ AVX: has 4-way parallel floating-point arithmetic ◮ → use radix-221.25 representation based on [Ber04] Haswell ◮ AVX2: has 4-way parallel 64-bit integer arithmetic ◮ → use radix-225.5 representation based on [BS12] Cortex-M4 ◮ Has powerful umlal and umaal instructions ◮ → use packed representation from [HL19]

16

slide-43
SLIDE 43

Application of formulas

Sandy Bridge + Haswell ◮ Vectorize all multiplications and some other ops ◮ Shuffles etc. all implemented by hand ◮ Inline all the calls to field arithmetic

17

slide-44
SLIDE 44

Application of formulas

Sandy Bridge + Haswell ◮ Vectorize all multiplications and some other ops ◮ Shuffles etc. all implemented by hand ◮ Inline all the calls to field arithmetic Cortex-M4 ◮ Size-constrained device ◮ One-to-one implementation of formulas ◮ No function inlining

17

slide-45
SLIDE 45

Results

slide-46
SLIDE 46

Benchmarks

Figure: cycle counts in kcc

Implementation SB H M4 Chou16 [Cho16] 159a 156b – Faz-Hern´ andez-L´

  • pez15 [FL15]

– 156a – OLHF18 [OLH+18] – 139a – Fujii-Aranha19 [FA19] – – 907a Haase-Labrique19 [HL19] – – 625a Curve13318 (this work) 390b 205b 1 797b slowdown 2.45× 1.47× 2.87×

a As reported in the respective publication. b From own measurements.

18

slide-47
SLIDE 47

Future work

◮ Use formulas from [SM17] ◮ Benchmark with ristretto255

19

slide-48
SLIDE 48

Thank you!

The code is at https://github.com/dsprenkels/curve13318-all (public domain) Extra reading:

◮ Paper: https://dsprenkels.com/files/curve13318.pdf ◮ Monero vulnerability (1):

https://nickler.ninja/blog/2017/05/23/exploiting-low-order- generators-in-one-time-ring-signatures/

◮ Monero vulnerability (2):

https://moderncrypto.org/mail-archive/curves/2017/000898.html

20

slide-49
SLIDE 49

References i

Paulo S. L. M. Barreto. Tweet, 2017. https: //twitter.com/pbarreto/status/869103226276134912. Daniel J. Bernstein. Floating-point arithmetic and message authentication, 2004. http://cr.yp.to/papers.html#hash127.

21

slide-50
SLIDE 50

References ii

Daniel J. Bernstein. Curve25519: new Diffie-Hellman speed records. In Moti Yung, Yevgeniy Dodis, Aggelos Kiayias, and Tal Malkin, editors, Public Key Cryptography – PKC 2006, volume 3958 of LNCS, pages 207–228. Springer, 2006.

http://cr.yp.to/papers.html#curve25519.

Daniel J. Bernstein and Tanja Lange. eBACS: ECRYPT Benchmarking of Cryptographic Systems. https://bench.cr.yp.to/results-sign.html (accessed 2019-10-03).

22

slide-51
SLIDE 51

References iii

Daniel J. Bernstein and Peter Schwabe. NEON crypto. In Emmanuel Prouff and Patrick Schaumont, editors, Cryptographic Hardware and Embedded Systems – CHES 2012, volume 7428 of LNCS, pages 320–339. Springer, 2012.

http://cryptojedi.org/papers/#neoncrypto.

Tung Chou. Sandy2x: New Curve25519 speed records. In Orr Dunkelman and Liam Keliher, editors, Selected Areas in Cryptography – SAC 2015, volume 9566 of LNCS, pages 145–160. Springer, 2016.

23

slide-52
SLIDE 52

References iv

https://www.win.tue.nl/~tchou/papers/sandy2x.pdf.

Cas Cremers and Dennis Jackson. Prime, order please! revisiting small subgroup and invalid curve attacks on protocols using Diffie-Hellman. In 2019 IEEE 32nd Computer Security Foundations Symposium (CSF), pages 78–93, 2019.

https://eprint.iacr.org/2019/526.

24

slide-53
SLIDE 53

References v

Henry de Valence, Jack Grigg, George Tankersley, Filippo Valsorda, and Isis Lovecruft. The ristretto255 group. IETF CFRG Internet Draft, 2019.

https://tools.ietf.org/html/draft-hdevalence-cfrg- ristretto-01 (accessed 2019-07-31).

Hayato Fujii and Diego F. Aranha. Curve25519 for the Cortex-M4 and Beyond. In Tanja Lange and Orr Dunkelman, editors, Progress in Cryptology – LATINCRYPT 2017, volume 11368 of LNCS, pages 109–127. Springer, 2019.

25

slide-54
SLIDE 54

References vi

http://www.cs.haifa.ac.il/~orrd/LC17/paper39.pdf.

Armando Faz-Hern´ andez and Julio L´

  • pez.

Fast implementation of Curve25519 using AVX2. In Kristin Lauter and Francisco Rodr´ ıguez-Henr´ ıquez, editors, Progress in Cryptology – LATINCRYPT 2015, volume 9230 of LNCS, pages 329–345. Springer, 2015. Mike Hamburg. Decaf: Eliminating cofactors through point compression.

26

slide-55
SLIDE 55

References vii

In Rosario Gennaro and Matthew Robshaw, editors, Advances in Cryptology – CRYPTO 2015, volume 9215 of LNCS, pages 705–723. Springer, 2015.

https://www.shiftleft.org/papers/decaf/.

Bj¨

  • rn Haase and Benoˆ

ıt Labrique. AuCPace: Efficient verifier-based PAKE protocol tailored for the IIoT. IACR Transactions on Cryptographic Hardware and Embedded Systems, pages 1–48, 2019.

https: //tches.iacr.org/index.php/TCHES/article/view/7384.

27

slide-56
SLIDE 56

References viii

luigi1111 and Riccardo “fluffypony” Spagni. Disclosure of a major bug in CryptoNote based currencies. Post on the Monero website, 2017.

https://www.getmonero.org/2017/05/17/disclosure-of-a- major-bug-in-cryptonote-based-currencies.html (accessed 2019-07-31).

28

slide-57
SLIDE 57

References ix

Thomaz Oliveira, Julio L´

  • pez, H¨

useyin Hı¸ sıl, Armando Faz-Hern´ andez, and Francisco Rodr´ ıguez-Henr´ ıquez. How to (Pre-)Compute a Ladder. In Carlisle Adams and Jan Camenisch, editors, Selected Areas in Cryptography – SAC 2017, volume 10719 of LNCS, pages 172–191. Springer, 2018.

https://eprint.iacr.org/2017/264.pdf.

29

slide-58
SLIDE 58

References x

Joost Renes, Craig Costello, and Lejla Batina. Complete addition formulas for prime order elliptic curves. In Marc Fischlin and Jean-S´ ebastien Coron, editors, Advances in Cryptology – Eurocrypt 2016, volume 9230 of LNCS, pages 403–428. Springer, 2016.

http://eprint.iacr.org/2015/1060.

30

slide-59
SLIDE 59

References xi

Ruggero Susella and Sofia Montrasio. A compact and exception-free ladder for all short Weierstrass elliptic curves. In Kerstin Lemke-Rust and Michael Tunstall, editors, Smart Card Research and Advanced Applications, volume 10146 of LNCS, pages 156–173. Springer, 2017.

31

slide-60
SLIDE 60

Preliminaries

slide-61
SLIDE 61

Elliptic curves

E : y2 = x3 + ax + b

slide-62
SLIDE 62

Elliptic curves

E : y2 = x3 + ax + b

−4 −2 2 4 x −4 −2 2 4 y

slide-63
SLIDE 63

Elliptic curves: addition

E : y2 = x3 + ax + b

−4 −2 2 4 x −4 −2 2 4 y P Q −R R

slide-64
SLIDE 64

Elliptic curves: doubling

E : y2 = x3 + ax + b

−4 −2 2 4 x −4 −2 2 4 y P −R R

slide-65
SLIDE 65

Elliptic curves

◮ Coordinates include the point at infinity O

  • Define P + O = P
slide-66
SLIDE 66

Elliptic curves

◮ Coordinates include the point at infinity O

  • Define P + O = P

◮ Curve equation: E : y2 = x3 + ax + b

slide-67
SLIDE 67

Elliptic curves

◮ Coordinates include the point at infinity O

  • Define P + O = P

◮ Curve equation: E : y2 = x3 + ax + b ◮ Coordinates are defined over a field Fq

  • I.e. integers modulo q
slide-68
SLIDE 68

Elliptic curves: actually

E : y2 = x3 − 3x + 1 defined over F11

1 2 3 4 5 6 7 8 9 10 11 x −5 −4 −3 −2 −1 1 2 3 4 5 y

slide-69
SLIDE 69

Elliptic curves: actual addition

E : y2 = x3 − 3x + 1 defined over F11

1 2 3 4 5 6 7 8 9 10 11 x −5 −4 −3 −2 −1 1 2 3 4 5 y P Q −R R

slide-70
SLIDE 70

Group arithmetic

◮ We can do arithmetic with these rules! :) ◮ Addition: P + Q ◮ Subtraction: P − Q ◮ Neutral element: O, i.e. “zero”

slide-71
SLIDE 71

Group arithmetic

◮ We can do arithmetic with these rules! :) ◮ Addition: P + Q ◮ Subtraction: P − Q ◮ Neutral element: O, i.e. “zero” ◮ Scalar multiplication: [k]P = P + P + ... + P

  • k times
slide-72
SLIDE 72

Group arithmetic

◮ We can do arithmetic with these rules! :) ◮ Addition: P + Q ◮ Subtraction: P − Q ◮ Neutral element: O, i.e. “zero” ◮ Scalar multiplication: [k]P = P + P + ... + P

  • k times

◮ Discrete log problem: given P, Q where [k]P = Q, hard to find k

slide-73
SLIDE 73

Elliptic curves are cyclic

◮ Points form a cycle: O +P − − → P

+P

− − → [2]P

+P

− − → [3]P

+P

− − → ... +P − − → [n − 1]P

+P

− − → O

slide-74
SLIDE 74

Elliptic curves are cyclic

◮ Points form a cycle: O +P − − → P

+P

− − → [2]P

+P

− − → [3]P

+P

− − → ... +P − − → [n − 1]P

+P

− − → O

  • n steps

◮ The order n should contain a large prime factor ◮ Only one cycle if n is prime

slide-75
SLIDE 75

Cofactors

◮ If n is not a prime Then n = h · ℓ ◮ I.e. small loops are possible: E.g. if 4|n, then there is a point T4: O

+T4

− − → T4

+T4

− − → [2]T4

+T4

− − → [3]T4

+T4

− − → O

  • nly 4 steps!
slide-76
SLIDE 76

Cofactors

◮ If n is not a prime Then n = h · ℓ ◮ I.e. small loops are possible: E.g. if 4|n, then there is a point T4: O

+T4

− − → T4

+T4

− − → [2]T4

+T4

− − → [3]T4

+T4

− − → O

  • nly 4 steps!

◮ h is called the cofactor

slide-77
SLIDE 77

Cofactors

◮ If n is not a prime Then n = h · ℓ ◮ I.e. small loops are possible: E.g. if 4|n, then there is a point T4: O

+T4

− − → T4

+T4

− − → [2]T4

+T4

− − → [3]T4

+T4

− − → O

  • nly 4 steps!

◮ h is called the cofactor ◮ This property is often harmless

slide-78
SLIDE 78

Cofactors

◮ If n is not a prime Then n = h · ℓ ◮ I.e. small loops are possible: E.g. if 4|n, then there is a point T4: O

+T4

− − → T4

+T4

− − → [2]T4

+T4

− − → [3]T4

+T4

− − → O

  • nly 4 steps!

◮ h is called the cofactor ◮ This property is often harmless

  • I.e. sometimes it’s the opposite of harmless
slide-79
SLIDE 79

Double-and-add

slide-80
SLIDE 80

Double-and-add algorithm

function DoubleAndAdd(k, P) ⊲ Compute [k]P R ← O for i from n − 1 down to 0 do R ← [2]R ⊲ Doubling if ki = 1 then R ← R + P ⊲ Addition else R ← R + O ⊲ Addition end if end for return R end function

slide-81
SLIDE 81

Fixed-window double-and-add

function FixedWindow(k, P) ⊲ Compute [k]P k′ ← Windowsw(k) Precompute ([2]P, ... , [2w − 1]P) R ← O for i from n

w − 1 down to 0 do

for j from 0 to w − 1 do R ← [2]R ⊲ w doublings end for if k′

i = 0 then

R ← R + [k′

i ]P

⊲ Addition else R ← R + O ⊲ Addition end if end for return R end function

slide-82
SLIDE 82

Signed double-and-add

function SignedFixedWindow(k, P) ⊲ Compute [k]P k′ ← RecodeSigned(Windowsw(k)) Precompute ([2]P, ... , [2w−1]P) R ← O for i from n

w − 1 down to 0 do

for j from 0 to w − 1 do R ← [2]R ⊲ w doublings end for if k′

i > 0 then

R ← R + [k′

i ]P

⊲ Addition else if k′

i < 0 then

R ← R − [−k′

i ]P

⊲ Addition else R ← R + O ⊲ Addition end if end for return R end function

slide-83
SLIDE 83

Implemented signed double-and-add

function ScalarMultiplication(k, P) ⊲ Compute [k]P T ← (O, P, ... , [16]P) ⊲ Precompute ([2]P, ... , [16]P) k′ ← RecodeSigned(Windows5(k)) R ← O for i from 50 down to 0 do for j from 0 to 4 do R ← [2]R ⊲ 5 doublings end for if k′

i < 0 then

R ← R − T−k′

i

⊲ Addition else R ← R + Tk′

i

⊲ Addition end if end for return R ⊲ R = (XR : YR : ZR) end function

slide-84
SLIDE 84

Signed windows

k′

3

k′

2

k′

1

k′ 1011 0010 0110 1110 k =

slide-85
SLIDE 85

Signed window recoding

k′′

4

k′′

3

k′′

2

k′′

1

k′′ 1011 0010 0110 1110 1 −101 010 111 −010 k =

slide-86
SLIDE 86

Sandy Bridge details

slide-87
SLIDE 87

sign exponent mantissa 63 52

slide-88
SLIDE 88

Depiction of top(f )

253bi+1 253bi bi+1 bi

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

fi:

+ 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

+

ci:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 1 ? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

z′:

+ 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

ci:

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

result:

slide-89
SLIDE 89

Sandy Bridge: field element representation

◮ Use double-precision floating points

slide-90
SLIDE 90

Sandy Bridge: field element representation

◮ Use double-precision floating points ◮ Allows 4× vectorized operations using SIMD instructions

slide-91
SLIDE 91

Sandy Bridge: field element representation

◮ Use double-precision floating points ◮ Allows 4× vectorized operations using SIMD instructions ◮ Radix-221.25 redundant representation

slide-92
SLIDE 92

Sandy Bridge: field element representation

◮ Use double-precision floating points ◮ Allows 4× vectorized operations using SIMD instructions ◮ Radix-221.25 redundant representation ◮ Use 12 limbs to represent 255-bit numbers

slide-93
SLIDE 93

Sandy Bridge: field element representation

◮ Use double-precision floating points ◮ Allows 4× vectorized operations using SIMD instructions ◮ Radix-221.25 redundant representation ◮ Use 12 limbs to represent 255-bit numbers

  • I.e. f = f0 + f1 + ... + f11
slide-94
SLIDE 94

Sandy Bridge: field element representation

◮ Carry

  • top(fi): force loss of precision
  • Then, move “high” bits to next limb
slide-95
SLIDE 95

Sandy Bridge: field element representation

◮ Carry

  • top(fi): force loss of precision
  • Then, move “high” bits to next limb

◮ Addition

  • (f + g)i = fi + gi
  • (f − g)i = fi − gi
slide-96
SLIDE 96

Sandy Bridge: field element representation

◮ Carry

  • top(fi): force loss of precision
  • Then, move “high” bits to next limb

◮ Addition

  • (f + g)i = fi + gi
  • (f − g)i = fi − gi

◮ Multiplication

  • (f · g)k =

i+j=k figi + i+j=k+12

  • 2−255 · 19
  • figi
  • Optimized using Karatsuba’s multiplication
slide-97
SLIDE 97

Addition formulas

◮ Use Renes-Costello-Batina formulas ◮ Rewrite using graphs into vectorized operations ◮ Implement using field arithmetic functions

slide-98
SLIDE 98

Point doubling

dbl_generic x y z x3 31 y3 27 z3 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 29 30 32 33

⟦ ⟧ ⟦ ⟧ ⟦ ⟧ ⟦ ⟧ ₉ ⟦ ⟧ ⟦ ⟧ ⟦ ⟧ ₂₀

Legend add subtract triple multiply by small constant multiply square

slide-99
SLIDE 99

Point doubling

dbl_4x (3M + 4c) extra carry operation x y z x3 31 y3 27 z3 32 14 13 12 15 5 2 34 8 ⟦-b/2⟧ 3 17 16 ⟦-3⟧ 18 ⟦2b⟧ 6 24 23 ⟦3⟧ 1 28 26 30 9 = -a₉/2 19 25 22 25 29a 4 11 10 7 ⟦-6⟧ 34 33 29b ⟦8⟧ 11 22 21 ⟦-3⟧ 20 = -a₂₀

Legend add subtract triple multiply by small constant multiply square

slide-100
SLIDE 100

Point addition

add_generic x1 y1 z1 x2 y2 z2 x3 40 y3 38 z3 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 39 41 42

⟦ ⟧ ⟦ ⟧ ⟦ ⟧ ⟦ ⟧

Legend add subtract triple multiply by small constant multiply

slide-101
SLIDE 101

Point addition

add_4x (3M and 4c) extra carry after operation x1 y1 z1 x2 y2 z2 x3 40 y3 38 z3 43 1 2 3 16 14 15 19 25 18 6 4 5 11 9 10 36 33 32 27b 26b ⟦3⟧ 31 30 ⟦3⟧ 37 23 24 35 13 39 8 41 42 34 29 22 21 ⟦3⟧ 20 28 27a 26a ⟦3⟧ 7 12 17

Legend add subtract triple multiply by small constant multiply

slide-102
SLIDE 102

Figure: Measured cycle counts

Implementation SB IB H Chou16 [Cho16] 159 128a 156 995a 155 823b Faz-Hern´ andez-Lopez15 [FL15] – – ≈ 156 500c OLHF18 [OLH+18] – – 138 963a Fujii-Aranha19 [FA19] – – – 907 Haase-Labrique19 [HL19] – – – 625 Curve13318 (this work) 389 546b 382 966b 204 643b 1 797 Ed25519 verify 221 988d 206 080d 184 052d slowdown 2.45× 2.44× 1.47×

a As reported in the respective publication. b From own measurements. c As reported in [FL15]. This publication expressed their benchmarks in kcc. As such,

has been padded with zeros.