Modern ECC signatures Many papers have explored Curve25519/Ed25519 - - PowerPoint PPT Presentation

modern ecc signatures many papers have explored
SMART_READER_LITE
LIVE PREVIEW

Modern ECC signatures Many papers have explored Curve25519/Ed25519 - - PowerPoint PPT Presentation

1 2 Modern ECC signatures Many papers have explored Curve25519/Ed25519 speed. 2011 BernsteinDuifLange SchwabeYang: e.g. 2015 Chou software: Ed25519 signature scheme = on Intel Sandy Bridge (2011), EdDSA using conservative


slide-1
SLIDE 1

1

Modern ECC signatures 2011 Bernstein–Duif–Lange– Schwabe–Yang: Ed25519 signature scheme = EdDSA using conservative Curve25519 elliptic curve. https://ed25519.cr.yp.to 32-byte public keys, 64-byte signatures, ≈2125:8 security level. Deployed in SSH, Signal, many more applications: https://ianix.com/pub /ed25519-deployment.html

2

Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:

  • n Intel Sandy Bridge (2011),

57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´

  • pez–Menezes:
  • n Intel Pentium II (1997),

1920000 cycles for ECDH using NIST P-256 curve.

slide-2
SLIDE 2

1

dern ECC signatures Bernstein–Duif–Lange– abe–Yang: Ed25519 signature scheme = using conservative Curve25519 elliptic curve. https://ed25519.cr.yp.to yte public keys, yte signatures,

8 security level.

ed in SSH, Signal, more applications: https://ianix.com/pub /ed25519-deployment.html

2

Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:

  • n Intel Sandy Bridge (2011),

57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´

  • pez–Menezes:
  • n Intel Pentium II (1997),

1920000 cycles for ECDH using NIST P-256 curve. AC: cycles Does AC A is better

slide-3
SLIDE 3

1

signatures Bernstein–Duif–Lange– signature scheme = conservative elliptic curve. https://ed25519.cr.yp.to eys, signatures, level. SSH, Signal, applications: https://ianix.com/pub /ed25519-deployment.html

2

Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:

  • n Intel Sandy Bridge (2011),

57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´

  • pez–Menezes:
  • n Intel Pentium II (1997),

1920000 cycles for ECDH using NIST P-256 curve. AC: cycles for alg Does AC < BD prove A is better than B

slide-4
SLIDE 4

1

Bernstein–Duif–Lange– = https://ed25519.cr.yp.to /ed25519-deployment.html

2

Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:

  • n Intel Sandy Bridge (2011),

57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´

  • pez–Menezes:
  • n Intel Pentium II (1997),

1920000 cycles for ECDH using NIST P-256 curve. AC: cycles for alg A on CPU Does AC < BD prove that A is better than B?

slide-5
SLIDE 5

2

Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:

  • n Intel Sandy Bridge (2011),

57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´

  • pez–Menezes:
  • n Intel Pentium II (1997),

1920000 cycles for ECDH using NIST P-256 curve.

3

AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B?

slide-6
SLIDE 6

2

Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:

  • n Intel Sandy Bridge (2011),

57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´

  • pez–Menezes:
  • n Intel Pentium II (1997),

1920000 cycles for ECDH using NIST P-256 curve.

3

AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second.

slide-7
SLIDE 7

2

papers have explored Curve25519/Ed25519 speed. 2015 Chou software: Intel Sandy Bridge (2011), cycles for keygen, cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– erson–L´

  • pez–Menezes:

Intel Pentium II (1997), 1920000 cycles for ECDH NIST P-256 curve.

3

AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second. Better compa (still raising ECDH on (still not 1920000 832457 cycles ECDH on 374000 cycles (from 2013 159128 cycles Verification 529000 cycles 205741 cycles

slide-8
SLIDE 8

2

have explored Curve25519/Ed25519 speed. software: Bridge (2011), keygen, signature, r verification, r ECDH. e.g., 2000 Brown– ez–Menezes: II (1997), for ECDH P-256 curve.

3

AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second. Better comparisons (still raising many ECDH on Intel Pentium (still not exactly the 1920000 cycles for 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.

slide-9
SLIDE 9

2

red eed. (2011), signature, verification, Brown– ez–Menezes: (1997),

3

AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second. Better comparisons (still raising many questions): ECDH on Intel Pentium II/II (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.

slide-10
SLIDE 10

3

AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second.

4

Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.

slide-11
SLIDE 11

3

cycles for alg A on CPU C.

C < BD prove that

etter than B? Beware change in CPU. AC > BC; AD > BD; es more work per cycle than thanks to CPU manufacturer. Sometimes people measure cost seconds instead of cycles. they benefit more work per cycle and more cycles per second.

4

Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519. For each

  • n each
  • n each

Simplest are much, Questions and softw How to build

  • n, e.g.,

Ed25519 Answers design: e.g.,

slide-12
SLIDE 12

3

alg A on CPU C. prove that B? change in CPU. ; AD > BD; rk per cycle than CPU manufacturer. eople measure cost instead of cycles. enefit per cycle and cycles per second.

4

Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519. For each of these op

  • n each of these curves,
  • n each of these CPUs:

Simplest implemen are much, much, much Questions in algorithm and software engineering: How to build the fastes

  • n, e.g., an ARM Co

Ed25519 signature Answers feed back design: e.g., choosing

slide-13
SLIDE 13

3

CPU C. CPU.

D;

cycle than manufacturer. measure cost cycles. and cond.

4

Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519. For each of these operations

  • n each of these curves,
  • n each of these CPUs:

Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest soft

  • n, e.g., an ARM Cortex-A8

Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves.

slide-14
SLIDE 14

4

Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.

5

For each of these operations,

  • n each of these curves,
  • n each of these CPUs:

Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software

  • n, e.g., an ARM Cortex-A8 for

Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves.

slide-15
SLIDE 15

4

comparisons raising many questions):

  • n Intel Pentium II/III

not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519.

  • n Sandy Bridge:

374000 cycles for NIST P-256 2013 Gueron–Krasnov), 159128 cycles for Curve25519. erification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.

5

For each of these operations,

  • n each of these curves,
  • n each of these CPUs:

Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software

  • n, e.g., an ARM Cortex-A8 for

Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves. Several levels ECC verify S Point P; Q Field x1; x2 → Machine 32-bit multiplication Gates: AND,

slide-16
SLIDE 16

4

risons many questions): entium II/III the same): for NIST P-256, r Curve25519. Bridge: r NIST P-256 Gueron–Krasnov), r Curve25519. Sandy Bridge: r ECDSA-P-256, r Ed25519.

5

For each of these operations,

  • n each of these curves,
  • n each of these CPUs:

Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software

  • n, e.g., an ARM Cortex-A8 for

Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves. Several levels to optimize: ECC ops: e.g., verify SB = R + h windowing

  • Point ops: e.g.,

P; Q → P + Q faster doubling

  • Field ops: e.g.,

x1; x2 → x1x2 in F delayed

  • Machine insns: e.g.,

32-bit multiplication pipelining

  • Gates: e.g.,

AND, OR, XOR

slide-17
SLIDE 17

4

questions): I/III same): P-256, Curve25519. P-256 Gueron–Krasnov), Curve25519. Bridge: ECDSA-P-256, Ed25519.

5

For each of these operations,

  • n each of these curves,
  • n each of these CPUs:

Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software

  • n, e.g., an ARM Cortex-A8 for

Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves. Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.

  • Point ops: e.g.,

P; Q → P + Q faster doubling etc

  • Field ops: e.g.,

x1; x2 → x1x2 in Fp delayed carries etc.

  • Machine insns: e.g.,

32-bit multiplication pipelining etc.

  • Gates: e.g.,

AND, OR, XOR

slide-18
SLIDE 18

5

For each of these operations,

  • n each of these curves,
  • n each of these CPUs:

Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software

  • n, e.g., an ARM Cortex-A8 for

Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves.

6

Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.

  • Point ops: e.g.,

P; Q → P + Q faster doubling etc.

  • Field ops: e.g.,

x1; x2 → x1x2 in Fp delayed carries etc.

  • Machine insns: e.g.,

32-bit multiplication pipelining etc.

  • Gates: e.g.,

AND, OR, XOR

slide-19
SLIDE 19

5

each of these operations, each of these curves, each of these CPUs: Simplest implementations much, much, much slower. Questions in algorithm design software engineering: to build the fastest software e.g., an ARM Cortex-A8 for Ed25519 signature verification? ers feed back into crypto design: e.g., choosing fast curves.

6

Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.

  • Point ops: e.g.,

P; Q → P + Q faster doubling etc.

  • Field ops: e.g.,

x1; x2 → x1x2 in Fp delayed carries etc.

  • Machine insns: e.g.,

32-bit multiplication pipelining etc.

  • Gates: e.g.,

AND, OR, XOR Single-scala Fundamental n; P → n Input n is ˘ 0; 1; : : : Input P Will build using additions and subtractions Later will double-scala m; P; n; Q

slide-20
SLIDE 20

5

these operations, curves, CPUs: implementations much, much slower. algorithm design engineering: the fastest software Cortex-A8 for signature verification? back into crypto

  • sing fast curves.

6

Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.

  • Point ops: e.g.,

P; Q → P + Q faster doubling etc.

  • Field ops: e.g.,

x1; x2 → x1x2 in Fp delayed carries etc.

  • Machine insns: e.g.,

32-bit multiplication pipelining etc.

  • Gates: e.g.,

AND, OR, XOR Single-scalar multip Fundamental ECC n; P → nP. Input n is integer in, ˘ 0; 1; : : : ; 2256 − 1 Input P is point on Will build n; P → n using additions P; Q and subtractions P Later will also look double-scalar multip m; P; n; Q → mP +

slide-21
SLIDE 21

5

erations, wer. design software rtex-A8 for verification? ypto curves.

6

Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.

  • Point ops: e.g.,

P; Q → P + Q faster doubling etc.

  • Field ops: e.g.,

x1; x2 → x1x2 in Fp delayed carries etc.

  • Machine insns: e.g.,

32-bit multiplication pipelining etc.

  • Gates: e.g.,

AND, OR, XOR Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + and subtractions P; Q → P − Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ.

slide-22
SLIDE 22

6

Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.

  • Point ops: e.g.,

P; Q → P + Q faster doubling etc.

  • Field ops: e.g.,

x1; x2 → x1x2 in Fp delayed carries etc.

  • Machine insns: e.g.,

32-bit multiplication pipelining etc.

  • Gates: e.g.,

AND, OR, XOR

7

Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ.

slide-23
SLIDE 23

6

Several levels to optimize: ECC ops: e.g., SB = R + hA windowing etc.

  • int ops: e.g.,

Q → P + Q faster doubling etc.

  • Field ops: e.g.,

→ x1x2 in Fp delayed carries etc.

  • Machine insns: e.g.,

multiplication pipelining etc.

  • Gates: e.g.,

AND, OR, XOR

7

Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ. Left-to-right

def scalarmult(n,P): if n == if n == R = scalarmult(n//2,P) R = R if n % return

Two Python

  • n//2 in
  • Recursion

See sys.setrecursionlimit

slide-24
SLIDE 24

6

  • ptimize:

e.g., hA windowing etc. e.g., Q faster doubling etc. e.g., in Fp ed carries etc. e.g., multiplication elining etc. e.g., OR

7

Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ. Left-to-right binary

def scalarmult(n,P): if n == 0: return if n == 1: return R = scalarmult(n//2,P) R = R + R if n % 2: R = R return R

Two Python notes:

  • n//2 in Python
  • Recursion depth

See sys.setrecursionlimit

slide-25
SLIDE 25

6

tc. etc. etc.

7

Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ. Left-to-right binary method

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R

Two Python notes:

  • n//2 in Python means ⌊n=
  • Recursion depth is limited.

See sys.setrecursionlimit

slide-26
SLIDE 26

7

Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ.

8

Left-to-right binary method

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R

Two Python notes:

  • n//2 in Python means ⌊n=2⌋.
  • Recursion depth is limited.

See sys.setrecursionlimit.

slide-27
SLIDE 27

7

Single-scalar multiplication undamental ECC operation: nP. n is integer in, e.g., : : : ; 2256 − 1 ¯ . P is point on elliptic curve. build n; P → nP additions P; Q → P + Q subtractions P; Q → P − Q. will also look at double-scalar multiplication ; Q → mP + nQ.

8

Left-to-right binary method

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R

Two Python notes:

  • n//2 in Python means ⌊n=2⌋.
  • Recursion depth is limited.

See sys.setrecursionlimit. This recursion

  • 2

“n 2P e.g. 20

  • 2

„n − 2 e.g. 21 Base cases 0P = 0. 1P = P. Assuming Otherwise

slide-28
SLIDE 28

7

multiplication ECC operation: integer in, e.g., 1 ¯ .

  • n elliptic curve.

nP ; Q → P + Q P; Q → P − Q.

  • k at

ultiplication + nQ.

8

Left-to-right binary method

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R

Two Python notes:

  • n//2 in Python means ⌊n=2⌋.
  • Recursion depth is limited.

See sys.setrecursionlimit. This recursion computes

  • 2

“n 2P ” if n ∈ 2 e.g. 20P = 2 · 10

  • 2

„n − 1 2 P « + P e.g. 21P = 2 · 10 Base cases in recursion: 0P = 0. For Edwa 1P = P. Could omit Assuming n ≥ 0 fo Otherwise use nP

slide-29
SLIDE 29

7

eration: curve. + Q − Q. lication

8

Left-to-right binary method

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R

Two Python notes:

  • n//2 in Python means ⌊n=2⌋.
  • Recursion depth is limited.

See sys.setrecursionlimit. This recursion computes nP

  • 2

“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.

  • 2

„n − 1 2 P « + P if n ∈ 1 e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0 1P = P. Could omit this case. Assuming n ≥ 0 for simplicit Otherwise use nP = −(−n)P

slide-30
SLIDE 30

8

Left-to-right binary method

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R

Two Python notes:

  • n//2 in Python means ⌊n=2⌋.
  • Recursion depth is limited.

See sys.setrecursionlimit.

9

This recursion computes nP as

  • 2

“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.

  • 2

„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P.

slide-31
SLIDE 31

8

Left-to-right binary method

scalarmult(n,P): == 0: return 0 == 1: return P scalarmult(n//2,P) + R % 2: R = R + P return R

Python notes: in Python means ⌊n=2⌋. Recursion depth is limited. sys.setrecursionlimit.

9

This recursion computes nP as

  • 2

“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.

  • 2

„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P. If 0 ≤ n this algo ≤2b − 2 ≤b − 1 doublings ≤b − 1 additions Example 31P = 2(2(2(2 31 = (11111) 4 doublings; Average 35P = 2(2(2(2(2 35 = (100011) 5 doublings;

slide-32
SLIDE 32

8

binary method

scalarmult(n,P): return 0 return P scalarmult(n//2,P) R + P

notes: Python means ⌊n=2⌋. depth is limited. sys.setrecursionlimit.

9

This recursion computes nP as

  • 2

“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.

  • 2

„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P. If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: ≤b − 1 doublings ≤b − 1 additions of Example of worst case: 31P = 2(2(2(2P+ 31 = (11111)2; b = 4 doublings; 4 more Average case is better: 35P = 2(2(2(2(2P 35 = (100011)2; b 5 doublings; 2 additions.

slide-33
SLIDE 33

8

d ⌊ =2⌋. limited. sys.setrecursionlimit.

9

This recursion computes nP as

  • 2

“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.

  • 2

„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P. If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + 35 = (100011)2; b = 6; 5 doublings; 2 additions.

slide-34
SLIDE 34

9

This recursion computes nP as

  • 2

“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.

  • 2

„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P.

10

If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions.

slide-35
SLIDE 35

9

recursion computes nP as P ” if n ∈ 2Z. 20P = 2 · 10P. − 1 2 P « + P if n ∈ 1 + 2Z. 21P = 2 · 10P + P. cases in recursion:

  • 0. For Edwards: 0 = (0; 1).
  • P. Could omit this case.

Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P.

10

If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions. Non-adjacent

def scalarmult(n,P): if n == if n == if n % R = R = return if n % R = R = return R = scalarmult(n/2,P) return

slide-36
SLIDE 36

9

computes nP as 2Z. 10P. P if n ∈ 1 + 2Z. 10P + P. recursion: Edwards: 0 = (0; 1).

  • mit this case.

for simplicity. P = −(−n)P.

10

If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions. Non-adjacent form

def scalarmult(n,P): if n == 0: return if n == 1: return if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) R = scalarmult(n/2,P) return R + R

slide-37
SLIDE 37

9

P as 1 + 2Z. (0; 1). case. simplicity. )P.

10

If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions. Non-adjacent form (NAF)

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R

slide-38
SLIDE 38

10

If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions.

11

Non-adjacent form (NAF)

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R

slide-39
SLIDE 39

10

n < 2b then algorithm uses 2 additions: specifically 1 doublings and 1 additions of P. Example of worst case: 2(2(2(2P+P)+P)+P)+P. (11111)2; b = 5; doublings; 4 more additions. Average case is better: e.g. 2(2(2(2(2P))) + P) + P. (100011)2; b = 6; doublings; 2 additions.

11

Non-adjacent form (NAF)

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R

Subtraction is as cheap NAF tak 31P = 2(2(2(2(2 31 = (10000 35P = 2(2(2(2(2 35 = (10010 “Non-adjacent”: separated Worst case: plus ≈b= On average

slide-40
SLIDE 40

10

then uses additions: specifically doublings and additions of P. rst case: +P)+P)+P)+P. b = 5; more additions. better: e.g. 2(2(2(2(2P))) + P) + P. b = 6; additions.

11

Non-adjacent form (NAF)

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R

Subtraction on the is as cheap as addition. NAF takes advantage 31P = 2(2(2(2(2P 31 = (10000¯ 1)2; ¯ 1 35P = 2(2(2(2(2P 35 = (10010¯ 1)2. “Non-adjacent”: ± separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions On average ≈b=3

slide-41
SLIDE 41

10

ecifically P)+P. additions. e.g. ) + P.

11

Non-adjacent form (NAF)

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R

Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes − 35P = 2(2(2(2(2P)) + P)) − 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions.

slide-42
SLIDE 42

11

Non-adjacent form (NAF)

def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R

12

Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions.

slide-43
SLIDE 43

11

Non-adjacent form (NAF)

scalarmult(n,P): == 0: return 0 == 1: return P % 4 == 1: scalarmult((n-1)/4,P) R + R return (R + R) + P % 4 == 3: scalarmult((n+1)/4,P) R + R return (R + R) - P scalarmult(n/2,P) return R + R

12

Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions. Width-2

def window2(n,P,P3): if n == if n == if n == if n % R = R = R = return if n % R = R = R = return

slide-44
SLIDE 44

11

rm (NAF)

scalarmult(n,P): return 0 return P scalarmult((n-1)/4,P) R) + P scalarmult((n+1)/4,P) R) - P scalarmult(n/2,P)

12

Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions. Width-2 signed sliding

def window2(n,P,P3): if n == 0: return if n == 1: return if n == 3: return if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R)

slide-45
SLIDE 45

11

scalarmult((n-1)/4,P) scalarmult((n+1)/4,P)

12

Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions. Width-2 signed sliding windo

def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3

slide-46
SLIDE 46

12

Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions.

13

Width-2 signed sliding windows

def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3

slide-47
SLIDE 47

12

Subtraction on the curve cheap as addition. takes advantage of this. 2(2(2(2(2P)))) − P. (10000¯ 1)2; ¯ 1 denotes −1. 2(2(2(2(2P)) + P)) − P. (10010¯ 1)2. “Non-adjacent”: ±P ops are rated by ≥2 doublings. case: ≈b doublings b=2 additions of ±P. average ≈b=3 additions.

13

Width-2 signed sliding windows

def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3 if n % R = R = R = return if n % R = R = R = return R = window2(n/2,P,P3) return def scalarmult(n,P): return

slide-48
SLIDE 48

12

the curve addition. advantage of this. 2(2(2(2(2P)))) − P. ¯ 1 denotes −1. 2(2(2(2(2P)) + P)) − P. ±P ops are doublings. doublings additions of ±P. 3 additions.

13

Width-2 signed sliding windows

def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3 if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

slide-49
SLIDE 49

12

this. . denotes −1. )) − P. are doublings. . additions.

13

Width-2 signed sliding windows

def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3 if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

slide-50
SLIDE 50

13

Width-2 signed sliding windows

def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3

14

if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

slide-51
SLIDE 51

13

Width-2 signed sliding windows

window2(n,P,P3): == 0: return 0 == 1: return P == 3: return P3 % 8 == 1: window2((n-1)/8,P,P3) R + R R + R return (R + R) + P % 8 == 3: window2((n-3)/8,P,P3) R + R R + R return (R + R) + P3

14

if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

Worst case: ≈b=3 additions On average

slide-52
SLIDE 52

13

sliding windows

window2(n,P,P3): return 0 return P return P3 window2((n-1)/8,P,P3) R) + P window2((n-3)/8,P,P3) R) + P3

14

if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

Worst case: ≈b doublings ≈b=3 additions of On average ≈b=4

slide-53
SLIDE 53

13

windows

window2((n-1)/8,P,P3) window2((n-3)/8,P,P3)

14

if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3 On average ≈b=4 additions.

slide-54
SLIDE 54

14

if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

15

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions.

slide-55
SLIDE 55

14

if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

15

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions.

slide-56
SLIDE 56

14

if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

15

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions.

slide-57
SLIDE 57

14

if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)

15

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions.

slide-58
SLIDE 58

14

% 8 == 5: window2((n+3)/8,P,P3) R + R R + R return (R + R) - P3 % 8 == 7: window2((n+1)/8,P,P3) R + R R + R return (R + R) - P window2(n/2,P,P3) return R + R scalarmult(n,P): return window2(n,P,P+P+P)

15

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions. Double-scala Want to m; P; n; Q e.g. verify by computing computing checking Obvious Compute e.g. b = ≈256 doublings ≈256 doublings ≈50 additions ≈50 additions

slide-59
SLIDE 59

14

window2((n+3)/8,P,P3) R) - P3 window2((n+1)/8,P,P3) R) - P window2(n/2,P,P3) scalarmult(n,P): window2(n,P,P+P+P)

15

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions. Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + e.g. verify signature by computing h = computing SB − h checking whether R Obvious approach: Compute mP; compute e.g. b = 256: ≈256 doublings fo ≈256 doublings fo ≈50 additions for m ≈50 additions for n

slide-60
SLIDE 60

14

window2((n+3)/8,P,P3) window2((n+1)/8,P,P3) window2(n,P,P+P+P)

15

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions. Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − Obvious approach: Compute mP; compute nQ; e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ.

slide-61
SLIDE 61

15

Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions.

16

Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ.

slide-62
SLIDE 62

15

case: ≈b doublings plus additions of ±P or ±3P. average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. average ≈b=5 additions. 4: Precompute 5P; 7P; 9P; 11P; 13P; 15P. average ≈b=6 additions.

  • f precomputation

eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions.

16

Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ. Joint doublings Do much 2X + 2Y

def scalarmult2(m,P,n,Q): if m == return if n == return R = scalarmult2(m//2,P,n//2,Q) R = R if m % if n % return

slide-63
SLIDE 63

15

doublings plus

  • f ±P or ±3P.

4 additions. sliding windows: P; 5P; 7P. 5 additions. Precompute ; 11P; 13P; 15P. 6 additions. recomputation eighs savings. doublings plus additions.

16

Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ. Joint doublings Do much better by 2X + 2Y into 2(X

def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R if n % 2: R = R return R

slide-64
SLIDE 64

15

plus ±3P. additions. windows: additions. ; 15P. additions. savings. plus

16

Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ. Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).

def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R

slide-65
SLIDE 65

16

Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ.

17

Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).

def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R

slide-66
SLIDE 66

16

Double-scalar multiplication to quickly compute ; Q → mP + nQ. verify signature (R; S) computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. = 256: doublings for mP, doublings for nQ, additions for mP, additions for nQ.

17

Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).

def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R

For example: 35P = 2(2(2(2(2 31Q = 2(2(2(2 into 35P 2(2(2(2(2 +P ≈b doublings ≈b=2 additions ≈b=2 additions Combine ≈256 doublings ≈50 additions ≈50 additions

slide-67
SLIDE 67

16

multiplication compute + nQ. signature (R; S) = H(R; M), hA, whether R = SB − hA. roach: compute nQ; add. for mP, for nQ, r mP, r nQ.

17

Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).

def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R

For example: merge 35P = 2(2(2(2(2P 31Q = 2(2(2(2Q+ into 35P + 31Q = 2(2(2(2(2P+Q)+Q +P+Q. ≈b doublings (merged!), ≈b=2 additions of ≈b=2 additions of Combine idea with ≈256 doublings fo ≈50 additions usin ≈50 additions usin

slide-68
SLIDE 68

16

multiplication ) ), − hA. ; add.

17

Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).

def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R

For example: merge 35P = 2(2(2(2(2P))) + P) + 31Q = 2(2(2(2Q+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q.

slide-69
SLIDE 69

17

Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).

def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R

18

For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q.

slide-70
SLIDE 70

17

doublings much better by merging 2Y into 2(X + Y ).

scalarmult2(m,P,n,Q): == 0: return scalarmult(n,Q) == 0: return scalarmult(m,P) scalarmult2(m//2,P,n//2,Q) + R % 2: R = R + P % 2: R = R + Q return R

18

For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q. Batch verification Verifying need to b S1B = R S2B = R S3B = R etc. Obvious Check each

slide-71
SLIDE 71

17

by merging X + Y ).

scalarmult2(m,P,n,Q): scalarmult(n,Q) scalarmult(m,P) scalarmult2(m//2,P,n//2,Q) R + P R + Q

18

For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q. Batch verification Verifying many signatures: need to be confident S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation

slide-72
SLIDE 72

17

merging

scalarmult2(m,P,n,Q): scalarmult(n,Q) scalarmult(m,P) scalarmult2(m//2,P,n//2,Q)

18

For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q. Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately

slide-73
SLIDE 73

18

For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q.

19

Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately.

slide-74
SLIDE 74

18

For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q.

19

Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination

  • f the equations.
slide-75
SLIDE 75

18

example: merge 2(2(2(2(2P))) + P) + P, 2(2(2(2Q+Q)+Q)+Q)+Q 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) P+Q. doublings (merged!), additions of P, additions of Q. Combine idea with windows: e.g., doublings for b = 256, additions using P, additions using Q.

19

Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination

  • f the equations.

Pick indep 128-bit z Check whether (z1S1 + z1R1 + ( z2R2 + ( z3R3 + ( (If =: See Doumen–Lange–Oosterwijk.) Easy to p forgeries

  • f fooling
slide-76
SLIDE 76

18

merge 2(2(2(2(2P))) + P) + P, +Q)+Q)+Q)+Q = )+Q)+Q)+P+Q) (merged!),

  • f P,
  • f Q.

with windows: e.g., for b = 256, sing P, sing Q.

19

Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination

  • f the equations.

Pick independent unifo 128-bit z1; z2; z3; : Check whether (z1S1 + z2S2 + z3S z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probabilit

  • f fooling this check.
slide-77
SLIDE 77

18

) + P, )+Q)+Q P+Q) ws: e.g., 256,

19

Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination

  • f the equations.

Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2

  • f fooling this check.
slide-78
SLIDE 78

19

Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination

  • f the equations.

20

Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128

  • f fooling this check.
slide-79
SLIDE 79

19

verification erifying many signatures: to be confident that R1 + h1A1, R2 + h2A2, R3 + h3A3, Obvious approach: each equation separately. faster approach: random linear combination equations.

20

Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128

  • f fooling this check.

Multi-scala Review of 1939 Brauer ≈ (1 + 1 additions P → nP 1964 Straus ≈ (1 + k additions P1; : : : ; P if n1; : : :

slide-80
SLIDE 80

19

verification signatures: confident that

1, 2, 3,

roach: equation separately. roach: linear combination equations.

20

Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128

  • f fooling this check.

Multi-scalar multip Review of asymptotic 1939 Brauer (wind ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 if n1; : : : ; nk < 2b.

slide-81
SLIDE 81

19

signatures: rately. combination

20

Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128

  • f fooling this check.

Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + n if n1; : : : ; nk < 2b.

slide-82
SLIDE 82

20

Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128

  • f fooling this check.

21

Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b.

slide-83
SLIDE 83

20

independent uniform random 128-bit z1; z2; z3; : : :. whether + z2S2 + z3S3 + · · ·)B = (z1h1)A1 + (z2h2)A2 + (z3h3)A3 + · · ·. See 2012 Bernstein– Doumen–Lange–Oosterwijk.) to prove: rgeries have probability ≤2−128

  • ling this check.

21

Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b. 1976 Yao: ≈ (1 + k additions P → n1P if n1; : : : 1976 Pipp Similar asym but replace Faster than if k is large. (Knuth sa as if speed

slide-84
SLIDE 84

20

endent uniform random ; : : :. z3S3 + · · ·)B = + + + · · ·. Bernstein– Doumen–Lange–Oosterwijk.) robability ≤2−128 check.

21

Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b. 1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with Faster than Straus if k is large. (Knuth says “generalization” as if speed were the

slide-85
SLIDE 85

20

random ·)B = Bernstein– Doumen–Lange–Oosterwijk.) ≤2−128

21

Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b. 1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.)

slide-86
SLIDE 86

21

Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b.

22

1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.)

slide-87
SLIDE 87

21

Multi-scalar multiplication

  • f asymptotic speeds:

Brauer (windows): 1=lg b)b additions to compute P if n < 2b. Straus (joint doublings): k=lg b)b additions to compute ; Pk → n1P1 + · · · + nkPk : : ; nk < 2b.

22

1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.) More generally algorithm ‘ sums of ≈ „ min{ if all coefficients Within 1

slide-88
SLIDE 88

21

multiplication asymptotic speeds: (windows): compute 2b. (joint doublings): compute P1 + · · · + nkPk

b.

22

1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.) More generally, Pipp algorithm computes ‘ sums of multiples ≈ „ min{k; ‘} + lg if all coefficients are Within 1 + › of optimal.

slide-89
SLIDE 89

21

eeds: doublings): nkPk

22

1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.) More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b if all coefficients are below 2 Within 1 + › of optimal.

slide-90
SLIDE 90

22

1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.)

23

More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal.

slide-91
SLIDE 91

22

1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.)

23

More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story?

slide-92
SLIDE 92

22

ao: k=lg b)b additions to compute

1P; : : : ; nkP

: : ; nk < 2b. Pippenger: r asymptotics, replace lg b with lg(kb). than Straus and Yao large. (Knuth says “generalization” speed were the same.)

23

More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story? No! 1989 If n1 ≥ n n1P1 + n (n1 − qn n3P3 + · Remarkab competitive for random much better

slide-93
SLIDE 93

22

compute P

b.

tics, with lg(kb). Straus and Yao “generalization” the same.)

23

More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story? No! 1989 Bos–Coste If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P (n1 − qn2)P1 + n2 n3P3 + · · · where q Remarkably simple; competitive with Pipp for random choices much better memo

slide-94
SLIDE 94

22

). ao “generalization” same.)

23

More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story? No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2 n3P3 + · · · where q = ⌊n1=n Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage.

slide-95
SLIDE 95

23

More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story?

24

No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage.

slide-96
SLIDE 96

23

generally, Pippenger’s rithm computes

  • f multiples of k inputs.

min{k; ‘} + k‘ lg(k‘b) « b adds coefficients are below 2b. 1 + › of optimal. rious special cases of enger’s algorithm were reinvented and patented by Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. the end of the story?

24

No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage. Example 000100000 000010000 100101100 010010010 001001101 000000010 000000001 Goal: Compute 300P, 146

slide-97
SLIDE 97

23

Pippenger’s computes multiples of k inputs. k‘ lg(k‘b) « b adds are below 2b.

  • ptimal.

cases of rithm were patented by rdon–McCurley– Lim–Lee, etc.

  • f the story?

24

No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage. Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32 300P, 146P, 77P,

slide-98
SLIDE 98

23

enger’s inputs. b adds 2b. ere y rdon–McCurley– etc. ry?

24

No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage. Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

slide-99
SLIDE 99

24

No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage.

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

slide-100
SLIDE 100

24

1989 Bos–Coster: n2 ≥ · · · then n2P2 + n3P3 + · · · = qn2)P1 + n2(qP1 + P2) + · · · where q = ⌊n1=n2⌋. rkably simple; etitive with Pippenger andom choices of ni’s; better memory usage.

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P. Reduce la 000100000 000010000 010011010 010010010 001001101 000000010 000000001 Goal: Compute 154P, 146 Plus one add 146P

  • btaining
slide-101
SLIDE 101

24

Bos–Coster: then

3P3 + · · · =

n2(qP1 + P2) + where q = ⌊n1=n2⌋. simple; Pippenger choices of ni’s; memory usage.

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P. Reduce largest row: 000100000 = 32 000010000 = 16 010011010 = 154 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32 154P, 146P, 77P, Plus one extra addition: add 146P into 154

  • btaining 300P.
slide-102
SLIDE 102

24

= P2) + =n2⌋. enger ’s; usage.

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P. Reduce largest row: 000100000 = 32 000010000 = 16 010011010 = 154 ← 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 154P, 146P, 77P, 2P, 1P. Plus one extra addition: add 146P into 154P,

  • btaining 300P.
slide-103
SLIDE 103

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000100000 = 32 000010000 = 16 010011010 = 154 ← 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 154P, 146P, 77P, 2P, 1P. Plus one extra addition: add 146P into 154P,

  • btaining 300P.
slide-104
SLIDE 104

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 ← 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 plus 2 additions.

slide-105
SLIDE 105

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 001000101 = 69 ← 001001101 = 77 000000010 = 2 000000001 = 1 plus 3 additions.

slide-106
SLIDE 106

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 001000101 = 69 000001000 = 8 ← 000000010 = 2 000000001 = 1 plus 4 additions.

slide-107
SLIDE 107

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 000100101 = 37 ← 000001000 = 8 000000010 = 2 000000001 = 1 plus 5 additions.

slide-108
SLIDE 108

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 000000101 = 5 ← 000001000 = 8 000000010 = 2 000000001 = 1 plus 6 additions.

slide-109
SLIDE 109

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000010000 = 16 ← 000010000 = 16 000001000 = 8 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 7 additions.

slide-110
SLIDE 110

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000010000 = 16 000001000 = 8 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 7 additions.

slide-111
SLIDE 111

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000001000 = 8 ← 000001000 = 8 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 8 additions.

slide-112
SLIDE 112

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 ← 000001000 = 8 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 8 additions.

slide-113
SLIDE 113

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 ← 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 8 additions.

slide-114
SLIDE 114

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000101 = 5 000000011 = 3 ← 000000010 = 2 000000001 = 1 plus 9 additions.

slide-115
SLIDE 115

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000010 = 2 ← 000000011 = 3 000000010 = 2 000000001 = 1 plus 10 additions.

slide-116
SLIDE 116

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000010 = 2 000000001 = 1 ← 000000010 = 2 000000001 = 1 plus 11 additions.

slide-117
SLIDE 117

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← 000000001 = 1 000000010 = 2 000000001 = 1 plus 11 additions.

slide-118
SLIDE 118

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000001 = 1 000000001 = 1 ← 000000001 = 1 plus 12 additions.

slide-119
SLIDE 119

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← 000000001 = 1 000000001 = 1 plus 12 additions.

slide-120
SLIDE 120

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← 000000001 = 1 plus 12 additions.

slide-121
SLIDE 121

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity.

slide-122
SLIDE 122

25

Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Compute 32P, 16P, 146P, 77P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity. Revised goal: 32P1 + 16 77P5 + 2 First compute and then 32P1 + 16 77P5 + 2 Same scala Ed25519 verify batch about twice verifying

slide-123
SLIDE 123

25

Bos–Coster: 300 146 32P, 16P, P, 2P, 1P.

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity. Revised goal: Compute 32P1 + 16P2 + 300 77P5 + 2P6 + 1P7. First compute P ′

4 =

and then recursively 32P1 + 16P2 + 154 77P5 + 2P6 + 1P7. Same scalars show Ed25519 batch verification: verify batch of 64 about twice as fast verifying each sepa

slide-124
SLIDE 124

25

, .

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity. Revised goal: Compute 32P1 + 16P2 + 300P3 + 146 77P5 + 2P6 + 1P7. First compute P ′

4 = P4 + P3

and then recursively compute 32P1 + 16P2 + 154P3 + 146 77P5 + 2P6 + 1P7. Same scalars show up as befo Ed25519 batch verification: verify batch of 64 signatures about twice as fast as verifying each separately.

slide-125
SLIDE 125

26

Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity.

27

Revised goal: Compute 32P1 + 16P2 + 300P3 + 146P4 + 77P5 + 2P6 + 1P7. First compute P ′

4 = P4 + P3

and then recursively compute 32P1 + 16P2 + 154P3 + 146P ′

4 +

77P5 + 2P6 + 1P7. Same scalars show up as before. Ed25519 batch verification: verify batch of 64 signatures about twice as fast as verifying each separately.