SLIDE 1 1
Modern ECC signatures 2011 Bernstein–Duif–Lange– Schwabe–Yang: Ed25519 signature scheme = EdDSA using conservative Curve25519 elliptic curve. https://ed25519.cr.yp.to 32-byte public keys, 64-byte signatures, ≈2125:8 security level. Deployed in SSH, Signal, many more applications: https://ianix.com/pub /ed25519-deployment.html
2
Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:
- n Intel Sandy Bridge (2011),
57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´
- pez–Menezes:
- n Intel Pentium II (1997),
1920000 cycles for ECDH using NIST P-256 curve.
SLIDE 2 1
dern ECC signatures Bernstein–Duif–Lange– abe–Yang: Ed25519 signature scheme = using conservative Curve25519 elliptic curve. https://ed25519.cr.yp.to yte public keys, yte signatures,
8 security level.
ed in SSH, Signal, more applications: https://ianix.com/pub /ed25519-deployment.html
2
Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:
- n Intel Sandy Bridge (2011),
57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´
- pez–Menezes:
- n Intel Pentium II (1997),
1920000 cycles for ECDH using NIST P-256 curve. AC: cycles Does AC A is better
SLIDE 3 1
signatures Bernstein–Duif–Lange– signature scheme = conservative elliptic curve. https://ed25519.cr.yp.to eys, signatures, level. SSH, Signal, applications: https://ianix.com/pub /ed25519-deployment.html
2
Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:
- n Intel Sandy Bridge (2011),
57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´
- pez–Menezes:
- n Intel Pentium II (1997),
1920000 cycles for ECDH using NIST P-256 curve. AC: cycles for alg Does AC < BD prove A is better than B
SLIDE 4 1
Bernstein–Duif–Lange– = https://ed25519.cr.yp.to /ed25519-deployment.html
2
Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:
- n Intel Sandy Bridge (2011),
57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´
- pez–Menezes:
- n Intel Pentium II (1997),
1920000 cycles for ECDH using NIST P-256 curve. AC: cycles for alg A on CPU Does AC < BD prove that A is better than B?
SLIDE 5 2
Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:
- n Intel Sandy Bridge (2011),
57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´
- pez–Menezes:
- n Intel Pentium II (1997),
1920000 cycles for ECDH using NIST P-256 curve.
3
AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B?
SLIDE 6 2
Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software:
- n Intel Sandy Bridge (2011),
57164 cycles for keygen, 63526 cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– Hankerson–L´
- pez–Menezes:
- n Intel Pentium II (1997),
1920000 cycles for ECDH using NIST P-256 curve.
3
AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second.
SLIDE 7 2
papers have explored Curve25519/Ed25519 speed. 2015 Chou software: Intel Sandy Bridge (2011), cycles for keygen, cycles for signature, 205741 cycles for verification, 159128 cycles for ECDH. Compare to, e.g., 2000 Brown– erson–L´
Intel Pentium II (1997), 1920000 cycles for ECDH NIST P-256 curve.
3
AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second. Better compa (still raising ECDH on (still not 1920000 832457 cycles ECDH on 374000 cycles (from 2013 159128 cycles Verification 529000 cycles 205741 cycles
SLIDE 8
2
have explored Curve25519/Ed25519 speed. software: Bridge (2011), keygen, signature, r verification, r ECDH. e.g., 2000 Brown– ez–Menezes: II (1997), for ECDH P-256 curve.
3
AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second. Better comparisons (still raising many ECDH on Intel Pentium (still not exactly the 1920000 cycles for 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.
SLIDE 9
2
red eed. (2011), signature, verification, Brown– ez–Menezes: (1997),
3
AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second. Better comparisons (still raising many questions): ECDH on Intel Pentium II/II (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.
SLIDE 10
3
AC: cycles for alg A on CPU C. Does AC < BD prove that A is better than B? No! Beware change in CPU. Maybe AC > BC; AD > BD; C does more work per cycle than D, thanks to CPU manufacturer. Sometimes people measure cost in seconds instead of cycles. Then they benefit from more work per cycle and from more cycles per second.
4
Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.
SLIDE 11 3
cycles for alg A on CPU C.
C < BD prove that
etter than B? Beware change in CPU. AC > BC; AD > BD; es more work per cycle than thanks to CPU manufacturer. Sometimes people measure cost seconds instead of cycles. they benefit more work per cycle and more cycles per second.
4
Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519. For each
Simplest are much, Questions and softw How to build
Ed25519 Answers design: e.g.,
SLIDE 12 3
alg A on CPU C. prove that B? change in CPU. ; AD > BD; rk per cycle than CPU manufacturer. eople measure cost instead of cycles. enefit per cycle and cycles per second.
4
Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519. For each of these op
- n each of these curves,
- n each of these CPUs:
Simplest implemen are much, much, much Questions in algorithm and software engineering: How to build the fastes
Ed25519 signature Answers feed back design: e.g., choosing
SLIDE 13 3
CPU C. CPU.
D;
cycle than manufacturer. measure cost cycles. and cond.
4
Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519. For each of these operations
- n each of these curves,
- n each of these CPUs:
Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest soft
- n, e.g., an ARM Cortex-A8
Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves.
SLIDE 14 4
Better comparisons (still raising many questions): ECDH on Intel Pentium II/III (still not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519. ECDH on Sandy Bridge: 374000 cycles for NIST P-256 (from 2013 Gueron–Krasnov), 159128 cycles for Curve25519. Verification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.
5
For each of these operations,
- n each of these curves,
- n each of these CPUs:
Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software
- n, e.g., an ARM Cortex-A8 for
Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves.
SLIDE 15 4
comparisons raising many questions):
not exactly the same): 1920000 cycles for NIST P-256, 832457 cycles for Curve25519.
374000 cycles for NIST P-256 2013 Gueron–Krasnov), 159128 cycles for Curve25519. erification on Sandy Bridge: 529000 cycles for ECDSA-P-256, 205741 cycles for Ed25519.
5
For each of these operations,
- n each of these curves,
- n each of these CPUs:
Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software
- n, e.g., an ARM Cortex-A8 for
Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves. Several levels ECC verify S Point P; Q Field x1; x2 → Machine 32-bit multiplication Gates: AND,
SLIDE 16 4
risons many questions): entium II/III the same): for NIST P-256, r Curve25519. Bridge: r NIST P-256 Gueron–Krasnov), r Curve25519. Sandy Bridge: r ECDSA-P-256, r Ed25519.
5
For each of these operations,
- n each of these curves,
- n each of these CPUs:
Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software
- n, e.g., an ARM Cortex-A8 for
Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves. Several levels to optimize: ECC ops: e.g., verify SB = R + h windowing
P; Q → P + Q faster doubling
x1; x2 → x1x2 in F delayed
32-bit multiplication pipelining
AND, OR, XOR
SLIDE 17 4
questions): I/III same): P-256, Curve25519. P-256 Gueron–Krasnov), Curve25519. Bridge: ECDSA-P-256, Ed25519.
5
For each of these operations,
- n each of these curves,
- n each of these CPUs:
Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software
- n, e.g., an ARM Cortex-A8 for
Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves. Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.
P; Q → P + Q faster doubling etc
x1; x2 → x1x2 in Fp delayed carries etc.
32-bit multiplication pipelining etc.
AND, OR, XOR
SLIDE 18 5
For each of these operations,
- n each of these curves,
- n each of these CPUs:
Simplest implementations are much, much, much slower. Questions in algorithm design and software engineering: How to build the fastest software
- n, e.g., an ARM Cortex-A8 for
Ed25519 signature verification? Answers feed back into crypto design: e.g., choosing fast curves.
6
Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.
P; Q → P + Q faster doubling etc.
x1; x2 → x1x2 in Fp delayed carries etc.
32-bit multiplication pipelining etc.
AND, OR, XOR
SLIDE 19 5
each of these operations, each of these curves, each of these CPUs: Simplest implementations much, much, much slower. Questions in algorithm design software engineering: to build the fastest software e.g., an ARM Cortex-A8 for Ed25519 signature verification? ers feed back into crypto design: e.g., choosing fast curves.
6
Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.
P; Q → P + Q faster doubling etc.
x1; x2 → x1x2 in Fp delayed carries etc.
32-bit multiplication pipelining etc.
AND, OR, XOR Single-scala Fundamental n; P → n Input n is ˘ 0; 1; : : : Input P Will build using additions and subtractions Later will double-scala m; P; n; Q
SLIDE 20 5
these operations, curves, CPUs: implementations much, much slower. algorithm design engineering: the fastest software Cortex-A8 for signature verification? back into crypto
6
Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.
P; Q → P + Q faster doubling etc.
x1; x2 → x1x2 in Fp delayed carries etc.
32-bit multiplication pipelining etc.
AND, OR, XOR Single-scalar multip Fundamental ECC n; P → nP. Input n is integer in, ˘ 0; 1; : : : ; 2256 − 1 Input P is point on Will build n; P → n using additions P; Q and subtractions P Later will also look double-scalar multip m; P; n; Q → mP +
SLIDE 21 5
erations, wer. design software rtex-A8 for verification? ypto curves.
6
Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.
P; Q → P + Q faster doubling etc.
x1; x2 → x1x2 in Fp delayed carries etc.
32-bit multiplication pipelining etc.
AND, OR, XOR Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + and subtractions P; Q → P − Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ.
SLIDE 22 6
Several levels to optimize: ECC ops: e.g., verify SB = R + hA windowing etc.
P; Q → P + Q faster doubling etc.
x1; x2 → x1x2 in Fp delayed carries etc.
32-bit multiplication pipelining etc.
AND, OR, XOR
7
Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ.
SLIDE 23 6
Several levels to optimize: ECC ops: e.g., SB = R + hA windowing etc.
Q → P + Q faster doubling etc.
→ x1x2 in Fp delayed carries etc.
multiplication pipelining etc.
AND, OR, XOR
7
Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ. Left-to-right
def scalarmult(n,P): if n == if n == R = scalarmult(n//2,P) R = R if n % return
Two Python
See sys.setrecursionlimit
SLIDE 24 6
e.g., hA windowing etc. e.g., Q faster doubling etc. e.g., in Fp ed carries etc. e.g., multiplication elining etc. e.g., OR
7
Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ. Left-to-right binary
def scalarmult(n,P): if n == 0: return if n == 1: return R = scalarmult(n//2,P) R = R + R if n % 2: R = R return R
Two Python notes:
- n//2 in Python
- Recursion depth
See sys.setrecursionlimit
SLIDE 25 6
tc. etc. etc.
7
Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ. Left-to-right binary method
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R
Two Python notes:
- n//2 in Python means ⌊n=
- Recursion depth is limited.
See sys.setrecursionlimit
SLIDE 26 7
Single-scalar multiplication Fundamental ECC operation: n; P → nP. Input n is integer in, e.g., ˘ 0; 1; : : : ; 2256 − 1 ¯ . Input P is point on elliptic curve. Will build n; P → nP using additions P; Q → P + Q and subtractions P; Q → P − Q. Later will also look at double-scalar multiplication m; P; n; Q → mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R
Two Python notes:
- n//2 in Python means ⌊n=2⌋.
- Recursion depth is limited.
See sys.setrecursionlimit.
SLIDE 27 7
Single-scalar multiplication undamental ECC operation: nP. n is integer in, e.g., : : : ; 2256 − 1 ¯ . P is point on elliptic curve. build n; P → nP additions P; Q → P + Q subtractions P; Q → P − Q. will also look at double-scalar multiplication ; Q → mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R
Two Python notes:
- n//2 in Python means ⌊n=2⌋.
- Recursion depth is limited.
See sys.setrecursionlimit. This recursion
“n 2P e.g. 20
„n − 2 e.g. 21 Base cases 0P = 0. 1P = P. Assuming Otherwise
SLIDE 28 7
multiplication ECC operation: integer in, e.g., 1 ¯ .
nP ; Q → P + Q P; Q → P − Q.
ultiplication + nQ.
8
Left-to-right binary method
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R
Two Python notes:
- n//2 in Python means ⌊n=2⌋.
- Recursion depth is limited.
See sys.setrecursionlimit. This recursion computes
“n 2P ” if n ∈ 2 e.g. 20P = 2 · 10
„n − 1 2 P « + P e.g. 21P = 2 · 10 Base cases in recursion: 0P = 0. For Edwa 1P = P. Could omit Assuming n ≥ 0 fo Otherwise use nP
SLIDE 29 7
eration: curve. + Q − Q. lication
8
Left-to-right binary method
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R
Two Python notes:
- n//2 in Python means ⌊n=2⌋.
- Recursion depth is limited.
See sys.setrecursionlimit. This recursion computes nP
“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.
„n − 1 2 P « + P if n ∈ 1 e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0 1P = P. Could omit this case. Assuming n ≥ 0 for simplicit Otherwise use nP = −(−n)P
SLIDE 30 8
Left-to-right binary method
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P R = scalarmult(n//2,P) R = R + R if n % 2: R = R + P return R
Two Python notes:
- n//2 in Python means ⌊n=2⌋.
- Recursion depth is limited.
See sys.setrecursionlimit.
9
This recursion computes nP as
“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.
„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P.
SLIDE 31 8
Left-to-right binary method
scalarmult(n,P): == 0: return 0 == 1: return P scalarmult(n//2,P) + R % 2: R = R + P return R
Python notes: in Python means ⌊n=2⌋. Recursion depth is limited. sys.setrecursionlimit.
9
This recursion computes nP as
“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.
„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P. If 0 ≤ n this algo ≤2b − 2 ≤b − 1 doublings ≤b − 1 additions Example 31P = 2(2(2(2 31 = (11111) 4 doublings; Average 35P = 2(2(2(2(2 35 = (100011) 5 doublings;
SLIDE 32 8
binary method
scalarmult(n,P): return 0 return P scalarmult(n//2,P) R + P
notes: Python means ⌊n=2⌋. depth is limited. sys.setrecursionlimit.
9
This recursion computes nP as
“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.
„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P. If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: ≤b − 1 doublings ≤b − 1 additions of Example of worst case: 31P = 2(2(2(2P+ 31 = (11111)2; b = 4 doublings; 4 more Average case is better: 35P = 2(2(2(2(2P 35 = (100011)2; b 5 doublings; 2 additions.
SLIDE 33 8
d ⌊ =2⌋. limited. sys.setrecursionlimit.
9
This recursion computes nP as
“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.
„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P. If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + 35 = (100011)2; b = 6; 5 doublings; 2 additions.
SLIDE 34 9
This recursion computes nP as
“n 2P ” if n ∈ 2Z. e.g. 20P = 2 · 10P.
„n − 1 2 P « + P if n ∈ 1 + 2Z. e.g. 21P = 2 · 10P + P. Base cases in recursion: 0P = 0. For Edwards: 0 = (0; 1). 1P = P. Could omit this case. Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P.
10
If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions.
SLIDE 35 9
recursion computes nP as P ” if n ∈ 2Z. 20P = 2 · 10P. − 1 2 P « + P if n ∈ 1 + 2Z. 21P = 2 · 10P + P. cases in recursion:
- 0. For Edwards: 0 = (0; 1).
- P. Could omit this case.
Assuming n ≥ 0 for simplicity. Otherwise use nP = −(−n)P.
10
If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions. Non-adjacent
def scalarmult(n,P): if n == if n == if n % R = R = return if n % R = R = return R = scalarmult(n/2,P) return
SLIDE 36 9
computes nP as 2Z. 10P. P if n ∈ 1 + 2Z. 10P + P. recursion: Edwards: 0 = (0; 1).
for simplicity. P = −(−n)P.
10
If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions. Non-adjacent form
def scalarmult(n,P): if n == 0: return if n == 1: return if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) R = scalarmult(n/2,P) return R + R
SLIDE 37
9
P as 1 + 2Z. (0; 1). case. simplicity. )P.
10
If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions. Non-adjacent form (NAF)
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R
SLIDE 38
10
If 0 ≤ n < 2b then this algorithm uses ≤2b − 2 additions: specifically ≤b − 1 doublings and ≤b − 1 additions of P. Example of worst case: 31P = 2(2(2(2P+P)+P)+P)+P. 31 = (11111)2; b = 5; 4 doublings; 4 more additions. Average case is better: e.g. 35P = 2(2(2(2(2P))) + P) + P. 35 = (100011)2; b = 6; 5 doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R
SLIDE 39
10
n < 2b then algorithm uses 2 additions: specifically 1 doublings and 1 additions of P. Example of worst case: 2(2(2(2P+P)+P)+P)+P. (11111)2; b = 5; doublings; 4 more additions. Average case is better: e.g. 2(2(2(2(2P))) + P) + P. (100011)2; b = 6; doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R
Subtraction is as cheap NAF tak 31P = 2(2(2(2(2 31 = (10000 35P = 2(2(2(2(2 35 = (10010 “Non-adjacent”: separated Worst case: plus ≈b= On average
SLIDE 40
10
then uses additions: specifically doublings and additions of P. rst case: +P)+P)+P)+P. b = 5; more additions. better: e.g. 2(2(2(2(2P))) + P) + P. b = 6; additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R
Subtraction on the is as cheap as addition. NAF takes advantage 31P = 2(2(2(2(2P 31 = (10000¯ 1)2; ¯ 1 35P = 2(2(2(2(2P 35 = (10010¯ 1)2. “Non-adjacent”: ± separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions On average ≈b=3
SLIDE 41
10
ecifically P)+P. additions. e.g. ) + P.
11
Non-adjacent form (NAF)
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R
Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes − 35P = 2(2(2(2(2P)) + P)) − 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions.
SLIDE 42
11
Non-adjacent form (NAF)
def scalarmult(n,P): if n == 0: return 0 if n == 1: return P if n % 4 == 1: R = scalarmult((n-1)/4,P) R = R + R return (R + R) + P if n % 4 == 3: R = scalarmult((n+1)/4,P) R = R + R return (R + R) - P R = scalarmult(n/2,P) return R + R
12
Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions.
SLIDE 43
11
Non-adjacent form (NAF)
scalarmult(n,P): == 0: return 0 == 1: return P % 4 == 1: scalarmult((n-1)/4,P) R + R return (R + R) + P % 4 == 3: scalarmult((n+1)/4,P) R + R return (R + R) - P scalarmult(n/2,P) return R + R
12
Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions. Width-2
def window2(n,P,P3): if n == if n == if n == if n % R = R = R = return if n % R = R = R = return
SLIDE 44
11
rm (NAF)
scalarmult(n,P): return 0 return P scalarmult((n-1)/4,P) R) + P scalarmult((n+1)/4,P) R) - P scalarmult(n/2,P)
12
Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions. Width-2 signed sliding
def window2(n,P,P3): if n == 0: return if n == 1: return if n == 3: return if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R)
SLIDE 45
11
scalarmult((n-1)/4,P) scalarmult((n+1)/4,P)
12
Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions. Width-2 signed sliding windo
def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3
SLIDE 46
12
Subtraction on the curve is as cheap as addition. NAF takes advantage of this. 31P = 2(2(2(2(2P)))) − P. 31 = (10000¯ 1)2; ¯ 1 denotes −1. 35P = 2(2(2(2(2P)) + P)) − P. 35 = (10010¯ 1)2. “Non-adjacent”: ±P ops are separated by ≥2 doublings. Worst case: ≈b doublings plus ≈b=2 additions of ±P. On average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3
SLIDE 47
12
Subtraction on the curve cheap as addition. takes advantage of this. 2(2(2(2(2P)))) − P. (10000¯ 1)2; ¯ 1 denotes −1. 2(2(2(2(2P)) + P)) − P. (10010¯ 1)2. “Non-adjacent”: ±P ops are rated by ≥2 doublings. case: ≈b doublings b=2 additions of ±P. average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3 if n % R = R = R = return if n % R = R = R = return R = window2(n/2,P,P3) return def scalarmult(n,P): return
SLIDE 48
12
the curve addition. advantage of this. 2(2(2(2(2P)))) − P. ¯ 1 denotes −1. 2(2(2(2(2P)) + P)) − P. ±P ops are doublings. doublings additions of ±P. 3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3 if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
SLIDE 49
12
this. . denotes −1. )) − P. are doublings. . additions.
13
Width-2 signed sliding windows
def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3 if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
SLIDE 50
13
Width-2 signed sliding windows
def window2(n,P,P3): if n == 0: return 0 if n == 1: return P if n == 3: return P3 if n % 8 == 1: R = window2((n-1)/8,P,P3) R = R + R R = R + R return (R + R) + P if n % 8 == 3: R = window2((n-3)/8,P,P3) R = R + R R = R + R return (R + R) + P3
14
if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
SLIDE 51
13
Width-2 signed sliding windows
window2(n,P,P3): == 0: return 0 == 1: return P == 3: return P3 % 8 == 1: window2((n-1)/8,P,P3) R + R R + R return (R + R) + P % 8 == 3: window2((n-3)/8,P,P3) R + R R + R return (R + R) + P3
14
if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
Worst case: ≈b=3 additions On average
SLIDE 52
13
sliding windows
window2(n,P,P3): return 0 return P return P3 window2((n-1)/8,P,P3) R) + P window2((n-3)/8,P,P3) R) + P3
14
if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
Worst case: ≈b doublings ≈b=3 additions of On average ≈b=4
SLIDE 53
13
windows
window2((n-1)/8,P,P3) window2((n-3)/8,P,P3)
14
if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3 On average ≈b=4 additions.
SLIDE 54
14
if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions.
SLIDE 55
14
if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions.
SLIDE 56
14
if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions.
SLIDE 57
14
if n % 8 == 5: R = window2((n+3)/8,P,P3) R = R + R R = R + R return (R + R) - P3 if n % 8 == 7: R = window2((n+1)/8,P,P3) R = R + R R = R + R return (R + R) - P R = window2(n/2,P,P3) return R + R def scalarmult(n,P): return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions.
SLIDE 58
14
% 8 == 5: window2((n+3)/8,P,P3) R + R R + R return (R + R) - P3 % 8 == 7: window2((n+1)/8,P,P3) R + R R + R return (R + R) - P window2(n/2,P,P3) return R + R scalarmult(n,P): return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions. Double-scala Want to m; P; n; Q e.g. verify by computing computing checking Obvious Compute e.g. b = ≈256 doublings ≈256 doublings ≈50 additions ≈50 additions
SLIDE 59
14
window2((n+3)/8,P,P3) R) - P3 window2((n+1)/8,P,P3) R) - P window2(n/2,P,P3) scalarmult(n,P): window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions. Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + e.g. verify signature by computing h = computing SB − h checking whether R Obvious approach: Compute mP; compute e.g. b = 256: ≈256 doublings fo ≈256 doublings fo ≈50 additions for m ≈50 additions for n
SLIDE 60
14
window2((n+3)/8,P,P3) window2((n+1)/8,P,P3) window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions. Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − Obvious approach: Compute mP; compute nQ; e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ.
SLIDE 61
15
Worst case: ≈b doublings plus ≈b=3 additions of ±P or ±3P. On average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. On average ≈b=5 additions. Width 4: Precompute P; 3P; 5P; 7P; 9P; 11P; 13P; 15P. On average ≈b=6 additions. Cost of precomputation eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions.
16
Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ.
SLIDE 62 15
case: ≈b doublings plus additions of ±P or ±3P. average ≈b=4 additions. Width-3 signed sliding windows: Precompute P; 3P; 5P; 7P. average ≈b=5 additions. 4: Precompute 5P; 7P; 9P; 11P; 13P; 15P. average ≈b=6 additions.
eventually outweighs savings. Optimal: ≈b doublings plus roughly b=lg b additions.
16
Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ. Joint doublings Do much 2X + 2Y
def scalarmult2(m,P,n,Q): if m == return if n == return R = scalarmult2(m//2,P,n//2,Q) R = R if m % if n % return
SLIDE 63 15
doublings plus
4 additions. sliding windows: P; 5P; 7P. 5 additions. Precompute ; 11P; 13P; 15P. 6 additions. recomputation eighs savings. doublings plus additions.
16
Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ. Joint doublings Do much better by 2X + 2Y into 2(X
def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R if n % 2: R = R return R
SLIDE 64
15
plus ±3P. additions. windows: additions. ; 15P. additions. savings. plus
16
Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ. Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R
SLIDE 65
16
Double-scalar multiplication Want to quickly compute m; P; n; Q → mP + nQ. e.g. verify signature (R; S) by computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. e.g. b = 256: ≈256 doublings for mP, ≈256 doublings for nQ, ≈50 additions for mP, ≈50 additions for nQ.
17
Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R
SLIDE 66
16
Double-scalar multiplication to quickly compute ; Q → mP + nQ. verify signature (R; S) computing h = H(R; M), computing SB − hA, checking whether R = SB − hA. Obvious approach: Compute mP; compute nQ; add. = 256: doublings for mP, doublings for nQ, additions for mP, additions for nQ.
17
Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R
For example: 35P = 2(2(2(2(2 31Q = 2(2(2(2 into 35P 2(2(2(2(2 +P ≈b doublings ≈b=2 additions ≈b=2 additions Combine ≈256 doublings ≈50 additions ≈50 additions
SLIDE 67
16
multiplication compute + nQ. signature (R; S) = H(R; M), hA, whether R = SB − hA. roach: compute nQ; add. for mP, for nQ, r mP, r nQ.
17
Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R
For example: merge 35P = 2(2(2(2(2P 31Q = 2(2(2(2Q+ into 35P + 31Q = 2(2(2(2(2P+Q)+Q +P+Q. ≈b doublings (merged!), ≈b=2 additions of ≈b=2 additions of Combine idea with ≈256 doublings fo ≈50 additions usin ≈50 additions usin
SLIDE 68
16
multiplication ) ), − hA. ; add.
17
Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R
For example: merge 35P = 2(2(2(2(2P))) + P) + 31Q = 2(2(2(2Q+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q.
SLIDE 69
17
Joint doublings Do much better by merging 2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q): if m == 0: return scalarmult(n,Q) if n == 0: return scalarmult(m,P) R = scalarmult2(m//2,P,n//2,Q) R = R + R if m % 2: R = R + P if n % 2: R = R + Q return R
18
For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q.
SLIDE 70
17
doublings much better by merging 2Y into 2(X + Y ).
scalarmult2(m,P,n,Q): == 0: return scalarmult(n,Q) == 0: return scalarmult(m,P) scalarmult2(m//2,P,n//2,Q) + R % 2: R = R + P % 2: R = R + Q return R
18
For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q. Batch verification Verifying need to b S1B = R S2B = R S3B = R etc. Obvious Check each
SLIDE 71
17
by merging X + Y ).
scalarmult2(m,P,n,Q): scalarmult(n,Q) scalarmult(m,P) scalarmult2(m//2,P,n//2,Q) R + P R + Q
18
For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q. Batch verification Verifying many signatures: need to be confident S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation
SLIDE 72
17
merging
scalarmult2(m,P,n,Q): scalarmult(n,Q) scalarmult(m,P) scalarmult2(m//2,P,n//2,Q)
18
For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q. Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately
SLIDE 73
18
For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q.
19
Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately.
SLIDE 74 18
For example: merge 35P = 2(2(2(2(2P))) + P) + P, 31Q = 2(2(2(2Q+Q)+Q)+Q)+Q into 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) +P+Q. ≈b doublings (merged!), ≈b=2 additions of P, ≈b=2 additions of Q. Combine idea with windows: e.g., ≈256 doublings for b = 256, ≈50 additions using P, ≈50 additions using Q.
19
Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination
SLIDE 75 18
example: merge 2(2(2(2(2P))) + P) + P, 2(2(2(2Q+Q)+Q)+Q)+Q 35P + 31Q = 2(2(2(2(2P+Q)+Q)+Q)+P+Q) P+Q. doublings (merged!), additions of P, additions of Q. Combine idea with windows: e.g., doublings for b = 256, additions using P, additions using Q.
19
Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination
Pick indep 128-bit z Check whether (z1S1 + z1R1 + ( z2R2 + ( z3R3 + ( (If =: See Doumen–Lange–Oosterwijk.) Easy to p forgeries
SLIDE 76 18
merge 2(2(2(2(2P))) + P) + P, +Q)+Q)+Q)+Q = )+Q)+Q)+P+Q) (merged!),
with windows: e.g., for b = 256, sing P, sing Q.
19
Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination
Pick independent unifo 128-bit z1; z2; z3; : Check whether (z1S1 + z2S2 + z3S z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probabilit
SLIDE 77 18
) + P, )+Q)+Q P+Q) ws: e.g., 256,
19
Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination
Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2
SLIDE 78 19
Batch verification Verifying many signatures: need to be confident that S1B = R1 + h1A1, S2B = R2 + h2A2, S3B = R3 + h3A3, etc. Obvious approach: Check each equation separately. Much faster approach: Check random linear combination
20
Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128
SLIDE 79 19
verification erifying many signatures: to be confident that R1 + h1A1, R2 + h2A2, R3 + h3A3, Obvious approach: each equation separately. faster approach: random linear combination equations.
20
Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128
Multi-scala Review of 1939 Brauer ≈ (1 + 1 additions P → nP 1964 Straus ≈ (1 + k additions P1; : : : ; P if n1; : : :
SLIDE 80 19
verification signatures: confident that
1, 2, 3,
roach: equation separately. roach: linear combination equations.
20
Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128
Multi-scalar multip Review of asymptotic 1939 Brauer (wind ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 if n1; : : : ; nk < 2b.
SLIDE 81 19
signatures: rately. combination
20
Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128
Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + n if n1; : : : ; nk < 2b.
SLIDE 82 20
Pick independent uniform random 128-bit z1; z2; z3; : : :. Check whether (z1S1 + z2S2 + z3S3 + · · ·)B = z1R1 + (z1h1)A1 + z2R2 + (z2h2)A2 + z3R3 + (z3h3)A3 + · · ·. (If =: See 2012 Bernstein– Doumen–Lange–Oosterwijk.) Easy to prove: forgeries have probability ≤2−128
21
Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b.
SLIDE 83 20
independent uniform random 128-bit z1; z2; z3; : : :. whether + z2S2 + z3S3 + · · ·)B = (z1h1)A1 + (z2h2)A2 + (z3h3)A3 + · · ·. See 2012 Bernstein– Doumen–Lange–Oosterwijk.) to prove: rgeries have probability ≤2−128
21
Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b. 1976 Yao: ≈ (1 + k additions P → n1P if n1; : : : 1976 Pipp Similar asym but replace Faster than if k is large. (Knuth sa as if speed
SLIDE 84
20
endent uniform random ; : : :. z3S3 + · · ·)B = + + + · · ·. Bernstein– Doumen–Lange–Oosterwijk.) robability ≤2−128 check.
21
Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b. 1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with Faster than Straus if k is large. (Knuth says “generalization” as if speed were the
SLIDE 85
20
random ·)B = Bernstein– Doumen–Lange–Oosterwijk.) ≤2−128
21
Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b. 1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.)
SLIDE 86
21
Multi-scalar multiplication Review of asymptotic speeds: 1939 Brauer (windows): ≈ (1 + 1=lg b)b additions to compute P → nP if n < 2b. 1964 Straus (joint doublings): ≈ (1 + k=lg b)b additions to compute P1; : : : ; Pk → n1P1 + · · · + nkPk if n1; : : : ; nk < 2b.
22
1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.)
SLIDE 87 21
Multi-scalar multiplication
Brauer (windows): 1=lg b)b additions to compute P if n < 2b. Straus (joint doublings): k=lg b)b additions to compute ; Pk → n1P1 + · · · + nkPk : : ; nk < 2b.
22
1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.) More generally algorithm ‘ sums of ≈ „ min{ if all coefficients Within 1
SLIDE 88
21
multiplication asymptotic speeds: (windows): compute 2b. (joint doublings): compute P1 + · · · + nkPk
b.
22
1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.) More generally, Pipp algorithm computes ‘ sums of multiples ≈ „ min{k; ‘} + lg if all coefficients are Within 1 + › of optimal.
SLIDE 89
21
eeds: doublings): nkPk
22
1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.) More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b if all coefficients are below 2 Within 1 + › of optimal.
SLIDE 90
22
1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.)
23
More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal.
SLIDE 91
22
1976 Yao: ≈ (1 + k=lg b)b additions to compute P → n1P; : : : ; nkP if n1; : : : ; nk < 2b. 1976 Pippenger: Similar asymptotics, but replace lg b with lg(kb). Faster than Straus and Yao if k is large. (Knuth says “generalization” as if speed were the same.)
23
More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story?
SLIDE 92
22
ao: k=lg b)b additions to compute
1P; : : : ; nkP
: : ; nk < 2b. Pippenger: r asymptotics, replace lg b with lg(kb). than Straus and Yao large. (Knuth says “generalization” speed were the same.)
23
More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story? No! 1989 If n1 ≥ n n1P1 + n (n1 − qn n3P3 + · Remarkab competitive for random much better
SLIDE 93
22
compute P
b.
tics, with lg(kb). Straus and Yao “generalization” the same.)
23
More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story? No! 1989 Bos–Coste If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P (n1 − qn2)P1 + n2 n3P3 + · · · where q Remarkably simple; competitive with Pipp for random choices much better memo
SLIDE 94
22
). ao “generalization” same.)
23
More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story? No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2 n3P3 + · · · where q = ⌊n1=n Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage.
SLIDE 95
23
More generally, Pippenger’s algorithm computes ‘ sums of multiples of k inputs. ≈ „ min{k; ‘} + k‘ lg(k‘b) « b adds if all coefficients are below 2b. Within 1 + › of optimal. Various special cases of Pippenger’s algorithm were reinvented and patented by 1993 Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. Is that the end of the story?
24
No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage.
SLIDE 96 23
generally, Pippenger’s rithm computes
min{k; ‘} + k‘ lg(k‘b) « b adds coefficients are below 2b. 1 + › of optimal. rious special cases of enger’s algorithm were reinvented and patented by Brickell–Gordon–McCurley– Wilson, 1995 Lim–Lee, etc. the end of the story?
24
No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage. Example 000100000 000010000 100101100 010010010 001001101 000000010 000000001 Goal: Compute 300P, 146
SLIDE 97 23
Pippenger’s computes multiples of k inputs. k‘ lg(k‘b) « b adds are below 2b.
cases of rithm were patented by rdon–McCurley– Lim–Lee, etc.
24
No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage. Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32 300P, 146P, 77P,
SLIDE 98
23
enger’s inputs. b adds 2b. ere y rdon–McCurley– etc. ry?
24
No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage. Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
SLIDE 99
24
No! 1989 Bos–Coster: If n1 ≥ n2 ≥ · · · then n1P1 + n2P2 + n3P3 + · · · = (n1 − qn2)P1 + n2(qP1 + P2) + n3P3 + · · · where q = ⌊n1=n2⌋. Remarkably simple; competitive with Pippenger for random choices of ni’s; much better memory usage.
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
SLIDE 100 24
1989 Bos–Coster: n2 ≥ · · · then n2P2 + n3P3 + · · · = qn2)P1 + n2(qP1 + P2) + · · · where q = ⌊n1=n2⌋. rkably simple; etitive with Pippenger andom choices of ni’s; better memory usage.
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P. Reduce la 000100000 000010000 010011010 010010010 001001101 000000010 000000001 Goal: Compute 154P, 146 Plus one add 146P
SLIDE 101 24
Bos–Coster: then
3P3 + · · · =
n2(qP1 + P2) + where q = ⌊n1=n2⌋. simple; Pippenger choices of ni’s; memory usage.
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P. Reduce largest row: 000100000 = 32 000010000 = 16 010011010 = 154 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32 154P, 146P, 77P, Plus one extra addition: add 146P into 154
SLIDE 102 24
= P2) + =n2⌋. enger ’s; usage.
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P. Reduce largest row: 000100000 = 32 000010000 = 16 010011010 = 154 ← 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 154P, 146P, 77P, 2P, 1P. Plus one extra addition: add 146P into 154P,
SLIDE 103 25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000100000 = 32 000010000 = 16 010011010 = 154 ← 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 154P, 146P, 77P, 2P, 1P. Plus one extra addition: add 146P into 154P,
SLIDE 104
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 ← 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 plus 2 additions.
SLIDE 105
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 001000101 = 69 ← 001001101 = 77 000000010 = 2 000000001 = 1 plus 3 additions.
SLIDE 106
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 001000101 = 69 000001000 = 8 ← 000000010 = 2 000000001 = 1 plus 4 additions.
SLIDE 107
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 000100101 = 37 ← 000001000 = 8 000000010 = 2 000000001 = 1 plus 5 additions.
SLIDE 108
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000100000 = 32 000010000 = 16 000001000 = 8 000000101 = 5 ← 000001000 = 8 000000010 = 2 000000001 = 1 plus 6 additions.
SLIDE 109
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000010000 = 16 ← 000010000 = 16 000001000 = 8 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 7 additions.
SLIDE 110
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000010000 = 16 000001000 = 8 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 7 additions.
SLIDE 111
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000001000 = 8 ← 000001000 = 8 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 8 additions.
SLIDE 112
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 ← 000001000 = 8 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 8 additions.
SLIDE 113
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 ← 000000101 = 5 000001000 = 8 000000010 = 2 000000001 = 1 plus 8 additions.
SLIDE 114
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000101 = 5 000000011 = 3 ← 000000010 = 2 000000001 = 1 plus 9 additions.
SLIDE 115
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000010 = 2 ← 000000011 = 3 000000010 = 2 000000001 = 1 plus 10 additions.
SLIDE 116
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000010 = 2 000000001 = 1 ← 000000010 = 2 000000001 = 1 plus 11 additions.
SLIDE 117
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← 000000001 = 1 000000010 = 2 000000001 = 1 plus 11 additions.
SLIDE 118
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000001 = 1 000000001 = 1 ← 000000001 = 1 plus 12 additions.
SLIDE 119
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← 000000001 = 1 000000001 = 1 plus 12 additions.
SLIDE 120
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← 000000001 = 1 plus 12 additions.
SLIDE 121
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Goal: Compute 32P, 16P, 300P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity.
SLIDE 122
25
Example of Bos–Coster: 000100000 = 32 000010000 = 16 100101100 = 300 010010010 = 146 001001101 = 77 000000010 = 2 000000001 = 1 Compute 32P, 16P, 146P, 77P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity. Revised goal: 32P1 + 16 77P5 + 2 First compute and then 32P1 + 16 77P5 + 2 Same scala Ed25519 verify batch about twice verifying
SLIDE 123
25
Bos–Coster: 300 146 32P, 16P, P, 2P, 1P.
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity. Revised goal: Compute 32P1 + 16P2 + 300 77P5 + 2P6 + 1P7. First compute P ′
4 =
and then recursively 32P1 + 16P2 + 154 77P5 + 2P6 + 1P7. Same scalars show Ed25519 batch verification: verify batch of 64 about twice as fast verifying each sepa
SLIDE 124
25
, .
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity. Revised goal: Compute 32P1 + 16P2 + 300P3 + 146 77P5 + 2P6 + 1P7. First compute P ′
4 = P4 + P3
and then recursively compute 32P1 + 16P2 + 154P3 + 146 77P5 + 2P6 + 1P7. Same scalars show up as befo Ed25519 batch verification: verify batch of 64 signatures about twice as fast as verifying each separately.
SLIDE 125
26
Reduce largest row: 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 000000000 = 0 ← plus 12 additions. Final addition chain: 1, 2, 3, 5, 8, 16, 32, 37, 69, 77, 146, 154, 300. Short, no temporary storage, low two-operand complexity.
27
Revised goal: Compute 32P1 + 16P2 + 300P3 + 146P4 + 77P5 + 2P6 + 1P7. First compute P ′
4 = P4 + P3
and then recursively compute 32P1 + 16P2 + 154P3 + 146P ′
4 +
77P5 + 2P6 + 1P7. Same scalars show up as before. Ed25519 batch verification: verify batch of 64 signatures about twice as fast as verifying each separately.