Selecting Elliptic Curves for Cryptography: an Efficiency and Security Analysis
Craig Costello ECC2014 – Chennai, India Joint work with Joppe Bos (NXP), Patrick Longa (MSR), Michael Naehrig (MSR)
http://eprint.iacr.org/2014/130.pdf
Cryptography: an Efficiency and Security Analysis - - PowerPoint PPT Presentation
Selecting Elliptic Curves for Cryptography: an Efficiency and Security Analysis http://eprint.iacr.org/2014/130.pdf Craig Costello ECC2014 Chennai, India Joint work with Joppe Bos (NXP), Patrick Longa (MSR), Michael Naehrig (MSR) June 2013
Craig Costello ECC2014 – Chennai, India Joint work with Joppe Bos (NXP), Patrick Longa (MSR), Michael Naehrig (MSR)
http://eprint.iacr.org/2014/130.pdf
June 2013 – the Snowden leaks
“… the NSA had written the [crypto] standard and could break it.”
Post-Snowden responses
manipulated them…”
undermining of cryptographic solutions and standards.”
formal request to CFRG for new elliptic curves for usage in TLS!!!
http://crypto.2014.rump.cr.yp.to/487f98c1a1a031283925d7affdbdef1c.pdf
Pre-Snowden suspicions re: NIST (and their curves)
method] to generate the NIST curves … or so he says…”
resources and expertise to dominate NIST, and NIST has rarely played a significant independent role.”
so we don’t know if the algorithm designer [NIST] knows [the backdoor] 𝑒.”
be distrusted?”
NIST’s CurveP256: one-in-a-million?
Prime characteristic: 𝑞 = 2256 − 2224 + 2192 + 296 − 1 Elliptic curve: 𝐹/𝑮𝑞 : 𝑧2 = 𝑦3 − 3𝑦 + 𝑐 Curve constant: 𝑐 = −
27 𝑇𝐼𝐵1 𝑡
Seed: 𝑡 = c49d360886e704936a6678e1139d26b7819f7e90
“Consider now the possibility that one in a million of all curves have an exploitable structure that "they" know about, but we don't.. Then "they" simply generate a million random seeds until they find one that generates one of "their" curves…”
Scott ‘99:
Rigidity
allow room for manipulation
and choice of hash function themselves introduce more wiggle room?
arguments e.g. choose fast prime field and take smallest curve constant that gives ``optimal’’ group order/s [Bernstein‘06]
So then, what about these?
Replacement curve Prime 𝒒 Constant 𝒄 (NEW) Curve P-256 2256 − 2224 + 2192 + 296 − 1 2627 (NEW) Curve P-384 2384 − 2128 − 296 + 232 − 1 14060 (NEW) Curve P-521 2521 − 1 167884
(Our) Motivations
(and this is true regardless of NIST-curve paranoia!)
The players
The players
Umpire Paterson (CFRG co-chair)
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
The last 2 years of “state-of-the-art” speeds
4-GLV/GLS using CM curve over quad. ext. field
Laddering on genus 2 Kummer surface
2-dimensional Montgomery ladder using Q-curve over quad. ext. field
GLS on a composite-degree binary extension field All of the above offer ≈128-bit security against best known attack BUT None of the above have been considered in the search for new curves!!!
Security hunches killing all the fun
fields are safest hedge for real world deployment
MOV degree… no special structure!
group orders (SEA), then assert above are huge (they will be)
WARNING: 𝜚
< 100,000 cyc
Security hunches killing all the fun
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
Two prime forms analyzed
(1) Pseudo-Mersenne primes: 𝒒 = 𝟑𝜷 − 𝜹 (2) Montgomery-friendly primes: 𝒒 = 𝟑𝜷 𝟑𝜸 − 𝜹 − 𝟐
(a) one “full bitlength” prime (b) one “relaxed bitlength” prime
Some premature performance ratios
Target Security Level Pseudo-Mers Full Pseudo-Mers Relaxed Mont-Friendly Full Mont-Friendly Relaxed 128 1.00x 0.97x 1.00x 0.84x 192 0.94y 0.90y 1.00y 0.90y 256 0.89z 0.85z 1.00z 0.92z
Cost ratios of variable-base scalar multiplications on twisted Edwards curves at three target security levels
Full length pseudo-Mersenne primes
& avoids temptation to keep going lower
Security level Prime 128 2256 − 189 192 2384 − 317 256 2512 − 569
Arithmetic for the pseudo-Mersenne primes
input: 0 ≤ 𝑦, 𝑧 < 2𝛽 − 𝛿 𝑦 ⋅ 𝑧 ∈ 𝐚 = ℎ ⋅ 2𝛽 + 𝑚 ≡ ℎ ⋅ 2𝛽 + 𝑚 − ℎ 2𝛽 − 𝛿 mod (2𝛽−𝛿) = 𝑚 + 𝛿 ⋅ ℎ
(after fixed=worst-case number of reduction rounds)
𝑏−1 ≡ 𝑏𝑞−2 mod 𝑞
√𝑏 ≡ 𝑏(𝑞+1)/4 mod 𝑞
𝑦 𝑧 𝑦 ⋅ 𝑧 𝑚 ℎ 𝑚 ℎ + 𝛿 ⋅ 𝑦 ⋅ 𝑧
What primes do others like?
𝑞 = 2255 − 19, 𝑞 = 2414 − 17, 𝑞 = 2521 − 1
Ed448-Goldilocks, Ed480-Ridinghood 𝑞 = 2448 − 2224 − 1, 𝑞 = 2480 − 2240 − 1
𝑞 = 2221 − 3, 𝑞 = 2383 − 187, 𝑞 = 2511 − 187, 𝑞 = 2382 − 105
𝑞 = 76884956397045344220809746629001649093037950200943055203735601445031516197751
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
A world of curve models
𝑧2 = 𝑦3 + 𝑏𝑦2 + 16𝑏𝑦 Doubling-oriented DIK curves 𝑏𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2 (twisted) Edwards curves 𝐶𝑧2 = 𝑦3 + 𝐵𝑦2 + 𝑦 Montgomery curves 𝑏𝑦3 + 𝑧3 + 1 = 𝑒𝑦𝑧 (twisted) Hessian curves
𝒛𝟑 = 𝒚𝟒 + 𝒃𝒚 + 𝒄 short Weierstrass curves
𝑡2 + 𝑑2 = 1 ∩ 𝑏𝑡2 + 𝑒2 = 1 Jacobi intersections 𝑧2 = 𝑦4 + 2𝑏𝑦2 + 1 Jacobi quartics
See Bernstein and Lange’s Explicit-Formulas Database (EFD) and/or Hisil’s PhD thesis
Montgomery curves
𝐶𝑧2 = 𝑦3 + 𝐵𝑦2 + 𝑦
ladder
free
(twisted) Edwards curves
𝑏𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2
have complete group law
Weierstrass curves
𝑧2 = 𝑦3 + 𝑏𝑦 + 𝑐
Brainpool curves
The chosen ones
Complete addition on Edwards curves
Let 𝑒 ≠ □ in 𝐿 and consider Edwards curve 𝐹/𝐿 ∶ 𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2 For all (!!!) 𝑄
1 = 𝑦1, 𝑧1 , 𝑄2 = 𝑦2, 𝑧2 ∈ 𝐹(𝐿)
𝑄
1 + 𝑄2 =: 𝑄3 = (
𝑦1𝑧2 + 𝑧1𝑦2 1 + 𝑒𝑦1𝑦2𝑧1𝑧2 , 𝑧1𝑧2 − 𝑦1𝑦2 1 − 𝑒𝑦1𝑦2𝑧1𝑧2 ) Denominators never zero, neutral element rational = 0,1 , etc..
(Bernstein-Lange, AsiaCrypt 2007)
Edwards vs twisted Edwards
General twisted Edwards 𝑭𝒃,𝒆 ∶ 𝒃𝒚𝟑 + 𝒛𝟑 = 𝟐 + 𝒆𝒚𝟑𝒛𝟑 When 𝑏 = 1 (Edwards!) 𝐹1,𝑒 ∶ 𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2 When 𝑏 = −1 𝐹−1,𝑒∶ −𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2 Fastest addition 8M, also (technically) incomplete when 𝑞 ≡ 3 mod 4
(Bernstein-Lange, AsiaCrypt 2007 and Hisil et al., AsiaCrypt 2008) (Hisil et al., AsiaCrypt 2008)
Fastest complete addition (for 𝑒 ≠ □) 9M+1d
Killing cofactors and the fastest formulas
so assume #𝐹 = 4𝑠 where 𝑠 is a large prime
𝑠, 2𝑠, or 4𝑠
𝑞)[4] to reveal
𝑙 mod 4 (i.e. the last two bits of 𝑙)
Killing cofactors and the fastest formulas
Our approach
𝐹−1,𝑒 ∶ −𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2
𝑙 ∈ 1, 2, … 𝑠 − 1 ↔ 𝑙 ∈ 4, 8, 4𝑠 − 4
𝑄 ∈ 𝐹 ↦ 𝑅 ≔ 4 𝑄 ∈ 𝐹 𝑠
𝑙 𝑄 = [𝑙]𝑅
“specified curve” incomplete, but uses fastest formulas and stays on one curve
Killing cofactors and the fastest formulas
Hamburg’s approach (http://eprint.iacr.org/2014/027)
𝐹1,𝑒 ∶ 𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2
𝜚 ∶ 𝐹1,𝑒 → 𝐹−1,𝑒−1
𝑙 𝑄 on 𝐹−1,𝑒−1 (since im 𝜚 = 𝐹−1,𝑒−1[𝑠])
𝜚 ∶ 𝐹−1,𝑒−1 → 𝐹1,𝑒
“specified curve” complete and uses fastest formulas, but isogeny needed
Killing cofactors and the fastest formulas
Bernstein-Chuengsatiansup-Lange approach (Curve41417)
𝐹1,𝑒 ∶ 𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2
𝑙 ∈ 8, 16, …
but compare ≈3727M to ≈3645M (+ 𝜚 + 𝜚)
“specified curve” is complete, stay on it (simple), but slightly slower additions
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
Textbook arithmetic on 𝑧2 = 𝑦3 + 𝑏𝑦 + 𝑐
(𝑦 2 𝑈, 𝑧 2 𝑈) = 𝐸𝐶𝑀(𝑦𝑈, 𝑧𝑈) (𝑦𝑈+𝑄, 𝑧𝑈+𝑄) = 𝐵𝐸𝐸(𝑦𝑈, 𝑧𝑈, 𝑦𝑄, 𝑧𝑄)
Montgomery’s arithmetic on 𝐶𝑧2 = 𝑦3 + 𝐵𝑦2 + 𝑦
𝑦 2 𝑈 = 𝐸𝐶𝑀(𝑦𝑈) 𝑦𝑈+𝑄 = 𝐸𝐽𝐺𝐺𝐵𝐸𝐸(𝑦𝑈, 𝑦𝑄, 𝑦𝑈−𝑄)
Differential additions …
vs.
so we always add them and double one (depends on binary rep. of k) to preserve the invariant
… and the Montgomery ladder
Twist-security
𝑦 𝑙 𝑄 = 𝑀𝐵𝐸𝐸𝐹𝑆(𝑦 𝑄 , 𝑙, 𝐵)
𝐹 and its quadratic twist 𝐹′
as 𝑀𝐵𝐸𝐸𝐹𝑆(𝑦, 𝑙, 𝐵) gives discrete log on 𝐹 or 𝐹′ for all 𝑦 ∈ 𝐿
why not have it anyway?
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
The NUMS curves
Security 𝒕 = Prime 𝒒 = Weie ierstrass 𝒄 = Twisted Edwards 𝒆 = Montgomery 𝑩 = 128 2256 − 189 152961 15342 −61370 192 2384 − 317 −34568 333194 −1332778 256 2512 − 569 121243 637608 −2550434
(fun fact: in these cases, largest primes full stop)
𝑒 > 0 corresponds to 𝑢 > 0.
Small constants all round for 𝑞 ≡ 3 mod 4
𝑁𝐵 ∶ 𝑧2 = 𝑦3 + 𝐵𝑦2 + 𝑦 𝐹𝑏,𝑒 : 𝑏𝑦2 + 𝑧2 = 1 + 𝑒𝑦2𝑧2 𝑁𝐵 𝑁−𝐵
twist
≅ ≅ 𝐹−1,𝑒1 𝐹−1,1/𝑒1
isogeny isogeny
𝐹−1,𝑒0 𝐹−1,−(𝑒0+1) 𝐹1, 𝑒0 +1 𝐹1,−𝑒0
𝑒1 = −
𝐵−2 𝐵+2
(big) 𝑒0 = −
𝐵+2 4
(small) Searches minimize |𝐵| with 𝐵 ≡ 2 mod 4 Upshot: search that minimizes Montgomery constant size also minimizes size of both twisted Edwards and Edwards constants (see Lemmas 1-3)
twist Both non-squares
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
Constant time implementations
execution to provide protection against timing and cache attacks
𝑙 = [−, 0, 0, 1, 0, 1, … ]
double-and-always-add: initialize 𝑅 ← 𝑄 [-, compute 2 𝑅, 2 𝑅 + 𝑄 𝑅 ← 2 𝑅 0, compute 2 𝑅, 2 𝑅 + 𝑄 𝑅 ← 2 𝑅 0, compute 2 𝑅, 2 𝑅 + 𝑄 𝑅 ← 2 𝑅 + 𝑄 1, compute 2 𝑅, 2 𝑅 + 𝑄 𝑅 ← 2 𝑅 0, compute 2 𝑅, 2 𝑅 + 𝑄 𝑅 ← 2 𝑅 + 𝑄 1, ..
Fixed-window recoding for variable-base
much as usual… BUT not when using bigger/optimal windows!!! …5 DBL’s → ADD ( 26 𝑄) → 5 DBL’s → ADD ( 21 𝑄) → 5 DBL’s → ADD ( 2 𝑄)…
[ …, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0,… ]
[ …, 26, 21, 2,… ]
𝑥 = 1 𝑥 = 5 […, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0,… ]
Protected “odd-only” fixed-window recoding algorithm
values, i.e. 𝑙 = (𝑙𝑢, … , 𝑙0), where 𝑢 =
log2 𝑠 𝑥
(only half the precomputed values needed, and there are no zeros)
Total: 52 × 5 + 1 = 261 DBL’s, 52 + 16 = 68 ADD’s.
Much more to constant-time implementations
e.g: recoding was for odd scalars only: negate every scalar, mask in the odd one, negate every “final” point, mask correct result… e.g: recoding the scalars themselves must be constant time e.g: must access/load every lookup element, every time, and mask
see http://eprint.iacr.org/2014/130.pdf and http://research.microsoft.com/en-us/projects/nums/ for solutions to these problems and more…
implementations, BUT only assuming the ECC formulas do their job!
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
Guaranteeing exception-free routines
𝑄, 3 𝑄, … , 2𝑥 − 1 𝑄 in the lookup table, or their inverse
(i) complete Edwards additions are not the fastest (ii) typical Weierstrass additions far from complete
scenario where 𝑄 is known (precomps mean larger lookup table – more potential trouble)
𝑙 𝑄 without exception
Guaranteeing exception-free routines
Edwards variable-base scalar multiplications will compute without exception if: fastest dedicated addition formulas are used throughout, except the final addition, which needs to be unified (for our proof to go through)
Edwards fixed-base scalar multiplications will compute without exception if: complete additions are used throughout (for our proof to go through)
Fine with me… Unified? Complete?
Weierstrass completeness
to compute at least two sets of explicit formulae to guarantee every sum is computed: i.e. no 𝑔
𝑌, 𝑔 𝑍, 𝑔 𝑎 such that
𝑌3 = 𝑔
𝑌(𝑌1, 𝑍 1, 𝑎1, 𝑌2, 𝑍 2, 𝑎2)
𝑍
3 = 𝑔 𝑍(𝑌1, 𝑍 1, 𝑎1, 𝑌2, 𝑍 2, 𝑎2)
𝑎3 = 𝑔
𝑎(𝑌1, 𝑍 1, 𝑎1, 𝑌2, 𝑍 2, 𝑎2)
computes the correct sum 𝑌3: 𝑍
3: 𝑎3 = 𝑌1: 𝑍 1: 𝑎1 + 𝑌2: 𝑍 2: 𝑎2 for all
points on a general curve
𝑌, 𝑔 𝑍, 𝑔 𝑎) and (𝑔 𝑌′, 𝑔 𝑍′, 𝑔 𝑎′), where at least one set will always do the
job…
Weierstrass completeness
𝑌1: 𝑍
1: 𝑎1 + 𝑌2: 𝑍 2: 𝑎2 will be at least one of 𝑌3: 𝑍 3: 𝑎3 or 𝑌3′: 𝑍 3′: 𝑎3′ :
gave 𝟑𝟑𝑵 + 𝟓𝑵𝒄 (compared to ≈ 𝟐𝟓𝑵 for dedicated projective additions)
… there’s got to be a better way…
Weierstrass “pseudo-completeness”
they give separate routines – we merge into one with masking
(
𝑦1𝑧2+𝑧1𝑦2 1+𝑒𝑦1𝑦2𝑧1𝑧2 , 𝑧1𝑧2−𝑦1𝑦2 1−𝑒𝑦1𝑦2𝑧1𝑧2 )
to Compare
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
TLS handshake with PFS: ECDH(E)-ECDSA
𝑙, 𝑄 ↦ 𝑙 𝑄 (𝑄 not known in advance)
𝑙, 𝑄 ↦ 𝑙 𝑄 (𝑄 known in advance)
𝑏, 𝑐, 𝑄, 𝑅 ↦ 𝑏 𝑄 + 𝑐 𝑅 (𝑄 known in advance, 𝑅 not)
Three scenarios
not necessary?? [Hamburg, a few days ago, private communication]
Clock cycles (× 𝟐𝟏𝟒) for various scalar multiplications Intel Core i7-2600 Sandy Bridge compiled with Linux / Visual Studio
Security Level Prime Curve Variable
Fixed
Double
128
𝑞 = 2256 − 189
Weierstrass twisted Edwards 270 216 107 82 289 231 192
𝑞 = 2384 − 317
Weierstrass twisted Edwards 714 588 252 201 758 614 256
𝑞 = 2512 − 569
Weierstrass twisted Edwards 1,504 1,242 488 391 1,596 1,308
Contents
PART I : CHOOSING CURVES Speed-records and security hunches Prime fields and modular reduction Curve models and killing cofactors Montgomery ladder and twist-security Our chosen curves: the NUMS curves PART II : IMPLEMENTING THEM Constant-time implementations and recoding scalars Exception-free algorithms and Weierstrass “completeness” Performance numbers and practical considerations Conclusions and recommendations
Our work (in a nutshell)
Consider different families of primes for fast arithmetic twisted Edwards curves Constant-time, exception-free algorithms to do crypto Weierstrass curves 128-bit security 192-bit security 256-bit security Demonstrate potential of new curves inside the Transport Layer Security (TLS) protocol
The sell: what did we do differently?
What could we do differently?
“implementations [should] readily expose both a scalar point multiplication operation and a point addition operation”
(and optionally use Hamburg’s isogeny trick?)
“If these requirements become final, then surely the complete curves mod the Microsoft primes with a=1 and no restriction on the sign of d (choose the one with q<p) should be in the running”.
… see also …
http://eprint.iacr.org/2014/130.pdf
http://research.microsoft.com/en-us/projects/nums/
http://research.microsoft.com/apps/pubs/default.aspx?id=219966
http://tools.ietf.org/html/draft-black-numscurves-02