SLIDE 1 Curve25519: new Diffie-Hellman speed records
Thanks to: University of Illinois at Chicago Danmarks Tekniske Universitet Alfred P. Sloan Foundation Which public-key systems are smallest? Fastest? Real-world cost measures: Pentium cycles, Athlon cycles,
- etc. for generating keys, signing,
verifying, encrypting, decrypting; key bytes, signed-message bytes, ciphertext bytes, etc. More useful than simplified cost measures, although harder to analyze.
SLIDE 2 Diffie-Hellman speed records Illinois at Chicago ekniske Universitet Foundation Which public-key systems are smallest? Fastest? Real-world cost measures: Pentium cycles, Athlon cycles,
- etc. for generating keys, signing,
verifying, encrypting, decrypting; key bytes, signed-message bytes, ciphertext bytes, etc. More useful than simplified cost measures, although harder to analyze. eBATS (ECRYPT
new project to measure time and space consumed public-key signature public-key encryption public-key secret-sha http://ebats.cr.yp.to
SLIDE 3 Which public-key systems are smallest? Fastest? Real-world cost measures: Pentium cycles, Athlon cycles,
- etc. for generating keys, signing,
verifying, encrypting, decrypting; key bytes, signed-message bytes, ciphertext bytes, etc. More useful than simplified cost measures, although harder to analyze. eBATS (ECRYPT Benchmarking
new project to measure time and space consumed by public-key signature systems, public-key encryption systems, public-key secret-sharing systems. http://ebats.cr.yp.to
SLIDE 4 ey systems astest? cost measures: cycles, Athlon cycles, generating keys, signing, encrypting, decrypting; signed-message bytes, ytes, etc. than measures, rder to analyze. eBATS (ECRYPT Benchmarking
new project to measure time and space consumed by public-key signature systems, public-key encryption systems, public-key secret-sharing systems. http://ebats.cr.yp.to This talk’s scope Focus on private ssh, email, purchasing, Typical setup: Each communicating has a long-term secret and a long-term public Alice authenticates encrypts messages using Alice’s secret and Bob’s public Bob verifies and decrypts using Alice’s public and Bob’s secret
SLIDE 5 eBATS (ECRYPT Benchmarking
new project to measure time and space consumed by public-key signature systems, public-key encryption systems, public-key secret-sharing systems. http://ebats.cr.yp.to This talk’s scope Focus on private communications: ssh, email, purchasing, etc. Typical setup: Each communicating party has a long-term secret key and a long-term public key. Alice authenticates and encrypts messages to Bob using Alice’s secret key and Bob’s public key. Bob verifies and decrypts using Alice’s public key and Bob’s secret key.
SLIDE 6
(ECRYPT Benchmarking Systems): measure consumed by signature systems, encryption systems, secret-sharing systems. http://ebats.cr.yp.to This talk’s scope Focus on private communications: ssh, email, purchasing, etc. Typical setup: Each communicating party has a long-term secret key and a long-term public key. Alice authenticates and encrypts messages to Bob using Alice’s secret key and Bob’s public key. Bob verifies and decrypts using Alice’s public key and Bob’s secret key. This talk’s recommendations The “asymmetric” Alice, Bob use Curve25519 compute long-term from secret keys, Note: minimal asymmetric The “symmetric” Alice, Bob use sha as key for Poly1305+Salsa20 to authenticate+encrypt Curve25519 is the if there aren’t many This talk focuses
SLIDE 7
This talk’s scope Focus on private communications: ssh, email, purchasing, etc. Typical setup: Each communicating party has a long-term secret key and a long-term public key. Alice authenticates and encrypts messages to Bob using Alice’s secret key and Bob’s public key. Bob verifies and decrypts using Alice’s public key and Bob’s secret key. This talk’s recommendations The “asymmetric” part: Alice, Bob use Curve25519 to compute long-term shared secret from secret keys, public keys. Note: minimal asymmetric usage! The “symmetric” part: Alice, Bob use shared secret as key for Poly1305+Salsa20 to authenticate+encrypt packets. Curve25519 is the bottleneck if there aren’t many packets. This talk focuses on Curve25519.
SLIDE 8
scope rivate communications: purchasing, etc. communicating party secret key public key. authenticates and messages to Bob secret key public key. and decrypts public key secret key. This talk’s recommendations The “asymmetric” part: Alice, Bob use Curve25519 to compute long-term shared secret from secret keys, public keys. Note: minimal asymmetric usage! The “symmetric” part: Alice, Bob use shared secret as key for Poly1305+Salsa20 to authenticate+encrypt packets. Curve25519 is the bottleneck if there aren’t many packets. This talk focuses on Curve25519. Curve25519 secret Curve25519 public Time to compute 957904 Pentium 624786 Athlon cycles plus negligible hashing No data-dependent No data-dependent No known patent Software is in public http://cr.yp.to/ecdh.html Best attack known more expensive than 128-bit brute-force
SLIDE 9
This talk’s recommendations The “asymmetric” part: Alice, Bob use Curve25519 to compute long-term shared secret from secret keys, public keys. Note: minimal asymmetric usage! The “symmetric” part: Alice, Bob use shared secret as key for Poly1305+Salsa20 to authenticate+encrypt packets. Curve25519 is the bottleneck if there aren’t many packets. This talk focuses on Curve25519. Curve25519 secret key: 32 bytes. Curve25519 public key: 32 bytes. Time to compute shared secret: 957904 Pentium 4 cycles or 624786 Athlon cycles or : : : plus negligible hashing time. No data-dependent branches. No data-dependent indexing. No known patent problems. Software is in public domain. http://cr.yp.to/ecdh.html Best attack known is more expensive than typical 128-bit brute-force search.
SLIDE 10 recommendations “asymmetric” part: Curve25519 to long-term shared secret eys, public keys. asymmetric usage! “symmetric” part: shared secret
authenticate+encrypt packets. the bottleneck many packets. cuses on Curve25519. Curve25519 secret key: 32 bytes. Curve25519 public key: 32 bytes. Time to compute shared secret: 957904 Pentium 4 cycles or 624786 Athlon cycles or : : : plus negligible hashing time. No data-dependent branches. No data-dependent indexing. No known patent problems. Software is in public domain. http://cr.yp.to/ecdh.html Best attack known is more expensive than typical 128-bit brute-force search. Alice’s secret key integer a; minor restrictions. Alice’s public key power 9a in Curve25519 If Bob’s secret key Curve25519 uses as fAlice; Bobg’s Bob computes sha with just one exp and one short hash.
SLIDE 11
Curve25519 secret key: 32 bytes. Curve25519 public key: 32 bytes. Time to compute shared secret: 957904 Pentium 4 cycles or 624786 Athlon cycles or : : : plus negligible hashing time. No data-dependent branches. No data-dependent indexing. No known patent problems. Software is in public domain. http://cr.yp.to/ecdh.html Best attack known is more expensive than typical 128-bit brute-force search. Alice’s secret key is integer a; minor restrictions. Alice’s public key is power 9a in Curve25519 group. If Bob’s secret key is b: Curve25519 uses hash of 9ab as fAlice; Bobg’s shared secret. Bob computes shared secret with just one exponentiation and one short hash.
SLIDE 12
secret key: 32 bytes. public key: 32 bytes. compute shared secret: entium 4 cycles or cycles or : : : hashing time. endent branches. endent indexing. patent problems. public domain. http://cr.yp.to/ecdh.html known is than typical rute-force search. Alice’s secret key is integer a; minor restrictions. Alice’s public key is power 9a in Curve25519 group. If Bob’s secret key is b: Curve25519 uses hash of 9ab as fAlice; Bobg’s shared secret. Bob computes shared secret with just one exponentiation and one short hash. Exponentiation metho in the previous literature take more than twice at the Curve25519 (Other secret-sha even slower.) Many interacting in design and implementation. Hard to find optimal Remainder of this some of the choices in designing and Curve25519.
SLIDE 13
Alice’s secret key is integer a; minor restrictions. Alice’s public key is power 9a in Curve25519 group. If Bob’s secret key is b: Curve25519 uses hash of 9ab as fAlice; Bobg’s shared secret. Bob computes shared secret with just one exponentiation and one short hash. Exponentiation methods in the previous literature take more than twice as long at the Curve25519 security level. (Other secret-sharing methods: even slower.) Many interacting parameters in design and implementation. Hard to find optimal parameters. Remainder of this talk discusses some of the choices made in designing and implementing Curve25519.
SLIDE 14
ey is r restrictions. ey is Curve25519 group. key is b: uses hash of 9ab ’s shared secret. shared secret exponentiation hash. Exponentiation methods in the previous literature take more than twice as long at the Curve25519 security level. (Other secret-sharing methods: even slower.) Many interacting parameters in design and implementation. Hard to find optimal parameters. Remainder of this talk discusses some of the choices made in designing and implementing Curve25519. Curve25519 uses elliptic-curve group. “Why not unit group group T2 or torus Why not XTR, using mults for each exp Answer: Compared elliptic curves use in a smaller field. Overall slightly less XTR needs larger to protect against
SLIDE 15
Exponentiation methods in the previous literature take more than twice as long at the Curve25519 security level. (Other secret-sharing methods: even slower.) Many interacting parameters in design and implementation. Hard to find optimal parameters. Remainder of this talk discusses some of the choices made in designing and implementing Curve25519. Curve25519 uses an elliptic-curve group. “Why not unit group T1 or torus group T2 or torus group T6? Why not XTR, using only 5:2 mults for each exponent bit?” Answer: Compared to XTR, elliptic curves use more mults in a smaller field. Overall slightly less expensive. XTR needs larger field to protect against NFS.
SLIDE 16 methods literature twice as long Curve25519 security level. secret-sharing methods: interacting parameters implementation.
this talk discusses choices made and implementing Curve25519 uses an elliptic-curve group. “Why not unit group T1 or torus group T2 or torus group T6? Why not XTR, using only 5:2 mults for each exponent bit?” Answer: Compared to XTR, elliptic curves use more mults in a smaller field. Overall slightly less expensive. XTR needs larger field to protect against NFS. Curve25519 comp an elliptic-curve p to a public key x (Not patented. 1986 “But then you need an expensive computation Why not also transmit Answer: Transmitting
A square-root computation isn’t terribly expensive— and is avoided entirely Curve25519 computation.
SLIDE 17 Curve25519 uses an elliptic-curve group. “Why not unit group T1 or torus group T2 or torus group T6? Why not XTR, using only 5:2 mults for each exponent bit?” Answer: Compared to XTR, elliptic curves use more mults in a smaller field. Overall slightly less expensive. XTR needs larger field to protect against NFS. Curve25519 compresses an elliptic-curve point (x; y) to a public key x. (Not patented. 1986 Miller.) “But then you need an expensive computation of y! Why not also transmit y?” Answer: Transmitting y is
- ften unacceptably expensive.
A square-root computation isn’t terribly expensive— and is avoided entirely in the Curve25519 computation.
SLIDE 18 uses an group. group T1 or torus rus group T6? using only 5:2 exponent bit?” Compared to XTR, use more mults field. less expensive. rger field against NFS. Curve25519 compresses an elliptic-curve point (x; y) to a public key x. (Not patented. 1986 Miller.) “But then you need an expensive computation of y! Why not also transmit y?” Answer: Transmitting y is
- ften unacceptably expensive.
A square-root computation isn’t terribly expensive— and is avoided entirely in the Curve25519 computation. Curve25519 uses
“Why not char 2? Squaring is almost Can exploit Frobenius Answer: Current fast floating-point for physics simulation Can reuse these multipliers arithmetic in large-cha Outweighs the cha
SLIDE 19 Curve25519 compresses an elliptic-curve point (x; y) to a public key x. (Not patented. 1986 Miller.) “But then you need an expensive computation of y! Why not also transmit y?” Answer: Transmitting y is
- ften unacceptably expensive.
A square-root computation isn’t terribly expensive— and is avoided entirely in the Curve25519 computation. Curve25519 uses a curve
“Why not char 2? Squaring is almost for free! Can exploit Frobenius on curve.” Answer: Current CPUs include fast floating-point multipliers for physics simulation etc. Can reuse these multipliers for arithmetic in large-char fields. Outweighs the char-2 advantages.
SLIDE 20 compresses elliptic-curve point (x; y) x. 1986 Miller.) need computation of y! transmit y?” ransmitting y is unacceptably expensive. computation expensive— entirely in the computation. Curve25519 uses a curve
“Why not char 2? Squaring is almost for free! Can exploit Frobenius on curve.” Answer: Current CPUs include fast floating-point multipliers for physics simulation etc. Can reuse these multipliers for arithmetic in large-char fields. Outweighs the char-2 advantages. Curve25519 uses y2 = x3+Ax2+x “Why not y2 = x Double (x; y) in Jacobian using only 5 field and 3 extra field Answer: With y2 can do projective doubling and addition using 1 field mult 4 field squarings, 5 extra field mults.
SLIDE 21 Curve25519 uses a curve
“Why not char 2? Squaring is almost for free! Can exploit Frobenius on curve.” Answer: Current CPUs include fast floating-point multipliers for physics simulation etc. Can reuse these multipliers for arithmetic in large-char fields. Outweighs the char-2 advantages. Curve25519 uses curve shape y2 = x3+Ax2+x, tiny A 2 2+4Z. “Why not y2 = x3 ` 3x + a6? Double (x; y) in Jacobian coords using only 5 field squarings and 3 extra field mults!” Answer: With y2 = x3 + Ax2 + x, can do projective x-coord doubling and addition together using 1 field mult by (A ` 2)=4, 4 field squarings, 5 extra field mults. Never need y.
SLIDE 22 uses a curve rge-char field. 2? almost for free! robenius on curve.” Current CPUs include
simulation etc. these multipliers for rge-char fields. char-2 advantages. Curve25519 uses curve shape y2 = x3+Ax2+x, tiny A 2 2+4Z. “Why not y2 = x3 ` 3x + a6? Double (x; y) in Jacobian coords using only 5 field squarings and 3 extra field mults!” Answer: With y2 = x3 + Ax2 + x, can do projective x-coord doubling and addition together using 1 field mult by (A ` 2)=4, 4 field squarings, 5 extra field mults. Never need y. Curve25519 uses “Why not an extension Adapt extension to CPU’s multiplier Avoid carries in a Answer: Extension punishes CPUs with another multiplier Maybe tolerable as CPUs converge—but carries are a very
SLIDE 23
Curve25519 uses curve shape y2 = x3+Ax2+x, tiny A 2 2+4Z. “Why not y2 = x3 ` 3x + a6? Double (x; y) in Jacobian coords using only 5 field squarings and 3 extra field mults!” Answer: With y2 = x3 + Ax2 + x, can do projective x-coord doubling and addition together using 1 field mult by (A ` 2)=4, 4 field squarings, 5 extra field mults. Never need y. Curve25519 uses a prime field. “Why not an extension field? Adapt extension degree to CPU’s multiplier size. Avoid carries in arithmetic!” Answer: Extension field punishes CPUs with another multiplier size. Maybe tolerable as CPUs converge—but carries are a very small cost.
SLIDE 24 uses curve shape x, tiny A 2 2+4Z. x3 ` 3x + a6? in Jacobian coords field squarings field mults!” y2 = x3 + Ax2 + x, rojective x-coord addition together mult by (A ` 2)=4, rings,
Curve25519 uses a prime field. “Why not an extension field? Adapt extension degree to CPU’s multiplier size. Avoid carries in arithmetic!” Answer: Extension field punishes CPUs with another multiplier size. Maybe tolerable as CPUs converge—but carries are a very small cost. Curve25519 uses extremely close to specifically, 2255 ` “Why not a word-aligned 2256 ` 2224 + 2192 Reduce by simple and subtractions!” Answer: Repeated are more expensive a multiplication b Also, analogous p to extension fields.
SLIDE 25
Curve25519 uses a prime field. “Why not an extension field? Adapt extension degree to CPU’s multiplier size. Avoid carries in arithmetic!” Answer: Extension field punishes CPUs with another multiplier size. Maybe tolerable as CPUs converge—but carries are a very small cost. Curve25519 uses prime extremely close to a power of 2: specifically, 2255 ` 19. “Why not a word-aligned prime, 2256 ` 2224 + 2192 + 296 ` 1? Reduce by simple word additions and subtractions!” Answer: Repeated additions are more expensive than a multiplication by 19. Also, analogous problem to extension fields.
SLIDE 26
uses a prime field. extension field? extension degree multiplier size. arithmetic!” Extension field with multiplier size. tolerable converge—but very small cost. Curve25519 uses prime extremely close to a power of 2: specifically, 2255 ` 19. “Why not a word-aligned prime, 2256 ` 2224 + 2192 + 296 ` 1? Reduce by simple word additions and subtractions!” Answer: Repeated additions are more expensive than a multiplication by 19. Also, analogous problem to extension fields. Curve25519 computation largest convenient with integer w. Example: With 64-bit floating-point mantissas, Curve25519 uses P
i small multiple
“Why not use radix 226? Doesn’t the to be an integer?” Answer: No, exponent have to be an integer. Radix 225:5 saves reduction mod 2255
SLIDE 27
Curve25519 uses prime extremely close to a power of 2: specifically, 2255 ` 19. “Why not a word-aligned prime, 2256 ` 2224 + 2192 + 296 ` 1? Reduce by simple word additions and subtractions!” Answer: Repeated additions are more expensive than a multiplication by 19. Also, analogous problem to extension fields. Curve25519 computation uses largest convenient radix 2255=w with integer w. Example: With 64-bit x86 floating-point mantissas, Curve25519 uses radix 225:5, i.e., P
i small multiple of 2d25:5ie.
“Why not use radix 225, or radix 226? Doesn’t the exponent have to be an integer?” Answer: No, exponent doesn’t have to be an integer. Radix 225:5 saves time in reduction mod 2255 ` 19.
SLIDE 28
uses prime to a power of 2:
255 ` 19.
rd-aligned prime, 2192 + 296 ` 1? simple word additions subtractions!” eated additions ensive than by 19. analogous problem fields. Curve25519 computation uses largest convenient radix 2255=w with integer w. Example: With 64-bit x86 floating-point mantissas, Curve25519 uses radix 225:5, i.e., P
i small multiple of 2d25:5ie.
“Why not use radix 225, or radix 226? Doesn’t the exponent have to be an integer?” Answer: No, exponent doesn’t have to be an integer. Radix 225:5 saves time in reduction mod 2255 ` 19. Curve25519 computation coefficients slightly than the radix. “Why not use canonical with minimal coefficients? Smaller coefficients allow faster arithmetic!” Answer: Conversion to canonical form Making coefficients is much less expensive than making them Has most of the
SLIDE 29
Curve25519 computation uses largest convenient radix 2255=w with integer w. Example: With 64-bit x86 floating-point mantissas, Curve25519 uses radix 225:5, i.e., P
i small multiple of 2d25:5ie.
“Why not use radix 225, or radix 226? Doesn’t the exponent have to be an integer?” Answer: No, exponent doesn’t have to be an integer. Radix 225:5 saves time in reduction mod 2255 ` 19. Curve25519 computation allows coefficients slightly larger than the radix. “Why not use canonical form, with minimal coefficients? Smaller coefficients allow faster arithmetic!” Answer: Conversion to canonical form is expensive. Making coefficients small is much less expensive than making them smallest. Has most of the same benefit.
SLIDE 30 computation uses convenient radix 2255=w . 64-bit x86 mantissas, uses radix 225:5, i.e., multiple of 2d25:5ie. radix 225, or radix the exponent have integer?” exponent doesn’t integer. saves time in 2255 ` 19. Curve25519 computation allows coefficients slightly larger than the radix. “Why not use canonical form, with minimal coefficients? Smaller coefficients allow faster arithmetic!” Answer: Conversion to canonical form is expensive. Making coefficients small is much less expensive than making them smallest. Has most of the same benefit. Curve25519 converts indexing into arithmetic: given P[0], P[1], P[b] as bP[1] + (1 “Why not simply index? Skip the multiplications b; 1 ` b and the addition!” Answer: This arithmetic
Protects against such as hyperthreading Less expensive than variable array indexing.
SLIDE 31 Curve25519 computation allows coefficients slightly larger than the radix. “Why not use canonical form, with minimal coefficients? Smaller coefficients allow faster arithmetic!” Answer: Conversion to canonical form is expensive. Making coefficients small is much less expensive than making them smallest. Has most of the same benefit. Curve25519 converts variable indexing into arithmetic: e.g., given P[0], P[1], bit b, compute P[b] as bP[1] + (1 ` b)P[0]. “Why not simply use b as an array index? Skip the multiplications by b; 1 ` b and the addition!” Answer: This arithmetic is 6%
- f the Curve25519 computation.
Protects against timing attacks, such as hyperthreading attacks. Less expensive than protecting variable array indexing.
SLIDE 32 computation allows slightly larger canonical form, coefficients? efficients rithmetic!” Conversion rm is expensive. efficients small expensive them smallest. the same benefit. Curve25519 converts variable indexing into arithmetic: e.g., given P[0], P[1], bit b, compute P[b] as bP[1] + (1 ` b)P[0]. “Why not simply use b as an array index? Skip the multiplications by b; 1 ` b and the addition!” Answer: This arithmetic is 6%
- f the Curve25519 computation.
Protects against timing attacks, such as hyperthreading attacks. Less expensive than protecting variable array indexing. Curve25519 uses with a secure twist: y2 = x3 + 486662 Group order 8 ´ prime. Twist group order “Why worry about Why not simply p keys on the twist?” Answer: Prohibiting twist means checking (“validating keys”). cost by choosing
SLIDE 33 Curve25519 converts variable indexing into arithmetic: e.g., given P[0], P[1], bit b, compute P[b] as bP[1] + (1 ` b)P[0]. “Why not simply use b as an array index? Skip the multiplications by b; 1 ` b and the addition!” Answer: This arithmetic is 6%
- f the Curve25519 computation.
Protects against timing attacks, such as hyperthreading attacks. Less expensive than protecting variable array indexing. Curve25519 uses a secure curve with a secure twist: y2 = x3 + 486662x2 + x. Group order 8 ´ prime. Twist group order 4 ´ prime. “Why worry about twist order? Why not simply prohibit keys on the twist?” Answer: Prohibiting keys on the twist means checking for them (“validating keys”). Eliminate this cost by choosing curve carefully.
SLIDE 34 converts variable rithmetic: e.g., [1], bit b, compute (1 ` b)P[0]. simply use b as an array the multiplications by the addition!” rithmetic is 6% Curve25519 computation. against timing attacks, erthreading attacks. than protecting indexing. Curve25519 uses a secure curve with a secure twist: y2 = x3 + 486662x2 + x. Group order 8 ´ prime. Twist group order 4 ´ prime. “Why worry about twist order? Why not simply prohibit keys on the twist?” Answer: Prohibiting keys on the twist means checking for them (“validating keys”). Eliminate this cost by choosing curve carefully. What’s next? Culmination of extensive
genus-2 hyperelliptic 25 mults per bit. eprint.iacr.org/2005/314 Half-size prime: e.g., Select curve to mak mults easier, like this needs faster Should analyze cycles instead of field mults. Prediction: this will
SLIDE 35 Curve25519 uses a secure curve with a secure twist: y2 = x3 + 486662x2 + x. Group order 8 ´ prime. Twist group order 4 ´ prime. “Why worry about twist order? Why not simply prohibit keys on the twist?” Answer: Prohibiting keys on the twist means checking for them (“validating keys”). Eliminate this cost by choosing curve carefully. What’s next? Culmination of extensive work
- n eliminating field mults for
genus-2 hyperelliptic curves: 25 mults per bit. Gaudry, eprint.iacr.org/2005/314 Half-size prime: e.g., 2127 ` 1. Select curve to make some mults easier, like taking tiny A; this needs faster point counting! Should analyze cycles instead of field mults. Prediction: this will beat genus 1.