SLIDE 1 1
How cryptographic benchmarking goes wrong Daniel J. Bernstein Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance. PRESERVE, ending 2015.06.30, was a European project “Preparing Secure Vehicle-to-X Communication Systems”. Project cost: 5383431 EUR, including 3850000 EUR from the European Commission.
2
“About PRESERVE”: “The mission of PRESERVE is, to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security
- Architecture. 2. Implementation
- f V2X Security Subsystem. 3.
Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges.”
SLIDE 2 1
cryptographic benchmarking wrong
Thanks to NIST 60NANB12D261 funding this work, and for not reviewing these slides in advance. PRESERVE, ending 2015.06.30, European project “Preparing Secure Vehicle-to-X Communication Systems”. Project cost: 5383431 EUR, including 3850000 EUR from European Commission.
2
“About PRESERVE”: “The mission of PRESERVE is, to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security
- Architecture. 2. Implementation
- f V2X Security Subsystem. 3.
Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges.” Cars already Why build PRESERVE “Security Security “Processing second and ms can ha hardware a Pentiu needs ab a verification cryptographic likely to
SLIDE 3 1
cryptographic benchmarking Bernstein 60NANB12D261 work, and for not slides in advance. ending 2015.06.30, project Secure Vehicle-to-X Systems”. 5383431 EUR, 3850000 EUR from Commission.
2
“About PRESERVE”: “The mission of PRESERVE is, to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security
- Architecture. 2. Implementation
- f V2X Security Subsystem. 3.
Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges.” Cars already include Why build an ASIC? PRESERVE deliverable “Security Requirements Security Architecture”, “Processing 1,000 second and processing ms can hardly be met
a Pentium D 3.4 GHz needs about 5 times a verification : : : a cryptographic co-p likely to be necessa
SLIDE 4 1
enchmarking 60NANB12D261 for not advance. 2015.06.30, ehicle-to-X Systems”. EUR, from Commission.
2
“About PRESERVE”: “The mission of PRESERVE is, to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security
- Architecture. 2. Implementation
- f V2X Security Subsystem. 3.
Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges.” Cars already include many CPUs. Why build an ASIC? PRESERVE deliverable 1.1, “Security Requirements of V Security Architecture”, 2011: “Processing 1,000 packets p second and processing each ms can hardly be met by current
- hardware. As discussed in [32],
a Pentium D 3.4 GHz processo needs about 5 times as long a verification : : : a dedicated cryptographic co-processor is likely to be necessary.”
SLIDE 5 2
“About PRESERVE”: “The mission of PRESERVE is, to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security
- Architecture. 2. Implementation
- f V2X Security Subsystem. 3.
Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges.”
3
Cars already include many CPUs. Why build an ASIC? PRESERVE deliverable 1.1, “Security Requirements of Vehicle Security Architecture”, 2011: “Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current
- hardware. As discussed in [32],
a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary.”
SLIDE 6 2
mission of PRESERVE is, design, implement, and secure and scalable Security Subsystem for realistic deployment scenarios. [Expected Results:] 1. rmonized V2X Security
- Architecture. 2. Implementation
Security Subsystem. 3. and scalable security ASIC
- V2X. 4. Testing results VSS
realistic conditions. 5. rch results for deployment challenges.”
3
Cars already include many CPUs. Why build an ASIC? PRESERVE deliverable 1.1, “Security Requirements of Vehicle Security Architecture”, 2011: “Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current
- hardware. As discussed in [32],
a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary.” PRESERVE “Deployment V4”, 2016: ECC signature second is factor fo environment 4mm×4mm technology space for 90nm will cores and more.” F max 100MHz,
SLIDE 7 2
PRESERVE”: “The PRESERVE is, implement, and and scalable Subsystem for yment scenarios. Results:] 1. Security Implementation
able security ASIC esting results VSS
for deployment
3
Cars already include many CPUs. Why build an ASIC? PRESERVE deliverable 1.1, “Security Requirements of Vehicle Security Architecture”, 2011: “Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current
- hardware. As discussed in [32],
a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary.” PRESERVE deliverable “Deployment Issues V4”, 2016: “the numb ECC signature verifications second is the key p factor for ASICs in environment : : : [On 4mm×4mm chip] the technology may onl space for one ECC 90nm will allow for cores and 55nm will more.” For 180nm max 100MHz, 100
SLIDE 8 2
“The for rios. Implementation
y ASIC results VSS 5. yment
3
Cars already include many CPUs. Why build an ASIC? PRESERVE deliverable 1.1, “Security Requirements of Vehicle Security Architecture”, 2011: “Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current
- hardware. As discussed in [32],
a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary.” PRESERVE deliverable 5.4, “Deployment Issues Report V4”, 2016: “the number of ECC signature verifications p second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm×4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten cores and 55nm will allow fo more.” For 180nm core says max 100MHz, 100 verif/second.
SLIDE 9 3
Cars already include many CPUs. Why build an ASIC? PRESERVE deliverable 1.1, “Security Requirements of Vehicle Security Architecture”, 2011: “Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current
- hardware. As discussed in [32],
a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary.”
4
PRESERVE deliverable 5.4, “Deployment Issues Report V4”, 2016: “the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm×4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more.” For 180nm core says max 100MHz, 100 verif/second.
SLIDE 10 3
lready include many CPUs. build an ASIC? PRESERVE deliverable 1.1, “Security Requirements of Vehicle Security Architecture”, 2011: cessing 1,000 packets per and processing each in 1 can hardly be met by current
- are. As discussed in [32],
entium D 3.4 GHz processor about 5 times as long for verification : : : a dedicated cryptographic co-processor is to be necessary.”
4
PRESERVE deliverable 5.4, “Deployment Issues Report V4”, 2016: “the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm×4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more.” For 180nm core says max 100MHz, 100 verif/second. Compare IAIK NIST 858 scala in 111620 at 180nm technology standard 9.3744 — conditions core voltage Signature somewhat Still close than the
SLIDE 11
3
include many CPUs. ASIC? deliverable 1.1, Requirements of Vehicle Architecture”, 2011: 1,000 packets per cessing each in 1 e met by current discussed in [32], GHz processor times as long for a dedicated co-processor is necessary.”
4
PRESERVE deliverable 5.4, “Deployment Issues Report V4”, 2016: “the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm×4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more.” For 180nm core says max 100MHz, 100 verif/second. Compare to, e.g., IAIK NIST P-256 ECC 858 scalarmult/second in 111620 GE at 192 at 180nm (“UMC technology using F standard cell library 9.3744 —m2/GE; w conditions (temperature core voltage 1.62V)”). Signature verification somewhat slower than Still close to 100× than the PRESERVE
SLIDE 12
3
CPUs. 1.1, Vehicle 2011: per each in 1 current [32], cessor long for dedicated r is
4
PRESERVE deliverable 5.4, “Deployment Issues Report V4”, 2016: “the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm×4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more.” For 180nm core says max 100MHz, 100 verif/second. Compare to, e.g., IAIK NIST P-256 ECC Module 858 scalarmult/second in 111620 GE at 192 MHz at 180nm (“UMC L180GII technology using Faraday f180 standard cell library (FSA0A 9.3744 —m2/GE; worst case conditions (temperature 125 core voltage 1.62V)”). Signature verification will be somewhat slower than scalarmult. Still close to 100× more efficient than the PRESERVE estimates.
SLIDE 13
4
PRESERVE deliverable 5.4, “Deployment Issues Report V4”, 2016: “the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm×4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more.” For 180nm core says max 100MHz, 100 verif/second.
5
Compare to, e.g., IAIK NIST P-256 ECC Module: 858 scalarmult/second in 111620 GE at 192 MHz at 180nm (“UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 —m2/GE; worst case conditions (temperature 125◦C, core voltage 1.62V)”). Signature verification will be somewhat slower than scalarmult. Still close to 100× more efficient than the PRESERVE estimates.
SLIDE 14 4
PRESERVE deliverable 5.4, yment Issues Report 2016: “the number of signature verifications per is the key performance for ASICs in a C2C environment : : : [On a 4mm chip] the 180nm technology may only yield enough for one ECC core, whereas will allow for up to ten ECC and 55nm will allow for even For 180nm core says 100MHz, 100 verif/second.
5
Compare to, e.g., IAIK NIST P-256 ECC Module: 858 scalarmult/second in 111620 GE at 192 MHz at 180nm (“UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 —m2/GE; worst case conditions (temperature 125◦C, core voltage 1.62V)”). Signature verification will be somewhat slower than scalarmult. Still close to 100× more efficient than the PRESERVE estimates. Let’s go core argument Central claim: in [32], a processor (i.e., 17 for signature [32] is “P Z., ‘Analysis
Third Joint Mobile Net (WMNC),
SLIDE 15 4
deliverable 5.4, Issues Report number of verifications per ey performance in a C2C [On a chip] the 180nm
ECC core, whereas for up to ten ECC will allow for even 180nm core says 100 verif/second.
5
Compare to, e.g., IAIK NIST P-256 ECC Module: 858 scalarmult/second in 111620 GE at 192 MHz at 180nm (“UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 —m2/GE; worst case conditions (temperature 125◦C, core voltage 1.62V)”). Signature verification will be somewhat slower than scalarmult. Still close to 100× more efficient than the PRESERVE estimates. Let’s go back to PRESERVE’s core argument for Central claim: “As in [32], a Pentium processor needs ab (i.e., 17 million CPU for signature verification. [32] is “Petit, J., Mamm Z., ‘Analysis of authentication
Third Joint IFIP Wireless Mobile Networking (WMNC), 2010.”
SLIDE 16 4
5.4, rt
verifications per rmance 180nm enough whereas ten ECC for even ys verif/second.
5
Compare to, e.g., IAIK NIST P-256 ECC Module: 858 scalarmult/second in 111620 GE at 192 MHz at 180nm (“UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 —m2/GE; worst case conditions (temperature 125◦C, core voltage 1.62V)”). Signature verification will be somewhat slower than scalarmult. Still close to 100× more efficient than the PRESERVE estimates. Let’s go back to PRESERVE’s core argument for an ASIC. Central claim: “As discussed in [32], a Pentium D 3.4 GHz processor needs about” 5ms (i.e., 17 million CPU cycles) for signature verification. [32] is “Petit, J., Mammeri, Z., ‘Analysis of authentication
- verhead in vehicular networks’,
Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010.”
SLIDE 17 5
Compare to, e.g., IAIK NIST P-256 ECC Module: 858 scalarmult/second in 111620 GE at 192 MHz at 180nm (“UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 —m2/GE; worst case conditions (temperature 125◦C, core voltage 1.62V)”). Signature verification will be somewhat slower than scalarmult. Still close to 100× more efficient than the PRESERVE estimates.
6
Let’s go back to PRESERVE’s core argument for an ASIC. Central claim: “As discussed in [32], a Pentium D 3.4 GHz processor needs about” 5ms (i.e., 17 million CPU cycles) for signature verification. [32] is “Petit, J., Mammeri, Z., ‘Analysis of authentication
- verhead in vehicular networks’,
Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010.”
SLIDE 18 5
Compare to, e.g., NIST P-256 ECC Module: scalarmult/second 111620 GE at 192 MHz 180nm (“UMC L180GII technology using Faraday f180 rd cell library (FSA0A C), —m2/GE; worst case conditions (temperature 125◦C, voltage 1.62V)”). Signature verification will be somewhat slower than scalarmult. close to 100× more efficient the PRESERVE estimates.
6
Let’s go back to PRESERVE’s core argument for an ASIC. Central claim: “As discussed in [32], a Pentium D 3.4 GHz processor needs about” 5ms (i.e., 17 million CPU cycles) for signature verification. [32] is “Petit, J., Mammeri, Z., ‘Analysis of authentication
- verhead in vehicular networks’,
Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010.” [32] says to the huge economic from vehicula governments, companies, have made vehicular [1]. On average, collisions and 7900 United States, economic [2]. : : : [Simila costing e
SLIDE 19 5
e.g., P-256 ECC Module: rmult/second 192 MHz (“UMC L180GII Faraday f180 rary (FSA0A C), /GE; worst case erature 125◦C, 1.62V)”). verification will be er than scalarmult. × more efficient PRESERVE estimates.
6
Let’s go back to PRESERVE’s core argument for an ASIC. Central claim: “As discussed in [32], a Pentium D 3.4 GHz processor needs about” 5ms (i.e., 17 million CPU cycles) for signature verification. [32] is “Petit, J., Mammeri, Z., ‘Analysis of authentication
- verhead in vehicular networks’,
Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010.” [32] says “1. Intro to the huge life losses economic impacts from vehicular collisions, governments, automotive companies, and indu have made the reduction vehicular fatalities [1]. On average, vehicula collisions cause 102 and 7900 injuries daily United States, leaving economic impact of [2]. : : : [Similar sto costing e160 billion
SLIDE 20 5
dule: I f180 (FSA0A C), case 125◦C, be calarmult. efficient estimates.
6
Let’s go back to PRESERVE’s core argument for an ASIC. Central claim: “As discussed in [32], a Pentium D 3.4 GHz processor needs about” 5ms (i.e., 17 million CPU cycles) for signature verification. [32] is “Petit, J., Mammeri, Z., ‘Analysis of authentication
- verhead in vehicular networks’,
Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010.” [32] says “1. Introduction. Due to the huge life losses and the economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry conso have made the reduction of vehicular fatalities a top prio [1]. On average, vehicular collisions cause 102 deaths and 7900 injuries daily in the United States, leaving an economic impact of $230 billion [2]. : : : [Similar story for EU:] costing e160 billion annually
SLIDE 21 6
Let’s go back to PRESERVE’s core argument for an ASIC. Central claim: “As discussed in [32], a Pentium D 3.4 GHz processor needs about” 5ms (i.e., 17 million CPU cycles) for signature verification. [32] is “Petit, J., Mammeri, Z., ‘Analysis of authentication
- verhead in vehicular networks’,
Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010.”
7
[32] says “1. Introduction. Due to the huge life losses and the economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority [1]. On average, vehicular collisions cause 102 deaths and 7900 injuries daily in the United States, leaving an economic impact of $230 billion [2]. : : : [Similar story for EU:] costing e160 billion annually [3].”
SLIDE 22 6
go back to PRESERVE’s rgument for an ASIC. Central claim: “As discussed [32], a Pentium D 3.4 GHz cessor needs about” 5ms 17 million CPU cycles) ignature verification. “Petit, J., Mammeri, ‘Analysis of authentication
- verhead in vehicular networks’,
Joint IFIP Wireless and Networking Conference (WMNC), 2010.”
7
[32] says “1. Introduction. Due to the huge life losses and the economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority [1]. On average, vehicular collisions cause 102 deaths and 7900 injuries daily in the United States, leaving an economic impact of $230 billion [2]. : : : [Similar story for EU:] costing e160 billion annually [3].” Vehicles information.
support the Signature [8] over P-224 and paper, w and communication the authentication provided
verification D 3.4Ghz
SLIDE 23 6
PRESERVE’s for an ASIC. “As discussed entium D 3.4 GHz about” 5ms CPU cycles) verification. J., Mammeri, authentication vehicular networks’, Wireless and rking Conference 2010.”
7
[32] says “1. Introduction. Due to the huge life losses and the economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority [1]. On average, vehicular collisions cause 102 deaths and 7900 injuries daily in the United States, leaving an economic impact of $230 billion [2]. : : : [Similar story for EU:] costing e160 billion annually [3].” Vehicles will communicate
- information. “All implementations
- f IEEE1609.2 standa
support the Elliptic Signature Algorithm [8] over the two NIST P-224 and P-256. paper, we assess the and communication the authentication provided by ECDSA.
verification times on D 3.4Ghz workstation
SLIDE 24 6
PRESERVE’s ASIC. discussed GHz 5ms cycles) eri, authentication
and Conference
7
[32] says “1. Introduction. Due to the huge life losses and the economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority [1]. On average, vehicular collisions cause 102 deaths and 7900 injuries daily in the United States, leaving an economic impact of $230 billion [2]. : : : [Similar story for EU:] costing e160 billion annually [3].” Vehicles will communicate safet
- information. “All implementations
- f IEEE1609.2 standard [7] shall
support the Elliptic Curve Digital Signature Algorithm (ECDSA) [8] over the two NIST curves P-224 and P-256. : : : In this paper, we assess the processing and communication overhead the authentication mechanism provided by ECDSA. : : : Table
- II. Signature generation and
verification times on a Pentiu D 3.4Ghz workstation [10]”
SLIDE 25 7
[32] says “1. Introduction. Due to the huge life losses and the economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority [1]. On average, vehicular collisions cause 102 deaths and 7900 injuries daily in the United States, leaving an economic impact of $230 billion [2]. : : : [Similar story for EU:] costing e160 billion annually [3].”
8
Vehicles will communicate safety
- information. “All implementations
- f IEEE1609.2 standard [7] shall
support the Elliptic Curve Digital Signature Algorithm (ECDSA) [8] over the two NIST curves P-224 and P-256. : : : In this paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table
- II. Signature generation and
verification times on a Pentium D 3.4Ghz workstation [10]”
SLIDE 26 7
ys “1. Introduction. Due huge life losses and the economic impacts resulting vehicular collisions, many governments, automotive companies, and industry consortia made the reduction of vehicular fatalities a top priority On average, vehicular collisions cause 102 deaths 7900 injuries daily in the States, leaving an economic impact of $230 billion : [Similar story for EU:] costing e160 billion annually [3].”
8
Vehicles will communicate safety
- information. “All implementations
- f IEEE1609.2 standard [7] shall
support the Elliptic Curve Digital Signature Algorithm (ECDSA) [8] over the two NIST curves P-224 and P-256. : : : In this paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table
- II. Signature generation and
verification times on a Pentium D 3.4Ghz workstation [10]” [10] (in [32]) J., ‘Analysis Authentication VANETs’, Conference Mobility Cairo, Decemb [10] says implemented and follo For NIST “Pentium 2.50ms/3.33ms 4.97ms/6.63ms
SLIDE 27 7
losses and the acts resulting collisions, many automotive industry consortia reduction of atalities a top priority average, vehicular 102 deaths uries daily in the leaving an act of $230 billion story for EU:] billion annually [3].”
8
Vehicles will communicate safety
- information. “All implementations
- f IEEE1609.2 standard [7] shall
support the Elliptic Curve Digital Signature Algorithm (ECDSA) [8] over the two NIST curves P-224 and P-256. : : : In this paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table
- II. Signature generation and
verification times on a Pentium D 3.4Ghz workstation [10]” [10] (in [32]) is “P J., ‘Analysis of ECDSA Authentication Pro VANETs’, 3rd IFIP Conference on New Mobility and Securit Cairo, December 2009. [10] says “ECDSA implemented using and following the Fig.1.” For NIST P-224/P-256 “Pentium D 3.4GHz 2.50ms/3.33ms to 4.97ms/6.63ms to
SLIDE 28 7
the resulting many consortia
riority deaths the billion EU:] nnually [3].”
8
Vehicles will communicate safety
- information. “All implementations
- f IEEE1609.2 standard [7] shall
support the Elliptic Curve Digital Signature Algorithm (ECDSA) [8] over the two NIST curves P-224 and P-256. : : : In this paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table
- II. Signature generation and
verification times on a Pentium D 3.4Ghz workstation [10]” [10] (in [32]) is “Petit J., ‘Analysis of ECDSA Authentication Processing in VANETs’, 3rd IFIP International Conference on New Technologies, Mobility and Security (NTMS), Cairo, December 2009.” [10] says “ECDSA was implemented using MIRACL and following the Fig.1.” For NIST P-224/P-256 on “Pentium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.
SLIDE 29 8
Vehicles will communicate safety
- information. “All implementations
- f IEEE1609.2 standard [7] shall
support the Elliptic Curve Digital Signature Algorithm (ECDSA) [8] over the two NIST curves P-224 and P-256. : : : In this paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table
- II. Signature generation and
verification times on a Pentium D 3.4Ghz workstation [10]”
9
[10] (in [32]) is “Petit J., ‘Analysis of ECDSA Authentication Processing in VANETs’, 3rd IFIP International Conference on New Technologies, Mobility and Security (NTMS), Cairo, December 2009.” [10] says “ECDSA was implemented using MIRACL and following the Fig.1.” For NIST P-224/P-256 on “Pentium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.
SLIDE 30 8
ehicles will communicate safety
- rmation. “All implementations
IEEE1609.2 standard [7] shall rt the Elliptic Curve Digital Signature Algorithm (ECDSA)
and P-256. : : : In this we assess the processing communication overhead of authentication mechanism rovided by ECDSA. : : : Table Signature generation and verification times on a Pentium 3.4Ghz workstation [10]”
9
[10] (in [32]) is “Petit J., ‘Analysis of ECDSA Authentication Processing in VANETs’, 3rd IFIP International Conference on New Technologies, Mobility and Security (NTMS), Cairo, December 2009.” [10] says “ECDSA was implemented using MIRACL and following the Fig.1.” For NIST P-224/P-256 on “Pentium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify. Compare speeds rep
(“2015 Intel https://bench.cr.yp.to 0.015ms 0.049ms
SLIDE 31 8
communicate safety “All implementations standard [7] shall Elliptic Curve Digital rithm (ECDSA) NIST curves P-256. : : : In this the processing communication overhead of authentication mechanism
generation and times on a Pentium rkstation [10]”
9
[10] (in [32]) is “Petit J., ‘Analysis of ECDSA Authentication Processing in VANETs’, 3rd IFIP International Conference on New Technologies, Mobility and Security (NTMS), Cairo, December 2009.” [10] says “ECDSA was implemented using MIRACL and following the Fig.1.” For NIST P-224/P-256 on “Pentium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify. Compare to, e.g., Ed25519 speeds reported for
(“2015 Intel Core i5-6600”) https://bench.cr.yp.to 0.015ms to sign (49840 0.049ms to verify (163206
SLIDE 32 8
safety implementations [7] shall Digital (ECDSA) curves this cessing
mechanism able nd entium [10]”
9
[10] (in [32]) is “Petit J., ‘Analysis of ECDSA Authentication Processing in VANETs’, 3rd IFIP International Conference on New Technologies, Mobility and Security (NTMS), Cairo, December 2009.” [10] says “ECDSA was implemented using MIRACL and following the Fig.1.” For NIST P-224/P-256 on “Pentium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify. Compare to, e.g., Ed25519 speeds reported for single co
(“2015 Intel Core i5-6600”) https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles).
SLIDE 33 9
[10] (in [32]) is “Petit J., ‘Analysis of ECDSA Authentication Processing in VANETs’, 3rd IFIP International Conference on New Technologies, Mobility and Security (NTMS), Cairo, December 2009.” [10] says “ECDSA was implemented using MIRACL and following the Fig.1.” For NIST P-224/P-256 on “Pentium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.
10
Compare to, e.g., Ed25519 speeds reported for single core
(“2015 Intel Core i5-6600”) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles).
SLIDE 34 9
[10] (in [32]) is “Petit J., ‘Analysis of ECDSA Authentication Processing in VANETs’, 3rd IFIP International Conference on New Technologies, Mobility and Security (NTMS), Cairo, December 2009.” [10] says “ECDSA was implemented using MIRACL and following the Fig.1.” For NIST P-224/P-256 on “Pentium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.
10
Compare to, e.g., Ed25519 speeds reported for single core
(“2015 Intel Core i5-6600”) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). This chip didn’t exist in 2009. Compare instead to single core
- f 65nm 2.4GHz Core 2 (“2007
Intel Core 2 Quad Q6600”). 0.065ms to sign (156843 cycles), 0.232ms to verify (557082 cycles).
SLIDE 35 9
(in [32]) is “Petit ‘Analysis of ECDSA Authentication Processing in ANETs’, 3rd IFIP International Conference on New Technologies, Mobility and Security (NTMS), December 2009.” ys “ECDSA was implemented using MIRACL following the Fig.1.” NIST P-224/P-256 on entium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.
10
Compare to, e.g., Ed25519 speeds reported for single core
(“2015 Intel Core i5-6600”) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). This chip didn’t exist in 2009. Compare instead to single core
- f 65nm 2.4GHz Core 2 (“2007
Intel Core 2 Quad Q6600”). 0.065ms to sign (156843 cycles), 0.232ms to verify (557082 cycles). 2012 Bernstein–Schw
0.9ms to ARM Co 1000MHz in iPad 1, 1000MHz in Samsung 1000MHz Motorola 800MHz Amazon Today: in Cortex-A7
SLIDE 36 9
“Petit ECDSA Processing in IFIP International New Technologies, Security (NTMS), er 2009.” “ECDSA was using MIRACL the Fig.1.” P-224/P-256 on 3.4GHz workstation”: to sign, to verify.
10
Compare to, e.g., Ed25519 speeds reported for single core
(“2015 Intel Core i5-6600”) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). This chip didn’t exist in 2009. Compare instead to single core
- f 65nm 2.4GHz Core 2 (“2007
Intel Core 2 Quad Q6600”). 0.065ms to sign (156843 cycles), 0.232ms to verify (557082 cycles). 2012 Bernstein–Schw
0.9ms to verify (650102 ARM Cortex-A8 co 1000MHz Apple A4 in iPad 1, iPhone 4 1000MHz Samsung in Samsung Galaxy 1000MHz TI OMAP3630 Motorola Droid X 800MHz Freescale Amazon Kindle 4 (2011); Today: in CPUs costing Cortex-A7 is even
SLIDE 37 9
in national echnologies, (NTMS), CL rkstation”:
10
Compare to, e.g., Ed25519 speeds reported for single core
(“2015 Intel Core i5-6600”) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). This chip didn’t exist in 2009. Compare instead to single core
- f 65nm 2.4GHz Core 2 (“2007
Intel Core 2 Quad Q6600”). 0.065ms to sign (156843 cycles), 0.232ms to verify (557082 cycles). 2012 Bernstein–Schwabe
0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 in iPad 1, iPhone 4 (2010); 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.MX50 in Amazon Kindle 4 (2011); : : : Today: in CPUs costing ≈2 Cortex-A7 is even more popula
SLIDE 38 10
Compare to, e.g., Ed25519 speeds reported for single core
(“2015 Intel Core i5-6600”) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). This chip didn’t exist in 2009. Compare instead to single core
- f 65nm 2.4GHz Core 2 (“2007
Intel Core 2 Quad Q6600”). 0.065ms to sign (156843 cycles), 0.232ms to verify (557082 cycles).
11
2012 Bernstein–Schwabe
0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 in iPad 1, iPhone 4 (2010); 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.MX50 in Amazon Kindle 4 (2011); : : : Today: in CPUs costing ≈2 EUR. Cortex-A7 is even more popular.
SLIDE 39 10
Compare to, e.g., Ed25519 reported for single core 14nm 3.31GHz Skylake Intel Core i5-6600”) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). chip didn’t exist in 2009. Compare instead to single core 65nm 2.4GHz Core 2 (“2007 Core 2 Quad Q6600”). 0.065ms to sign (156843 cycles), 0.232ms to verify (557082 cycles).
11
2012 Bernstein–Schwabe
0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 in iPad 1, iPhone 4 (2010); 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.MX50 in Amazon Kindle 4 (2011); : : : Today: in CPUs costing ≈2 EUR. Cortex-A7 is even more popular. 180nm 32-bit (“2001 Intel 0.46ms (0.9 for Curve25519 using floating-p Integer multiplier Nobody adapting Would b 3.4GHz P same basic more instructions, Ed25519
SLIDE 40 10
e.g., Ed25519 for single core 3.31GHz Skylake re i5-6600”) on https://bench.cr.yp.to: (49840 cycles), verify (163206 cycles). exist in 2009. to single core Core 2 (“2007 Quad Q6600”). (156843 cycles), verify (557082 cycles).
11
2012 Bernstein–Schwabe
0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 in iPad 1, iPhone 4 (2010); 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.MX50 in Amazon Kindle 4 (2011); : : : Today: in CPUs costing ≈2 EUR. Cortex-A7 is even more popular. 180nm 32-bit 2GHz (“2001 Intel Pentium 0.46ms (0.9 million for Curve25519 scala using floating-point Integer multiplier is Nobody has ever b adapting this to signatures. Would be ≈0:6ms 3.4GHz Pentium D same basic microarchitecture, more instructions, Ed25519 would be
SLIDE 41 10
core i5-6600”) on : cycles), cycles). 2009. core (“2007 Q6600”). cycles), cycles).
11
2012 Bernstein–Schwabe
0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 in iPad 1, iPhone 4 (2010); 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.MX50 in Amazon Kindle 4 (2011); : : : Today: in CPUs costing ≈2 EUR. Cortex-A7 is even more popular. 180nm 32-bit 2GHz Willamette (“2001 Intel Pentium 4”): 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slo Nobody has ever bothered adapting this to signatures. Would be ≈0:6ms for verify. 3.4GHz Pentium D (dual core): same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10× faster
- n one core than Petit’s soft
SLIDE 42 11
2012 Bernstein–Schwabe
0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 in iPad 1, iPhone 4 (2010); 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.MX50 in Amazon Kindle 4 (2011); : : : Today: in CPUs costing ≈2 EUR. Cortex-A7 is even more popular.
12
180nm 32-bit 2GHz Willamette (“2001 Intel Pentium 4”): 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! Nobody has ever bothered adapting this to signatures. Would be ≈0:6ms for verify. 3.4GHz Pentium D (dual core): same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10× faster
- n one core than Petit’s software.
SLIDE 43 11
Bernstein–Schwabe 720MHz ARM Cortex-A8: to verify (650102 cycles). Cortex-A8 cores were in 1000MHz Apple A4 1, iPhone 4 (2010); 1000MHz Samsung Exynos 3110 Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in rola Droid X (2010); 800MHz Freescale i.MX50 in Amazon Kindle 4 (2011); : : : y: in CPUs costing ≈2 EUR. rtex-A7 is even more popular.
12
180nm 32-bit 2GHz Willamette (“2001 Intel Pentium 4”): 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! Nobody has ever bothered adapting this to signatures. Would be ≈0:6ms for verify. 3.4GHz Pentium D (dual core): same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10× faster
- n one core than Petit’s software.
Bad ECDSA-NIST-P-256 certainly
- can’t use
- can’t use
- need an
- etc. Typical
2000 Bro Menezes 4.0ms/6.4ms cycles) fo inside NIST 2001 Bernstein, 0.7 million for NIST
SLIDE 44 11
Bernstein–Schwabe Cortex-A8: (650102 cycles). cores were in A4 iPhone 4 (2010); Samsung Exynos 3110 laxy S (2010); OMAP3630 in X (2010); reescale i.MX50 in 4 (2011); : : : costing ≈2 EUR. even more popular.
12
180nm 32-bit 2GHz Willamette (“2001 Intel Pentium 4”): 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! Nobody has ever bothered adapting this to signatures. Would be ≈0:6ms for verify. 3.4GHz Pentium D (dual core): same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10× faster
- n one core than Petit’s software.
Bad ECDSA-NIST-P-256 certainly has some
- can’t use fastest
- can’t use fastest
- need an annoying
- etc. Typical estimate:
2000 Brown–Hank Menezes on 400MHz 4.0ms/6.4ms (1.6/2.6 cycles) for double inside NIST P-224/P-256 2001 Bernstein, ≈1 0.7 million cycles on for NIST P-224 scala
SLIDE 45 11
rtex-A8: cycles). ere in (2010); Exynos 3110 (2010); in in : : : 2 EUR.
12
180nm 32-bit 2GHz Willamette (“2001 Intel Pentium 4”): 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! Nobody has ever bothered adapting this to signatures. Would be ≈0:6ms for verify. 3.4GHz Pentium D (dual core): same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10× faster
- n one core than Petit’s software.
Bad ECDSA-NIST-P-256 design certainly has some impact:
- can’t use fastest mulmods;
- can’t use fastest curve form
- need an annoying inversion;
- etc. Typical estimate: 2× slo
2000 Brown–Hankerson–L´
Menezes on 400MHz Pentium 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, ≈1:6× faster: 0.7 million cycles on Pentium for NIST P-224 scalarmult.
SLIDE 46 12
180nm 32-bit 2GHz Willamette (“2001 Intel Pentium 4”): 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! Nobody has ever bothered adapting this to signatures. Would be ≈0:6ms for verify. 3.4GHz Pentium D (dual core): same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10× faster
- n one core than Petit’s software.
13
Bad ECDSA-NIST-P-256 design certainly has some impact:
- can’t use fastest mulmods;
- can’t use fastest curve formulas;
- need an annoying inversion;
- etc. Typical estimate: 2× slower.
2000 Brown–Hankerson–L´
Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, ≈1:6× faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult.
SLIDE 47 12
32-bit 2GHz Willamette Intel Pentium 4”): 0.46ms (0.9 million cycles) Curve25519 scalarmult floating-point multiplier. Integer multiplier is much slower! dy has ever bothered adapting this to signatures. be ≈0:6ms for verify. 3.4GHz Pentium D (dual core): basic microarchitecture, instructions, faster clock. Ed25519 would be >10× faster core than Petit’s software.
13
Bad ECDSA-NIST-P-256 design certainly has some impact:
- can’t use fastest mulmods;
- can’t use fastest curve formulas;
- need an annoying inversion;
- etc. Typical estimate: 2× slower.
2000 Brown–Hankerson–L´
Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, ≈1:6× faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult. 2000 Bro Menezes cycles on e.g., P-224 1.2 million 2.7 million 2001 Bernstein 0.7 million 0.8 million 0.9 million using comp OpenSSL 2.0 million
SLIDE 48 12
2GHz Willamette entium 4”): million cycles) scalarmult
multiplier is much slower! ever bothered signatures. 6ms for verify. D (dual core): microarchitecture, instructions, faster clock. be >10× faster Petit’s software.
13
Bad ECDSA-NIST-P-256 design certainly has some impact:
- can’t use fastest mulmods;
- can’t use fastest curve formulas;
- need an annoying inversion;
- etc. Typical estimate: 2× slower.
2000 Brown–Hankerson–L´
Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, ≈1:6× faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult. 2000 Brown–Hank Menezes software use cycles on P4 than e.g., P-224 scalarmult: 1.2 million cycles on 2.7 million cycles on 2001 Bernstein P-224 0.7 million cycles on 0.8 million cycles on 0.9 million cycles on using compressed k OpenSSL 1.0.1, P-224 2.0 million cycles on
SLIDE 49 12
Willamette cycles) multiplier. slower! signatures. verify. core): rchitecture, clock. faster software.
13
Bad ECDSA-NIST-P-256 design certainly has some impact:
- can’t use fastest mulmods;
- can’t use fastest curve formulas;
- need an annoying inversion;
- etc. Typical estimate: 2× slower.
2000 Brown–Hankerson–L´
Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, ≈1:6× faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult. 2000 Brown–Hankerson–L´
Menezes software uses many cycles on P4 than on PII. e.g., P-224 scalarmult: 1.2 million cycles on Pentium 2.7 million cycles on Pentium 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium 0.8 million cycles on Pentium 0.9 million cycles on Pentium using compressed keys. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium
SLIDE 50 13
Bad ECDSA-NIST-P-256 design certainly has some impact:
- can’t use fastest mulmods;
- can’t use fastest curve formulas;
- need an annoying inversion;
- etc. Typical estimate: 2× slower.
2000 Brown–Hankerson–L´
Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, ≈1:6× faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult.
14
2000 Brown–Hankerson–L´
Menezes software uses many more cycles on P4 than on PII. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D.
SLIDE 51 13
ECDSA-NIST-P-256 design certainly has some impact: can’t use fastest mulmods; can’t use fastest curve formulas; an annoying inversion; ypical estimate: 2× slower. Brown–Hankerson–L´
Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million for double scalarmult NIST P-224/P-256 verif. Bernstein, ≈1:6× faster: million cycles on Pentium II IST P-224 scalarmult.
14
2000 Brown–Hankerson–L´
Menezes software uses many more cycles on P4 than on PII. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. How did 17 million 22 million Presumably bad mulmo Why did ECDSA, underlying Why did previous Why did Why did
SLIDE 52 13
ECDSA-NIST-P-256 design some impact: fastest mulmods; fastest curve formulas; ying inversion; estimate: 2× slower. wn–Hankerson–L´
400MHz Pentium II: (1.6/2.6 million double scalarmult P-224/P-256 verif. ≈1:6× faster: cycles on Pentium II scalarmult.
14
2000 Brown–Hankerson–L´
Menezes software uses many more cycles on P4 than on PII. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. How did Petit manage 17 million cycles fo 22 million cycles fo Presumably some combination bad mulmod and ba Why did Petit reimplement ECDSA, using MIRA underlying arithmetic? Why did Petit not previous speed literature? Why did Petit cho Why did BHLM cho
SLIDE 53 13
design impact: ds; formulas; inversion; slower. erson–L´
entium II: million rmult verif. faster: entium II rmult.
14
2000 Brown–Hankerson–L´
Menezes software uses many more cycles on P4 than on PII. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. How did Petit manage to use 17 million cycles for P-224 verif, 22 million cycles for P-256 verif Presumably some combination bad mulmod and bad curve Why did Petit reimplement ECDSA, using MIRACL for t underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium Why did BHLM choose PII?
SLIDE 54 14
2000 Brown–Hankerson–L´
Menezes software uses many more cycles on P4 than on PII. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D.
15
How did Petit manage to use 17 million cycles for P-224 verif, 22 million cycles for P-256 verif? Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement ECDSA, using MIRACL for the underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII?
SLIDE 55 14
Brown–Hankerson–L´
Menezes software uses many more
P-224 scalarmult: million cycles on Pentium II. million cycles on Pentium 4. Bernstein P-224 scalarmult: million cycles on Pentium II. million cycles on Pentium 4. million cycles on Pentium 4 compressed keys. enSSL 1.0.1, P-224 verif: million cycles on Pentium D.
15
How did Petit manage to use 17 million cycles for P-224 verif, 22 million cycles for P-256 verif? Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement ECDSA, using MIRACL for the underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII? Petit: “There cryptographic OpenSSL Authors comparison that MIRA performance elliptic curves
SLIDE 56 14
wn–Hankerson–L´
re uses many more than on PII. rmult: cycles on Pentium II. cycles on Pentium 4. P-224 scalarmult: cycles on Pentium II. cycles on Pentium 4. cycles on Pentium 4 ressed keys. P-224 verif: cycles on Pentium D.
15
How did Petit manage to use 17 million cycles for P-224 verif, 22 million cycles for P-256 verif? Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement ECDSA, using MIRACL for the underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII? Petit: “There are three cryptographic libra OpenSSL and Crypto++. Authors in [21] prop comparison and concluded that MIRACL has performance for op elliptic curves over
SLIDE 57 14
erson–L´
many more entium II. entium 4. scalarmult: entium II. entium 4. entium 4 verif: entium D.
15
How did Petit manage to use 17 million cycles for P-224 verif, 22 million cycles for P-256 verif? Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement ECDSA, using MIRACL for the underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII? Petit: “There are three main cryptographic libraries: MIRA OpenSSL and Crypto++. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on elliptic curves over binary fie
SLIDE 58
15
How did Petit manage to use 17 million cycles for P-224 verif, 22 million cycles for P-256 verif? Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement ECDSA, using MIRACL for the underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII?
16
Petit: “There are three main cryptographic libraries: MIRACL, OpenSSL and Crypto++. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on elliptic curves over binary fields.”
SLIDE 59 15
How did Petit manage to use 17 million cycles for P-224 verif, 22 million cycles for P-256 verif? Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement ECDSA, using MIRACL for the underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII?
16
Petit: “There are three main cryptographic libraries: MIRACL, OpenSSL and Crypto++. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on elliptic curves over binary fields.” But NIST P-224 and NIST P-256 are defined over prime fields! [21] says “For elliptic curves
- ver prime fields, OpenSSL has
the best performance under all platforms.”
SLIDE 60 15
did Petit manage to use million cycles for P-224 verif, million cycles for P-256 verif? Presumably some combination of mulmod and bad curve ops. did Petit reimplement ECDSA, using MIRACL for the underlying arithmetic? did Petit not simply cite revious speed literature? did Petit choose Pentium D? did BHLM choose PII?
16
Petit: “There are three main cryptographic libraries: MIRACL, OpenSSL and Crypto++. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on elliptic curves over binary fields.” But NIST P-224 and NIST P-256 are defined over prime fields! [21] says “For elliptic curves
- ver prime fields, OpenSSL has
the best performance under all platforms.” More general Paper analyzes crypto up If the crypto Why is the Why should If the crypto Paper is Look, here’s More likely More likely funding to
SLIDE 61 15
manage to use for P-224 verif, for P-256 verif? some combination of bad curve ops. reimplement MIRACL for the rithmetic? not simply cite literature? choose Pentium D? choose PII?
16
Petit: “There are three main cryptographic libraries: MIRACL, OpenSSL and Crypto++. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on elliptic curves over binary fields.” But NIST P-224 and NIST P-256 are defined over prime fields! [21] says “For elliptic curves
- ver prime fields, OpenSSL has
the best performance under all platforms.” More general situation: Paper analyzes impact crypto upon an ap If the crypto sounds Why is the paper interesting? Why should it be published? If the crypto sounds Paper is more interesting. Look, here’s a spee More likely to be pu More likely to mot funding to fix the p
SLIDE 62 15
use verif, verif? combination of curve ops. reimplement r the cite tium D? PII?
16
Petit: “There are three main cryptographic libraries: MIRACL, OpenSSL and Crypto++. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on elliptic curves over binary fields.” But NIST P-224 and NIST P-256 are defined over prime fields! [21] says “For elliptic curves
- ver prime fields, OpenSSL has
the best performance under all platforms.” More general situation: Paper analyzes impact of crypto upon an application. If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here’s a speed problem! More likely to be published. More likely to motivate funding to fix the problem.
SLIDE 63 16
Petit: “There are three main cryptographic libraries: MIRACL, OpenSSL and Crypto++. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on elliptic curves over binary fields.” But NIST P-224 and NIST P-256 are defined over prime fields! [21] says “For elliptic curves
- ver prime fields, OpenSSL has
the best performance under all platforms.”
17
More general situation: Paper analyzes impact of crypto upon an application. If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here’s a speed problem! More likely to be published. More likely to motivate funding to fix the problem.
SLIDE 64
16
“There are three main cryptographic libraries: MIRACL, enSSL and Crypto++. rs in [21] proposed a rison and concluded MIRACL has the best rmance for operations on curves over binary fields.” NIST P-224 and NIST P-256 defined over prime fields! ys “For elliptic curves rime fields, OpenSSL has est performance under all rms.”
17
More general situation: Paper analyzes impact of crypto upon an application. If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here’s a speed problem! More likely to be published. More likely to motivate funding to fix the problem. Obvious application deployment: Many random answering CPU to literature mulmod, Slowest, are most Situation randomness There’s no deliberately
SLIDE 65 16
re three main raries: MIRACL, Crypto++. proposed a concluded has the best
- perations on
- ver binary fields.”
and NIST P-256 prime fields! elliptic curves fields, OpenSSL has rmance under all
17
More general situation: Paper analyzes impact of crypto upon an application. If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here’s a speed problem! More likely to be published. More likely to motivate funding to fix the problem. Obvious question whenever application considers deployment: “Is it Many random metho answering this question. CPU to test? What literature and libra mulmod, or curve Slowest, least comp are most likely to b Situation is fully explainable randomness + natura There’s no evidence deliberately slowed
SLIDE 66
16
main MIRACL, est s on fields.” P-256 fields! curves enSSL has under all
17
More general situation: Paper analyzes impact of crypto upon an application. If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here’s a speed problem! More likely to be published. More likely to motivate funding to fix the problem. Obvious question whenever an application considers crypto deployment: “Is it fast enough?” Many random methodologies answering this question. Which CPU to test? What to take literature and libraries? Reuse mulmod, or curve ops, or mo Slowest, least competent answ are most likely to be published. Situation is fully explainable randomness + natural selection. There’s no evidence that Petit deliberately slowed down crypto.
SLIDE 67
17
More general situation: Paper analyzes impact of crypto upon an application. If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here’s a speed problem! More likely to be published. More likely to motivate funding to fix the problem.
18
Obvious question whenever an application considers crypto deployment: “Is it fast enough?” Many random methodologies for answering this question. Which CPU to test? What to take from literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There’s no evidence that Petit deliberately slowed down crypto.
SLIDE 68
17
general situation: analyzes impact of upon an application. crypto sounds fast: is the paper interesting? should it be published? crypto sounds slower: is more interesting. here’s a speed problem! likely to be published. likely to motivate funding to fix the problem.
18
Obvious question whenever an application considers crypto deployment: “Is it fast enough?” Many random methodologies for answering this question. Which CPU to test? What to take from literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There’s no evidence that Petit deliberately slowed down crypto. Paper intro software incentive slow, and report its Paper will functions, lengths, timing mechanism, maximize from old This is not what matters
SLIDE 69
17
situation: impact of application. sounds fast: er interesting? e published? sounds slower: interesting. speed problem! e published. motivate the problem.
18
Obvious question whenever an application considers crypto deployment: “Is it fast enough?” Many random methodologies for answering this question. Which CPU to test? What to take from literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There’s no evidence that Petit deliberately slowed down crypto. Paper introducing software or hardwa incentive to report slow, and analogous report its own crypto Paper will naturally functions, parameters lengths, platforms, timing mechanism, maximize reported from old to new. This is not the same what matters most
SLIDE 70
17
plication. interesting? published? er: roblem! blished. roblem.
18
Obvious question whenever an application considers crypto deployment: “Is it fast enough?” Many random methodologies for answering this question. Which CPU to test? What to take from literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There’s no evidence that Petit deliberately slowed down crypto. Paper introducing new crypto software or hardware has same incentive to report older crypto slow, and analogous incentive report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users.
SLIDE 71
18
Obvious question whenever an application considers crypto deployment: “Is it fast enough?” Many random methodologies for answering this question. Which CPU to test? What to take from literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There’s no evidence that Petit deliberately slowed down crypto.
19
Paper introducing new crypto software or hardware has same incentive to report older crypto as slow, and analogous incentive to report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users.
SLIDE 72
18
Obvious question whenever an application considers crypto yment: “Is it fast enough?” random methodologies for ering this question. Which to test? What to take from literature and libraries? Reuse d, or curve ops, or more? st, least competent answers most likely to be published. Situation is fully explainable by randomness + natural selection. There’s no evidence that Petit erately slowed down crypto.
19
Paper introducing new crypto software or hardware has same incentive to report older crypto as slow, and analogous incentive to report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. Bit operations (assuming as listed key ops/bit 128 88 128 100 128 117 256 144 128 147.2 256 156 128 162.75 128 202.5 256 283.5
SLIDE 73 18
question whenever an considers crypto it fast enough?” methodologies for
What to take from raries? Reuse curve ops, or more? competent answers to be published. explainable by natural selection. evidence that Petit ed down crypto.
19
Paper introducing new crypto software or hardware has same incentive to report older crypto as slow, and analogous incentive to report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. Bit operations per (assuming precomputed as listed in recent key ops/bit cipher 128 88 Simon: 128 100 NOEKEON 128 117 Skinny 256 144 Simon: 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES
SLIDE 74
18
whenever an crypto enough?” dologies for Which e from Reuse more? answers published. explainable by selection. etit crypto.
19
Paper introducing new crypto software or hardware has same incentive to report older crypto as slow, and analogous incentive to report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. Bit operations per bit of plaintext (assuming precomputed subk as listed in recent Skinny pap key ops/bit cipher 128 88 Simon: 60 ops 128 100 NOEKEON 128 117 Skinny 256 144 Simon: 106 op 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES
SLIDE 75
19
Paper introducing new crypto software or hardware has same incentive to report older crypto as slow, and analogous incentive to report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users.
20
Bit operations per bit of plaintext (assuming precomputed subkeys), as listed in recent Skinny paper: key ops/bit cipher 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES
SLIDE 76
19
Paper introducing new crypto software or hardware has same incentive to report older crypto as slow, and analogous incentive to report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users.
20
Bit operations per bit of plaintext (assuming precomputed subkeys), not entirely listed in Skinny paper: key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES
SLIDE 77 19
introducing new crypto re or hardware has same incentive to report older crypto as and analogous incentive to its own crypto as fast. will naturally select functions, parameters, input lengths, platforms, I/O format, mechanism, etc. that maximize reported improvement
not the same as selecting matters most for the users.
20
Bit operations per bit of plaintext (assuming precomputed subkeys), not entirely listed in Skinny paper: key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES Many bad backed b e.g. Do w
the older Rely on “optimizing” “We come most architectures do much complete heuristics. get little where the slightly wrong
SLIDE 78 19
ducing new crypto rdware has same rt older crypto as analogous incentive to crypto as fast. rally select rameters, input rms, I/O format, mechanism, etc. that rted improvement new. same as selecting most for the users.
20
Bit operations per bit of plaintext (assuming precomputed subkeys), not entirely listed in Skinny paper: key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES Many bad examples backed by tons of e.g. Do we bother
the older crypto? T Rely on “optimizing” “We come so close most architectures do much more without complete algorithms
get little niggles here where the heuristics slightly wrong answ
SLIDE 79 19
crypto same crypto as incentive to fast. input rmat, that rovement selecting users.
20
Bit operations per bit of plaintext (assuming precomputed subkeys), not entirely listed in Skinny paper: key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES Many bad examples to imitate, backed by tons of misinformation. e.g. Do we bother searching
- ptimized implementations of
the older crypto? Take any co Rely on “optimizing” compiler! “We come so close to optimal most architectures that we can’t do much more without using complete algorithms instead
- heuristics. We can only try to
get little niggles here and there where the heuristics get slightly wrong answers.”
SLIDE 80 20
Bit operations per bit of plaintext (assuming precomputed subkeys), not entirely listed in Skinny paper: key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES
21
Many bad examples to imitate, backed by tons of misinformation. e.g. Do we bother searching for
- ptimized implementations of
the older crypto? Take any code! Rely on “optimizing” compiler! “We come so close to optimal on most architectures that we can’t do much more without using NP complete algorithms instead of
- heuristics. We can only try to
get little niggles here and there where the heuristics get slightly wrong answers.”
SLIDE 81 20
erations per bit of plaintext (assuming precomputed subkeys), entirely listed in Skinny paper:
54 Salsa20/8 78 Salsa20/12 88 Simon: 60 ops broken 100 NOEKEON 117 Skinny 126 Salsa20 144 Simon: 106 ops broken 147.2 PRESENT 156 Skinny 162.75 Piccolo 202.5 AES 283.5 AES
21
Many bad examples to imitate, backed by tons of misinformation. e.g. Do we bother searching for
- ptimized implementations of
the older crypto? Take any code! Rely on “optimizing” compiler! “We come so close to optimal on most architectures that we can’t do much more without using NP complete algorithms instead of
- heuristics. We can only try to
get little niggles here and there where the heuristics get slightly wrong answers.” Reality is
SLIDE 82 20
er bit of plaintext computed subkeys), listed in Skinny paper: cipher Salsa20/8 Salsa20/12 Simon: 60 ops broken NOEKEON Skinny Salsa20 Simon: 106 ops broken PRESENT Skinny Piccolo AES AES
21
Many bad examples to imitate, backed by tons of misinformation. e.g. Do we bother searching for
- ptimized implementations of
the older crypto? Take any code! Rely on “optimizing” compiler! “We come so close to optimal on most architectures that we can’t do much more without using NP complete algorithms instead of
- heuristics. We can only try to
get little niggles here and there where the heuristics get slightly wrong answers.” Reality is more complicated:
SLIDE 83 20
plaintext subkeys), Skinny paper:
21
Many bad examples to imitate, backed by tons of misinformation. e.g. Do we bother searching for
- ptimized implementations of
the older crypto? Take any code! Rely on “optimizing” compiler! “We come so close to optimal on most architectures that we can’t do much more without using NP complete algorithms instead of
- heuristics. We can only try to
get little niggles here and there where the heuristics get slightly wrong answers.” Reality is more complicated:
SLIDE 84 21
Many bad examples to imitate, backed by tons of misinformation. e.g. Do we bother searching for
- ptimized implementations of
the older crypto? Take any code! Rely on “optimizing” compiler! “We come so close to optimal on most architectures that we can’t do much more without using NP complete algorithms instead of
- heuristics. We can only try to
get little niggles here and there where the heuristics get slightly wrong answers.”
22
Reality is more complicated:
SLIDE 85 21
bad examples to imitate, by tons of misinformation. Do we bother searching for
- ptimized implementations of
- lder crypto? Take any code!
- n “optimizing” compiler!
- me so close to optimal on
rchitectures that we can’t much more without using NP complete algorithms instead of
- heuristics. We can only try to
little niggles here and there the heuristics get slightly wrong answers.”
22
Reality is more complicated: SUPERCOP includes
>20 implementations Haswell: implementation gcc -O3 is 6:15× Salsa20 implementation. merged implementation with “machine-indep
compiler
SLIDE 86 21
xamples to imitate,
- f misinformation.
- ther searching for
implementations of crypto? Take any code! “optimizing” compiler! close to optimal on rchitectures that we can’t without using NP rithms instead of can only try to here and there heuristics get answers.”
22
Reality is more complicated: SUPERCOP benchma includes 2155 implementations
>20 implementations Haswell: Reasonably implementation compiled gcc -O3 -fomit-frame-pointer is 6:15× slower than Salsa20 implementation. merged implementation with “machine-indep
compiler options:
SLIDE 87 21
imitate, rmation. rching for tions of any code! compiler!
can’t ing NP instead of try to there
22
Reality is more complicated: SUPERCOP benchmarking to includes 2155 implementations
- f 595 cryptographic primitives.
>20 implementations of Salsa20. Haswell: Reasonably simple implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15× slower than fastest Salsa20 implementation. merged implementation with “machine-independent”
- ptimizations and best of 121
compiler options: 4:52× slow
SLIDE 88 22
Reality is more complicated:
23
SUPERCOP benchmarking toolkit includes 2155 implementations
- f 595 cryptographic primitives.
>20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15× slower than fastest Salsa20 implementation. merged implementation with “machine-independent”
- ptimizations and best of 121
compiler options: 4:52× slower.
SLIDE 89 22
is more complicated:
23
SUPERCOP benchmarking toolkit includes 2155 implementations
- f 595 cryptographic primitives.
>20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15× slower than fastest Salsa20 implementation. merged implementation with “machine-independent”
- ptimizations and best of 121
compiler options: 4:52× slower. Another lattice-based means generating
2017.03 Valencia–O’Sullivan–G Regazzoni sources of discrete benchma Qualitatively choice of sampling
SLIDE 90 22
complicated:
23
SUPERCOP benchmarking toolkit includes 2155 implementations
- f 595 cryptographic primitives.
>20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15× slower than fastest Salsa20 implementation. merged implementation with “machine-independent”
- ptimizations and best of 121
compiler options: 4:52× slower. Another interesting lattice-based signing means generating a
2017.03 Brannigan–Smyth–Oder– Valencia–O’Sullivan–G Regazzoni “An investigation sources of randomness discrete Gaussian sampling”: benchmarks for RNGs, Qualitatively large choice of RNG ⇒ sampling ⇒ cost of
SLIDE 91 22
complicated:
23
SUPERCOP benchmarking toolkit includes 2155 implementations
- f 595 cryptographic primitives.
>20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15× slower than fastest Salsa20 implementation. merged implementation with “machine-independent”
- ptimizations and best of 121
compiler options: 4:52× slower. Another interesting example: lattice-based signing typically means generating a huge numb
- f random Gaussian samples.
2017.03 Brannigan–Smyth–Oder– Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG ⇒ cost of sampling ⇒ cost of signing.
SLIDE 92 23
SUPERCOP benchmarking toolkit includes 2155 implementations
- f 595 cryptographic primitives.
>20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15× slower than fastest Salsa20 implementation. merged implementation with “machine-independent”
- ptimizations and best of 121
compiler options: 4:52× slower.
24
Another interesting example: lattice-based signing typically means generating a huge number
- f random Gaussian samples.
2017.03 Brannigan–Smyth–Oder– Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation of sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG ⇒ cost of sampling ⇒ cost of signing.
SLIDE 93 23
SUPERCOP benchmarking toolkit includes 2155 implementations cryptographic primitives. implementations of Salsa20. ell: Reasonably simple ref implementation compiled with
× slower than fastest Salsa20 implementation. implementation “machine-independent”
- ptimizations and best of 121
compiler options: 4:52× slower.
24
Another interesting example: lattice-based signing typically means generating a huge number
- f random Gaussian samples.
2017.03 Brannigan–Smyth–Oder– Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation of sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG ⇒ cost of sampling ⇒ cost of signing. Two examples in this 2017 Skylake (I 383.69 MByte/sec cycles/byte) using AES-NI; (32 cycles
SLIDE 94 23
enchmarking toolkit implementations cryptographic primitives. implementations of Salsa20. Reasonably simple ref compiled with
than fastest implementation. implementation “machine-independent” and best of 121
24
Another interesting example: lattice-based signing typically means generating a huge number
- f random Gaussian samples.
2017.03 Brannigan–Smyth–Oder– Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation of sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG ⇒ cost of sampling ⇒ cost of signing. Two examples of sp in this 2017 paper Skylake (Intel Core 383.69 MByte/sec cycles/byte) for AES using AES-NI; 106.07 (32 cycles/byte) fo
SLIDE 95 23
rking toolkit implementations rimitives. Salsa20. simple ref with
fastest endent” 121 slower.
24
Another interesting example: lattice-based signing typically means generating a huge number
- f random Gaussian samples.
2017.03 Brannigan–Smyth–Oder– Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation of sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG ⇒ cost of sampling ⇒ cost of signing. Two examples of speed repo in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): 383.69 MByte/sec (8.86 cycles/byte) for AES CTR-D using AES-NI; 106.07 MByte/sec (32 cycles/byte) for ChaCha20.
SLIDE 96 24
Another interesting example: lattice-based signing typically means generating a huge number
- f random Gaussian samples.
2017.03 Brannigan–Smyth–Oder– Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation of sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG ⇒ cost of sampling ⇒ cost of signing.
25
Two examples of speed reported in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): 383.69 MByte/sec (8.86 cycles/byte) for AES CTR-DRBG using AES-NI; 106.07 MByte/sec (32 cycles/byte) for ChaCha20.
SLIDE 97 24
Another interesting example: lattice-based signing typically means generating a huge number
- f random Gaussian samples.
2017.03 Brannigan–Smyth–Oder– Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation of sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG ⇒ cost of sampling ⇒ cost of signing.
25
Two examples of speed reported in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): 383.69 MByte/sec (8.86 cycles/byte) for AES CTR-DRBG using AES-NI; 106.07 MByte/sec (32 cycles/byte) for ChaCha20. But wait. eBACS reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: “essential for us to examine standard open implementations”. Slow ones?
SLIDE 98 24
Another interesting example: lattice-based signing typically generating a huge number random Gaussian samples. 2017.03 Brannigan–Smyth–Oder– alencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation of sources of randomness within discrete Gaussian sampling”: enchmarks for RNGs, samplers. Qualitatively large impacts:
sampling ⇒ cost of signing.
25
Two examples of speed reported in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): 383.69 MByte/sec (8.86 cycles/byte) for AES CTR-DRBG using AES-NI; 106.07 MByte/sec (32 cycles/byte) for ChaCha20. But wait. eBACS reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: “essential for us to examine standard open implementations”. Slow ones?
SLIDE 99
24
interesting example: signing typically generating a huge number aussian samples. Brannigan–Smyth–Oder– alencia–O’Sullivan–G¨ uneysu– investigation of randomness within Gaussian sampling”: RNGs, samplers. rge impacts: cost of cost of signing.
25
Two examples of speed reported in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): 383.69 MByte/sec (8.86 cycles/byte) for AES CTR-DRBG using AES-NI; 106.07 MByte/sec (32 cycles/byte) for ChaCha20. But wait. eBACS reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: “essential for us to examine standard open implementations”. Slow ones?
SLIDE 100
24
example: ypically number samples. Brannigan–Smyth–Oder– uneysu– investigation of within sampling”: samplers. impacts: signing.
25
Two examples of speed reported in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): 383.69 MByte/sec (8.86 cycles/byte) for AES CTR-DRBG using AES-NI; 106.07 MByte/sec (32 cycles/byte) for ChaCha20. But wait. eBACS reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: “essential for us to examine standard open implementations”. Slow ones?
SLIDE 101
25
Two examples of speed reported in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): 383.69 MByte/sec (8.86 cycles/byte) for AES CTR-DRBG using AES-NI; 106.07 MByte/sec (32 cycles/byte) for ChaCha20. But wait. eBACS reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: “essential for us to examine standard open implementations”. Slow ones?
26
SLIDE 102 25
examples of speed reported 2017 paper for a 3.4GHz e (Intel Core i7-6700): MByte/sec (8.86 cycles/byte) for AES CTR-DRBG AES-NI; 106.07 MByte/sec cycles/byte) for ChaCha20.
cycles/byte for AES-256-CTR, cycles/byte for ChaCha20. r non-response: “essential to examine standard open implementations”. Slow ones?
26
SLIDE 103 25
er for a 3.4GHz Core i7-6700): MByte/sec (8.86 AES CTR-DRBG 106.07 MByte/sec for ChaCha20. CS reports for AES-256-CTR, for ChaCha20.
examine standard open implementations”. Slow ones?
26
SLIDE 104 25
reported 3.4GHz i7-6700): CTR-DRBG MByte/sec aCha20. 256-CTR, aCha20. “essential
26
SLIDE 105
26 27
SLIDE 106
26 27
SLIDE 107
26 27
SLIDE 108
26 27
SLIDE 109
27 28