SLIDE 1 Efficient implementation of code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography.
SLIDE 2 Efficient implementation of code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level.
SLIDE 3 Efficient implementation of code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers.
SLIDE 4 Efficient implementation of code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc.
SLIDE 5 Efficient implementation of code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record.
SLIDE 6 Efficient implementation of code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once.
SLIDE 7 Efficient implementation of de-based cryptography Bernstein University of Illinois at Chicago & echnische Universiteit Eindhoven
Chou echnische Universiteit Eindhoven Schwabe
Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once. The track 1978 McEliece public-key Has held
1962 Prange. 1988 Lee–Brick 1989 Krouk. 1989 Dumer. 1990 Coffey–Go 1990 van 1991 Coffey–Go 1993 Chabanne–Courteau. 1993 Chabaud.
SLIDE 8 entation of cryptography Illinois at Chicago & Universiteit Eindhoven Universiteit Eindhoven University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once. The track record 1978 McEliece prop public-key code-based Has held up well after
1962 Prange. 1981 1988 Lee–Brickell. 1989 Krouk. 1989 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Coffey–Goodman–F 1993 Chabanne–Courteau. 1993 Chabaud.
SLIDE 9
Chicago & Eindhoven Eindhoven Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once. The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud.
SLIDE 10 Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once. The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud.
SLIDE 11 Objectives new speed records public-key cryptography. a high security level. including protection against quantum computers. including full protection against cache-timing attacks, ranch-prediction attacks, etc. using code-based crypto solid track record.
The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–P 2009 Bernstein–Lange– Peters–van 2009 Bernstein 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–P 2011 Ma 2011 Beck 2012 Beck 2013 Bernstein–Jeffe Meurer (p
SLIDE 12 records cryptography. security level. rotection computers. full protection cache-timing attacks, rediction attacks, etc. de-based crypto track record.
The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–P 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–P 2011 May–Meurer–Th 2011 Becker–Coron–Joux. 2012 Becker–Joux–Ma 2013 Bernstein–Jeffe Meurer (post-quantum).
SLIDE 13 cryptography. level. ers. rotection attacks, etc. crypto
The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum).
SLIDE 14 The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum).
SLIDE 15 track record McEliece proposed public-key code-based crypto. held up well after extensive
- ptimization of attack algorithms:
- Prange. 1981 Omura.
Lee–Brickell. 1988 Leon.
Dumer. Coffey–Goodman. van Tilburg. 1991 Dumer. Coffey–Goodman–Farrell. Chabanne–Courteau. Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples Some cycle (Intel Co from bench.cr.yp.to mceliece (2008 Bisw gls254 DH (binary elliptic kummer DH (hyperelliptic; curve25519 (conservative mceliece ronald1024
SLIDE 16 roposed de-based crypto. ell after extensive attack algorithms: 1981 Omura.
1989 Stern.
- dman.
- rg. 1991 Dumer.
- dman–Farrell.
Chabanne–Courteau. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the comp Some cycle counts (Intel Core i5-3210M, from bench.cr.yp.to mceliece encrypt (2008 Biswas–Sendri gls254 DH (binary elliptic curve; kummer DH (hyperelliptic; Asiacrypt curve25519 DH (conservative elliptic mceliece decrypt ronald1024 decrypt
SLIDE 17
crypto. extensive algorithms: Omura. Leon. Dumer. rrell. Chabanne–Courteau. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt (2008 Biswas–Sendrier, ≈280 gls254 DH (binary elliptic curve; CHES kummer DH (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324
SLIDE 18
1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324
SLIDE 19
van Tilburg. Canteaut–Chabanne. Canteaut–Chabaud. Canteaut–Sendrier. Bernstein–Lange–Peters. Bernstein–Lange– eters–van Tilborg. Bernstein (post-quantum). Finiasz–Sendrier. Bernstein–Lange–Peters. May–Meurer–Thomae. Becker–Coron–Joux. Becker–Joux–May–Meurer. Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New dec ≈2128 securit
SLIDE 20 rg. Canteaut–Chabanne. Canteaut–Chabaud. Canteaut–Sendrier. Bernstein–Lange–Peters. Bernstein–Lange–
(post-quantum). Finiasz–Sendrier. Bernstein–Lange–Peters. y–Meurer–Thomae. ron–Joux. er–Joux–May–Meurer. Bernstein–Jeffery–Lange–
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n;
SLIDE 21 Canteaut–Chabanne. eters.
eters. ae. y–Meurer. y–Lange– Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096
SLIDE 22
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41):
SLIDE 23
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.)
SLIDE 24
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles.
SLIDE 25
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS.
SLIDE 26
Examples of the competition cycle counts on h9ivy Core i5-3210M, Ivy Bridge) bench.cr.yp.to: mceliece encrypt 73092 Biswas–Sendrier, ≈280) DH 76212 ry elliptic curve; CHES 2013) DH 88448 erelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time The extremist’s to eliminate Handle all using only XOR (^),
SLIDE 27
competition counts on h9ivy i5-3210M, Ivy Bridge) bench.cr.yp.to: encrypt 73092 as–Sendrier, ≈280) 76212 curve; CHES 2013) 88448 Asiacrypt 2014) 182708 elliptic curve) decrypt 1130908 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s app to eliminate timing Handle all secret data using only bit operations— XOR (^), AND (&),
SLIDE 28
etition h9ivy Bridge) 73092 280) 76212 CHES 2013) 88448 2014) 182708 curve) 1130908 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.
SLIDE 29
New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.
SLIDE 30
New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach.
SLIDE 31
New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?”
SLIDE 32 decoding speeds security (n; t) = (4096; 41): Ivy Bridge cycles. will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) security (n; t) = (2048; 32): Ivy Bridge cycles. load/store addresses all branch conditions
cache-timing attacks etc. r improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we a Not as slo On a typical the XOR is actually
SLIDE 33 speeds n; t) = (4096; 41): Bridge cycles.
slightly slower: cipher, MAC.) ; t) = (2048; 32): Bridge cycles. addresses conditions Eliminates attacks etc. rovements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit the XOR instruction is actually 32-bit X
- perating in parallel
- n vectors of 32 bits
SLIDE 34 (4096; 41): case. er: C.) (2048; 32): CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
SLIDE 35 Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
SLIDE 36 Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
SLIDE 37 Constant-time fanaticism extremist’s approach eliminate timing attacks: Handle all secret data
), AND (&), etc. take this approach. can this be etitive in speed?
multiplication with hundreds of bit operations
Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not imme that this saves time multiplication
SLIDE 38 fanaticism approach timing attacks: secret data erations— (&), etc. approach. e speed? simulating multiplication with
simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F
SLIDE 39 attacks: tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.
SLIDE 40 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.
SLIDE 41 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212.
SLIDE 42 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing.
SLIDE 43 e are. slow as it sounds! ypical 32-bit CPU, OR instruction actually 32-bit XOR, erating in parallel vectors of 32 bits. w-end smartphone CPU: 128-bit XOR every cycle. Bridge: 256-bit XOR every cycle, e 128-bit XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive Fix n = 4096 Big final is to find
For each compute 41 adds,
SLIDE 44 it sounds! 32-bit CPU, instruction 32-bit XOR, rallel bits. rtphone CPU: every cycle. every cycle, XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212 Big final decoding is to find all roots
For each ¸ ∈ F212, compute f (¸) by Ho 41 adds, 41 mults.
SLIDE 45 CPU: Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults.
SLIDE 46 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults.
SLIDE 47 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults.
SLIDE 48 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults.
SLIDE 49 immediately obvious this “bitslicing” time for, e.g., multiplication in F212. quite obvious that it time for addition in F212. ypical decoding algorithms add, mult roughly balanced. Coming next: how to save adds and most mults. synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally so Horner’s Θ(nt) =
SLIDE 50
“bitslicing” e.g., F212.
addition in F212. algorithms roughly balanced. how to save most mults. with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= so Horner’s rule costs Θ(nt) = Θ(n2= lg n
SLIDE 51 F212. rithms balanced. mults. bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n).
SLIDE 52 The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n).
SLIDE 53 The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f f = c41x41 + · · · + c0x0.
For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n?
SLIDE 54
additive FFT 4096 = 212, t = 41. final decoding step find all roots in F212 c41x41 + · · · + c0x0. each ¸ ∈ F212, compute f (¸) by Horner’s rule: adds, 41 mults. Chien search: compute
ig2i, cig3i, etc. Cost per
again 41 adds, 41 mults. cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n? Standard Want to f = c0 + at all the Write f Observe f (¸) = f f (−¸) = f0 has n= evaluate by same Similarly
SLIDE 55 FFT 212, t = 41. ding step
· · · + c0x0.
12,
y Horner’s rule: mults. search: compute , etc. Cost per adds, 41 mults. adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · at all the nth roots Write f as f0(x2) + Observe big overlap f (¸) = f0(¸2) + ¸f f (−¸) = f0(¸2) − f0 has n=2 coeffs; evaluate at (n=2)nd by same idea recursively Similarly f1.
SLIDE 56
41. . rule: compute Cost per mults. mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of by same idea recursively. Similarly f1.
SLIDE 57
Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1.
SLIDE 58 Asymptotics: rmally t ∈ Θ(n= lg n), rner’s rule costs = Θ(n2= lg n). minute. we learn in school FFT evaluates
n1+o(1) operations? this better than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1. Useless in Standard FFT considered 1988 Wa independently “additive Still quite 1996 von some im 2010 Gao–Mateer: much better We use Gao–Mateer, plus some
SLIDE 59 n= lg n), costs lg n). in school evaluates
erations? than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1. Useless in char 2: Standard workarounds FFT considered imp 1988 Wang–Zhu, independently 1989 “additive FFT” in Still quite expensive. 1996 von zur Gathen–Gerha some improvements. 2010 Gao–Mateer: much better additive We use Gao–Mateer, plus some new imp
SLIDE 60
lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1. Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.
SLIDE 61
Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1. Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.
SLIDE 62 Standard radix-2 FFT: to evaluate + c1x + · · · + cn−1xn−1 the nth roots of 1. f as f0(x2) + xf1(x2). Observe big overlap between f0(¸2) + ¸f1(¸2), = f0(¸2) − ¸f1(¸2). n=2 coeffs; evaluate at (n=2)nd roots of 1 same idea recursively. rly f1. Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and f = c0 +
Their main f0(x2 + x Big overlap f0(¸2 + ¸ and f (¸ f0(¸2 + ¸ “Twist” Then ˘ ¸ size-(n=2) Apply same
SLIDE 63 FFT: evaluate · · · + cn−1xn−1
) + xf1(x2).
¸f1(¸2), − ¸f1(¸2). effs; 2)nd roots of 1 recursively. Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate f = c0 + c1x + · · ·
Their main idea: W f0(x2 + x) + xf1(x Big overlap between f0(¸2 + ¸) + ¸f1(¸ and f (¸ + 1) = f0(¸2 + ¸) + (¸ + “Twist” to ensure Then ˘ ¸2 + ¸ ¯ is size-(n=2) F2-linea Apply same idea recursively
SLIDE 64 xn−1 ). een ).
Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn
- n a size-n F2-linear space.
Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively.
SLIDE 65 Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1
- n a size-n F2-linear space.
Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively.
SLIDE 66 Useless in char 2: ¸ = −¸. Standard workarounds are painful. considered impractical. ang–Zhu, endently 1989 Cantor: “additive FFT” in char 2. quite expensive. von zur Gathen–Gerhard: improvements. Gao–Mateer: better additive FFT. use Gao–Mateer, some new improvements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1
- n a size-n F2-linear space.
Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively. Results 60493 Ivy 8622 fo 20846 fo 7714 fo 14794 fo 8520 fo Code will We’re still Also 10× More info cr.yp.to/papers.html#mcbits
SLIDE 67 2: ¸ = −¸. rounds are painful. impractical. ng–Zhu, 1989 Cantor: in char 2. ensive. Gathen–Gerhard: rovements. Gao–Mateer: dditive FFT. Gao–Mateer, improvements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1
- n a size-n F2-linear space.
Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permuta 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permuta Code will be public We’re still speeding Also 10× speedup More information: cr.yp.to/papers.html#mcbits
SLIDE 68 . painful. ractical. r: Gathen–Gerhard: FFT. rovements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1
- n a size-n F2-linear space.
Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits
SLIDE 69 Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1
- n a size-n F2-linear space.
Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits
SLIDE 70 and Mateer evaluate + c1x + · · · + cn−1xn−1 size-n F2-linear space. main idea: Write f as x) + xf1(x2 + x).
+ ¸) + ¸f1(¸2 + ¸) ¸ + 1) = + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. ˘ ¸2 + ¸ ¯ is a =2) F2-linear space. same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you Cryptosystem Our speedups (We now cr.yp.to/papers.html#auth256 Fast syndrome without Important Fast secret using bit sorting net permutation
SLIDE 71 evaluate · · · + cn−1xn−1
a: Write f as (x2 + x). een f (¸) = (¸2 + ¸) + 1)f1(¸2 + ¸). ensure 1 ∈ space. is a
recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in pap Cryptosystem specification. Our speedups to additive (We now have more cr.yp.to/papers.html#auth256 Fast syndrome computation without big precom Important for light Fast secret permutation using bit operations: sorting networks, permutation netwo
SLIDE 72
xn−1 . as = + ¸). space. recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in paper: Cryptosystem specification. Our speedups to additive FFT. (We now have more speedups: cr.yp.to/papers.html#auth256 Fast syndrome computation without big precomputed matrix. Important for lightweight! Fast secret permutation using bit operations: sorting networks, permutation networks.
SLIDE 73
Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in paper: Cryptosystem specification. Our speedups to additive FFT. (We now have more speedups: cr.yp.to/papers.html#auth256.) Fast syndrome computation without big precomputed matrix. Important for lightweight! Fast secret permutation using bit operations: sorting networks, permutation networks.