SLIDE 1 McBits: fast constant-time code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography.
SLIDE 2 McBits: fast constant-time code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level.
SLIDE 3 McBits: fast constant-time code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers.
SLIDE 4 McBits: fast constant-time code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc.
SLIDE 5 McBits: fast constant-time code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record.
SLIDE 6 McBits: fast constant-time code-based cryptography
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once.
SLIDE 7 McBits: constant-time de-based cryptography Bernstein University of Illinois at Chicago & echnische Universiteit Eindhoven
Chou echnische Universiteit Eindhoven riginal speaker, aiting for U.S. visa) Schwabe
Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. The track 1978 McEliece public-key Has held
1962 Prange. 1988 Lee–Brick 1989 Krouk. 1989 Dumer. 1990 Coffey–Go 1990 van 1991 Coffey–Go 1993 Chabanne–Courteau. 1993 Chabaud.
SLIDE 8 constant-time cryptography Illinois at Chicago & Universiteit Eindhoven Universiteit Eindhoven er, U.S. visa) University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. The track record 1978 McEliece prop public-key code-based Has held up well after
1962 Prange. 1981 1988 Lee–Brickell. 1989 Krouk. 1989 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Coffey–Goodman–F 1993 Chabanne–Courteau. 1993 Chabaud.
SLIDE 9 Chicago & Eindhoven Eindhoven Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud.
SLIDE 10 Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud.
SLIDE 11 Objectives new speed records public-key cryptography. ✿ ✿ ✿ a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, ranch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto solid track record. ✿ ✿ ✿
The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–P 2009 Bernstein–Lange– Peters–van 2009 Bernstein 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–P 2011 Ma 2011 Beck 2012 Beck 2013 Bernstein–Jeffe Meurer (p
SLIDE 12 records cryptography. ✿ ✿ ✿ security level. ✿ ✿ ✿ rotection computers. ✿ ✿ ✿ full protection cache-timing attacks, rediction attacks, etc. ✿ ✿ ✿ de-based crypto track record. ✿ ✿ ✿
The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–P 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–P 2011 May–Meurer–Th 2011 Becker–Coron–Joux. 2012 Becker–Joux–Ma 2013 Bernstein–Jeffe Meurer (post-quantum).
SLIDE 13 cryptography. ✿ ✿ ✿ level. ✿ ✿ ✿ ers. ✿ ✿ ✿ rotection attacks, etc. ✿ ✿ ✿ crypto ✿ ✿ ✿
The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum).
SLIDE 14 The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive
- ptimization of attack algorithms:
1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum).
SLIDE 15 track record McEliece proposed public-key code-based crypto. held up well after extensive
- ptimization of attack algorithms:
- Prange. 1981 Omura.
Lee–Brickell. 1988 Leon.
Dumer. Coffey–Goodman. van Tilburg. 1991 Dumer. Coffey–Goodman–Farrell. Chabanne–Courteau. Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples Some cycle (Intel Co from bench.cr.yp.to mceliece (2008 Bisw ✙ gls254 DH (binary elliptic kumfp127g (hyperelliptic; curve25519 (conservative mceliece ronald1024
SLIDE 16 roposed de-based crypto. ell after extensive attack algorithms: 1981 Omura.
1989 Stern.
- dman.
- rg. 1991 Dumer.
- dman–Farrell.
Chabanne–Courteau. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the comp Some cycle counts (Intel Core i5-3210M, from bench.cr.yp.to mceliece encrypt (2008 Biswas–Sendri ✙ gls254 DH (binary elliptic curve; kumfp127g DH (hyperelliptic; Euro curve25519 DH (conservative elliptic mceliece decrypt ronald1024 decrypt
SLIDE 17
crypto. extensive algorithms: Omura. Leon. Dumer. rrell. Chabanne–Courteau. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt (2008 Biswas–Sendrier, ✙280 gls254 DH (binary elliptic curve; CHES kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040
SLIDE 18
1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040
SLIDE 19
van Tilburg. Canteaut–Chabanne. Canteaut–Chabaud. Canteaut–Sendrier. Bernstein–Lange–Peters. Bernstein–Lange– eters–van Tilborg. Bernstein (post-quantum). Finiasz–Sendrier. Bernstein–Lange–Peters. May–Meurer–Thomae. Becker–Coron–Joux. Becker–Joux–May–Meurer. Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New dec ✙2128 securit ♥❀ t ❀
SLIDE 20 rg. Canteaut–Chabanne. Canteaut–Chabaud. Canteaut–Sendrier. Bernstein–Lange–Peters. Bernstein–Lange–
(post-quantum). Finiasz–Sendrier. Bernstein–Lange–Peters. y–Meurer–Thomae. ron–Joux. er–Joux–May–Meurer. Bernstein–Jeffery–Lange–
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t ❀
SLIDE 21 Canteaut–Chabanne. eters.
eters. ae. y–Meurer. y–Lange– Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀
SLIDE 22
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41):
SLIDE 23
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.)
SLIDE 24
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles.
SLIDE 25
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS.
SLIDE 26
Examples of the competition cycle counts on h9ivy Core i5-3210M, Ivy Bridge) bench.cr.yp.to: mceliece encrypt 61440 Biswas–Sendrier, ✙280) DH 77468 ry elliptic curve; CHES 2013) kumfp127g DH 116944 erelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time The extremist’s to eliminate Handle all using only XOR (^),
SLIDE 27
competition counts on h9ivy i5-3210M, Ivy Bridge) bench.cr.yp.to: encrypt 61440 as–Sendrier, ✙280) 77468 curve; CHES 2013) 116944 Eurocrypt 2013) 182632 elliptic curve) decrypt 1219344 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s app to eliminate timing Handle all secret data using only bit operations— XOR (^), AND (&),
SLIDE 28
etition h9ivy Bridge) 61440 ✙280) 77468 CHES 2013) 116944 2013) 182632 curve) 1219344 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.
SLIDE 29
New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.
SLIDE 30
New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach.
SLIDE 31
New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?”
SLIDE 32 decoding speeds ✙ security (♥❀ t) = (4096❀ 41): Ivy Bridge cycles. will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙ security (♥❀ t) = (2048❀ 32): Ivy Bridge cycles. load/store addresses all branch conditions
cache-timing attacks etc. r improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we a Not as slo On a typical the XOR is actually
SLIDE 33 speeds ✙ ♥❀ t) = (4096❀ 41): Bridge cycles.
slightly slower: cipher, MAC.) ✙ ♥❀ t) = (2048❀ 32): Bridge cycles. addresses conditions Eliminates attacks etc. rovements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit the XOR instruction is actually 32-bit X
- perating in parallel
- n vectors of 32 bits
SLIDE 34 ✙ ♥❀ t (4096❀ 41): case. er: C.) ✙ ♥❀ t (2048❀ 32): CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
SLIDE 35 Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
SLIDE 36 Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
SLIDE 37 Constant-time fanaticism extremist’s approach eliminate timing attacks: Handle all secret data
), AND (&), etc. take this approach. can this be etitive in speed?
multiplication with hundreds of bit operations
Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not imme that this saves time multiplication
SLIDE 38 fanaticism approach timing attacks: secret data erations— (&), etc. approach. e speed? simulating multiplication with
simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F
SLIDE 39 attacks: tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.
SLIDE 40 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.
SLIDE 41 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212.
SLIDE 42 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing.
SLIDE 43 e are. slow as it sounds! ypical 32-bit CPU, OR instruction actually 32-bit XOR, erating in parallel vectors of 32 bits. w-end smartphone CPU: 128-bit XOR every cycle. Bridge: 256-bit XOR every cycle, e 128-bit XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive Fix ♥ = t Big final is to find
① ✁ ✁ ✁ ❝ ① For each ☛ ✷ compute ❢ ☛ 41 adds,
SLIDE 44 it sounds! 32-bit CPU, instruction 32-bit XOR, rallel bits. rtphone CPU: every cycle. every cycle, XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212 t Big final decoding is to find all roots
❝ ① For each ☛ ✷ F212, compute ❢(☛) by Ho 41 adds, 41 mults.
SLIDE 45 CPU: Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults.
SLIDE 46 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults.
SLIDE 47 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults.
SLIDE 48 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults.
SLIDE 49 immediately obvious this “bitslicing” time for, e.g., multiplication in F212. quite obvious that it time for addition in F212. ypical decoding algorithms add, mult roughly balanced. Coming next: how to save adds and most mults. synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ ♥❂ ♥ so Horner’s Θ(♥t) = ♥ ❂ ♥
SLIDE 50
“bitslicing” e.g., F212.
addition in F212. algorithms roughly balanced. how to save most mults. with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ ♥ so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥
SLIDE 51 F212. rithms balanced. mults. bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥).
SLIDE 52 The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥).
SLIDE 53 The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥?
SLIDE 54
additive FFT ♥ = 4096 = 212, t = 41. final decoding step find all roots in F212 ❢ ❝41①41 + ✁ ✁ ✁ + ❝0①0. each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: adds, 41 mults. Chien search: compute ❝✐❣✐ ❝✐❣2✐, ❝✐❣3✐, etc. Cost per again 41 adds, 41 mults. cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard Want to ❢ = ❝0 + ❝ ① ✁ ✁ ✁ ❝♥ ①♥ at all the ♥ Write ❢ ❢ ① ①❢ ① Observe ❢(☛) = ❢ ☛ ☛❢ ☛ ❢(☛) = ❢ ☛ ☛❢ ☛ ❢0 has ♥❂ evaluate ♥❂ by same Similarly ❢
SLIDE 55 FFT ♥ 212, t = 41. ding step
❢ ❝ ① ✁ ✁ ✁ + ❝0①0. ☛ ✷
12,
❢ ☛ y Horner’s rule: mults. search: compute ❝✐❣✐ ❝✐❣ ✐ ❝✐❣ ✐, etc. Cost per adds, 41 mults. adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ ❝♥ ①♥ at all the ♥th roots Write ❢ as ❢0(①2) ①❢ ① Observe big overlap ❢(☛) = ❢0(☛2) + ☛❢ ☛ ❢(☛) = ❢0(☛2) ☛❢ ☛ ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd by same idea recursively Similarly ❢1.
SLIDE 56
♥ t 41. ❢ ❝ ① ✁ ✁ ✁ ❝ ① . ☛ ✷ ❢ ☛ rule: compute ❝✐❣✐ ❝✐❣ ✐ ❝✐❣ ✐ Cost per mults. mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥ at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of by same idea recursively. Similarly ❢1.
SLIDE 57
Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1.
SLIDE 58 Asymptotics: rmally t ✷ Θ(♥❂ lg ♥), rner’s rule costs ♥t = Θ(♥2❂ lg ♥). minute. we learn in school FFT evaluates ♥-coeff polynomial ♥
♥1+♦(1) operations? this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in ☛ ☛ Standard FFT considered 1988 Wa independently “additive Still quite 1996 von some im 2010 Gao–Mateer: much better We use Gao–Mateer, plus some
SLIDE 59 t ✷ ♥❂ lg ♥), costs ♥t ♥ ❂ lg ♥). in school evaluates ♥
♥ ♥
♦
erations? than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ ☛ Standard workarounds FFT considered imp 1988 Wang–Zhu, independently 1989 “additive FFT” in Still quite expensive. 1996 von zur Gathen–Gerha some improvements. 2010 Gao–Mateer: much better additive We use Gao–Mateer, plus some new imp
SLIDE 60
t ✷ ♥❂ ♥ ♥t ♥ ❂ ♥ ♥ ♥ ♥
♦
♥ ❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.
SLIDE 61
Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.
SLIDE 62 Standard radix-2 FFT: to evaluate ❢ ❝ + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 the ♥th roots of 1. ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢ ☛ ❢0(☛2) + ☛❢1(☛2), ❢ ☛ = ❢0(☛2) ☛❢1(☛2). ❢ ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 same idea recursively. rly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and ❢ = ❝0 + ❝ ① ✁ ✁ ✁ ❝♥ ①♥
Their main ❢ ❢0(①2 + ① ①❢ ① ① Big overlap ❢ ☛ ❢0(☛2 + ☛ ☛❢ ☛ ☛ and ❢(☛ ❢0(☛2 + ☛ ☛ ❢ ☛ ☛ “Twist” ✷ Then ✟ ☛ ☛ ✠ size-(♥❂2) Apply same
SLIDE 63 FFT: evaluate ❢ ❝ ❝ ① ✁ ✁ ✁ + ❝♥1①♥1 ♥
❢ ❢ ① ) + ①❢1(①2).
❢ ☛ ❢ ☛ ☛❢1(☛2), ❢ ☛ ❢ ☛ ) ☛❢1(☛2). ❢ ♥❂ effs; ♥❂2)nd roots of 1 recursively. ❢ Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ ❝♥ ①♥
Their main idea: W ❢ ❢0(①2 + ①) + ①❢1(① ① Big overlap between ❢ ☛ ❢0(☛2 + ☛) + ☛❢1(☛ ☛ and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + ❢ ☛ ☛ “Twist” to ensure ✷ Then ✟ ☛2 + ☛ ✠ is size-(♥❂2) F2-linea Apply same idea recursively
SLIDE 64 ❢ ❝ ❝ ① ✁ ✁ ✁ ❝♥ ①♥1 ♥ ❢ ❢ ① ①❢ ①2). een ❢ ☛ ❢ ☛ ☛❢ ☛ ), ❢ ☛ ❢ ☛ ☛❢ ☛ ). ❢ ♥❂ ♥❂
❢ Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥
- n a size-♥ F2-linear space.
Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛ “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively.
SLIDE 65 Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively.
SLIDE 66 Useless in char 2: ☛ = ☛. Standard workarounds are painful. considered impractical. ang–Zhu, endently 1989 Cantor: “additive FFT” in char 2. quite expensive. von zur Gathen–Gerhard: improvements. Gao–Mateer: better additive FFT. use Gao–Mateer, some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. Results 60493 Ivy 8622 fo 20846 fo 7714 fo 14794 fo 8520 fo Code will We’re still Also 10✂ More info cr.yp.to/papers.html#mcbits
SLIDE 67 2: ☛ = ☛. rounds are painful. impractical. ng–Zhu, 1989 Cantor: in char 2. ensive. Gathen–Gerhard: rovements. Gao–Mateer: additive FFT. Gao–Mateer, improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permuta 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permuta Code will be public We’re still speeding Also 10✂ speedup More information: cr.yp.to/papers.html#mcbits
SLIDE 68 ☛ ☛. painful. ractical. r: Gathen–Gerhard: FFT. rovements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits
SLIDE 69 Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits
SLIDE 70 and Mateer evaluate ❢ ❝ + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 size-♥ F2-linear space. main idea: Write ❢ as ❢ ① + ①) + ①❢1(①2 + ①).
❢ ☛ + ☛) + ☛❢1(☛2 + ☛) ❢(☛ + 1) = ❢ ☛ + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. ✟ ☛2 + ☛ ✠ is a ♥❂2) F2-linear space. same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you Cryptosystem Our speedups (We now
Fast syndrome without Important Fast secret using bit sorting net permutation
SLIDE 71 evaluate ❢ ❝ ❝ ① ✁ ✁ ✁ + ❝♥1①♥1 ♥
a: Write ❢ as ❢ ① ① ①❢1(①2 + ①). een ❢(☛) = ❢ ☛ ☛ ☛❢1(☛2 + ☛) ❢ ☛ ❢ ☛ ☛ ☛ + 1)❢1(☛2 + ☛). ensure 1 ✷ space. ✟ ☛ ☛ ✠ is a ♥❂
recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in pap Cryptosystem specification. Our speedups to additive (We now have more
Fast syndrome computation without big precom Important for light Fast secret permutation using bit operations: sorting networks, permutation netwo
SLIDE 72 ❢ ❝ ❝ ① ✁ ✁ ✁ ❝♥ ①♥1 ♥ space. ❢ as ❢ ① ① ①❢ ① ①). ❢ ☛ = ❢ ☛ ☛ ☛❢ ☛ ☛) ❢ ☛ ❢ ☛ ☛ ☛ ❢ ☛2 + ☛). ✷ space. ✟ ☛ ☛ ✠ ♥❂ space. recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in paper: Cryptosystem specification. Our speedups to additive FFT. (We now have more speedups;
- ngoing joint work with Lange
Fast syndrome computation without big precomputed matrix. Important for lightweight! Fast secret permutation using bit operations: sorting networks, permutation networks.
SLIDE 73 Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in paper: Cryptosystem specification. Our speedups to additive FFT. (We now have more speedups;
- ngoing joint work with Lange.)
Fast syndrome computation without big precomputed matrix. Important for lightweight! Fast secret permutation using bit operations: sorting networks, permutation networks.