McBits: Objectives fast constant-time Set new speed records - - PowerPoint PPT Presentation

▶

Sep 16, 2022 16 likes •749 views

McBits: Objectives fast constant-time Set new speed records code-based cryptography for public-key cryptography. D. J. Bernstein University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou

SLIDE 1

McBits: fast constant-time code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography.

SLIDE 2

McBits: fast constant-time code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level.

SLIDE 3

McBits: fast constant-time code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers.

SLIDE 4

McBits: fast constant-time code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc.

SLIDE 5

McBits: fast constant-time code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record.

SLIDE 6

McBits: fast constant-time code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven (original speaker, still waiting for U.S. visa) Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once.

SLIDE 7

McBits: constant-time de-based cryptography Bernstein University of Illinois at Chicago & echnische Universiteit Eindhoven

rk with:

Chou echnische Universiteit Eindhoven riginal speaker, aiting for U.S. visa) Schwabe

ud University Nijmegen

Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. The track 1978 McEliece public-key Has held

ptimization

1962 Prange. 1988 Lee–Brick 1989 Krouk. 1989 Dumer. 1990 Coffey–Go 1990 van 1991 Coffey–Go 1993 Chabanne–Courteau. 1993 Chabaud.

SLIDE 8

constant-time cryptography Illinois at Chicago & Universiteit Eindhoven Universiteit Eindhoven er, U.S. visa) University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. The track record 1978 McEliece prop public-key code-based Has held up well after

ptimization of attack

1962 Prange. 1981 1988 Lee–Brickell. 1989 Krouk. 1989 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Coffey–Goodman–F 1993 Chabanne–Courteau. 1993 Chabaud.

SLIDE 9

Chicago & Eindhoven Eindhoven Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud.

SLIDE 10

Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud.

SLIDE 11

Objectives new speed records public-key cryptography. ✿ ✿ ✿ a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, ranch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto solid track record. ✿ ✿ ✿

f the above at once.

The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–P 2009 Bernstein–Lange– Peters–van 2009 Bernstein 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–P 2011 Ma 2011 Beck 2012 Beck 2013 Bernstein–Jeffe Meurer (p

SLIDE 12

records cryptography. ✿ ✿ ✿ security level. ✿ ✿ ✿ rotection computers. ✿ ✿ ✿ full protection cache-timing attacks, rediction attacks, etc. ✿ ✿ ✿ de-based crypto track record. ✿ ✿ ✿

ve at once.

The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–P 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–P 2011 May–Meurer–Th 2011 Becker–Coron–Joux. 2012 Becker–Joux–Ma 2013 Bernstein–Jeffe Meurer (post-quantum).

SLIDE 13

cryptography. ✿ ✿ ✿ level. ✿ ✿ ✿ ers. ✿ ✿ ✿ rotection attacks, etc. ✿ ✿ ✿ crypto ✿ ✿ ✿

nce.

The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum).

SLIDE 14

The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum).

SLIDE 15

track record McEliece proposed public-key code-based crypto. held up well after extensive

ptimization of attack algorithms:
Prange. 1981 Omura.

Lee–Brickell. 1988 Leon.

Krouk. 1989 Stern.

Dumer. Coffey–Goodman. van Tilburg. 1991 Dumer. Coffey–Goodman–Farrell. Chabanne–Courteau. Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples Some cycle (Intel Co from bench.cr.yp.to mceliece (2008 Bisw ✙ gls254 DH (binary elliptic kumfp127g (hyperelliptic; curve25519 (conservative mceliece ronald1024

SLIDE 16

roposed de-based crypto. ell after extensive attack algorithms: 1981 Omura.

ell. 1988 Leon.

1989 Stern.

dman.
rg. 1991 Dumer.
dman–Farrell.

Chabanne–Courteau. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the comp Some cycle counts (Intel Core i5-3210M, from bench.cr.yp.to mceliece encrypt (2008 Biswas–Sendri ✙ gls254 DH (binary elliptic curve; kumfp127g DH (hyperelliptic; Euro curve25519 DH (conservative elliptic mceliece decrypt ronald1024 decrypt

SLIDE 17

crypto. extensive algorithms: Omura. Leon. Dumer. rrell. Chabanne–Courteau. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt (2008 Biswas–Sendrier, ✙280 gls254 DH (binary elliptic curve; CHES kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040

SLIDE 18

1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040

SLIDE 19

van Tilburg. Canteaut–Chabanne. Canteaut–Chabaud. Canteaut–Sendrier. Bernstein–Lange–Peters. Bernstein–Lange– eters–van Tilborg. Bernstein (post-quantum). Finiasz–Sendrier. Bernstein–Lange–Peters. May–Meurer–Thomae. Becker–Coron–Joux. Becker–Joux–May–Meurer. Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New dec ✙2128 securit ♥❀ t ❀

SLIDE 20

rg. Canteaut–Chabanne. Canteaut–Chabaud. Canteaut–Sendrier. Bernstein–Lange–Peters. Bernstein–Lange–

(post-quantum). Finiasz–Sendrier. Bernstein–Lange–Peters. y–Meurer–Thomae. ron–Joux. er–Joux–May–Meurer. Bernstein–Jeffery–Lange–

st-quantum).

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t ❀

SLIDE 21

Canteaut–Chabanne. eters.

st-quantum).

eters. ae. y–Meurer. y–Lange– Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀

SLIDE 22

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41):

SLIDE 23

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.)

SLIDE 24

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles.

SLIDE 25

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, ✙280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS.

SLIDE 26

Examples of the competition cycle counts on h9ivy Core i5-3210M, Ivy Bridge) bench.cr.yp.to: mceliece encrypt 61440 Biswas–Sendrier, ✙280) DH 77468 ry elliptic curve; CHES 2013) kumfp127g DH 116944 erelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time The extremist’s to eliminate Handle all using only XOR (^),

SLIDE 27

competition counts on h9ivy i5-3210M, Ivy Bridge) bench.cr.yp.to: encrypt 61440 as–Sendrier, ✙280) 77468 curve; CHES 2013) 116944 Eurocrypt 2013) 182632 elliptic curve) decrypt 1219344 decrypt 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s app to eliminate timing Handle all secret data using only bit operations— XOR (^), AND (&),

SLIDE 28

etition h9ivy Bridge) 61440 ✙280) 77468 CHES 2013) 116944 2013) 182632 curve) 1219344 1340040 New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.

SLIDE 29

New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.

SLIDE 30

New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach.

SLIDE 31

New decoding speeds ✙2128 security (♥❀ t) = (4096❀ 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙280 security (♥❀ t) = (2048❀ 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?”

SLIDE 32

decoding speeds ✙ security (♥❀ t) = (4096❀ 41): Ivy Bridge cycles. will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ✙ security (♥❀ t) = (2048❀ 32): Ivy Bridge cycles. load/store addresses all branch conditions

blic. Eliminates

cache-timing attacks etc. r improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we a Not as slo On a typical the XOR is actually

perating
n vecto

SLIDE 33

speeds ✙ ♥❀ t) = (4096❀ 41): Bridge cycles.

n this case.

slightly slower: cipher, MAC.) ✙ ♥❀ t) = (2048❀ 32): Bridge cycles. addresses conditions Eliminates attacks etc. rovements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit the XOR instruction is actually 32-bit X

perating in parallel
n vectors of 32 bits

SLIDE 34

✙ ♥❀ t (4096❀ 41): case. er: C.) ✙ ♥❀ t (2048❀ 32): CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

SLIDE 35

Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

SLIDE 36

Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

SLIDE 37

Constant-time fanaticism extremist’s approach eliminate timing attacks: Handle all secret data

nly bit operations—

), AND (&), etc. take this approach. can this be etitive in speed?

u really simulating

multiplication with hundreds of bit operations

f simple log tables?”

Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not imme that this saves time multiplication

SLIDE 38

fanaticism approach timing attacks: secret data erations— (&), etc. approach. e speed? simulating multiplication with

perations

simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F

SLIDE 39

attacks: tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.

SLIDE 40

Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.

SLIDE 41

Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212.

SLIDE 42

Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing.

SLIDE 43

e are. slow as it sounds! ypical 32-bit CPU, OR instruction actually 32-bit XOR, erating in parallel vectors of 32 bits. w-end smartphone CPU: 128-bit XOR every cycle. Bridge: 256-bit XOR every cycle, e 128-bit XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive Fix ♥ = t Big final is to find

f ❢ = ❝

① ✁ ✁ ✁ ❝ ① For each ☛ ✷ compute ❢ ☛ 41 adds,

SLIDE 44

it sounds! 32-bit CPU, instruction 32-bit XOR, rallel bits. rtphone CPU: every cycle. every cycle, XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212 t Big final decoding is to find all roots

f ❢ = ❝41①41 + ✁ ✁ ✁

❝ ① For each ☛ ✷ F212, compute ❢(☛) by Ho 41 adds, 41 mults.

SLIDE 45

CPU: Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults.

SLIDE 46

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults.

SLIDE 47

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults.

SLIDE 48

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults.

SLIDE 49

immediately obvious this “bitslicing” time for, e.g., multiplication in F212. quite obvious that it time for addition in F212. ypical decoding algorithms add, mult roughly balanced. Coming next: how to save adds and most mults. synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ ♥❂ ♥ so Horner’s Θ(♥t) = ♥ ❂ ♥

SLIDE 50

bvious

“bitslicing” e.g., F212.

bvious that it

addition in F212. algorithms roughly balanced. how to save most mults. with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ ♥ so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥

SLIDE 51

F212. rithms balanced. mults. bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥).

SLIDE 52

The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥).

SLIDE 53

The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.

For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥?

SLIDE 54

additive FFT ♥ = 4096 = 212, t = 41. final decoding step find all roots in F212 ❢ ❝41①41 + ✁ ✁ ✁ + ❝0①0. each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: adds, 41 mults. Chien search: compute ❝✐❣✐ ❝✐❣2✐, ❝✐❣3✐, etc. Cost per again 41 adds, 41 mults. cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard Want to ❢ = ❝0 + ❝ ① ✁ ✁ ✁ ❝♥ ①♥ at all the ♥ Write ❢ ❢ ① ①❢ ① Observe ❢(☛) = ❢ ☛ ☛❢ ☛ ❢(☛) = ❢ ☛ ☛❢ ☛ ❢0 has ♥❂ evaluate ♥❂ by same Similarly ❢

SLIDE 55

FFT ♥ 212, t = 41. ding step

ts in F212

❢ ❝ ① ✁ ✁ ✁ + ❝0①0. ☛ ✷

12,

❢ ☛ y Horner’s rule: mults. search: compute ❝✐❣✐ ❝✐❣ ✐ ❝✐❣ ✐, etc. Cost per adds, 41 mults. adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ ❝♥ ①♥ at all the ♥th roots Write ❢ as ❢0(①2) ①❢ ① Observe big overlap ❢(☛) = ❢0(☛2) + ☛❢ ☛ ❢(☛) = ❢0(☛2) ☛❢ ☛ ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd by same idea recursively Similarly ❢1.

SLIDE 56

♥ t 41. ❢ ❝ ① ✁ ✁ ✁ ❝ ① . ☛ ✷ ❢ ☛ rule: compute ❝✐❣✐ ❝✐❣ ✐ ❝✐❣ ✐ Cost per mults. mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥ at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of by same idea recursively. Similarly ❢1.

SLIDE 57

Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1.

SLIDE 58

Asymptotics: rmally t ✷ Θ(♥❂ lg ♥), rner’s rule costs ♥t = Θ(♥2❂ lg ♥). minute. we learn in school FFT evaluates ♥-coeff polynomial ♥

ints

♥1+♦(1) operations? this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in ☛ ☛ Standard FFT considered 1988 Wa independently “additive Still quite 1996 von some im 2010 Gao–Mateer: much better We use Gao–Mateer, plus some

SLIDE 59

t ✷ ♥❂ lg ♥), costs ♥t ♥ ❂ lg ♥). in school evaluates ♥

lynomial

♥ ♥

♦

erations? than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ ☛ Standard workarounds FFT considered imp 1988 Wang–Zhu, independently 1989 “additive FFT” in Still quite expensive. 1996 von zur Gathen–Gerha some improvements. 2010 Gao–Mateer: much better additive We use Gao–Mateer, plus some new imp

SLIDE 60

t ✷ ♥❂ ♥ ♥t ♥ ❂ ♥ ♥ ♥ ♥

♦

♥ ❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.

SLIDE 61

Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.

SLIDE 62

Standard radix-2 FFT: to evaluate ❢ ❝ + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 the ♥th roots of 1. ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢ ☛ ❢0(☛2) + ☛❢1(☛2), ❢ ☛ = ❢0(☛2) ☛❢1(☛2). ❢ ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 same idea recursively. rly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and ❢ = ❝0 + ❝ ① ✁ ✁ ✁ ❝♥ ①♥

n a size-♥

Their main ❢ ❢0(①2 + ① ①❢ ① ① Big overlap ❢ ☛ ❢0(☛2 + ☛ ☛❢ ☛ ☛ and ❢(☛ ❢0(☛2 + ☛ ☛ ❢ ☛ ☛ “Twist” ✷ Then ✟ ☛ ☛ ✠ size-(♥❂2) Apply same

SLIDE 63

FFT: evaluate ❢ ❝ ❝ ① ✁ ✁ ✁ + ❝♥1①♥1 ♥

ts of 1.

❢ ❢ ① ) + ①❢1(①2).

verlap between

❢ ☛ ❢ ☛ ☛❢1(☛2), ❢ ☛ ❢ ☛ ) ☛❢1(☛2). ❢ ♥❂ effs; ♥❂2)nd roots of 1 recursively. ❢ Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ ❝♥ ①♥

n a size-♥ F2-linea

Their main idea: W ❢ ❢0(①2 + ①) + ①❢1(① ① Big overlap between ❢ ☛ ❢0(☛2 + ☛) + ☛❢1(☛ ☛ and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + ❢ ☛ ☛ “Twist” to ensure ✷ Then ✟ ☛2 + ☛ ✠ is size-(♥❂2) F2-linea Apply same idea recursively

SLIDE 64

❢ ❝ ❝ ① ✁ ✁ ✁ ❝♥ ①♥1 ♥ ❢ ❢ ① ①❢ ①2). een ❢ ☛ ❢ ☛ ☛❢ ☛ ), ❢ ☛ ❢ ☛ ☛❢ ☛ ). ❢ ♥❂ ♥❂

❢ Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥

n a size-♥ F2-linear space.

Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛ “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively.

SLIDE 65

Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1

n a size-♥ F2-linear space.

Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively.

SLIDE 66

Useless in char 2: ☛ = ☛. Standard workarounds are painful. considered impractical. ang–Zhu, endently 1989 Cantor: “additive FFT” in char 2. quite expensive. von zur Gathen–Gerhard: improvements. Gao–Mateer: better additive FFT. use Gao–Mateer, some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1

n a size-♥ F2-linear space.

Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. Results 60493 Ivy 8622 fo 20846 fo 7714 fo 14794 fo 8520 fo Code will We’re still Also 10✂ More info cr.yp.to/papers.html#mcbits

SLIDE 67

2: ☛ = ☛. rounds are painful. impractical. ng–Zhu, 1989 Cantor: in char 2. ensive. Gathen–Gerhard: rovements. Gao–Mateer: additive FFT. Gao–Mateer, improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1

n a size-♥ F2-linear space.

Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permuta 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permuta Code will be public We’re still speeding Also 10✂ speedup More information: cr.yp.to/papers.html#mcbits

SLIDE 68

☛ ☛. painful. ractical. r: Gathen–Gerhard: FFT. rovements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1

n a size-♥ F2-linear space.

Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits

SLIDE 69

Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1

n a size-♥ F2-linear space.

Their main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits

SLIDE 70

and Mateer evaluate ❢ ❝ + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 size-♥ F2-linear space. main idea: Write ❢ as ❢ ① + ①) + ①❢1(①2 + ①).

verlap between ❢(☛) =

❢ ☛ + ☛) + ☛❢1(☛2 + ☛) ❢(☛ + 1) = ❢ ☛ + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. ✟ ☛2 + ☛ ✠ is a ♥❂2) F2-linear space. same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you Cryptosystem Our speedups (We now

ngoing

Fast syndrome without Important Fast secret using bit sorting net permutation

SLIDE 71

evaluate ❢ ❝ ❝ ① ✁ ✁ ✁ + ❝♥1①♥1 ♥

linear space.

a: Write ❢ as ❢ ① ① ①❢1(①2 + ①). een ❢(☛) = ❢ ☛ ☛ ☛❢1(☛2 + ☛) ❢ ☛ ❢ ☛ ☛ ☛ + 1)❢1(☛2 + ☛). ensure 1 ✷ space. ✟ ☛ ☛ ✠ is a ♥❂

linear space.

recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in pap Cryptosystem specification. Our speedups to additive (We now have more

ngoing joint work

Fast syndrome computation without big precom Important for light Fast secret permutation using bit operations: sorting networks, permutation netwo

SLIDE 72

❢ ❝ ❝ ① ✁ ✁ ✁ ❝♥ ①♥1 ♥ space. ❢ as ❢ ① ① ①❢ ① ①). ❢ ☛ = ❢ ☛ ☛ ☛❢ ☛ ☛) ❢ ☛ ❢ ☛ ☛ ☛ ❢ ☛2 + ☛). ✷ space. ✟ ☛ ☛ ✠ ♥❂ space. recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in paper: Cryptosystem specification. Our speedups to additive FFT. (We now have more speedups;

ngoing joint work with Lange

Fast syndrome computation without big precomputed matrix. Important for lightweight! Fast secret permutation using bit operations: sorting networks, permutation networks.

SLIDE 73

Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10✂ speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in paper: Cryptosystem specification. Our speedups to additive FFT. (We now have more speedups;

ngoing joint work with Lange.)

Fast syndrome computation without big precomputed matrix. Important for lightweight! Fast secret permutation using bit operations: sorting networks, permutation networks.