Efficient implementation of Objectives code-based cryptography Set - - PowerPoint PPT Presentation

▶

Oct 09, 2023 123 likes •872 views

Efficient implementation of Objectives code-based cryptography Set new speed records D. J. Bernstein for public-key cryptography. University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou

SLIDE 1

Efficient implementation of code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography.

SLIDE 2

Efficient implementation of code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level.

SLIDE 3

Efficient implementation of code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers.

SLIDE 4

Efficient implementation of code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc.

SLIDE 5

Efficient implementation of code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record.

SLIDE 6

Efficient implementation of code-based cryptography

D. J. Bernstein

University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once.

SLIDE 7

Efficient implementation of de-based cryptography Bernstein University of Illinois at Chicago & echnische Universiteit Eindhoven

rk with:

Chou echnische Universiteit Eindhoven Schwabe

ud University Nijmegen

Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once. The track 1978 McEliece public-key Has held

ptimization

1962 Prange. 1988 Lee–Brick 1989 Krouk. 1989 Dumer. 1990 Coffey–Go 1990 van 1991 Coffey–Go 1993 Chabanne–Courteau. 1993 Chabaud.

SLIDE 8

entation of cryptography Illinois at Chicago & Universiteit Eindhoven Universiteit Eindhoven University Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once. The track record 1978 McEliece prop public-key code-based Has held up well after

ptimization of attack

1962 Prange. 1981 1988 Lee–Brickell. 1989 Krouk. 1989 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Coffey–Goodman–F 1993 Chabanne–Courteau. 1993 Chabaud.

SLIDE 9

Chicago & Eindhoven Eindhoven Nijmegen Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once. The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud.

SLIDE 10

Objectives Set new speed records for public-key cryptography. : : : at a high security level. : : : including protection against quantum computers. : : : including full protection against cache-timing attacks, branch-prediction attacks, etc. : : : using code-based crypto with a solid track record. : : : all of the above at once. The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud.

SLIDE 11

Objectives new speed records public-key cryptography. a high security level. including protection against quantum computers. including full protection against cache-timing attacks, ranch-prediction attacks, etc. using code-based crypto solid track record.

f the above at once.

The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–P 2009 Bernstein–Lange– Peters–van 2009 Bernstein 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–P 2011 Ma 2011 Beck 2012 Beck 2013 Bernstein–Jeffe Meurer (p

SLIDE 12

records cryptography. security level. rotection computers. full protection cache-timing attacks, rediction attacks, etc. de-based crypto track record.

ve at once.

The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–P 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–P 2011 May–Meurer–Th 2011 Becker–Coron–Joux. 2012 Becker–Joux–Ma 2013 Bernstein–Jeffe Meurer (post-quantum).

SLIDE 13

cryptography. level. ers. rotection attacks, etc. crypto

nce.

The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum).

SLIDE 14

The track record 1978 McEliece proposed public-key code-based crypto. Has held up well after extensive

ptimization of attack algorithms:

1962 Prange. 1981 Omura. 1988 Lee–Brickell. 1988 Leon. 1989 Krouk. 1989 Stern. 1989 Dumer. 1990 Coffey–Goodman. 1990 van Tilburg. 1991 Dumer. 1991 Coffey–Goodman–Farrell. 1993 Chabanne–Courteau. 1993 Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum).

SLIDE 15

track record McEliece proposed public-key code-based crypto. held up well after extensive

ptimization of attack algorithms:
Prange. 1981 Omura.

Lee–Brickell. 1988 Leon.

Krouk. 1989 Stern.

Dumer. Coffey–Goodman. van Tilburg. 1991 Dumer. Coffey–Goodman–Farrell. Chabanne–Courteau. Chabaud. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples Some cycle (Intel Co from bench.cr.yp.to mceliece (2008 Bisw gls254 DH (binary elliptic kummer DH (hyperelliptic; curve25519 (conservative mceliece ronald1024

SLIDE 16

roposed de-based crypto. ell after extensive attack algorithms: 1981 Omura.

ell. 1988 Leon.

1989 Stern.

dman.
rg. 1991 Dumer.
dman–Farrell.

Chabanne–Courteau. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the comp Some cycle counts (Intel Core i5-3210M, from bench.cr.yp.to mceliece encrypt (2008 Biswas–Sendri gls254 DH (binary elliptic curve; kummer DH (hyperelliptic; Asiacrypt curve25519 DH (conservative elliptic mceliece decrypt ronald1024 decrypt

SLIDE 17

crypto. extensive algorithms: Omura. Leon. Dumer. rrell. Chabanne–Courteau. 1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt (2008 Biswas–Sendrier, ≈280 gls254 DH (binary elliptic curve; CHES kummer DH (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324

SLIDE 18

1994 van Tilburg. 1994 Canteaut–Chabanne. 1998 Canteaut–Chabaud. 1998 Canteaut–Sendrier. 2008 Bernstein–Lange–Peters. 2009 Bernstein–Lange– Peters–van Tilborg. 2009 Bernstein (post-quantum). 2009 Finiasz–Sendrier. 2010 Bernstein–Lange–Peters. 2011 May–Meurer–Thomae. 2011 Becker–Coron–Joux. 2012 Becker–Joux–May–Meurer. 2013 Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324

SLIDE 19

van Tilburg. Canteaut–Chabanne. Canteaut–Chabaud. Canteaut–Sendrier. Bernstein–Lange–Peters. Bernstein–Lange– eters–van Tilborg. Bernstein (post-quantum). Finiasz–Sendrier. Bernstein–Lange–Peters. May–Meurer–Thomae. Becker–Coron–Joux. Becker–Joux–May–Meurer. Bernstein–Jeffery–Lange– Meurer (post-quantum). Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New dec ≈2128 securit

SLIDE 20

rg. Canteaut–Chabanne. Canteaut–Chabaud. Canteaut–Sendrier. Bernstein–Lange–Peters. Bernstein–Lange–

(post-quantum). Finiasz–Sendrier. Bernstein–Lange–Peters. y–Meurer–Thomae. ron–Joux. er–Joux–May–Meurer. Bernstein–Jeffery–Lange–

st-quantum).

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n;

SLIDE 21

Canteaut–Chabanne. eters.

st-quantum).

eters. ae. y–Meurer. y–Lange– Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096

SLIDE 22

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41):

SLIDE 23

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.)

SLIDE 24

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles.

SLIDE 25

Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 73092 (2008 Biswas–Sendrier, ≈280) gls254 DH 76212 (binary elliptic curve; CHES 2013) kummer DH 88448 (hyperelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS.

SLIDE 26

Examples of the competition cycle counts on h9ivy Core i5-3210M, Ivy Bridge) bench.cr.yp.to: mceliece encrypt 73092 Biswas–Sendrier, ≈280) DH 76212 ry elliptic curve; CHES 2013) DH 88448 erelliptic; Asiacrypt 2014) curve25519 DH 182708 (conservative elliptic curve) mceliece decrypt 1130908 ronald1024 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time The extremist’s to eliminate Handle all using only XOR (^),

SLIDE 27

competition counts on h9ivy i5-3210M, Ivy Bridge) bench.cr.yp.to: encrypt 73092 as–Sendrier, ≈280) 76212 curve; CHES 2013) 88448 Asiacrypt 2014) 182708 elliptic curve) decrypt 1130908 decrypt 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s app to eliminate timing Handle all secret data using only bit operations— XOR (^), AND (&),

SLIDE 28

etition h9ivy Bridge) 73092 280) 76212 CHES 2013) 88448 2014) 182708 curve) 1130908 1313324 New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.

SLIDE 29

New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.

SLIDE 30

New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach.

SLIDE 31

New decoding speeds ≈2128 security (n; t) = (4096; 41): 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ≈280 security (n; t) = (2048; 32): 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?”

SLIDE 32

decoding speeds security (n; t) = (4096; 41): Ivy Bridge cycles. will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) security (n; t) = (2048; 32): Ivy Bridge cycles. load/store addresses all branch conditions

blic. Eliminates

cache-timing attacks etc. r improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we a Not as slo On a typical the XOR is actually

perating
n vecto

SLIDE 33

speeds n; t) = (4096; 41): Bridge cycles.

n this case.

slightly slower: cipher, MAC.) ; t) = (2048; 32): Bridge cycles. addresses conditions Eliminates attacks etc. rovements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit the XOR instruction is actually 32-bit X

perating in parallel
n vectors of 32 bits

SLIDE 34

(4096; 41): case. er: C.) (2048; 32): CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

SLIDE 35

Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

SLIDE 36

Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

SLIDE 37

Constant-time fanaticism extremist’s approach eliminate timing attacks: Handle all secret data

nly bit operations—

), AND (&), etc. take this approach. can this be etitive in speed?

u really simulating

multiplication with hundreds of bit operations

f simple log tables?”

Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not imme that this saves time multiplication

SLIDE 38

fanaticism approach timing attacks: secret data erations— (&), etc. approach. e speed? simulating multiplication with

perations

simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F

SLIDE 39

attacks: tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.

SLIDE 40

Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.

SLIDE 41

Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212.

SLIDE 42

Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,

perating in parallel
n vectors of 32 bits.

Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,

r three 128-bit XORs.

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing.

SLIDE 43

e are. slow as it sounds! ypical 32-bit CPU, OR instruction actually 32-bit XOR, erating in parallel vectors of 32 bits. w-end smartphone CPU: 128-bit XOR every cycle. Bridge: 256-bit XOR every cycle, e 128-bit XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive Fix n = 4096 Big final is to find

f f = c41

For each compute 41 adds,

SLIDE 44

it sounds! 32-bit CPU, instruction 32-bit XOR, rallel bits. rtphone CPU: every cycle. every cycle, XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212 Big final decoding is to find all roots

f f = c41x41 + · ·

For each ¸ ∈ F212, compute f (¸) by Ho 41 adds, 41 mults.

SLIDE 45

CPU: Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults.

SLIDE 46

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults.

SLIDE 47

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults.

SLIDE 48

Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults.

SLIDE 49

immediately obvious this “bitslicing” time for, e.g., multiplication in F212. quite obvious that it time for addition in F212. ypical decoding algorithms add, mult roughly balanced. Coming next: how to save adds and most mults. synergy with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally so Horner’s Θ(nt) =

SLIDE 50

bvious

“bitslicing” e.g., F212.

bvious that it

addition in F212. algorithms roughly balanced. how to save most mults. with bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= so Horner’s rule costs Θ(nt) = Θ(n2= lg n

SLIDE 51

F212. rithms balanced. mults. bitslicing. The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n).

SLIDE 52

The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n).

SLIDE 53

The additive FFT Fix n = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212

f f = c41x41 + · · · + c0x0.

For each ¸ ∈ F212, compute f (¸) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute cigi, cig2i, cig3i, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n?

SLIDE 54

additive FFT 4096 = 212, t = 41. final decoding step find all roots in F212 c41x41 + · · · + c0x0. each ¸ ∈ F212, compute f (¸) by Horner’s rule: adds, 41 mults. Chien search: compute

ig2i, cig3i, etc. Cost per

again 41 adds, 41 mults. cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n? Standard Want to f = c0 + at all the Write f Observe f (¸) = f f (−¸) = f0 has n= evaluate by same Similarly

SLIDE 55

FFT 212, t = 41. ding step

ts in F212

· · · + c0x0.

12,

y Horner’s rule: mults. search: compute , etc. Cost per adds, 41 mults. adds, 2.09 mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · at all the nth roots Write f as f0(x2) + Observe big overlap f (¸) = f0(¸2) + ¸f f (−¸) = f0(¸2) − f0 has n=2 coeffs; evaluate at (n=2)nd by same idea recursively Similarly f1.

SLIDE 56

41. . rule: compute Cost per mults. mults. Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of by same idea recursively. Similarly f1.

SLIDE 57

Asymptotics: normally t ∈ Θ(n= lg n), so Horner’s rule costs Θ(nt) = Θ(n2= lg n). Wait a minute. Didn’t we learn in school that FFT evaluates an n-coeff polynomial at n points using n1+o(1) operations? Isn’t this better than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1.

SLIDE 58

Asymptotics: rmally t ∈ Θ(n= lg n), rner’s rule costs = Θ(n2= lg n). minute. we learn in school FFT evaluates

coeff polynomial
ints

n1+o(1) operations? this better than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1. Useless in Standard FFT considered 1988 Wa independently “additive Still quite 1996 von some im 2010 Gao–Mateer: much better We use Gao–Mateer, plus some

SLIDE 59

n= lg n), costs lg n). in school evaluates

lynomial

erations? than n2= lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1. Useless in char 2: Standard workarounds FFT considered imp 1988 Wang–Zhu, independently 1989 “additive FFT” in Still quite expensive. 1996 von zur Gathen–Gerha some improvements. 2010 Gao–Mateer: much better additive We use Gao–Mateer, plus some new imp

SLIDE 60

lg n? Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1. Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.

SLIDE 61

Standard radix-2 FFT: Want to evaluate f = c0 + c1x + · · · + cn−1xn−1 at all the nth roots of 1. Write f as f0(x2) + xf1(x2). Observe big overlap between f (¸) = f0(¸2) + ¸f1(¸2), f (−¸) = f0(¸2) − ¸f1(¸2). f0 has n=2 coeffs; evaluate at (n=2)nd roots of 1 by same idea recursively. Similarly f1. Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.

SLIDE 62

Standard radix-2 FFT: to evaluate + c1x + · · · + cn−1xn−1 the nth roots of 1. f as f0(x2) + xf1(x2). Observe big overlap between f0(¸2) + ¸f1(¸2), = f0(¸2) − ¸f1(¸2). n=2 coeffs; evaluate at (n=2)nd roots of 1 same idea recursively. rly f1. Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and f = c0 +

n a size-

Their main f0(x2 + x Big overlap f0(¸2 + ¸ and f (¸ f0(¸2 + ¸ “Twist” Then ˘ ¸ size-(n=2) Apply same

SLIDE 63

FFT: evaluate · · · + cn−1xn−1

ts of 1.

) + xf1(x2).

verlap between

¸f1(¸2), − ¸f1(¸2). effs; 2)nd roots of 1 recursively. Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate f = c0 + c1x + · · ·

n a size-n F2-linea

Their main idea: W f0(x2 + x) + xf1(x Big overlap between f0(¸2 + ¸) + ¸f1(¸ and f (¸ + 1) = f0(¸2 + ¸) + (¸ + “Twist” to ensure Then ˘ ¸2 + ¸ ¯ is size-(n=2) F2-linea Apply same idea recursively

SLIDE 64

xn−1 ). een ).

Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn

n a size-n F2-linear space.

Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively.

SLIDE 65

Useless in char 2: ¸ = −¸. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1

n a size-n F2-linear space.

Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively.

SLIDE 66

Useless in char 2: ¸ = −¸. Standard workarounds are painful. considered impractical. ang–Zhu, endently 1989 Cantor: “additive FFT” in char 2. quite expensive. von zur Gathen–Gerhard: improvements. Gao–Mateer: better additive FFT. use Gao–Mateer, some new improvements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1

n a size-n F2-linear space.

Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively. Results 60493 Ivy 8622 fo 20846 fo 7714 fo 14794 fo 8520 fo Code will We’re still Also 10× More info cr.yp.to/papers.html#mcbits

SLIDE 67

2: ¸ = −¸. rounds are painful. impractical. ng–Zhu, 1989 Cantor: in char 2. ensive. Gathen–Gerhard: rovements. Gao–Mateer: dditive FFT. Gao–Mateer, improvements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1

n a size-n F2-linear space.

Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permuta 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permuta Code will be public We’re still speeding Also 10× speedup More information: cr.yp.to/papers.html#mcbits

SLIDE 68

. painful. ractical. r: Gathen–Gerhard: FFT. rovements. Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1

n a size-n F2-linear space.

Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits

SLIDE 69

Gao and Mateer evaluate f = c0 + c1x + · · · + cn−1xn−1

n a size-n F2-linear space.

Their main idea: Write f as f0(x2 + x) + xf1(x2 + x). Big overlap between f (¸) = f0(¸2 + ¸) + ¸f1(¸2 + ¸) and f (¸ + 1) = f0(¸2 + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. Then ˘ ¸2 + ¸ ¯ is a size-(n=2) F2-linear space. Apply same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits

SLIDE 70

and Mateer evaluate + c1x + · · · + cn−1xn−1 size-n F2-linear space. main idea: Write f as x) + xf1(x2 + x).

verlap between f (¸) =

+ ¸) + ¸f1(¸2 + ¸) ¸ + 1) = + ¸) + (¸ + 1)f1(¸2 + ¸). “Twist” to ensure 1 ∈ space. ˘ ¸2 + ¸ ¯ is a =2) F2-linear space. same idea recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you Cryptosystem Our speedups (We now cr.yp.to/papers.html#auth256 Fast syndrome without Important Fast secret using bit sorting net permutation

SLIDE 71

evaluate · · · + cn−1xn−1

linear space.

a: Write f as (x2 + x). een f (¸) = (¸2 + ¸) + 1)f1(¸2 + ¸). ensure 1 ∈ space. is a

linear space.

recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in pap Cryptosystem specification. Our speedups to additive (We now have more cr.yp.to/papers.html#auth256 Fast syndrome computation without big precom Important for light Fast secret permutation using bit operations: sorting networks, permutation netwo

SLIDE 72

xn−1 . as = + ¸). space. recursively. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in paper: Cryptosystem specification. Our speedups to additive FFT. (We now have more speedups: cr.yp.to/papers.html#auth256 Fast syndrome computation without big precomputed matrix. Important for lightweight! Fast secret permutation using bit operations: sorting networks, permutation networks.

SLIDE 73

Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. Also 10× speedup for CFS. More information: cr.yp.to/papers.html#mcbits What you find in paper: Cryptosystem specification. Our speedups to additive FFT. (We now have more speedups: cr.yp.to/papers.html#auth256.) Fast syndrome computation without big precomputed matrix. Important for lightweight! Fast secret permutation using bit operations: sorting networks, permutation networks.