SLIDE 1 McBits: fast constant-time code-based cryptography (to appear at CHES 2013)
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography.
SLIDE 2 McBits: fast constant-time code-based cryptography (to appear at CHES 2013)
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level.
SLIDE 3 McBits: fast constant-time code-based cryptography (to appear at CHES 2013)
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers.
SLIDE 4 McBits: fast constant-time code-based cryptography (to appear at CHES 2013)
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc.
SLIDE 5 McBits: fast constant-time code-based cryptography (to appear at CHES 2013)
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record.
SLIDE 6 McBits: fast constant-time code-based cryptography (to appear at CHES 2013)
University of Illinois at Chicago & Technische Universiteit Eindhoven Joint work with: Tung Chou Technische Universiteit Eindhoven Peter Schwabe Radboud University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once.
SLIDE 7 McBits: constant-time de-based cryptography appear at CHES 2013) Bernstein University of Illinois at Chicago & echnische Universiteit Eindhoven
Chou echnische Universiteit Eindhoven Schwabe
Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. Examples Some cycle (Intel Co from bench.cr.yp.to mceliece (2008 Bisw gls254 DH (binary elliptic kumfp127g (hyperelliptic; curve25519 (conservative mceliece ronald1024
SLIDE 8
constant-time cryptography CHES 2013) Illinois at Chicago & Universiteit Eindhoven Universiteit Eindhoven University Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. Examples of the comp Some cycle counts (Intel Core i5-3210M, from bench.cr.yp.to mceliece encrypt (2008 Biswas–Sendri gls254 DH (binary elliptic curve; kumfp127g DH (hyperelliptic; Euro curve25519 DH (conservative elliptic mceliece decrypt ronald1024 decrypt
SLIDE 9
Chicago & Eindhoven Eindhoven Nijmegen Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt (2008 Biswas–Sendrier, 280) gls254 DH (binary elliptic curve; CHES kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040
SLIDE 10
Objectives Set new speed records for public-key cryptography. ✿ ✿ ✿ at a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, branch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto with a solid track record. ✿ ✿ ✿ all of the above at once. Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, 280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040
SLIDE 11 Objectives new speed records public-key cryptography. ✿ ✿ ✿ a high security level. ✿ ✿ ✿ including protection against quantum computers. ✿ ✿ ✿ including full protection against cache-timing attacks, ranch-prediction attacks, etc. ✿ ✿ ✿ using code-based crypto solid track record. ✿ ✿ ✿
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, 280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New dec (♥❀ t) = (4096❀
SLIDE 12 records cryptography. ✿ ✿ ✿ security level. ✿ ✿ ✿ rotection computers. ✿ ✿ ✿ full protection cache-timing attacks, rediction attacks, etc. ✿ ✿ ✿ de-based crypto track record. ✿ ✿ ✿
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, 280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds (♥❀ t) = (4096❀ 41);
SLIDE 13 cryptography. ✿ ✿ ✿ level. ✿ ✿ ✿ ers. ✿ ✿ ✿ rotection attacks, etc. ✿ ✿ ✿ crypto ✿ ✿ ✿
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, 280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds (♥❀ t) = (4096❀ 41); 2128 securit
SLIDE 14
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, 280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security:
SLIDE 15
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, 280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.)
SLIDE 16
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, 280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) (♥❀ t) = (2048❀ 32); 280 security: 26544 Ivy Bridge cycles.
SLIDE 17
Examples of the competition Some cycle counts on h9ivy (Intel Core i5-3210M, Ivy Bridge) from bench.cr.yp.to: mceliece encrypt 61440 (2008 Biswas–Sendrier, 280) gls254 DH 77468 (binary elliptic curve; CHES 2013) kumfp127g DH 116944 (hyperelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) (♥❀ t) = (2048❀ 32); 280 security: 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS.
SLIDE 18
Examples of the competition cycle counts on h9ivy Core i5-3210M, Ivy Bridge) bench.cr.yp.to: mceliece encrypt 61440 Biswas–Sendrier, 280) DH 77468 ry elliptic curve; CHES 2013) kumfp127g DH 116944 erelliptic; Eurocrypt 2013) curve25519 DH 182632 (conservative elliptic curve) mceliece decrypt 1219344 ronald1024 decrypt 1340040 New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) (♥❀ t) = (2048❀ 32); 280 security: 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time The extremist’s to eliminate Handle all using only XOR (^),
SLIDE 19
competition counts on h9ivy i5-3210M, Ivy Bridge) bench.cr.yp.to: encrypt 61440 as–Sendrier, 280) 77468 curve; CHES 2013) 116944 Eurocrypt 2013) 182632 elliptic curve) decrypt 1219344 decrypt 1340040 New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) (♥❀ t) = (2048❀ 32); 280 security: 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s app to eliminate timing Handle all secret data using only bit operations— XOR (^), AND (&),
SLIDE 20
etition h9ivy Bridge) 61440
80)
77468 CHES 2013) 116944 2013) 182632 curve) 1219344 1340040 New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) (♥❀ t) = (2048❀ 32); 280 security: 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.
SLIDE 21
New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) (♥❀ t) = (2048❀ 32); 280 security: 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc.
SLIDE 22
New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) (♥❀ t) = (2048❀ 32); 280 security: 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach.
SLIDE 23
New decoding speeds (♥❀ t) = (4096❀ 41); 2128 security: 60493 Ivy Bridge cycles. Talk will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) (♥❀ t) = (2048❀ 32); 280 security: 26544 Ivy Bridge cycles. All load/store addresses and all branch conditions are public. Eliminates cache-timing attacks etc. Similar improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?”
SLIDE 24 decoding speeds ♥❀ t (4096❀ 41); 2128 security: Ivy Bridge cycles. will focus on this case. (Decryption is slightly slower: includes hash, cipher, MAC.) ♥❀ t (2048❀ 32); 280 security: Ivy Bridge cycles. load/store addresses all branch conditions
cache-timing attacks etc. r improvements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we a Not as slo On a typical the XOR is actually
SLIDE 25 speeds ♥❀ t ❀ 41); 2128 security: Bridge cycles.
slightly slower: cipher, MAC.) ♥❀ t ❀ 32); 280 security: Bridge cycles. addresses conditions Eliminates attacks etc. rovements for CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit the XOR instruction is actually 32-bit X
- perating in parallel
- n vectors of 32 bits
SLIDE 26 ♥❀ t ❀ security: case. er: C.) ♥❀ t ❀ security: CFS. Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
SLIDE 27 Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
SLIDE 28 Constant-time fanaticism The extremist’s approach to eliminate timing attacks: Handle all secret data using only bit operations— XOR (^), AND (&), etc. We take this approach. “How can this be competitive in speed? Are you really simulating field multiplication with hundreds of bit operations instead of simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
SLIDE 29 Constant-time fanaticism extremist’s approach eliminate timing attacks: Handle all secret data
), AND (&), etc. take this approach. can this be etitive in speed?
multiplication with hundreds of bit operations
Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not imme that this saves time multiplication
SLIDE 30 fanaticism approach timing attacks: secret data erations— (&), etc. approach. e speed? simulating multiplication with
simple log tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F
SLIDE 31 attacks: tables?” Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.
SLIDE 32 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212.
SLIDE 33 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212.
SLIDE 34 Yes, we are. Not as slow as it sounds! On a typical 32-bit CPU, the XOR instruction is actually 32-bit XOR,
- perating in parallel
- n vectors of 32 bits.
Low-end smartphone CPU: 128-bit XOR every cycle. Ivy Bridge: 256-bit XOR every cycle,
Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing.
SLIDE 35 e are. slow as it sounds! ypical 32-bit CPU, OR instruction actually 32-bit XOR, erating in parallel vectors of 32 bits. w-end smartphone CPU: 128-bit XOR every cycle. Bridge: 256-bit XOR every cycle, e 128-bit XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive Fix ♥ = t Big final is to find
① ✁ ✁ ✁ ❝ ① For each ☛ ✷ compute ❢ ☛ 41 adds,
SLIDE 36 it sounds! 32-bit CPU, instruction 32-bit XOR, rallel bits. rtphone CPU: every cycle. every cycle, XORs. Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212 t Big final decoding is to find all roots
❝ ① For each ☛ ✷ F212, compute ❢(☛) by Ho 41 adds, 41 mults.
SLIDE 37 CPU: Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults.
SLIDE 38 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults.
SLIDE 39 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults.
SLIDE 40 Not immediately obvious that this “bitslicing” saves time for, e.g., multiplication in F212. But quite obvious that it saves time for addition in F212. Typical decoding algorithms have add, mult roughly balanced. Coming next: how to save many adds and most mults. Nice synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults.
SLIDE 41 immediately obvious this “bitslicing” time for, e.g., multiplication in F212. quite obvious that it time for addition in F212. ypical decoding algorithms add, mult roughly balanced. Coming next: how to save adds and most mults. synergy with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ ♥❂ ♥ so Horner’s Θ(♥t) = ♥ ❂ ♥
SLIDE 42
“bitslicing” e.g., F212.
addition in F212. algorithms roughly balanced. how to save most mults. with bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ ♥ so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥
SLIDE 43 F212. rithms balanced. mults. bitslicing. The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥).
SLIDE 44 The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥).
SLIDE 45 The additive FFT Fix ♥ = 4096 = 212, t = 41. Big final decoding step is to find all roots in F212
- f ❢ = ❝41①41 + ✁ ✁ ✁ + ❝0①0.
For each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: 41 adds, 41 mults. Or use Chien search: compute ❝✐❣✐, ❝✐❣2✐, ❝✐❣3✐, etc. Cost per point: again 41 adds, 41 mults. Our cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥?
SLIDE 46
additive FFT ♥ = 4096 = 212, t = 41. final decoding step find all roots in F212 ❢ ❝41①41 + ✁ ✁ ✁ + ❝0①0. each ☛ ✷ F212, compute ❢(☛) by Horner’s rule: adds, 41 mults. Chien search: compute ❝✐❣✐ ❝✐❣2✐, ❝✐❣3✐, etc. Cost per again 41 adds, 41 mults. cost: 6.01 adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard Want to ❢ = ❝0 + ❝ ① ✁ ✁ ✁ ❝♥ ①♥ at all the ♥ Write ❢ ❢ ① ①❢ ① Observe ❢(☛) = ❢ ☛ ☛❢ ☛ ❢(☛) = ❢ ☛ ☛❢ ☛ ❢0 has ♥❂ evaluate ♥❂ by same Similarly ❢
SLIDE 47 FFT ♥ 212, t = 41. ding step
❢ ❝ ① ✁ ✁ ✁ + ❝0①0. ☛ ✷
12,
❢ ☛ y Horner’s rule: mults. search: compute ❝✐❣✐ ❝✐❣ ✐ ❝✐❣ ✐, etc. Cost per adds, 41 mults. adds, 2.09 mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ ❝♥ ①♥ at all the ♥th roots Write ❢ as ❢0(①2) ①❢ ① Observe big overlap ❢(☛) = ❢0(☛2) + ☛❢ ☛ ❢(☛) = ❢0(☛2) ☛❢ ☛ ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd by same idea recursively Similarly ❢1.
SLIDE 48
♥ t 41. ❢ ❝ ① ✁ ✁ ✁ ❝ ① . ☛ ✷ ❢ ☛ rule: compute ❝✐❣✐ ❝✐❣ ✐ ❝✐❣ ✐ Cost per mults. mults. Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥ at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of by same idea recursively. Similarly ❢1.
SLIDE 49
Asymptotics: normally t ✷ Θ(♥❂ lg ♥), so Horner’s rule costs Θ(♥t) = Θ(♥2❂ lg ♥). Wait a minute. Didn’t we learn in school that FFT evaluates an ♥-coeff polynomial at ♥ points using ♥1+♦(1) operations? Isn’t this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1.
SLIDE 50 Asymptotics: rmally t ✷ Θ(♥❂ lg ♥), rner’s rule costs ♥t = Θ(♥2❂ lg ♥). minute. we learn in school FFT evaluates ♥-coeff polynomial ♥
♥1+♦(1) operations? this better than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in ☛ ☛ Standard FFT considered 1988 Wa independently “additive Still quite 1996 von some im 2010 Gao–Mateer: much better We use Gao–Mateer, plus some
SLIDE 51 t ✷ ♥❂ lg ♥), costs ♥t ♥ ❂ lg ♥). in school evaluates ♥
♥ ♥
♦
erations? than ♥2❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ ☛ Standard workarounds FFT considered imp 1988 Wang–Zhu, independently 1989 “additive FFT” in Still quite expensive. 1996 von zur Gathen–Gerha some improvements. 2010 Gao–Mateer: much better additive We use Gao–Mateer, plus some new imp
SLIDE 52
t ✷ ♥❂ ♥ ♥t ♥ ❂ ♥ ♥ ♥ ♥
♦
♥ ❂ lg ♥? Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.
SLIDE 53
Standard radix-2 FFT: Want to evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 at all the ♥th roots of 1. Write ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢(☛) = ❢0(☛2) + ☛❢1(☛2), ❢(☛) = ❢0(☛2) ☛❢1(☛2). ❢0 has ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 by same idea recursively. Similarly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements.
SLIDE 54 Standard radix-2 FFT: to evaluate ❢ ❝ + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 the ♥th roots of 1. ❢ as ❢0(①2) + ①❢1(①2). Observe big overlap between ❢ ☛ ❢0(☛2) + ☛❢1(☛2), ❢ ☛ = ❢0(☛2) ☛❢1(☛2). ❢ ♥❂2 coeffs; evaluate at (♥❂2)nd roots of 1 same idea recursively. rly ❢1. Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and ❢ = ❝0 + ❝ ① ✁ ✁ ✁ ❝♥ ①♥
Main idea: ❢ ❢0(①2 + ① ①❢ ① ① Big overlap ❢ ☛ ❢0(☛2 + ☛ ☛❢ ☛ ☛ and ❢(☛ ❢0(☛2 + ☛ ☛ ❢ ☛ ☛ “Twist” ✷ Then ✟ ☛ ☛ ✠ size-(♥❂2) Apply same
SLIDE 55 FFT: evaluate ❢ ❝ ❝ ① ✁ ✁ ✁ + ❝♥1①♥1 ♥
❢ ❢ ① ) + ①❢1(①2).
❢ ☛ ❢ ☛ ☛❢1(☛2), ❢ ☛ ❢ ☛ ) ☛❢1(☛2). ❢ ♥❂ effs; ♥❂2)nd roots of 1 recursively. ❢ Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ ❝♥ ①♥
Main idea: Write ❢ ❢0(①2 + ①) + ①❢1(① ① Big overlap between ❢ ☛ ❢0(☛2 + ☛) + ☛❢1(☛ ☛ and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + ❢ ☛ ☛ “Twist” to ensure ✷ Then ✟ ☛2 + ☛ ✠ is size-(♥❂2) F2-linea Apply same idea recursively
SLIDE 56 ❢ ❝ ❝ ① ✁ ✁ ✁ ❝♥ ①♥1 ♥ ❢ ❢ ① ①❢ ①2). een ❢ ☛ ❢ ☛ ☛❢ ☛ ), ❢ ☛ ❢ ☛ ☛❢ ☛ ). ❢ ♥❂ ♥❂
❢ Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥
- n a size-♥ F2-linear space.
Main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛ “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively.
SLIDE 57 Useless in char 2: ☛ = ☛. Standard workarounds are painful. FFT considered impractical. 1988 Wang–Zhu, independently 1989 Cantor: “additive FFT” in char 2. Still quite expensive. 1996 von zur Gathen–Gerhard: some improvements. 2010 Gao–Mateer: much better additive FFT. We use Gao–Mateer, plus some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively.
SLIDE 58 Useless in char 2: ☛ = ☛. Standard workarounds are painful. considered impractical. ang–Zhu, endently 1989 Cantor: “additive FFT” in char 2. quite expensive. von zur Gathen–Gerhard: improvements. Gao–Mateer: better additive FFT. use Gao–Mateer, some new improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. We generalize ❢ = ❝0 + ❝ ① ✁ ✁ ✁ ❝t①t for any t ❁ ♥ ✮ several not all of by simply For t = 0: ❝ For t ✷ ❢ ❀ ❣ ❢1 is a constant. Instead of this constant ☛ multiply and compute
SLIDE 59 2: ☛ = ☛. rounds are painful. impractical. ng–Zhu, 1989 Cantor: in char 2. ensive. Gathen–Gerhard: rovements. Gao–Mateer: additive FFT. Gao–Mateer, improvements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. We generalize to ❢ = ❝0 + ❝1① + ✁ ✁ ✁ ❝t①t for any t ❁ ♥. ✮ several optimizations, not all of which are by simply tracking For t = 0: copy ❝0 For t ✷ ❢1❀ 2❣: ❢1 is a constant. Instead of multiplying this constant by each ☛ multiply only by generato and compute subset
SLIDE 60 ☛ ☛. painful. ractical. r: Gathen–Gerhard: FFT. rovements. Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. We generalize to ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝t①t for any t ❁ ♥. ✮ several optimizations, not all of which are automated by simply tracking zeros. For t = 0: copy ❝0. For t ✷ ❢1❀ 2❣: ❢1 is a constant. Instead of multiplying this constant by each ☛, multiply only by generators and compute subset sums.
SLIDE 61 Gao and Mateer evaluate ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1
- n a size-♥ F2-linear space.
Main idea: Write ❢ as ❢0(①2 + ①) + ①❢1(①2 + ①). Big overlap between ❢(☛) = ❢0(☛2 + ☛) + ☛❢1(☛2 + ☛) and ❢(☛ + 1) = ❢0(☛2 + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. Then ✟ ☛2 + ☛ ✠ is a size-(♥❂2) F2-linear space. Apply same idea recursively. We generalize to ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝t①t for any t ❁ ♥. ✮ several optimizations, not all of which are automated by simply tracking zeros. For t = 0: copy ❝0. For t ✷ ❢1❀ 2❣: ❢1 is a constant. Instead of multiplying this constant by each ☛, multiply only by generators and compute subset sums.
SLIDE 62 and Mateer evaluate ❢ ❝ + ❝1① + ✁ ✁ ✁ + ❝♥1①♥1 size-♥ F2-linear space. idea: Write ❢ as ❢ ① + ①) + ①❢1(①2 + ①).
❢ ☛ + ☛) + ☛❢1(☛2 + ☛) ❢(☛ + 1) = ❢ ☛ + ☛) + (☛ + 1)❢1(☛2 + ☛). “Twist” to ensure 1 ✷ space. ✟ ☛2 + ☛ ✠ is a ♥❂2) F2-linear space. same idea recursively. We generalize to ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝t①t for any t ❁ ♥. ✮ several optimizations, not all of which are automated by simply tracking zeros. For t = 0: copy ❝0. For t ✷ ❢1❀ 2❣: ❢1 is a constant. Instead of multiplying this constant by each ☛, multiply only by generators and compute subset sums. Syndrome Initial deco s0 = r1 + r ✁ ✁ ✁ r♥ s1 = r1☛ r ☛ ✁ ✁ ✁ r♥☛♥ s2 = r1☛ r ☛ ✁ ✁ ✁ r♥☛♥ . . ., st = r1☛t r ☛t ✁ ✁ ✁ r♥☛t
♥
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ scaled by Typically mapping Not as slo still ♥2+♦
SLIDE 63 evaluate ❢ ❝ ❝ ① ✁ ✁ ✁ + ❝♥1①♥1 ♥
rite ❢ as ❢ ① ① ①❢1(①2 + ①). een ❢(☛) = ❢ ☛ ☛ ☛❢1(☛2 + ☛) ❢ ☛ ❢ ☛ ☛ ☛ + 1)❢1(☛2 + ☛). ensure 1 ✷ space. ✟ ☛ ☛ ✠ is a ♥❂
recursively. We generalize to ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝t①t for any t ❁ ♥. ✮ several optimizations, not all of which are automated by simply tracking zeros. For t = 0: copy ❝0. For t ✷ ❢1❀ 2❣: ❢1 is a constant. Instead of multiplying this constant by each ☛, multiply only by generators and compute subset sums. Syndrome computation Initial decoding step: s0 = r1 + r2 + ✁ ✁ ✁ r♥ s1 = r1☛1 + r2☛2 ✁ ✁ ✁ r♥☛♥ s2 = r1☛2
1 + r2☛2 2
✁ ✁ ✁ r♥☛♥ . . ., st = r1☛t
1 + r2☛t 2
✁ ✁ ✁ r♥☛t
♥
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are r scaled by Goppa constants. Typically precompute mapping bits to syndrome. Not as slow as Chien still ♥2+♦(1) and huge
SLIDE 64
❢ ❝ ❝ ① ✁ ✁ ✁ ❝♥ ①♥1 ♥ space. ❢ ❢ ① ① ①❢ ① ①). ❢ ☛ = ❢ ☛ ☛ ☛❢ ☛ ☛) ❢ ☛ ❢ ☛ ☛ ☛ ❢ ☛2 + ☛). ✷ space. ✟ ☛ ☛ ✠ ♥❂ space. recursively. We generalize to ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝t①t for any t ❁ ♥. ✮ several optimizations, not all of which are automated by simply tracking zeros. For t = 0: copy ❝0. For t ✷ ❢1❀ 2❣: ❢1 is a constant. Instead of multiplying this constant by each ☛, multiply only by generators and compute subset sums. Syndrome computation Initial decoding step: compute s0 = r1 + r2 + ✁ ✁ ✁ + r♥, s1 = r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥ s2 = r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛♥
. . ., st = r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are received bits scaled by Goppa constants. Typically precompute matrix mapping bits to syndrome. Not as slow as Chien search still ♥2+♦(1) and huge secret
SLIDE 65
We generalize to ❢ = ❝0 + ❝1① + ✁ ✁ ✁ + ❝t①t for any t ❁ ♥. ✮ several optimizations, not all of which are automated by simply tracking zeros. For t = 0: copy ❝0. For t ✷ ❢1❀ 2❣: ❢1 is a constant. Instead of multiplying this constant by each ☛, multiply only by generators and compute subset sums. Syndrome computation Initial decoding step: compute s0 = r1 + r2 + ✁ ✁ ✁ + r♥, s1 = r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥, s2 = r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛2 ♥,
. . ., st = r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥.
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are received bits scaled by Goppa constants. Typically precompute matrix mapping bits to syndrome. Not as slow as Chien search but still ♥2+♦(1) and huge secret key.
SLIDE 66
generalize to ❢ ❝ + ❝1① + ✁ ✁ ✁ + ❝t①t any t ❁ ♥. ✮ several optimizations, all of which are automated simply tracking zeros. t 0: copy ❝0. t ✷ ❢1❀ 2❣: ❢ constant. Instead of multiplying constant by each ☛, multiply only by generators compute subset sums. Syndrome computation Initial decoding step: compute s0 = r1 + r2 + ✁ ✁ ✁ + r♥, s1 = r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥, s2 = r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛2 ♥,
. . ., st = r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥.
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are received bits scaled by Goppa constants. Typically precompute matrix mapping bits to syndrome. Not as slow as Chien search but still ♥2+♦(1) and huge secret key. Compare ❢(☛1) = ❝ ❝ ☛ ✁ ✁ ✁ ❝t☛t ❢(☛2) = ❝ ❝ ☛ ✁ ✁ ✁ ❝t☛t . . ., ❢(☛♥) = ❝ ❝ ☛♥ ✁ ✁ ✁ ❝t☛t
♥
SLIDE 67 ❢ ❝ ❝ ① ✁ ✁ ✁ + ❝t①t t ❁ ♥ ✮
are automated tracking zeros. t ❝0. t ✷ ❢ ❀ ❣ ❢ constant. ltiplying each ☛, generators subset sums. Syndrome computation Initial decoding step: compute s0 = r1 + r2 + ✁ ✁ ✁ + r♥, s1 = r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥, s2 = r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛2 ♥,
. . ., st = r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥.
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are received bits scaled by Goppa constants. Typically precompute matrix mapping bits to syndrome. Not as slow as Chien search but still ♥2+♦(1) and huge secret key. Compare to multip ❢(☛1) = ❝0 + ❝1☛1 ✁ ✁ ✁ ❝t☛t ❢(☛2) = ❝0 + ❝1☛2 ✁ ✁ ✁ ❝t☛t . . ., ❢(☛♥) = ❝0 + ❝1☛♥ ✁ ✁ ✁ ❝t☛t
♥
SLIDE 68
❢ ❝ ❝ ① ✁ ✁ ✁ ❝t①t t ❁ ♥ ✮ automated t ❝ t ✷ ❢ ❀ ❣ ❢ ☛ rs sums. Syndrome computation Initial decoding step: compute s0 = r1 + r2 + ✁ ✁ ✁ + r♥, s1 = r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥, s2 = r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛2 ♥,
. . ., st = r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥.
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are received bits scaled by Goppa constants. Typically precompute matrix mapping bits to syndrome. Not as slow as Chien search but still ♥2+♦(1) and huge secret key. Compare to multipoint evaluation: ❢(☛1) = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t ❢(☛2) = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t . . ., ❢(☛♥) = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥
SLIDE 69
Syndrome computation Initial decoding step: compute s0 = r1 + r2 + ✁ ✁ ✁ + r♥, s1 = r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥, s2 = r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛2 ♥,
. . ., st = r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥.
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are received bits scaled by Goppa constants. Typically precompute matrix mapping bits to syndrome. Not as slow as Chien search but still ♥2+♦(1) and huge secret key. Compare to multipoint evaluation: ❢(☛1) = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢(☛2) = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t
2,
. . ., ❢(☛♥) = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
SLIDE 70
Syndrome computation Initial decoding step: compute s0 = r1 + r2 + ✁ ✁ ✁ + r♥, s1 = r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥, s2 = r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛2 ♥,
. . ., st = r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥.
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are received bits scaled by Goppa constants. Typically precompute matrix mapping bits to syndrome. Not as slow as Chien search but still ♥2+♦(1) and huge secret key. Compare to multipoint evaluation: ❢(☛1) = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢(☛2) = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t
2,
. . ., ❢(☛♥) = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
Matrix for syndrome computation is transpose of matrix for multipoint evaluation.
SLIDE 71 Syndrome computation Initial decoding step: compute s0 = r1 + r2 + ✁ ✁ ✁ + r♥, s1 = r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥, s2 = r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛2 ♥,
. . ., st = r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥.
r1❀ r2❀ ✿ ✿ ✿ ❀ r♥ are received bits scaled by Goppa constants. Typically precompute matrix mapping bits to syndrome. Not as slow as Chien search but still ♥2+♦(1) and huge secret key. Compare to multipoint evaluation: ❢(☛1) = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢(☛2) = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t
2,
. . ., ❢(☛♥) = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
Matrix for syndrome computation is transpose of matrix for multipoint evaluation. Amazing consequence: syndrome computation is as few
- ps as multipoint evaluation.
Eliminate precomputed matrix.
SLIDE 72 Syndrome computation decoding step: compute s r1 + r2 + ✁ ✁ ✁ + r♥, s r1☛1 + r2☛2 + ✁ ✁ ✁ + r♥☛♥, s r1☛2
1 + r2☛2 2 + ✁ ✁ ✁ + r♥☛2 ♥,
st r1☛t
1 + r2☛t 2 + ✁ ✁ ✁ + r♥☛t ♥.
r ❀ r ❀ ✿ ✿ ✿ ❀ r♥ are received bits by Goppa constants. ypically precompute matrix mapping bits to syndrome. slow as Chien search but ♥2+♦(1) and huge secret key. Compare to multipoint evaluation: ❢(☛1) = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢(☛2) = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t
2,
. . ., ❢(☛♥) = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
Matrix for syndrome computation is transpose of matrix for multipoint evaluation. Amazing consequence: syndrome computation is as few
- ps as multipoint evaluation.
Eliminate precomputed matrix. Transposition If a linea computes ▼ then reversing exchanging computes ▼ 1956 Bord independently for Boolean 1973 Fiduccia preserves preserves number
SLIDE 73 computation step: compute s r r ✁ ✁ ✁ + r♥, s r ☛ r ☛2 + ✁ ✁ ✁ + r♥☛♥, s r ☛ r ☛2
2 + ✁ ✁ ✁ + r♥☛2 ♥,
st r ☛t r ☛t
2 + ✁ ✁ ✁ + r♥☛t ♥.
r ❀ r ❀ ✿ ✿ ✿ ❀ r♥ re received bits constants. recompute matrix syndrome. Chien search but ♥
♦
huge secret key. Compare to multipoint evaluation: ❢(☛1) = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢(☛2) = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t
2,
. . ., ❢(☛♥) = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
Matrix for syndrome computation is transpose of matrix for multipoint evaluation. Amazing consequence: syndrome computation is as few
- ps as multipoint evaluation.
Eliminate precomputed matrix. Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges exchanging inputs/outputs computes the transp ▼ 1956 Bordewijk; independently 1957 for Boolean matric 1973 Fiduccia analysis: preserves number of preserves number of number of nontrivial
SLIDE 74 compute s r r ✁ ✁ ✁ r♥ s r ☛ r ☛ ✁ ✁ ✁ r♥☛♥, s r ☛ r ☛ ✁ ✁ ✁ r♥☛2
♥,
st r ☛t r ☛t ✁ ✁ ✁ r♥☛t
♥.
r ❀ r ❀ ✿ ✿ ✿ ❀ r♥ bits constants. matrix syndrome. rch but ♥
♦
cret key. Compare to multipoint evaluation: ❢(☛1) = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢(☛2) = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t
2,
. . ., ❢(☛♥) = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
Matrix for syndrome computation is transpose of matrix for multipoint evaluation. Amazing consequence: syndrome computation is as few
- ps as multipoint evaluation.
Eliminate precomputed matrix. Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼ 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs.
SLIDE 75 Compare to multipoint evaluation: ❢(☛1) = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢(☛2) = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t
2,
. . ., ❢(☛♥) = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
Matrix for syndrome computation is transpose of matrix for multipoint evaluation. Amazing consequence: syndrome computation is as few
- ps as multipoint evaluation.
Eliminate precomputed matrix. Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼. 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs.
SLIDE 76
Compare to multipoint evaluation: ❢ ☛ = ❝0 + ❝1☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢ ☛ = ❝0 + ❝1☛2 + ✁ ✁ ✁ + ❝t☛t
2,
❢ ☛♥ = ❝0 + ❝1☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
for syndrome computation transpose of for multipoint evaluation. Amazing consequence: syndrome computation is as few multipoint evaluation. Eliminate precomputed matrix. Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼. 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs. We built producing Too many ♠ gcc ran
SLIDE 77 multipoint evaluation: ❢ ☛ ❝ ❝ ☛1 + ✁ ✁ ✁ + ❝t☛t
1,
❢ ☛ ❝ ❝ ☛2 + ✁ ✁ ✁ + ❝t☛t
2,
❢ ☛♥ ❝ ❝ ☛♥ + ✁ ✁ ✁ + ❝t☛t
♥.
syndrome computation multipoint evaluation. consequence: computation is as few
recomputed matrix. Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼. 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs. We built transposin producing C code. Too many variables ♠ gcc ran out of memo
SLIDE 78
evaluation: ❢ ☛ ❝ ❝ ☛ ✁ ✁ ✁ ❝t☛t
1,
❢ ☛ ❝ ❝ ☛ ✁ ✁ ✁ ❝t☛t
2,
❢ ☛♥ ❝ ❝ ☛♥ ✁ ✁ ✁ + ❝t☛t
♥.
computation evaluation. as few evaluation. matrix. Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼. 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs. We built transposing compiler producing C code. Too many variables for ♠ = gcc ran out of memory.
SLIDE 79
Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼. 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs. We built transposing compiler producing C code. Too many variables for ♠ = 13; gcc ran out of memory.
SLIDE 80
Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼. 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs. We built transposing compiler producing C code. Too many variables for ♠ = 13; gcc ran out of memory. Used qhasm register allocator to optimize the variables. Worked, but not very quickly.
SLIDE 81
Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼. 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs. We built transposing compiler producing C code. Too many variables for ♠ = 13; gcc ran out of memory. Used qhasm register allocator to optimize the variables. Worked, but not very quickly. Wrote faster register allocator. Still excessive code size.
SLIDE 82
Transposition principle: If a linear algorithm computes a matrix ▼ then reversing edges and exchanging inputs/outputs computes the transpose of ▼. 1956 Bordewijk; independently 1957 Lupanov for Boolean matrices. 1973 Fiduccia analysis: preserves number of mults; preserves number of adds plus number of nontrivial outputs. We built transposing compiler producing C code. Too many variables for ♠ = 13; gcc ran out of memory. Used qhasm register allocator to optimize the variables. Worked, but not very quickly. Wrote faster register allocator. Still excessive code size. Built new interpreter, allowing some code compression. Still big; still some overhead.
SLIDE 83
ear algorithm computes a matrix ▼ reversing edges and exchanging inputs/outputs computes the transpose of ▼. Bordewijk; endently 1957 Lupanov
Fiduccia analysis: reserves number of mults; reserves number of adds plus er of nontrivial outputs. We built transposing compiler producing C code. Too many variables for ♠ = 13; gcc ran out of memory. Used qhasm register allocator to optimize the variables. Worked, but not very quickly. Wrote faster register allocator. Still excessive code size. Built new interpreter, allowing some code compression. Still big; still some overhead. Better solution: stared at wrote do with same Small co Speedups translate to transp Further savings: merged first scaling b
SLIDE 84
rinciple: rithm matrix ▼ edges and inputs/outputs transpose of ▼. 1957 Lupanov matrices. analysis: er of mults; er of adds plus nontrivial outputs. We built transposing compiler producing C code. Too many variables for ♠ = 13; gcc ran out of memory. Used qhasm register allocator to optimize the variables. Worked, but not very quickly. Wrote faster register allocator. Still excessive code size. Built new interpreter, allowing some code compression. Still big; still some overhead. Better solution: stared at additive FFT, wrote down transp with same loops etc. Small code, no overhead. Speedups of additive translate easily to transposed algo Further savings: merged first stage scaling by Goppa constants.
SLIDE 85 ▼ inputs/outputs
Lupanov mults; plus tputs. We built transposing compiler producing C code. Too many variables for ♠ = 13; gcc ran out of memory. Used qhasm register allocator to optimize the variables. Worked, but not very quickly. Wrote faster register allocator. Still excessive code size. Built new interpreter, allowing some code compression. Still big; still some overhead. Better solution: stared at additive FFT, wrote down transposition with same loops etc. Small code, no overhead. Speedups of additive FFT translate easily to transposed algorithm. Further savings: merged first stage with scaling by Goppa constants.
SLIDE 86
We built transposing compiler producing C code. Too many variables for ♠ = 13; gcc ran out of memory. Used qhasm register allocator to optimize the variables. Worked, but not very quickly. Wrote faster register allocator. Still excessive code size. Built new interpreter, allowing some code compression. Still big; still some overhead. Better solution: stared at additive FFT, wrote down transposition with same loops etc. Small code, no overhead. Speedups of additive FFT translate easily to transposed algorithm. Further savings: merged first stage with scaling by Goppa constants.
SLIDE 87 ilt transposing compiler ducing C code. many variables for ♠ = 13; ran out of memory. qhasm register allocator
ed, but not very quickly. faster register allocator. excessive code size. new interpreter, wing some code compression. big; still some overhead. Better solution: stared at additive FFT, wrote down transposition with same loops etc. Small code, no overhead. Speedups of additive FFT translate easily to transposed algorithm. Further savings: merged first stage with scaling by Goppa constants. Secret permutation Additive ✮ ❢ field eleme This is not needed in Must apply part of the Same issue Solution: Almost done Beneˇ s net
SLIDE 88
de. riables for ♠ = 13; memory. register allocator variables. very quickly. register allocator. de size. reter, code compression. some overhead. Better solution: stared at additive FFT, wrote down transposition with same loops etc. Small code, no overhead. Speedups of additive FFT translate easily to transposed algorithm. Further savings: merged first stage with scaling by Goppa constants. Secret permutation Additive FFT ✮ ❢ field elements in a This is not the order needed in code-base Must apply a secret part of the secret k Same issue for syndrome. Solution: Batcher Almost done with Beneˇ s network.
SLIDE 89 compiler ♠ = 13; cator quickly. cator. ression.
Better solution: stared at additive FFT, wrote down transposition with same loops etc. Small code, no overhead. Speedups of additive FFT translate easily to transposed algorithm. Further savings: merged first stage with scaling by Goppa constants. Secret permutation Additive FFT ✮ ❢ values at field elements in a standard This is not the order needed in code-based crypto! Must apply a secret permutation, part of the secret key. Same issue for syndrome. Solution: Batcher sorting. Almost done with faster solution: Beneˇ s network.
SLIDE 90
Better solution: stared at additive FFT, wrote down transposition with same loops etc. Small code, no overhead. Speedups of additive FFT translate easily to transposed algorithm. Further savings: merged first stage with scaling by Goppa constants. Secret permutation Additive FFT ✮ ❢ values at field elements in a standard order. This is not the order needed in code-based crypto! Must apply a secret permutation, part of the secret key. Same issue for syndrome. Solution: Batcher sorting. Almost done with faster solution: Beneˇ s network.
SLIDE 91
solution: at additive FFT, down transposition same loops etc. code, no overhead. eedups of additive FFT translate easily transposed algorithm. urther savings: merged first stage with by Goppa constants. Secret permutation Additive FFT ✮ ❢ values at field elements in a standard order. This is not the order needed in code-based crypto! Must apply a secret permutation, part of the secret key. Same issue for syndrome. Solution: Batcher sorting. Almost done with faster solution: Beneˇ s network. Results 60493 Ivy 8622 fo 20846 fo 7714 fo 14794 fo 8520 fo Code will We’re still More info cr.yp.to/papers.html#mcbits
SLIDE 92 additive FFT, transposition etc.
additive FFT algorithm. stage with Goppa constants. Secret permutation Additive FFT ✮ ❢ values at field elements in a standard order. This is not the order needed in code-based crypto! Must apply a secret permutation, part of the secret key. Same issue for syndrome. Solution: Batcher sorting. Almost done with faster solution: Beneˇ s network. Results 60493 Ivy Bridge cycles: 8622 for permuta 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permuta Code will be public We’re still speeding More information: cr.yp.to/papers.html#mcbits
SLIDE 93
constants. Secret permutation Additive FFT ✮ ❢ values at field elements in a standard order. This is not the order needed in code-based crypto! Must apply a secret permutation, part of the secret key. Same issue for syndrome. Solution: Batcher sorting. Almost done with faster solution: Beneˇ s network. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. More information: cr.yp.to/papers.html#mcbits
SLIDE 94
Secret permutation Additive FFT ✮ ❢ values at field elements in a standard order. This is not the order needed in code-based crypto! Must apply a secret permutation, part of the secret key. Same issue for syndrome. Solution: Batcher sorting. Almost done with faster solution: Beneˇ s network. Results 60493 Ivy Bridge cycles: 8622 for permutation. 20846 for syndrome. 7714 for BM. 14794 for roots. 8520 for permutation. Code will be public domain. We’re still speeding it up. More information: cr.yp.to/papers.html#mcbits