Introduction The MP Problem Solving a system of m multivariate - - PowerPoint PPT Presentation
Introduction The MP Problem Solving a system of m multivariate - - PowerPoint PPT Presentation
Fast Exhaustive Search for Quadratic Systems in F 2 on FPGAs Charles Bouillaguet, Chen-Mou Cheng, Tung Chou, Ruben Niederhagen, Bo- Yin Yang August 15, 2013 Introduction The MP Problem Solving a system of m multivariate polynomial equations in
Introduction
The MP Problem
Solving a system of m multivariate polynomial equations in n variables over Fq is called the MP problem. The MP problem is an NP-hard problem even for multivariate quadratic systems and q ✏ 2.
Introduction August 15, 2013 1 / 13
Introduction
Multivariate Public-Key Cryptography:
e.g. HFE, SFLASH, and QUARTZ
Provably-Secure Stream Ciphers:
e.g. QUAD
➓ ➓ ➓
Introduction August 15, 2013 2 / 13
Introduction
Multivariate Public-Key Cryptography:
e.g. HFE, SFLASH, and QUARTZ
Provably-Secure Stream Ciphers:
e.g. QUAD
Algebraic Cryptanalysis:
Obtain a system of multivariate polynomial equations with the secret among the variables.
➓ Naturally breaks the above, ➓ does not break AES as first advertised, ➓ but does break, e.g., KeeLoq.
Complexity?
Introduction August 15, 2013 2 / 13
Introduction
Most Efficient Algorithm for F2:
Brute-force search, testing all 2n possible inputs.
Previous Work:
On GPUs we can solve a quadratic system of 48+ equations in 48 variables in 21min.
Introduction August 15, 2013 3 / 13
Introduction
Most Efficient Algorithm for F2:
Brute-force search, testing all 2n possible inputs.
Previous Work:
On GPUs we can solve a quadratic system of 48+ equations in 48 variables in 21min.
Research Question:
How would specifically designed hardware perform on this task? We approach the answer by solving multivariate quadratic systems
- n reconfigurable hardware (FPGAs).
Introduction August 15, 2013 3 / 13
Gray-Code Approach
Full-Evaluation Approach
➓ Evaluate the whole equation for each possible input. ➓ Time Complexity: O♣2nn2q ➓ Memory Complexity: O♣nq ➓ ➓ ➓ ➓
♣ q
➓
♣ q
Exhaustive Search August 15, 2013 4 / 13
Gray-Code Approach
Full-Evaluation Approach
➓ Evaluate the whole equation for each possible input. ➓ Time Complexity: O♣2nn2q ➓ Memory Complexity: O♣nq
Gray-Code Approach
➓ Only re-compute those parts of the equation that have changed. ➓ Enumerate input vector in Gray-code order. ➓ Update solution using the derivatives of the involved variables. ➓ Time Complexity: O♣2nmq ➓ Memory Complexity: O♣n2mq
Trade computation for memory.
Exhaustive Search August 15, 2013 4 / 13
Gray-Code Approach
k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1
- 1
- 1
✏ ✏ ✏ ✏ ✏ ✏
✏
- ✏
☎
- ☎
- ☎
- Exhaustive Search
August 15, 2013 5 / 13
Gray-Code Approach
k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1
- 1
- 1
k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1
- 1
- 1
1
Exhaustive Search August 15, 2013 5 / 13
Gray-Code Approach
k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1
- 1
- 1
k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1
- 1
- 1
1 k ✏ 01100b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 1, x1 ✏ 0, x0 ✏ 0
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 1 1 ☎ 0 1 ☎ 0 1
- 1
Exhaustive Search August 15, 2013 5 / 13
Gray-Code Approach
k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1
- 1
- 1
k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1
- 1
- 1
1 k ✏ 01001b in Gray-code order
f ✏ x4x2 x3x0 x2x1 x3
x1 x0 1
f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 0 1
- 1
1 ✏ ♣ q ✁ ☎ ✁
- ☎
- ✏
♣ q
- ❇
❇
♣ q
Exhaustive Search August 15, 2013 5 / 13
Gray-Code Approach
k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1
- 1
- 1
k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1
- 1
- 1
1 k ✏ 01001b in Gray-code order
f ✏ x4x2 x3x0 x2x1 x3
x1 x0 1
f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 0 1
- 1
1
f ✏ f♣01011bq
✁ 0 ☎ 1 ✁
1
0 ☎ 0 0 ✏ ♣ q
- ❇
❇
♣ q
Exhaustive Search August 15, 2013 5 / 13
Gray-Code Approach
k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1
- 1
- 1
k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1
f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1
- 1
- 1
1 k ✏ 01001b in Gray-code order
f ✏ x4x2 x3x0 x2x1 x3
x1 x0 1
f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 0 1
- 1
1
f ✏ f♣01011bq
✁ 0 ☎ 1 ✁
1
0 ☎ 0 0
f ✏ f♣01011bq
- ❇f
❇x1 ♣01001bq
Exhaustive Search August 15, 2013 5 / 13
Gray-Code Approach
Full-Evaluation Approach
➓ Evaluate the whole equation for each possible input. ➓ Time Complexity: O♣2nn2q ➓ Memory Complexity: O♣nq
Gray-Code Approach
➓ Only re-compute those parts of the equation that have changed. ➓ Enumerate input vector in Gray-code order. ➓ Update solution using the derivatives of the involved variables. ➓ Time Complexity: O♣2nmq ➓ Memory Complexity: O♣n2mq
Trade computation for memory.
Exhaustive Search August 15, 2013 6 / 13
Xilinx Spartan6 FPGA
Lookup Table (LUT) – LUT-6
Can be seen as
➓ logic: compute any logical expression in 6 variables, ➓ ROM: store 64bit, addressed by 6 address ports.
Can be used as two LUT-5 with identical input wires and independent
- utput wires.
Exhaustive Search on FPGAs August 15, 2013 7 / 13
Xilinx Spartan6 FPGA
Resources
➓ 50% SLICEX
➓ 4 LUT-6 ➓ 8 Flip-Flops
➓ 25% SLICEL
+ wide multiplexers + carry logic for large adders
➓ 25% SLICEM
+ LUT can be used as shift registers + LUT can be used as RAM sharing the same write address
➓ Block RAM, DSPs, IO, ...
Exhaustive Search on FPGAs August 15, 2013 7 / 13
Gray-Code Algorithm
24: function EVAL(s) 25:
while s.i ➔ 2n do
26:
s.i Ð s.i 1;
27:
k1 Ð BIT1♣s.iq;
28:
k2 Ð BIT2♣s.iq;
29:
if k2 valid then
30:
s.d✶rk1s Ð s.d✶rk1s ❵ s.d✷rk1, k2s;
31:
end if
32:
s.y Ð s.y ❵ s.d✶rk1s;
33:
if s.y ✏ 0 then
34:
return shr♣s.i, 1q ❵ s.i;
35:
end if
36:
end while
37: end function
Exhaustive Search on FPGAs August 15, 2013 8 / 13
Parallelization
Fix i Variables for 2i Parallel Instances:
f
✏
x4x2
- x3x0
x2x1 x3 x1 x0 1
e.g. i ✏ 2 : f00b ✏ 0 ☎ x2 0 ☎ x0 x2x1
x1 x0 1
f01b ✏ 0 ☎ x2 1 ☎ x0 x2x1 1
x1 x0 1
f10b ✏ 1 ☎ x2 0 ☎ x0 x2x1
x1 x0 1
f11b ✏ 1 ☎ x2 1 ☎ x0 x2x1 1
x1 x0 1
2i independent equations (systems)
Exhaustive Search on FPGAs August 15, 2013 9 / 13
Parallelization
Fix i Variables for 2i Parallel Instances:
f
✏
x4x2
- x3x0
x2x1 x3 x1 x0 1
e.g. i ✏ 2 : f00b ✏ 0 ☎ x2 0 ☎ x0 x2x1
x1 x0 1
f01b ✏ 0 ☎ x2 1 ☎ x0 x2x1 1
x1 x0 1
f10b ✏ 1 ☎ x2 0 ☎ x0 x2x1
x1 x0 1
f11b ✏ 1 ☎ x2 1 ☎ x0 x2x1 1
x1 x0 1
2i independent equations (systems) sharing the same quadratic terms!
Exhaustive Search on FPGAs August 15, 2013 9 / 13
Instance
y new_y new_d' d' d'' k1
- r
sol buffer inst new_d' = d' ⊕ d''; new_y = d' ⊕ d'' ⊕ y; buffer
j,k...k+3
instj,k instj,k+1 instj,k+2 instj,k+3 flip flop LUT-6 RAM
Exhaustive Search on FPGAs August 15, 2013 10 / 13
Instance
y new_y new_d' d' d'' k1
- r
sol buffer inst new_d' = d' ⊕ d''; new_y = d' ⊕ d'' ⊕ y; buffer
j,k...k+3
instj,k instj,k+1 instj,k+2 instj,k+3 flip flop LUT-6 RAM
Program a LUT-6 directly as two LUT-5.
Exhaustive Search on FPGAs August 15, 2013 10 / 13
Instance
y new_y new_d' d' d'' k1
- r
sol buffer inst new_d' = d' ⊕ d''; new_y = d' ⊕ d'' ⊕ y; buffer
j,k...k+3
instj,k instj,k+1 instj,k+2 instj,k+3 flip flop LUT-6 RAM
Explicitly place a group of 4 instances within 4 nearby slices.
Exhaustive Search on FPGAs August 15, 2013 10 / 13
Overall Architecture
pillar1 pillar0 eq0 eq1 eqmg✁1 gray_code fifo bus counter gray_tree addr ✏ k2♣k2 ✁ 1q④2 k1 table d✷ table d✷
1
. . .
table d✷
mg✁1
inst0,0...3 inst0,4...7 inst0,...2i✁1 inst1,0...3 inst1,4...7 inst1,...2i✁1 instmg✁1,0...3 instmg✁1,4...7 instmg✁1,...2i✁1
. . . . . . . . . ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎
ctr k2 k1 k1 addr addr k1 addr k1 sol0,0...3 sol1,0...3 solmg✁1,0...3 id0 sol0,4...7 sol1,4...7 solmg✁1,4...7 id1 sol0,...2i✁1 sol1,...2i✁1 solmg✁1,...2i✁1 id2i✁3 k1 d✷ k1 d✷ k1 d✷ k1 d✷
1
k1 d✷
1
k1 d✷
1
k1 d✷
mg
k1 d✷
mg
k1 d✷
mg
eqmg eqmg1 eqm✁1
☎ ☎ ☎
counter2 merge x sol x sol ctr2 x sol xmg solmg xmg1 solmg1 sol x x sol warn solver Exhaustive Search on FPGAs August 15, 2013 11 / 13
Synthesis Tools
Xilinx Tools tend to fail for those parts of the design with the highest need for efficiency!
➓ ➓ ➓ ➓ ➓ ➓ ➓
Synthesis Tools August 15, 2013 12 / 13
Synthesis Tools
Xilinx Tools tend to fail for those parts of the design with the highest need for efficiency!
Development Strategy
We require some kind of low-level, assembly-like HDL:
➓ Use Python scripts for code generation. ➓ Assign most of the logic explicitly to LUT-6. ➓ Fully pipeline the design. ➓ Explicitly place components to achieve 200MHz. ➓ Exchange LUT data without resynthesizing. ➓ ➓
Synthesis Tools August 15, 2013 12 / 13
Synthesis Tools
Xilinx Tools tend to fail for those parts of the design with the highest need for efficiency!
Development Strategy
We require some kind of low-level, assembly-like HDL:
➓ Use Python scripts for code generation. ➓ Assign most of the logic explicitly to LUT-6. ➓ Fully pipeline the design. ➓ Explicitly place components to achieve 200MHz. ➓ Exchange LUT data without resynthesizing.
Missing in our tool chain:
➓ Explicit routing. ➓ Totally avoid Verilog/VHDL; program low levels directly.
Synthesis Tools August 15, 2013 12 / 13
Results
Parameters:
➓ n ✏ 54 ➓ m ✏ 54 ➓ 210 instances, two pillars ➓ mg ✏ 12
Results August 15, 2013 13 / 13
Results
Performance:
➓ 200MHz ➓ 8.8W (GPU: 235W) ➓ runtime:
254✁10④200MHz ✏ 24.43h
➓ runtime for n ✏ 48:
248✁10④200MHz ✏ 22.91min (GPU: 21min)
➓ total energy for n ✏ 48:
3,4Wh (GPU: 82.3Wh)
Results August 15, 2013 13 / 13
Results
Large FPGA Clusters:
Same routing and placement for any equation system by simply exchanging LUT data.
Results August 15, 2013 13 / 13
Results
80-bit Security:
Solving a system of 80 variables requires 1042days on 65,536 Spartan-6 FPGAs at a total cost of about US$40 million.
Results August 15, 2013 13 / 13