Introduction The MP Problem Solving a system of m multivariate - - PowerPoint PPT Presentation

introduction
SMART_READER_LITE
LIVE PREVIEW

Introduction The MP Problem Solving a system of m multivariate - - PowerPoint PPT Presentation

Fast Exhaustive Search for Quadratic Systems in F 2 on FPGAs Charles Bouillaguet, Chen-Mou Cheng, Tung Chou, Ruben Niederhagen, Bo- Yin Yang August 15, 2013 Introduction The MP Problem Solving a system of m multivariate polynomial equations in


slide-1
SLIDE 1

Fast Exhaustive Search for Quadratic Systems in F2 on FPGAs

Charles Bouillaguet, Chen-Mou Cheng, Tung Chou, Ruben Niederhagen, Bo- Yin Yang August 15, 2013

slide-2
SLIDE 2

Introduction

The MP Problem

Solving a system of m multivariate polynomial equations in n variables over Fq is called the MP problem. The MP problem is an NP-hard problem even for multivariate quadratic systems and q ✏ 2.

Introduction August 15, 2013 1 / 13

slide-3
SLIDE 3

Introduction

Multivariate Public-Key Cryptography:

e.g. HFE, SFLASH, and QUARTZ

Provably-Secure Stream Ciphers:

e.g. QUAD

➓ ➓ ➓

Introduction August 15, 2013 2 / 13

slide-4
SLIDE 4

Introduction

Multivariate Public-Key Cryptography:

e.g. HFE, SFLASH, and QUARTZ

Provably-Secure Stream Ciphers:

e.g. QUAD

Algebraic Cryptanalysis:

Obtain a system of multivariate polynomial equations with the secret among the variables.

➓ Naturally breaks the above, ➓ does not break AES as first advertised, ➓ but does break, e.g., KeeLoq.

Complexity?

Introduction August 15, 2013 2 / 13

slide-5
SLIDE 5

Introduction

Most Efficient Algorithm for F2:

Brute-force search, testing all 2n possible inputs.

Previous Work:

On GPUs we can solve a quadratic system of 48+ equations in 48 variables in 21min.

Introduction August 15, 2013 3 / 13

slide-6
SLIDE 6

Introduction

Most Efficient Algorithm for F2:

Brute-force search, testing all 2n possible inputs.

Previous Work:

On GPUs we can solve a quadratic system of 48+ equations in 48 variables in 21min.

Research Question:

How would specifically designed hardware perform on this task? We approach the answer by solving multivariate quadratic systems

  • n reconfigurable hardware (FPGAs).

Introduction August 15, 2013 3 / 13

slide-7
SLIDE 7

Gray-Code Approach

Full-Evaluation Approach

➓ Evaluate the whole equation for each possible input. ➓ Time Complexity: O♣2nn2q ➓ Memory Complexity: O♣nq ➓ ➓ ➓ ➓

♣ q

♣ q

Exhaustive Search August 15, 2013 4 / 13

slide-8
SLIDE 8

Gray-Code Approach

Full-Evaluation Approach

➓ Evaluate the whole equation for each possible input. ➓ Time Complexity: O♣2nn2q ➓ Memory Complexity: O♣nq

Gray-Code Approach

➓ Only re-compute those parts of the equation that have changed. ➓ Enumerate input vector in Gray-code order. ➓ Update solution using the derivatives of the involved variables. ➓ Time Complexity: O♣2nmq ➓ Memory Complexity: O♣n2mq

Trade computation for memory.

Exhaustive Search August 15, 2013 4 / 13

slide-9
SLIDE 9

Gray-Code Approach

k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1

  • 1
  • 1

✏ ✏ ✏ ✏ ✏ ✏

  • Exhaustive Search

August 15, 2013 5 / 13

slide-10
SLIDE 10

Gray-Code Approach

k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1

  • 1
  • 1

k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1

  • 1
  • 1

1

Exhaustive Search August 15, 2013 5 / 13

slide-11
SLIDE 11

Gray-Code Approach

k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1

  • 1
  • 1

k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1

  • 1
  • 1

1 k ✏ 01100b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 1, x1 ✏ 0, x0 ✏ 0

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 1 1 ☎ 0 1 ☎ 0 1

  • 1

Exhaustive Search August 15, 2013 5 / 13

slide-12
SLIDE 12

Gray-Code Approach

k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1

  • 1
  • 1

k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1

  • 1
  • 1

1 k ✏ 01001b in Gray-code order

f ✏ x4x2 x3x0 x2x1 x3

x1 x0 1

f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 0 1

  • 1

1 ✏ ♣ q ✁ ☎ ✁

♣ q

♣ q

Exhaustive Search August 15, 2013 5 / 13

slide-13
SLIDE 13

Gray-Code Approach

k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1

  • 1
  • 1

k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1

  • 1
  • 1

1 k ✏ 01001b in Gray-code order

f ✏ x4x2 x3x0 x2x1 x3

x1 x0 1

f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 0 1

  • 1

1

f ✏ f♣01011bq

✁ 0 ☎ 1 ✁

1

0 ☎ 0 0 ✏ ♣ q

♣ q

Exhaustive Search August 15, 2013 5 / 13

slide-14
SLIDE 14

Gray-Code Approach

k ✏ 01010b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 0

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 0 0 ☎ 1 1

  • 1
  • 1

k ✏ 01011b; x4 ✏ 0, x3 ✏ 1, x2 ✏ 0, x1 ✏ 1, x0 ✏ 1

f ✏ x4x2 x3x0 x2x1 x3 x1 x0 1 f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 1 1

  • 1
  • 1

1 k ✏ 01001b in Gray-code order

f ✏ x4x2 x3x0 x2x1 x3

x1 x0 1

f ✏ 0 ☎ 0 1 ☎ 1 0 ☎ 0 1

  • 1

1

f ✏ f♣01011bq

✁ 0 ☎ 1 ✁

1

0 ☎ 0 0

f ✏ f♣01011bq

  • ❇f

❇x1 ♣01001bq

Exhaustive Search August 15, 2013 5 / 13

slide-15
SLIDE 15

Gray-Code Approach

Full-Evaluation Approach

➓ Evaluate the whole equation for each possible input. ➓ Time Complexity: O♣2nn2q ➓ Memory Complexity: O♣nq

Gray-Code Approach

➓ Only re-compute those parts of the equation that have changed. ➓ Enumerate input vector in Gray-code order. ➓ Update solution using the derivatives of the involved variables. ➓ Time Complexity: O♣2nmq ➓ Memory Complexity: O♣n2mq

Trade computation for memory.

Exhaustive Search August 15, 2013 6 / 13

slide-16
SLIDE 16

Xilinx Spartan6 FPGA

Lookup Table (LUT) – LUT-6

Can be seen as

➓ logic: compute any logical expression in 6 variables, ➓ ROM: store 64bit, addressed by 6 address ports.

Can be used as two LUT-5 with identical input wires and independent

  • utput wires.

Exhaustive Search on FPGAs August 15, 2013 7 / 13

slide-17
SLIDE 17

Xilinx Spartan6 FPGA

Resources

➓ 50% SLICEX

➓ 4 LUT-6 ➓ 8 Flip-Flops

➓ 25% SLICEL

+ wide multiplexers + carry logic for large adders

➓ 25% SLICEM

+ LUT can be used as shift registers + LUT can be used as RAM sharing the same write address

➓ Block RAM, DSPs, IO, ...

Exhaustive Search on FPGAs August 15, 2013 7 / 13

slide-18
SLIDE 18

Gray-Code Algorithm

24: function EVAL(s) 25:

while s.i ➔ 2n do

26:

s.i Ð s.i 1;

27:

k1 Ð BIT1♣s.iq;

28:

k2 Ð BIT2♣s.iq;

29:

if k2 valid then

30:

s.d✶rk1s Ð s.d✶rk1s ❵ s.d✷rk1, k2s;

31:

end if

32:

s.y Ð s.y ❵ s.d✶rk1s;

33:

if s.y ✏ 0 then

34:

return shr♣s.i, 1q ❵ s.i;

35:

end if

36:

end while

37: end function

Exhaustive Search on FPGAs August 15, 2013 8 / 13

slide-19
SLIDE 19

Parallelization

Fix i Variables for 2i Parallel Instances:

f

x4x2

  • x3x0

x2x1 x3 x1 x0 1

e.g. i ✏ 2 : f00b ✏ 0 ☎ x2 0 ☎ x0 x2x1

x1 x0 1

f01b ✏ 0 ☎ x2 1 ☎ x0 x2x1 1

x1 x0 1

f10b ✏ 1 ☎ x2 0 ☎ x0 x2x1

x1 x0 1

f11b ✏ 1 ☎ x2 1 ☎ x0 x2x1 1

x1 x0 1

2i independent equations (systems)

Exhaustive Search on FPGAs August 15, 2013 9 / 13

slide-20
SLIDE 20

Parallelization

Fix i Variables for 2i Parallel Instances:

f

x4x2

  • x3x0

x2x1 x3 x1 x0 1

e.g. i ✏ 2 : f00b ✏ 0 ☎ x2 0 ☎ x0 x2x1

x1 x0 1

f01b ✏ 0 ☎ x2 1 ☎ x0 x2x1 1

x1 x0 1

f10b ✏ 1 ☎ x2 0 ☎ x0 x2x1

x1 x0 1

f11b ✏ 1 ☎ x2 1 ☎ x0 x2x1 1

x1 x0 1

2i independent equations (systems) sharing the same quadratic terms!

Exhaustive Search on FPGAs August 15, 2013 9 / 13

slide-21
SLIDE 21

Instance

y new_y new_d' d' d'' k1

  • r

sol buffer inst new_d' = d' ⊕ d''; new_y = d' ⊕ d'' ⊕ y; buffer

j,k...k+3

instj,k instj,k+1 instj,k+2 instj,k+3 flip flop LUT-6 RAM

Exhaustive Search on FPGAs August 15, 2013 10 / 13

slide-22
SLIDE 22

Instance

y new_y new_d' d' d'' k1

  • r

sol buffer inst new_d' = d' ⊕ d''; new_y = d' ⊕ d'' ⊕ y; buffer

j,k...k+3

instj,k instj,k+1 instj,k+2 instj,k+3 flip flop LUT-6 RAM

Program a LUT-6 directly as two LUT-5.

Exhaustive Search on FPGAs August 15, 2013 10 / 13

slide-23
SLIDE 23

Instance

y new_y new_d' d' d'' k1

  • r

sol buffer inst new_d' = d' ⊕ d''; new_y = d' ⊕ d'' ⊕ y; buffer

j,k...k+3

instj,k instj,k+1 instj,k+2 instj,k+3 flip flop LUT-6 RAM

Explicitly place a group of 4 instances within 4 nearby slices.

Exhaustive Search on FPGAs August 15, 2013 10 / 13

slide-24
SLIDE 24

Overall Architecture

pillar1 pillar0 eq0 eq1 eqmg✁1 gray_code fifo bus counter gray_tree addr ✏ k2♣k2 ✁ 1q④2 k1 table d✷ table d✷

1

. . .

table d✷

mg✁1

inst0,0...3 inst0,4...7 inst0,...2i✁1 inst1,0...3 inst1,4...7 inst1,...2i✁1 instmg✁1,0...3 instmg✁1,4...7 instmg✁1,...2i✁1

. . . . . . . . . ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎

ctr k2 k1 k1 addr addr k1 addr k1 sol0,0...3 sol1,0...3 solmg✁1,0...3 id0 sol0,4...7 sol1,4...7 solmg✁1,4...7 id1 sol0,...2i✁1 sol1,...2i✁1 solmg✁1,...2i✁1 id2i✁3 k1 d✷ k1 d✷ k1 d✷ k1 d✷

1

k1 d✷

1

k1 d✷

1

k1 d✷

mg

k1 d✷

mg

k1 d✷

mg

eqmg eqmg1 eqm✁1

☎ ☎ ☎

counter2 merge x sol x sol ctr2 x sol xmg solmg xmg1 solmg1 sol x x sol warn solver Exhaustive Search on FPGAs August 15, 2013 11 / 13

slide-25
SLIDE 25

Synthesis Tools

Xilinx Tools tend to fail for those parts of the design with the highest need for efficiency!

➓ ➓ ➓ ➓ ➓ ➓ ➓

Synthesis Tools August 15, 2013 12 / 13

slide-26
SLIDE 26

Synthesis Tools

Xilinx Tools tend to fail for those parts of the design with the highest need for efficiency!

Development Strategy

We require some kind of low-level, assembly-like HDL:

➓ Use Python scripts for code generation. ➓ Assign most of the logic explicitly to LUT-6. ➓ Fully pipeline the design. ➓ Explicitly place components to achieve 200MHz. ➓ Exchange LUT data without resynthesizing. ➓ ➓

Synthesis Tools August 15, 2013 12 / 13

slide-27
SLIDE 27

Synthesis Tools

Xilinx Tools tend to fail for those parts of the design with the highest need for efficiency!

Development Strategy

We require some kind of low-level, assembly-like HDL:

➓ Use Python scripts for code generation. ➓ Assign most of the logic explicitly to LUT-6. ➓ Fully pipeline the design. ➓ Explicitly place components to achieve 200MHz. ➓ Exchange LUT data without resynthesizing.

Missing in our tool chain:

➓ Explicit routing. ➓ Totally avoid Verilog/VHDL; program low levels directly.

Synthesis Tools August 15, 2013 12 / 13

slide-28
SLIDE 28

Results

Parameters:

➓ n ✏ 54 ➓ m ✏ 54 ➓ 210 instances, two pillars ➓ mg ✏ 12

Results August 15, 2013 13 / 13

slide-29
SLIDE 29

Results

Performance:

➓ 200MHz ➓ 8.8W (GPU: 235W) ➓ runtime:

254✁10④200MHz ✏ 24.43h

➓ runtime for n ✏ 48:

248✁10④200MHz ✏ 22.91min (GPU: 21min)

➓ total energy for n ✏ 48:

3,4Wh (GPU: 82.3Wh)

Results August 15, 2013 13 / 13

slide-30
SLIDE 30

Results

Large FPGA Clusters:

Same routing and placement for any equation system by simply exchanging LUT data.

Results August 15, 2013 13 / 13

slide-31
SLIDE 31

Results

80-bit Security:

Solving a system of 80 variables requires 1042days on 65,536 Spartan-6 FPGAs at a total cost of about US$40 million.

Results August 15, 2013 13 / 13

slide-32
SLIDE 32

Questions?