Outline 1 Introduction to LWE The LWE Problem Motivation 2 - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline 1 Introduction to LWE The LWE Problem Motivation 2 - - PowerPoint PPT Presentation

Combinatorial methods for solving LWE 1 Thomas Johansson 1 1 Dept. of Electrical and Information Technology, Lund University 1 Last part to appear at Asiacrypt 2018 London September, 2017 Outline 1 Introduction to LWE The LWE Problem Motivation


slide-1
SLIDE 1

Combinatorial methods for solving LWE 1

Thomas Johansson1

  • 1Dept. of Electrical and Information Technology, Lund University

1Last part to appear at Asiacrypt 2018 London September, 2017

slide-2
SLIDE 2

Outline

1 Introduction to LWE

The LWE Problem Motivation

2 Background and reformulating LWE 3 The BKW Algorithm 4 Coded-BKW

Lattice Codes Coded-BKW Results - Complexity

5 A New Algorithm– Coded-BKW with sieving 6 Conclusions

Thomas Johansson , 2 / 58

slide-3
SLIDE 3

Outline

1 Introduction to LWE

The LWE Problem Motivation

2 Background and reformulating LWE 3 The BKW Algorithm 4 Coded-BKW

Lattice Codes Coded-BKW Results - Complexity

5 A New Algorithm– Coded-BKW with sieving 6 Conclusions

Thomas Johansson , 3 / 58

slide-4
SLIDE 4

Learning with Errors (LWE)

There is a secret vector s in Zn

  • q. We may query an oracle (who

knows s):

The LWE oracle with parameters (n, q, X):

  • 1. Uniformly picks r from Zn

q.

  • 2. Picks a ’noise’ e ← X.
  • 3. Outputs the pair (r, v = r, s + e) as a sample.

Thomas Johansson , 3 / 58

slide-5
SLIDE 5

Learning with Errors (LWE)

There is a secret vector s in Zn

  • q. We may query an oracle (who

knows s):

The LWE oracle with parameters (n, q, X):

  • 1. Uniformly picks r from Zn

q.

  • 2. Picks a ’noise’ e ← X.
  • 3. Outputs the pair (r, v = r, s + e) as a sample.

The search problem (informal):

Find s after collecting enough samples.

Thomas Johansson , 3 / 58

slide-6
SLIDE 6

Learning with Errors (LWE)

There is a secret vector s in Zn

  • q. We may query an oracle (who

knows s):

The LWE oracle with parameters (n, q, X):

  • 1. Uniformly picks r from Zn

q.

  • 2. Picks a ’noise’ e ← X.
  • 3. Outputs the pair (r, v = r, s + e) as a sample.

The search problem (informal):

Find s after collecting enough samples.

Error distribution Xαq

Discrete Gaussian over Zq with mean 0 and standard deviation σ = αq.

Thomas Johansson , 3 / 58

slide-7
SLIDE 7

Example:

Z13 = {−6, −5, . . . , 0, . . . , 5, 6}, n = 5 ei small 3s1 + 5s2 + 2s3 − 4s4 − s5 + e1 = 6 4s1 − 1s2 + 3s3 − 4s4 + 3s5 + e2 = 9 −2s1 + 2s2 + 2s3 + 3s4 − 3s5 + e3 = s1 + 0s2 − 4s3 − 4s4 + 1s5 + e4 = −1 0s1 − 5s2 + 2s3 − 2s4 + s5 + e5 = −5 −3s1 + s2 + 2s3 − s4 − 4s5 + e6 = 2 2s1 − 1s2 + 3s3 − 1s4 + 3s5 + e7 = 5

Thomas Johansson , 4 / 58

slide-8
SLIDE 8

Different LWE problems

The search problem (informal):

Find s after collecting enough samples.

The distinguishing problem (informal):

After collecting enough samples, determine whether they come from an LWE oracle or whether they are purely random samples. Similar complexity. General strategy: Guess some small part of s, rewrite and use distinguisher to decide if a guess is correct.

Thomas Johansson , 5 / 58

slide-9
SLIDE 9

Error distribution?

Zq = {−(q − 1)/2, . . . , −1, 0, 1, . . . , q − 1)/2}

Error distribution Xαq

Discrete Gaussian over Zq with mean 0 and standard deviation σ = αq. Other distributions are possible...

Thomas Johansson , 6 / 58

slide-10
SLIDE 10

Secret vector distribution?

Zq = {−(q − 1)/2, . . . , −1, 0, 1, . . . , q − 1)/2}

Standard case:

s in Zn

q, uniformly distributed

A simple transformation allows us to assume s in X n

αq.

Binary LWE:

s in {0, 1}n

  • r similar choices like s in {0, −1, 1}n.

Other distributions are possible...

Thomas Johansson , 7 / 58

slide-11
SLIDE 11

Related problems

Ring-LWE

s(x) in Zq[x]/φ(x), uniformly distributed. φ(x) is a degree n (cyclotomic) polynomial. The oracle selects a random r(x) and

  • utputs

(r(x), v(x) = r(x)s(x) + e(x)) as a sample. e(x) degree n − 1 plynomial with coefficients from X n

αq.

A solver/distinguisher for LWE solves also Ring-LWE.

LPN

LWE when q = 2 (Hamming metric)

Thomas Johansson , 8 / 58

slide-12
SLIDE 12

Motivation

◮ LWE and its greatness.

◮ Known to be as hard as worst-case hard lattice problems. ◮ Efficient cryptographic primitives. ◮ Extremely versatile, e.g., Fully Homomorphic Encryption

(FHE) schemes.

◮ Post-quantum cryptography

◮ Complexity of solving LWE?

◮ Especially for practical security. Say, how to choose the

smallest parameters for a security level (e.g., 80-bit security)?

Thomas Johansson , 9 / 58

slide-13
SLIDE 13

Solving Algorithms

Mainly three types:

  • 1. Reduce to some lattice problem.

◮ Short Integer Solution (SIS) problem ◮ Bounded Distance Decoding (BDD) problem

  • 2. Arora-Ge [AroraGe11]

◮ Performs asymptotically well for very small noise.

  • 3. BKW2 (Combinatorial)

2Unbounded number of samples are provided. Thomas Johansson , 10 / 58

slide-14
SLIDE 14

Outline

1 Introduction to LWE

The LWE Problem Motivation

2 Background and reformulating LWE 3 The BKW Algorithm 4 Coded-BKW

Lattice Codes Coded-BKW Results - Complexity

5 A New Algorithm– Coded-BKW with sieving 6 Conclusions

Thomas Johansson , 11 / 58

slide-15
SLIDE 15

Background

In n-dimensional Euclidean space Rn, the intuitive notion of the length of a vector x = (x1, x2, . . . , xn) is captured by the L2-norm x =

  • x2

1 + · · · + x2 n.

The Euclidean distance between two vectors x and y in Rn is defined as x − y.

Thomas Johansson , 11 / 58

slide-16
SLIDE 16

Discrete Gaussian

The discrete Gaussian distribution over Z with mean 0 and variance σ2, denoted DZ,σ, is the probability distribution obtained by assigning a probability proportional to exp(−x2/2σ2) to each x ∈ Z. The X distribution3 with variance σ is the distribution on Zq

  • btained by folding DZ,σ modq, i.e., accumulating the value of the

probability mass function over all integers in each residue class modq. Similarly, we can define the discrete Gaussian over Zn with variance σ2, denoted DZn,σ, as the product distribution of n independent copies of DZ,σ.

3It is also denoted Xσ, and we omit σ if there is no ambiguity. Thomas Johansson , 12 / 58

slide-17
SLIDE 17

Distinguishing between two distributions

The number of samples required to distinguish between the uniform distribution on Zq and Xσ. A good approximation using the divergence4 is ∆(XσU) ≈ e−π

  • σ

√ 2π q

2

. In particular, we need about M ≈ ε · eπ

  • σ

√ 2π q

2

. samples for a distinguishing advantage of ε.

4Divergence, relative entropy, Kullback-Leibler divergence Thomas Johansson , 13 / 58

slide-18
SLIDE 18

Number of samples

For secret vector recovery, we assume that for a right guess, the

  • bserved symbol is Xσ distributed; otherwise, it is uniformly

random. Distinguish the secret from Q candidates. We follow the theory from linear cryptanalysis, that the number M of required samples to test is about O

  • ln(Q)

∆(XσU)

  • ,

where ∆(XσU) is the divergence between Xσ and the uniform distribution U in Zq.

Thomas Johansson , 14 / 58

slide-19
SLIDE 19

The LWE problem again

We ask for m samples from the LWE distribution Ls,X and the response is (r1, z1), (r2, z2), . . . , (rm, zm), where ri ∈ Zn

q, zi ∈ Zq.

Introduce z = (z1, z2, . . . , zm) and y = (y1, y2, . . . , ym) = sR. We can then write R =

  • rT

1

rT

2

· · · rT

n

  • and z = sR + e, where

zi = yi + ei = s, ri + ei and ei

$

← X is the noise. A decoding problem. The matrix R serves as the generator matrix for a linear code over Zq and z is the received word. Finding the codeword y = sR such that the Euclidean distance ||y − z|| is minimum will give the secret vector s.

Thomas Johansson , 15 / 58

slide-20
SLIDE 20

Transform s to be Gaussian

If s is drawn from the uniform distribution, there is a simple transformation that can be applied, namely, we may through Gaussian elimination transform R into systematic form. Assume that the first n columns are linearly independent and form the matrix R0. Define D = R0−1. With a change of variables ˆ s = sD−1 − (z1, z2, . . . , zn) we get an equivalent problem described by ^ R = (I,ˆ rT

n+1,ˆ

rT

n+2, · · · ,ˆ

rT

m), where ^

R = DR. We compute ˆ z = z − (z1, z2, . . . , zn)ˆ R = (0, ˆ zn+1, ˆ zn+2, . . . , ˆ zm). After this initial step, each entry in the secret vector ˆ s is now distributed according to X.

Thomas Johansson , 16 / 58

slide-21
SLIDE 21

Rewriting

Recall that we have the LWE samples in the form z = sR + e. We write this as (s, e) R I

  • = z.

(1) The unknown (s, e) on the left-hand side have all iid entries of the same size. The matrix above is denoted as H0 = R I

  • and it is a

known quantity, as well as z.

Thomas Johansson , 17 / 58

slide-22
SLIDE 22

Lattice-based algorithms for solving LWE

Thomas Johansson , 18 / 58

slide-23
SLIDE 23

Outline

1 Introduction to LWE

The LWE Problem Motivation

2 Background and reformulating LWE 3 The BKW Algorithm 4 Coded-BKW

Lattice Codes Coded-BKW Results - Complexity

5 A New Algorithm– Coded-BKW with sieving 6 Conclusions

Thomas Johansson , 19 / 58

slide-24
SLIDE 24

The BKW Algorithm

The BKW (Blum, Kalai, and Wasserman) algorithm:

◮ Originally proposed for solving LPN.

◮ The best asymptotic algorithm with sub-exponential

complexity 2O(n/ log(n)) for LPN (exponential for LWE).

◮ Main idea:

◮ Divide the length n vector r into a parts, each with size

b = ⌈n/a⌉.

◮ Merge and Sort (called one BKW step)—A trade-off: ◮ Store all the samples. ◮ Sort according to the bottom b entries of the vector r. ◮ Subtract samples in the same partition.

v1 = [r1, r0], s + e1 v2 = [r2, r0], s + e2 v1 − v2 = [r1 − r2, 0], s + e1 − e2

◮ Do a − 1 BKW steps iteratively to zero out the bottom a − 1

blocks.

Thomas Johansson , 19 / 58

slide-25
SLIDE 25

Related Works

[Blum Kalai Wasserman 03]

◮ LPN [Levieil Fouque 06]

◮ Add Fast Walsh-Hadamard transform (FWHT).

[Bernstein Lange 13/Kirchner 11]

◮ Secret-error transformation for LPN.

[Guo Johansson Löndahl 14]

◮ Subspace hypothesis testing using covering codes. Thomas Johansson , 20 / 58

slide-26
SLIDE 26

Related Works

[Blum Kalai Wasserman 03]

◮ LWE [Albrecht Cid Faugère FitzpatrickPerret 13]

◮ Apply BKW for solving LWE.

[Applebaum Cash Peikert Sahai 09]

◮ Secret-error transformation for LWE.

[Albrecht Faugère Fitzpatrick Perret 14]

◮ Introduce the lazy modulus switching technique. ◮ The best known BKW-type binary-LWE solver.

[Duc Tramèr Vaudenay 15]

◮ Add Fast Fourier transform (FFT). ◮ The best known BKW-type LWE solver.

[Guo Johansson Stankovski 15; Kirchner Fouque 15]

◮ Coded-BKW. ◮ Improved asymptotic performance. Thomas Johansson , 20 / 58

slide-27
SLIDE 27

Outline

1 Introduction to LWE

The LWE Problem Motivation

2 Background and reformulating LWE 3 The BKW Algorithm 4 Coded-BKW

Lattice Codes Coded-BKW Results - Complexity

5 A New Algorithm– Coded-BKW with sieving 6 Conclusions

Thomas Johansson , 21 / 58

slide-28
SLIDE 28

Lattice Codes

  • 1. Lattices are the Euclidean space counterpart of binary linear

codes in Hamming space.

  • 2. A narrow class: lattices associated with a code, especially,

constructed based on Construction A.

◮ Let C be a q-ary linear code. ◮ Construct a lattice over this code

Λ(C) = {λ ∈ Rn : λ ≡ c mod q, c ∈ C}.

Why lattice codes?

  • 1. Better shaping5.
  • 2. Theory for estimating the noise variance when using q-ary

linear codes (e.g., subspace hypothesis testing technique).

5Compared with the work [AlbrechtFaugèreFitzpatrickPerret14], in which

they use n-cube quantization.

Thomas Johansson , 21 / 58

slide-29
SLIDE 29

Lattice Codes

  • 1. Lattices are the Euclidean space counterpart of binary linear

codes in Hamming space.

  • 2. A narrow class: lattices associated with a code, especially,

constructed based on Construction A.

◮ Let C be a q-ary linear code. ◮ Construct a lattice over this code

Λ(C) = {λ ∈ Rn : λ ≡ c mod q, c ∈ C}.

Second moment

The second moment of Λ is defined as the second moment per dimension of a uniform distribution over its fundamental region V, i.e., σ2 = E[e2] N = 1 N ·

  • V

x2 1 Vol(V)dx. (2)

Thomas Johansson , 21 / 58

slide-30
SLIDE 30

The Coded-BKW Algorithm

Main Steps:

  • 1. Gaussian elimination.

◮ Make the secret ^

s follow the noise distribution.

  • 2. t1 standard BKW reductions.

◮ Zero out the bottom t1b

entries of r.

  • 3. t2 coded-BKW reductions.

◮ Make the next bottom ncod

entries of r small.

  • 4. Partial guessing.

◮ Exhaust the top ntop entries of

^ s with the absolute value less than d.

  • 5. Subspace hypothesis testing using a

q-ary [ntest, l] linear code.

Guessing part ntop Rows [1, n] Length ncod Code length ntest BKW part, length t1b

Thomas Johansson , 22 / 58

slide-31
SLIDE 31

The Coded-BKW Algorithm

Main Steps:

  • 1. Gaussian elimination.
  • 2. Standard BKW reductions.
  • 3. Coded-BKW reductions.
  • 4. Partial guessing.
  • 5. Subspace hypothesis testing.

Guessing part ntop Rows [1, n] Length ncod Code length ntest BKW part, length t1b

Thomas Johansson , 23 / 58

slide-32
SLIDE 32

Coded-BKW

◮ Recall standard BKW: use qb−1 2

partitions to zero out b positions.

◮ Coded-BKW idea: use a q-ary linear code with parameters

[Ni, b] for ith reduction step5.

◮ rI is the part of r reduced in step i, containing Ni positions. ◮ Rewrite rI = cI + eI. Thus,

ˆ sI, rI = ˆ sI, cI + ˆ sI, eI .

◮ Summing or subtracting two vectors mapped to the same

codeword will cancel out the first part.

Advantage: use qb−1

2

partitions to make Ni entries small. (Ni > b)

5Standard BKW can be viewed as coded-BKW using a [b, b] trivial code. Thomas Johansson , 24 / 58

slide-33
SLIDE 33

Coded-BKW

Noise Formula:

e =

2t

  • j=1

eij +

n

  • i=1

ˆ si(δI1

i E (1) i

+ δI2

i E (2) i

+ · · · + δ

It2 i E (t2) i

), (3) where E (h)

i

= 2t2−h+1

j=1

ˆ e(h)

ij

and ˆ e(h)

ij

is the coding noise introduced in the h-th coded-BKW reduction.

◮ A noise tower. ◮ Preset a variance value σ2 set.

  • 1. Make the contribution of each E (h)

i

the same.

  • 2. σ2

set = 2t2−i+1σ2 ΛNi ,b.

  • 3. Bound the noise.

Thomas Johansson , 25 / 58

slide-34
SLIDE 34

Variance Estimation

Theorem

Assume that good6 lattice codes are employed. Let the noise level introduced by coding be σset. Then, the variance of the total coding noise is ^ stot2 σ2

set, where the vector7 ^

stot is a sub-vector

  • f ^

s that the corresponding entries in r are reduced by using lattice codes.

6This means that the fundamental regions are spherical. 7Its length is ntot = ncod + ntest. Thomas Johansson , 26 / 58

slide-35
SLIDE 35

Determine the Code Length

σ2

set = 2t2−i+1σ2 ΛNi ,b

Compute the second moment of Λ

◮ σ2 = G(Λ) · Vol(V)

2 N , where G(Λ) is called the normalized

second moment.

1 2πe < G(ΛN,k) ≤ 1 12.

◮ For a lattice built from an [N, k] linear code by Construction

A, the volume of V is qN−k. To determine Ni: σ2

set = 2t2−i+1GΛNi ,bq2(1− b

Ni ).

◮ ncod = t2 i=1 Ni.

Thomas Johansson , 27 / 58

slide-36
SLIDE 36

Results - assumptions

Assumption One: The noise variable is (approximately) discrete Gaussian distributed.

  • 1. Follows the previous research line.

◮ Intuition from the central limit theorem (CLT).

  • 2. Our experiments verify this assumption.

5000 10000 15000 20000 Thomas Johansson , 28 / 58

slide-37
SLIDE 37

Results - assumptions

Assumption One: The noise variable is (approximately) discrete Gaussian distributed.

  • 1. Follows the previous research line.

◮ Intuition from the central limit theorem (CLT).

  • 2. Our experiments verify this assumption.

5000 10000 15000 20000 Thomas Johansson , 28 / 58

slide-38
SLIDE 38

Assumptions

Assumption Two: The theory of lattice codes is accurate.

  • 1. In the fundamental region: uniform over integer

points versus uniform continuously.

  • 2. We numerically verify it: the computed G

behaves as expected.

q 631 2053 16411 code [2,1] [3,1] [4,1] [2,1] [3,1] [4,1] [2,1] [3,1] E[e2] 101.26† 1277.29† 4951.53 329.24† 6185.67 29107.73 2631.99† 99166.25 1/G 12.46 12.71 12.80 12.47 12.65 12.78 12.47 12.62 The value with a † sign means that it is optimal. Thomas Johansson , 29 / 58

slide-39
SLIDE 39

Complexity

The complexity consists of two parts:8

  • 1. Inner complexity Cone−iteration.

◮ The accumulated complexity of

all the steps.

  • 2. The success probability of one

iteration.

◮ Guessing probability Fg. ◮ The probability that all the

top ntop entries of ^ s have an absolute value less than d.

◮ Testing probability Ft. ◮ The probability that the

Euclidean length of vector ^ stot is bounded correctly.

Guessing part ntop Rows [1, n] Length ncod Code length ntest BKW part, length t1b

8For any γ ≥ 1, Pr[v > γσ√n; v $

← DZn,σ] < (γe

(1−γ2) 2

)n. [Lyu12]

Thomas Johansson , 30 / 58

slide-40
SLIDE 40

Complexity Formula

Theorem (Informal)

The complexity of the new algorithm is C = Cone−iteration Fg · Ft . (4) The required number of samples M for testing is set to be9 M = 4 ln((2d + 1)ntop ql) ∆(Xσfinal U) , where U is the uniform distribution in Zq and σ2

final = 2t1+t2σ2 + γ2σ2σ2

  • setntot. Thus,

the number of calls to the LWE oracle is m = (t1+t2)(qb−1)

2

+ M.

9The constant factor in the formula is chosen as 4. The divergence ∆(Xσfinal U) will be computed numerically. Thomas Johansson , 31 / 58

slide-41
SLIDE 41

Results

Table: Time complexity comparison for solving various LWE instances.

n q σ Complexity (log2 #Zq) Coded-BKW [DTV15] NTL-BKZ BKZ 2.0 LP Model Simulator Model [Regev05] 128 16,411 11.81 84.5 95.0 61.6 61.9 256 65,537 25.53 145.1 178.7 175.5 174.5 512 262,147 57.06 287.6 357.5 386.8 518.6 [LindnerPeikert11] 128 2,053 2.70 69.7 83.7 54.5 57.1 256 4,099 3.34 123.8 154.2 156.2 151.2 512 4,099 2.90 209.2 271.8 341.9 424.5

◮ Works well for both LWE and binary-LWE.

◮ The table shows results for solving various classic LWE

parameters.

◮ The improvement is significant when n is large. ◮ For example, we gain a factor of almost 270 when solving the

Regev instance with n = 512.

Thomas Johansson , 32 / 58

slide-42
SLIDE 42

Results

Table: Time complexity comparison for solving various LWE instances.

n q σ Complexity (log2 #Zq) Coded-BKW [DTV15] NTL-BKZ BKZ 2.0 LP Model Simulator Model [Regev05] 128 16,411 11.81 84.5 95.0 61.6 61.9 256 65,537 25.53 145.1 178.7 175.5 174.5 512 262,147 57.06 287.6 357.5 386.8 518.6 [LindnerPeikert11] 128 2,053 2.70 69.7 83.7 54.5 57.1 256 4,099 3.34 123.8 154.2 156.2 151.2 512 4,099 2.90 209.2 271.8 341.9 424.5

◮ For recently proposed ring-LWE based cryptosystems, some

should increase their security parameters.

◮ For example, the ones ([GFSBH12] [RVMCV14] [DRVV15])

employing ring-LWE (256, 7681, 4.51) (ring-LWE (512, 12289, 4.86)) for 128(256)-bit security.

Thomas Johansson , 32 / 58

slide-43
SLIDE 43

Results

Table: Time complexity comparison for solving various LWE instances.

n q σ Complexity (log2 #Zq) Coded-BKW [DTV15] NTL-BKZ BKZ 2.0 LP Model Simulator Model [Regev05] 128 16,411 11.81 84.5 95.0 61.6 61.9 256 65,537 25.53 145.1 178.7 175.5 174.5 512 262,147 57.06 287.6 357.5 386.8 518.6 [LindnerPeikert11] 128 2,053 2.70 69.7 83.7 54.5 57.1 256 4,099 3.34 123.8 154.2 156.2 151.2 512 4,099 2.90 209.2 271.8 341.9 424.5

Pessimistic results: upper bound of the worst-case complexity.

◮ We set G =

1 12 and it is LF1 type.

◮ Actual performance will be better.

◮ Many heuristics, e.g., the hybrid, LF2, unnatural selection (pruning),

e.t.c..

◮ Adopting the hybrid and LF2 heuristics, we solve the Regev instance with

n = 512 in 2271 Zq operations.

Thomas Johansson , 32 / 58

slide-44
SLIDE 44

Simulations

15 30 45 60 5 10 15 20 variance roof standard BKW coded-BKW theory coded-BKW simulation w/ unnatural selection

Figure: Number of eliminated rows vs. log2 of error variance.

◮ A toy example to show the improved trade-off using lattice codes. ◮ (q, σ, #samples) =

  • 2053, 2.70, 225

◮ Four standard 2-row BKW steps were used initially, followed by

three iterations each of [3,2]-, [4,2]-, [5,2]- and [6,2]-coding steps.

Thomas Johansson , 33 / 58

slide-45
SLIDE 45

Outline

1 Introduction to LWE

The LWE Problem Motivation

2 Background and reformulating LWE 3 The BKW Algorithm 4 Coded-BKW

Lattice Codes Coded-BKW Results - Complexity

5 A New Algorithm– Coded-BKW with sieving 6 Conclusions

Thomas Johansson , 34 / 58

slide-46
SLIDE 46

A New Algorithm - Ideas

(s, e) R I

  • = z.

(5) Let H0 = R I

  • .

Multiplying (5) with special matrices Pi. Starting with (s, e)H0 = z, we find a matrix P0 and form H1 = H0P0, z1 = zP0, resulting in (s, e)H1 = z1. In t steps we have formed Ht = H0P0 · · · Pt−1, zt = zP0 · · · Pt−1.

Thomas Johansson , 34 / 58

slide-47
SLIDE 47

Ideas

Plain BKW : Pi has columns with only two nonzero entries in {−1, 1}. Cancels rows in the H1 matrices in such a way that Ht = H′

t

  • , where columns of H′

t have size 2t.

The goal is to end up with columns in Ht being as small as possible. The smaller size, the larger advantage in the corresponding samples. Improved techniques like lazy modulus reduction and coded-BKW reduces the Ht similar to the BKW, but improves by using the fact that the top rows of H′

t does not have to be cancelled to 0. Instead,

entries are allowed to be of the same size as in the H′

t matrix.

Thomas Johansson , 35 / 58

slide-48
SLIDE 48

Ideas

noise = 0 length t1b Picture of BKW reduction

Thomas Johansson , 36 / 58

slide-49
SLIDE 49

Ideas

noise = 0 noise = σ1 noise = σ2 length t1b length > t1b Picture of coded-BKW reduction

Thomas Johansson , 37 / 58

slide-50
SLIDE 50

A BKW-Sieving Algorithm

Let Nj = j

i=1 ni, j = 1, 2, . . . , t.

CodeMap(h, i): We assume, following the idea of Coded-BKW, that we have fixed a lattice code Ci of length ni. The vector h fed as input to CodeMap is first considered only restricted to the positions Ni−1 + 1 to Ni, i.e., as a vector of length ni. This vector, denoted as h[Ni−1+1,Ni] is then mapped to the closest codeword in the code Ci. This closest codeword is denoted by CodeMap(h, i). The code Ci needs to have an associated procedure of quickly finding the closest codeword for any given vector. Select the parameters in such a way that the distance to the closest codeword is expected to be no more than ni · B.

Thomas Johansson , 38 / 58

slide-51
SLIDE 51

A BKW-Sieving Algorithm

Sieve(L∆, i, √Ni · B): The input L∆ contains a list of vectors. We are only considering them restricted to the first Ni positions. This procedure will find all differences between any two vectors such that the size of the difference restricted to the first Ni positions is less than √Ni · B. All such differences are put in a list S∆ which is the

  • utput of the procedure.

The vectors in the list L∆ restricted to the first Ni positions, all have a norm at most √NiB. Then the problem is solved by algorithms for sieving in lattices, for example using Locality-Sensitive Hashing.

Thomas Johansson , 39 / 58

slide-52
SLIDE 52

A BKW-Sieving Algorithm

Recall (s, e)H0 = z, where H0 = R I

  • , we are going to perform t steps to transform

H0 to Ht such that the columns in Ht are "small". Again, we look at the first n positions in a column corresponding to the R matrix. Since we are only adding or subtracting columns using coefficients in {−1, 1}, the remaining positions in the column are assumed to contain 2i nonzero positions either containing a −1 or a 1, after i steps.

Thomas Johansson , 40 / 58

slide-53
SLIDE 53

Ideas

The first n positions of columns are considered as a concatenation

  • f smaller vectors. We assume that these vectors have lengths

which are n1, n2, n3, . . . , nt, in such a way that t

i=1 ni = n.

We are going to fix an average level of "smallness" for a position, which is a constant denoted by B. The idea of the algorithm is to keep considered vectors of some length n′ to be of norm smaller than √ n′ · B. A column h ∈ H0 will now be processed by first computing ∆ = CodeMap(h, 1). Then we place h in the list L∆. After running through all columns h ∈ H0 they have been sorted in K lists L∆.

Thomas Johansson , 41 / 58

slide-54
SLIDE 54

Ideas

We then run through each such list, containing roughly m/K

  • columns. We perform a sieving step, according to

S∆ = Sieve(L∆, √N1 · B). The result is a list of vectors where the norm of the vector corresponding to the first N1 positions is less than √N1 · B. The indices of each two columns, ij, ik are kept in such a way that we can compute a new received symbol z = zij − zij. All vectors in all lists S∆ are now put as columns in H1. We now have a matrix H1 where the norm of each column restricted to the first n1 positions is less than √N1 · B). This is the end of a single step.

Thomas Johansson , 42 / 58

slide-55
SLIDE 55

Ideas

Repeat roughly the same t − 1 times. A column h ∈ Hi−1 will be processed by first computing ∆ = CodeMap(h, i). We place h in the list L∆. After running through all columns h ∈ Hi−1 they have been sorted in K lists L∆. Run through each list, containing roughly m/K columns. We perform a sieving step, according to S∆ = Sieve(L∆, i, √Ni · B). The result is a list of vectors where the norm of the vector corresponding to the first Ni positions is less than √Ni · B. A new received symbol is computed. All vectors in all lists S∆ are now put as columns in Hi. We get a matrix Hi where the norm of each column restricted to the first Ni positions is less than √Ni · B). This is repeated for i = 2, . . . , t. We assume that the parameters have been chosen in such a way that each matrix Hi can have m columns.

Thomas Johansson , 43 / 58

slide-56
SLIDE 56

Ideas

After t steps we end up with a matrix Ht such that the norm of columns restricted to the first n positions is bounded by √n · B and the size of the last m positions is roughly 2t/2. Altogether, this should result in samples generated as z = (s, e)Ht. The values in the z vector are then roughly Gaussian distributed, with variance σ2 · (nB2 + 2t) A distinguisher on z can verify whether our initial guess is correct or

  • not. After restoring some secret value, the whole procedure can be

repeated, but for a smaller dimension.

Thomas Johansson , 44 / 58

slide-57
SLIDE 57

Algorithm

Algorithm 1 New LWE solving algorithm (main steps) Input: Matrix R with n rows and m columns, received vector z of length m and algorithm parameters ni, 1 ≤ i ≤ t and B Step 0: Use Gaussian elimination to change the distribution

1

  • f the secret vector;

For i from 1 to t do Step 1-2;

2

Step 1: For all columns h do ∆ = CodeMap(h, i), put h in

3

list L∆ ; Step 2: For all lists L∆ do S∆ = Sieve(L∆, i, √Ni · B), put

4

all S∆ in Hi; Step 3: Exhaustively guess some entries in s;

5

Step 4: Hypothesis Testing using a distinguisher;

6

Thomas Johansson , 45 / 58

slide-58
SLIDE 58

Parameter Selection

After each step, positions that have been treated should be of an "average norm" of B. They should stay within this norm after each additional step. This should be provided by the sieving part of each step. After t steps we should have vectors of norm less than √nB. Assigning m = 2k, where 2k is a parameter that will decide the total complexity of the algorithm, we will end up with roughly m = 2k samples after t steps. These received samples will be roughly Gaussian with variance σ2 · (nB2 + 2t). Best strategy, choose nB2 = 2t.

Thomas Johansson , 46 / 58

slide-59
SLIDE 59

Parameter Selection

Furthermore, in order to be able to distinguish using m samples, we need ln 2 · k = 2π2(σ2 · (nB2 + 2t))/q2,

  • r

B =

  • ln 2

4π2 · k/n · q/σ. The expression for t is then 2t = ln 2 4π2 · k · q2/σ2. Each of the t steps should deliver m = 2k vectors of the form as described before.

Thomas Johansson , 47 / 58

slide-60
SLIDE 60

Parameter Selection

By coded-BKW we sort the 2k vectors in 2di different lists. Here we know that all vectors in a list, restricted to the ni considered positions, have size less than √niB if the codeword is subtracted from the vector. So the number of lists (di) have to be chosen such that this is true. Then the sieving part should leave all the Ni positions intact in norm, i.e., less than √NiB. Because all vectors in a list can be considered to have norm √NiB in these Ni positions, the sieving step needs to find any pair that leaves the difference between two vectors of norm √NiB. From the theory of sieving in lattices, we know that heuristics give that if a single list should produce the same number of vectors as it contains, then it should contain 20.208Ni vectors. The complexity and space is 20.292Ni.

Thomas Johansson , 48 / 58

slide-61
SLIDE 61

Asymptotic analysis

Write k = c0n for some c0. We adopt q = ncq and σ = ncs. Then B = C · ncq−cs and t = log2 D + (2(cq − cs) + 1) log2 n. We assume exponential overall complexity and write it as 2cn for some coefficient c to be determined. Each step is additive with respect to complexity, so we assume that we can use 2cn operations in each step.

Thomas Johansson , 49 / 58

slide-62
SLIDE 62

Asymptotic analysis

In the t steps we are choosing n1, n2, . . . positions for each step. Since B ≈ ncq−cs we can in the first step use n1 positions in such a way that (ncs)n1 ≈ 2cn · 2−0.292n1. Taking the log, cs log n · n1 = cn − 0.292n1. Since we are only interested in asymptotics, we can remove constants to simplify and instead write n1 = cn cs log n + 0.292. To simplify expressions, we use the notation W = cs log n + 0.292. For the next step, we get W · n2 = cn − 0.292n1, simplifying to n2 = cn W (1 − 0.292 W ). Continuing in this way, we have ni = cn − 0.292 i−1

j=1 nj and

ni = cn W (

i

  • j=0

i − 1 j (−0.292)j W j ).

Thomas Johansson , 50 / 58

slide-63
SLIDE 63

Asymptotic analysis

After t steps, we have t

i=1 ni = n so we observe that t

  • i=1

ni = cn W

t

  • i=1

(

i

  • j=0

i − 1 j (−0.292)j W j ), n =

t

  • i=1

ni = cn W

t−1

  • i=0
  • t

i + 1 (−0.292)i W i ). Now 1 = c ·

t−1

  • i=0

ti+1 (i + 1)! (−0.292)i W i+1 . Since t/W ≈ (1 + 2(cq − cs))/cs this finally gives us c = (

t

  • i=1

(−0.292)i−1 i! (1 + 2(cq − cs))i ci

s

)−1.

Thomas Johansson , 51 / 58

slide-64
SLIDE 64

Asymptotic analysis

Since t is of order log n it tends to infinity as n tends to infinity. Thus if Y = t/W then

t

  • i=1

(−0.292)i−1 i! Y i = 1/(−0.292)

t

  • i=1

(−0.292Y )i i! = 1/(−0.292)(

t

  • i=0

(−0.292Y )i i! − 1) = 1/(0.292)(1 − e−0.292Y ). c = 0.292 1 − e−0.292Y

Theorem

The time and space complexity is 2(c+o(1))n, where c = 0.292 1 − e−0.292(1+2(cq−cs))/cs

Thomas Johansson , 52 / 58

slide-65
SLIDE 65

Asymptotic analysis

Example

(LP parameters) For cq = 2 and cs = 1.5 we get c = 0.9054. Previously best was 0.93. (Improvement indicate c = 0.8951)

Thomas Johansson , 53 / 58

slide-66
SLIDE 66

Overview

noise = σ1 noise = σ1 length >> t1b Picture of BKW-Sieve reduction

Thomas Johansson , 54 / 58

slide-67
SLIDE 67

BKW Coded-BKW Coded-BKW with Sieving

Figure: A high level explanation of how the different versions of the BKW algorithm work.

Thomas Johansson , 55 / 58

slide-68
SLIDE 68

Asymptotics

q = ncq and σ = ncs

Thomas Johansson , 56 / 58

slide-69
SLIDE 69

Outline

1 Introduction to LWE

The LWE Problem Motivation

2 Background and reformulating LWE 3 The BKW Algorithm 4 Coded-BKW

Lattice Codes Coded-BKW Results - Complexity

5 A New Algorithm– Coded-BKW with sieving 6 Conclusions

Thomas Johansson , 57 / 58

slide-70
SLIDE 70

Conclusions

Conclusions :

  • 1. Overview of BKW-type solvers for LWE
  • 2. A new idea for a LWE solver with improved asymptotic

performance.

Thomas Johansson , 57 / 58

slide-71
SLIDE 71

Thank you for your attention! Questions?

Thomas Johansson , 58 / 58