[PPT] - Learning Strikes Again: the Case of the DRS Signature Scheme Yang Yu PowerPoint Presentation

SLIDE 1

Learning Strikes Again: the Case of the DRS Signature Scheme

Yang Yu1 L´ eo Ducas2

1Tsinghua University 2Centrum Wiskunde & Informatica

Asiacrypt 2018 Brisbane, Australia

1 / 27

SLIDE 2

This is a cryptanalysis work...

Target: DRS — a NIST lattice-based signature proposal

2 / 27

SLIDE 3

This is a cryptanalysis work...

Target: DRS — a NIST lattice-based signature proposal Techniques: learning & lattice

2 / 27

SLIDE 4

This is a cryptanalysis work...

Target: DRS — a NIST lattice-based signature proposal Techniques: learning & lattice

Statistical learning ⇒ secret key information leaks

2 / 27

SLIDE 5

This is a cryptanalysis work...

Target: DRS — a NIST lattice-based signature proposal Techniques: learning & lattice

Statistical learning ⇒ secret key information leaks Lattice techniques ⇒ better use of leaks

2 / 27

SLIDE 6

This is a cryptanalysis work...

Target: DRS — a NIST lattice-based signature proposal Techniques: learning & lattice

Statistical learning ⇒ secret key information leaks Lattice techniques ⇒ better use of leaks

They claim that Parameter Set-I offers at least 128-bits of security. We show that it actually offers at most 80-bits of security!

2 / 27

SLIDE 7

Outline

1 Background 2 DRS signature 3 Learning secret key coefficients 4 Exploiting the leaks 3 / 27

SLIDE 8

Outline

1 Background 2 DRS signature 3 Learning secret key coefficients 4 Exploiting the leaks 4 / 27

SLIDE 9

Lattice

Definition

A lattice L is a discrete subgroup

f Rm.

5 / 27

SLIDE 10

Lattice

g1 g2

Definition

A lattice L is a discrete subgroup

f Rm.

A lattice is generated by its basis G = (g1, · · · , gn) ∈ Rn×m, e.g. L = {xG | x ∈ Zn}.

5 / 27

SLIDE 11

Lattice

g1 g2 b1 b2

Definition

A lattice L is a discrete subgroup

f Rm.

A lattice is generated by its basis G = (g1, · · · , gn) ∈ Rn×m, e.g. L = {xG | x ∈ Zn}. L has infinitely many bases G is good, B is bad.

5 / 27

SLIDE 12

Finding Close Vectors

Each basis defines a parallelepiped P. m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v

6 / 27

SLIDE 13

Finding Close Vectors

Each basis defines a parallelepiped P. m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v m v Babai’s round-off algorithm outputs v ∈ L such that v − m ∈ P.

6 / 27

SLIDE 14

GGH & NTRUSign Schemes

Public key: P, secret key: S Sign

1 Hash the message to a random vector m 2 Round m (using S) to v ∈ L

Verify

1 Check v ∈ L (using P) 2 Check v is close to m 7 / 27

SLIDE 15

GGH & NTRUSign are insecure!

v − m ∈ P(S) ⇒ (v, m) leaks some information of S.

8 / 27

SLIDE 16

GGH & NTRUSign are insecure!

v − m ∈ P(S) ⇒ (v, m) leaks some information of S. GGH and NTRUSign were broken by “learning the parallelepiped” [NR06]. Some countermeasures were also broken by a similar attack [DN12].

8 / 27

SLIDE 17

Countermeasures

Let us focus on Hash-then-Sign approach! Provably secure method [GPV08] rounding based on Gaussian sampling v − m is independent of S

9 / 27

SLIDE 18

Countermeasures

Let us focus on Hash-then-Sign approach! Provably secure method [GPV08] rounding based on Gaussian sampling v − m is independent of S Heuristic method [PSW08] rounding based on CVP w.r.t ℓ∞-norm the support of v − m is independent of S DRS [PSDS17] is an instantiation, submitted to the NIST.

9 / 27

SLIDE 19

Outline

1 Background 2 DRS signature 3 Learning secret key coefficients 4 Exploiting the leaks 10 / 27

SLIDE 20

DRS

DRS = Diagonal-dominant Reduction Signature

11 / 27

SLIDE 21

DRS

DRS = Diagonal-dominant Reduction Signature Parameters: (n, D, b, Nb, N1)

n : the dimension D : the diagonal coefficient b : the magnitude of the large coefficients (i.e. {±b}) Nb : the number of large coefficients per row vector N1 : the number of small coefficients (i.e. {±1}) per row vector

S =      D D ... D      +

11 / 27

SLIDE 22

DRS

DRS = Diagonal-dominant Reduction Signature Parameters: (n, D, b, Nb, N1)

n : the dimension D : the diagonal coefficient b : the magnitude of the large coefficients (i.e. {±b}) Nb : the number of large coefficients per row vector N1 : the number of small coefficients (i.e. {±1}) per row vector

S =      D D ... D      + ← “absolute circulant”

11 / 27

SLIDE 23

Message reduction algorithm

Input: a message m ∈ Zn, the secret matrix S Output: a reduced message w such that w − m ∈ L and w∞ < D

1: w ← m, i ← 0 2: repeat 3:

w ← w − ⌊ wi

D ⌋ · si

4:

i ← (i + 1) mod n

5: until w∞ < D 6: return w

12 / 27

SLIDE 24

Message reduction algorithm

Input: a message m ∈ Zn, the secret matrix S Output: a reduced message w such that w − m ∈ L and w∞ < D

1: w ← m, i ← 0 2: repeat 3:

w ← w − ⌊ wi

D ⌋ · si

4:

i ← (i + 1) mod n

5: until w∞ < D 6: return w

Intuition: use si to reduce wi decreases a lot for j = i, wj increases a bit w1 is reduced ⇒ reduction always terminates!

12 / 27

SLIDE 25

Resistance to NR attack

The support of w: (−D, D)n

DRS domain

P(S)

13 / 27

SLIDE 26

Resistance to NR attack

The support of w: (−D, D)n

DRS domain

P(S) The support is “zero-knowledge”

13 / 27

SLIDE 27

Resistance to NR attack

The support of w: (−D, D)n

DRS domain

P(S) The support is “zero-knowledge”, but maybe the distribution is not!

13 / 27

SLIDE 28

Outline

1 Background 2 DRS signature 3 Learning secret key coefficients 4 Exploiting the leaks 14 / 27

SLIDE 29

Intuition

wi wj

(−D,−D) (D,D)

wi wj

(−D,−D) (D,D)

Si,j = −b

wi wj

(−D,−D) (D,D)

Si,j = 0

wi wj

(−D,−D) (D,D)

Si,j = b

15 / 27

SLIDE 30

Figure out the model

Can we devise a formula Si,j ≈ f (Wi,j) ?

16 / 27

SLIDE 31

Figure out the model

Can we devise a formula Si,j ≈ f (Wi,j) ? Seems complicated! cascading phenomenon: a reduction triggers another one.

ther parasite correlations

16 / 27

SLIDE 32

Figure out the model

Can we devise a formula Si,j ≈ f (Wi,j) ? Seems complicated! cascading phenomenon: a reduction triggers another one.

ther parasite correlations

⇒ Search for the best linear fit f ?

16 / 27

SLIDE 33

Figure out the model

Can we devise a formula Si,j ≈ f (Wi,j) ? Seems complicated! cascading phenomenon: a reduction triggers another one.

ther parasite correlations

⇒ Search for the best linear fit f ? Search space for all linear f : too large!

16 / 27

SLIDE 34

Figure out the model

Can we devise a formula Si,j ≈ f (Wi,j) ? Seems complicated! cascading phenomenon: a reduction triggers another one.

ther parasite correlations

⇒ Search for the best linear fit f ? Search space for all linear f : too large! ⇒ choose some features {fi} and search in span({fi}), i.e. f = xℓfℓ

16 / 27

SLIDE 35

Training — feature selection

Lower degree moments:

f1(W ) = E(wiwj) f2(W ) = E(wi · |wi|1/2 · wj) f3(W ) = E(wi · |wi| · wj)

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

17 / 27

SLIDE 36

Training — feature selection

Lower degree moments:

f1(W ) = E(wiwj) f2(W ) = E(wi · |wi|1/2 · wj) f3(W ) = E(wi · |wi| · wj)

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

Not enough!

17 / 27

SLIDE 37

Training — feature selection

wi wj

(−D,−D) (D,D)

wi wj

(−D,−D) (D,D)

Si,j = −b

wi wj

(−D,−D) (D,D)

Si,j = 0

wi wj

(−D,−D) (D,D)

Si,j = b

18 / 27

SLIDE 38

Training — feature selection

Pay more attention to the central region (i.e. |wi| small).

f4 = E(wi(wi − 1)(wi + 1)wj) f5 = E(2wi(2wi − 1)(2wi + 1)wj | |2wi| ≤ 1)

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

f6 = E(4wi(4wi − 1)(4wi + 1)wj | |4wi| ≤ 1) f7 = E(8wi(8wi − 1)(8wi + 1)wj | |8wi| ≤ 1)

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

19 / 27

SLIDE 39

Training — feature selection

Pay more attention to the central region (i.e. |wi| small).

f4 = E(wi(wi − 1)(wi + 1)wj) f5 = E(2wi(2wi − 1)(2wi + 1)wj | |2wi| ≤ 1)

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

f6 = E(4wi(4wi − 1)(4wi + 1)wj | |4wi| ≤ 1) f7 = E(8wi(8wi − 1)(8wi + 1)wj | |8wi| ≤ 1)

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y

Together with transposes (i.e. f t(wi, wj) = f (wj, wi)), we finally selected 7 × 2 − 1 = 13 features in experiments.

19 / 27

SLIDE 40

The model

f = xℓfℓ

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

x

1
0.75
0.5
0.25

0.0 0.25 0.5 0.75 1

y 20 / 27

SLIDE 41

Learning

Let’s learn a new S as S′ = f (W )!

10 5 5 10

f

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

probability density

Si,j =b Si,j =−b Si,j =1 Si,j =−1 Si,j =0

21 / 27

SLIDE 42

Learning

Let’s learn a new S as S′ = f (W )!

10 5 5 10

f

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

probability density

Si,j =b Si,j =−b Si,j =1 Si,j =−1 Si,j =0

21 / 27

SLIDE 43

Learning

S = D · I+ is “absolute circulant” ⇒ more confidence via diagonal amplification

22 / 27

SLIDE 44

Learning

S = D · I+ is “absolute circulant” ⇒ more confidence via diagonal amplification focus on absolute values and put guesses in a same diagonal together

22 / 27

SLIDE 45

Learning

S = D · I+ is “absolute circulant” ⇒ more confidence via diagonal amplification focus on absolute values and put guesses in a same diagonal together We locate all large coefficients successfully!

22 / 27

SLIDE 46

Learning

S = D · I+ is “absolute circulant” ⇒ more confidence via diagonal amplification focus on absolute values and put guesses in a same diagonal together We locate all large coefficients successfully! but we are still missing the signs!

22 / 27

SLIDE 47

Learning

Si,j ∈ {±b, ±1, 0}

10 5 5 10

f

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

probability density

Si,j =b Si,j =−b Si,j =1 Si,j =−1 Si,j =0

23 / 27

SLIDE 48

Learning

Si,j ∈ {±b}

10 5 5 10

f

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

probability density

Si,j =b Si,j =−b

23 / 27

SLIDE 49

Learning

Si,j ∈ {±b}

10 5 5 10

f

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

probability density

Si,j =b Si,j =−b

We can determine all large coefficients in one row!

23 / 27

SLIDE 50

Learning

Si,j ∈ {±b}

10 5 5 10

f

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

probability density

Si,j =b Si,j =−b

We can determine all large coefficients in one row! However, it is still hard to learn small coefficients...

23 / 27

SLIDE 51

Outline

1 Background 2 DRS signature 3 Learning secret key coefficients 4 Exploiting the leaks 24 / 27

SLIDE 52

Leaks help a lot!

Attack without leaks

dim = n + 1, short vector of length

b2 · Nb + N1 + 1

cost: > 2128

25 / 27

SLIDE 53

Leaks help a lot!

Attack without leaks

dim = n + 1, short vector of length

b2 · Nb + N1 + 1

cost: > 2128

Naive attack with leaks

dim = n + 1, short vector of length √N1 + 1 cost: 278

25 / 27

SLIDE 54

Leaks help a lot!

Attack without leaks

dim = n + 1, short vector of length

b2 · Nb + N1 + 1

cost: > 2128

Naive attack with leaks

dim = n + 1, short vector of length √N1 + 1 cost: 278

Improved attack with leaks

dim = n − Nb, short vector of length √N1 + 1 cost: 273

25 / 27

SLIDE 55

Conclusion

We present a statistical attack against DRS: given 100 000 signatures, security is below 80-bits; even less with the current progress of lattice algorithms.

26 / 27

SLIDE 56

Thank you!

27 / 27