[PPT] - Memory-hard functions and tradeoff cryptanalysis with applications PowerPoint Presentation

SLIDE 1

Memory-hard functions and tradeoff cryptanalysis with applications to password hashing, cryptocurrencies, and white-box cryptography

Alex Biryukov Dmitry Khovratovich Johann Groschaedl

University of Luxembourg

SLIDE 2

1 Introduction

Passwords Related applications

2 Memory-hard functions

Toy example Pebble game Scrypt Catena Argon Lyra2

3 ASIC implementations

Context Catena Argon Lyra2

4 White-box cryptography

Definitions White-boxed AES Memory-hard ciphers

SLIDE 3

Passwords

SLIDE 4

Password evolution

Since 1960s, multi-user environments employ password-based authentication:

User selects name l and password p;
Hash digest H(p) is stored;
User sends (l, p) during the login;
Server matches (l, H(p)) with its database.

Passwords should not be stored in cleartext. Encryption is inconvenient as it requires key material. Server should not know the passwords between the sessions, which implies a preimage-resistant hash function. Weakness?

SLIDE 5

Password evolution

Since 1960s, multi-user environments employ password-based authentication:

User selects name l and password p;
Hash digest H(p) is stored;
User sends (l, p) during the login;
Server matches (l, H(p)) with its database.

Identical passwords have identical hashes. Solution: random salt.

SLIDE 6

Password evolution

Since 1960s, multi-user environments employ password-based authentication:

User selects name l and password p;
Salt s and H(s, p) are stored;
User sends (l, p) during the login;
Server matches (l, H(s, p)) with its database.

Weakness?

SLIDE 7

Password evolution

Since 1960s, multi-user environments employ password-based authentication:

User selects name l and password p;
Salt s and H(s, p) are stored;
User sends (l, p) during the login;
Server matches (l, H(s, p)) with its database.

If the database is stolen, an adversary can test the most popular passwords ("123456", "aaa", etc.). Solution: iterate the hash function multiple times.

SLIDE 8

Password evolution

Since 1960s, multi-user environments employ password-based authentication:

User selects name l and password p;
Salt s and H(H(· · · H
100000 times

(s, p))) are stored;

User sends (l, p) during the login;
Server matches (l, H(s, p)) with its database.

Weakness?

SLIDE 9

Password evolution

Since 1960s, multi-user environments employ password-based authentication:

User selects name l and password p;
Salt s and H(H(· · · H
100000 times

(s, p))) are stored;

User sends (l, p) during the login;
Server matches (l, H(s, p)) with its database.

Adversary may employ graphic cards (GPU) and dedicated hardware (FPGA or even custom ASICs), where password cracking is much cheaper. Solution: use memory-intensive operations, as memory access is expensive everywhere.

SLIDE 10

Cryptocurrencies and other applications

SLIDE 11

Bitcoin

Bitcoin chain:

Previous block hash (SHA-256)

Transaction data hash

(SHA-256) Time Difficulty Miner’s 25 btc name A B C D X btc 1 3 2 4 300000

Blocks are generated every 10 minutes with each block

containing 25 new Bitcoins (7000 USD).

To generate a new block, miners take hash h (double

SHA-256) of previous block, add a 32-bit nonce n, timestamp t and hash it.

Miner wins if H(h, n, t) has 66 leading zeros.

SLIDE 12

Cryptocurrencies

Mining:

There is a chain of blocks, generated every 10 minutes with

each block containing 25 new Bitcoins (7000 USD).

To generate a new block, miners take hash h (double

SHA-256) of previous block, add a 32-bit nonce n, timestamp t and hash it.

Miner wins if H(h, n, t) has 66 leading zeros.

The same problem: massive computation of SHA-256 is much more efficient on dedicated hardware (232 hashes per joule vs. 217 hashes per joule).

SLIDE 13

Cryptocurrencies

Mining:

There is a chain of blocks, generated every 10 minutes with

each block containing 25 new Bitcoins (7000 USD).

To generate a new block, miners take hash h (double

SHA-256) of previous block, add a 32-bit nonce n, timestamp t and hash it.

Miner wins if H(h, n, t) has 66 leading zeros.

The same problem: massive computation of SHA-256 is much more efficient on dedicated hardware (232 hashes per joule vs. 217 hashes per joule). The same proposal: use hash functions that perform a number of memory operations. However, it should be much faster compared to the password hash, so less memory would be processed.

SLIDE 14

Other applications

Other applications of memory-intensive functions:

Password-based key derivation. Identical to password hashing,

but allows larger running time;

Fighting spam1. A mail sender must do a lot of memory

accesses with provably many cache misses.

Proofs of space2: a user should regularly prove that he

possesses a certain amount of memory.

1Cynthia Dwork, Andrew Goldberg, and Moni Naor. “On Memory-Bound

Functions for Fighting Spam”. In: CRYPTO 2003.

2Stefan Dziembowski et al. “Proofs of Space”. In: IACR Cryptology ePrint

Archive’2013/796 ().

SLIDE 15

Memory-intensive functions

Clearly, memory-intensive functions should not be simulated by functions with modest memory requirements.

SLIDE 16

Memory-hard functions

SLIDE 17

Memory-hard functions

Memory-hard function (informal) — function f that needs time T and space S for ordinary computation, but T ′ ≫ T if S′ < S. The hardness is represented by a tradeoff ST = f (N).

Time Space

Normal computation

There could be other tradeoffs: computation-memory,

energy-memory (for implementations), time-area, etc.

SLIDE 18

Toy example

SLIDE 19

Toy example

Hash function with two iterations over memory of size N.

Vi = F(Vi−1);
V ′

N = VN;

V ′

i = F(V ′ i+1||Vi).

F F F F F F F F F F F F F F F F F

X Y

SLIDE 20

Trivial tradeoff

Compute the hash using N

m + m memory units and 3N calls to F

(instead of 2N):

Store every m-th block;
When entering a new interval, precompute its m inputs.

Optimal point is m = √ N.

F F F F F F F F F F F F F F F F F

X Y

SLIDE 21

Functions as graphs

The composition of hash functions can be represented as a directed acyclic graph.

F F F F F F F F F F F F F F F F F

X Y In Out

SLIDE 22

Bounded in-degree

Resulting graphs must have bounded in-degree, otherwise the solution is trivial:

F F F F F F F F F F

X Y

SLIDE 23

Bounded in-degree

Resulting graphs must have bounded in-degree, otherwise the solution is trivial:

F F F F F F F F F F

X Y

So we limit the number of inputs taken by subfunctions by some k.

SLIDE 24

Pebble game

SLIDE 25

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

1

SLIDE 26

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

2

SLIDE 27

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

2

SLIDE 28

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

2

SLIDE 29

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

3

SLIDE 30

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

3

SLIDE 31

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

3

SLIDE 32

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

4

SLIDE 33

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

5

SLIDE 34

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

4

SLIDE 35

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

3

SLIDE 36

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

3

SLIDE 37

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

5

SLIDE 38

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

4

SLIDE 39

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

3

SLIDE 40

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

2

SLIDE 41

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

2

SLIDE 42

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

4

SLIDE 43

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

3

SLIDE 44

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

2

SLIDE 45

Pebble game

Computation with space complexity S can be modelled as a pebble game with S pebbles:

A free pebble can be placed on an input vertex at any time;
A pebble can be removed at any time;
A free pebble can be placed at any vertex if all its predecessors

are pebbled.

We win if we pebble all output vertices.

In Out

1

SLIDE 46

Earlier results

Early results on pebble games on k-in graphs:

Every graph with N vertices can be pebbled with ck

N log N

pebbles3;

3John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. “On Time

Versus Space”. In: J. ACM 24.2 (1977), pp. 332–337.

4Wolfgang J. Paul, Robert Endre Tarjan, and James R. Celoni. “Space

Bounds for a Game on Graphs”. In: Mathematical Systems Theory 10 (1977),

pp. 239–251.

5Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds

n time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982),
pp. 1087–1130.

SLIDE 47

Earlier results

Early results on pebble games on k-in graphs:

Every graph with N vertices can be pebbled with ck

N log N

pebbles3;

There exist graphs for which this bound is tight4 and time

complexity is superpolynomial of N5. Time complexity bounds for pebble number between

N log N and N is

unclear.

3John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. “On Time

Versus Space”. In: J. ACM 24.2 (1977), pp. 332–337.

4Wolfgang J. Paul, Robert Endre Tarjan, and James R. Celoni. “Space

Bounds for a Game on Graphs”. In: Mathematical Systems Theory 10 (1977),

pp. 239–251.

5Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds

n time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982),
pp. 1087–1130.

SLIDE 48

Parallel pebbling

Nowadays, multiple cores are available. Hence we can precompute:

F F F F F F F F F F F F F F F F F

X Y

The scheme latency is still 2N hash calls, since the input addresses are known in advance.

SLIDE 49

Memory-hardness from superconcentrators

Superconcentrators: several layers, each set of l input and l output vertices has l vertex-disjoint paths. Stacks of superconcentrators exhibit nice tradeoffs6: T = αO(α)N. Stacks of superconcentrators are interesting candidates for a memory-hard function, but the overhead is too large (40+ layers for 1 GB of RAM).

6Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds

n time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982),
pp. 1087–1130.

SLIDE 50

Resilience to parallel hashing

Another problem of superconcentrators: if N cores are available, the time complexity is only log n.

7Jo¨

el Alwen and Vladimir Serbinenko. “High Parallel Complexity Graphs and Memory-Hard Functions”. In: IACR Cryptology ePrint Archive, Report 2014/238 ().

SLIDE 51

Resilience to parallel hashing

Another problem of superconcentrators: if N cores are available, the time complexity is only log n. There are graphs of size N that can not be efficiently parallelized7. However, they consist of log5 N layers, which is prohibitively slow with memory size of 100 MB.

7Jo¨

el Alwen and Vladimir Serbinenko. “High Parallel Complexity Graphs and Memory-Hard Functions”. In: IACR Cryptology ePrint Archive, Report 2014/238 ().

SLIDE 52

Data-dependent addressing: Scrypt

SLIDE 53

Scrypt

Scrypt8 — hashing with data-dependent addressing:

Sequential initialization: X[i] ← H(X[i − 1]);
Pseudo-random walk on X[] (previously suggested by Dwork et

al.9): for 1 ≤ i ≤ N A ← H(A ⊕ X[A]).

Used in the Litecoin cryptocurrency with moderate N;

X[] : A

X[A]

H

8Colin Percival. “Stronger key derivation via sequential memory-hard

functions”. In: (2009). http://www.tarsnap.com/scrypt/scrypt.pdf.

9Cynthia Dwork, Moni Naor, and Hoeteck Wee. “Pebbling and Proofs of

Work”. In: CRYPTO 2005.

SLIDE 54

Scrypt

Problems:

Too many parameters and subfunctions;
Allows trivial tradeoff:

ST = O(N2);

ASIC implementations demonstrate 1000x efficiency

improvement;

Might be subject to cache-based timing attacks10.

10Daniel J. Bernstein. Cache-timing attacks on AES. Tech. rep.

http://cr.yp.to/antiforgery/cachetiming-20050414.pdf. 2005.

SLIDE 55

Open problems:

1 What are the most efficient memory-hard

functions?

2 Do they have to be data-dependent? 3 What are the best tradeoffs we can get?

SLIDE 56

PHC

Password Hashing Competition (2014-2015): struggle to find faster, more secure, more universal schemes.

22 schemes in competition;
Vast majority claim resilience to GPU/ASIC cracking;
Only a few really tried to attack their schemes (standard

practice in cryptography designs);

SLIDE 57

PHC

Password Hashing Competition (2014-2015): struggle to find faster, more secure, more universal schemes.

22 schemes in competition;
Vast majority claim resilience to GPU/ASIC cracking;
Only a few really tried to attack their schemes (standard

practice in cryptography designs);

We show how to improve such attacks;
And we will see how ASIC-equipped adversaries can exploit

them. We considered three schemes, which have come out of academic crypto-community and have clear documentation.

SLIDE 58

Catena

SLIDE 59

Bit-reversal permutation

Bit-reversal permutation11:

two layers, vertex i1i2 · · · in is connected with in · · · i2i1.

000 001 010 011 100 101 110 111 111 000 001 010 011 100 101 110

In Out

Tradeoff: ST = O(N2); or T = αN, where α = N

S — memory reduction.

11Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds

n time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982),
pp. 1087–1130.

12Jo¨

el Alwen and Vladimir Serbinenko. “High Parallel Complexity Graphs and Memory-Hard Functions”. In: IACR Cryptology ePrint Archive, Report 2014/238 ().

SLIDE 60

Bit-reversal permutation

Bit-reversal permutation11:

two layers, vertex i1i2 · · · in is connected with in · · · i2i1.

000 001 010 011 100 101 110 111 111 000 001 010 011 100 101 110

In Out

Tradeoff: ST = O(N2); or T = αN, where α = N

S — memory reduction.

Can be computed with √ N memory and time 2N on multiple cores12.

Stack of such permutations?

11Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds

n time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982),
pp. 1087–1130.

12Jo¨

el Alwen and Vladimir Serbinenko. “High Parallel Complexity Graphs and Memory-Hard Functions”. In: IACR Cryptology ePrint Archive, Report 2014/238 ().

SLIDE 61

Catena-λ

Catena13:

Stack of λ bit-reversal permutations (λ = 3, 4):

V L[ABC] = H(V L[ABC − 1], V L−1[C B A]).

000 001 010 011 100 101 110 111 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110

In Out

13Christian Forler, Stefan Lucks, and Jakob Wenzel. “Catena: A

Memory-Consuming Password Scrambler”. In: IACR Cryptology ePrint Archive, Report 2013/525 ().

SLIDE 62

Catena-λ

Catena13:

Stack of λ bit-reversal permutations (λ = 3, 4):

V L[ABC] = H(V L[ABC − 1], V L−1[C B A]).

000 001 010 011 100 101 110 111 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110

In Out

Full-round hash function (Blake2);
Proof of tradeoff resilience (extension of Lengauer-Tarjan

proof for λ = 1): SλT = Ω(Nλ+1) Memory fraction 1

q should imply penalty qλ.

13Christian Forler, Stefan Lucks, and Jakob Wenzel. “Catena: A

Memory-Consuming Password Scrambler”. In: IACR Cryptology ePrint Archive, Report 2013/525 ().

SLIDE 63

Observation

Apparently, the proof has a flaw.

SLIDE 64

Observation

Apparently, the proof has a flaw.

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Consider vertices [AB0], [AB1], [AB2], . . ., where B has n − 2k

bits and the other letters are k-bit;

SLIDE 65

Observation

Apparently, the proof has a flaw.

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Consider vertices [AB0], [AB1], [AB2], . . ., where B has n − 2k

bits and the other letters are k-bit;

To compute [ABC] at level T, we need [C B A] at level T − 1;
[C B A] refers to [ABC] at level T − 2.
Note that the middle part is either B or B.

SLIDE 66

Tradeoff cryptanalysis

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Efficient computation of [AB∗] at level 4:

Suppose that we have stored all vertices [∗ ∗ 0] at all levels

(2n−k vertices per level);

Compute [∗B∗] at level 0 (22k steps);
Use these values to compute [∗B∗] at level 1 (22k steps);
Use these values to compute [∗B∗] at level 2 (22k steps);
Use these values to compute [∗B A] at level 3 (A2k steps);
Use these values to compute [AB∗] at level 4 (2k steps).

In total 3.5 · 22k hashes for 2k vertices.

SLIDE 67

Cryptanalysis-II

Eventually we have the following penalties for l < n/3 − 2: Memory fraction Catena-3 Catena-4 Penalty

1 2

7.4 13.1

1 4

15.5 26.1

1 8

30.1 51.5

1 2l

2l+1.9 2l+2.7

SLIDE 68

Cryptanalysis-II

Eventually we have the following penalties for l < n/3 − 2: Memory fraction Catena-3 Catena-4 Penalty

1 2

7.4 13.1

1 4

15.5 26.1

1 8

30.1 51.5

1 2l

2l+1.9 2l+2.7 So the penalty is 4q for memory fraction 1

q. Tradeoff for Catena-3:

ST ≤ 16N2;

SLIDE 69

Argon

SLIDE 70

Argon

Argon14:

Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix

Blockcipher-based design:

n × 32-matrix of 16-byte blocks;
Row-wise nonlinear transformation (48 reduced AES cores and

a linear layer) with guaranteed branch number (at least 8 inputs for 1 output);

Column-wise permutation (n data-dependent swaps based on

the RC4 permutation).

14Alex Biryukov and Dmitry Khovratovich. Argon: password hashing scheme.

Tech. rep. https://www.cryptolux.org/images/0/0c/Argon-v1.pdf. 2014.

SLIDE 71

Tradeoff

Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix

When trying to attack apply the following strategy:

Store permutations, not blocks (about 1

2 of total memory);

SLIDE 72

Tradeoff

Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix

When trying to attack apply the following strategy:

Store permutations, not blocks (about 1

2 of total memory);

When an element is needed, recompute it;

SLIDE 73

Tradeoff

Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix

When trying to attack apply the following strategy:

Store permutations, not blocks (about 1

2 of total memory);

When an element is needed, recompute it;
Parallelize the RC4 permutation: ≈ 250 elements can be read

in parallel without bank collisions.

Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix 1 lookup 8 lookups 64 lookups 2 AES calls 16 AES calls 128 AES calls 512 AES calls

Last level: one memory access is replaced with a tree of depth 7 of 5-round AES, which increases latency by a few times.

SLIDE 74

Further tradeoff

Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix 1 lookup 8 lookups 64 lookups 2 AES calls 16 AES calls 128 AES calls 512 AES calls

If the last permutation can not be saved, it has to be recomputed each time we need an element: 218-increase in latency.

SLIDE 75

Computational penalties

Penalties slightly depend on the memory size: Fraction \ Memory 16 MB 128 MB 1 GB

1 2

139 160 180

1 4

218 226 234

1 8

231 236 247 Tradeoff: T = N4 (cN)3N/M

SLIDE 76

Lyra2

SLIDE 77

Lyra2

Lyra2 [Simplicio-Almeida-Andrade-dos Santos-Barreto’13]:

64 1 2 R 1

R × 64 matrix of 64-byte blocks. Two phases:

Setup phase: deterministic generation and update of rows;
Wandering phase (T ≥ 1 iterations): sequential and

pseudorandom update in parallel. Claims high speed (1.2GB/sec for T = 1).

SLIDE 78

Setup phase

F — stateful function with 128-byte state (sponge construction based on hash function Blake2b). M[i] ← F(M[i − 1], M[2k − i]), M[2k − i]⊕ = M[i];

F M[8] M[9] M[14] M[11] M[5] M[6] M[7] M[12]

SLIDE 79

Setup phase

Overall picture:

R

SLIDE 80

Tradeoff analysis: Setup phase

Strategy:

Store first 2l rows;
Store every q-th row;

Then q consecutive rows are determined from q(r − l) previous rows, which are precomputed.

i 2r − i 2r−1 q 2l

Setup phase can be computed with little penalty and memory.

SLIDE 81

Wandering phase

Wandering phase: M[i] ← F(M[i − 1], M[ri], M[ri]⊕ = M[i]; Here ri – pseudorandom function of M[i − 1] (i.e. determined at the time of computation).

time blocks i i − 1 r[i]

SLIDE 82

Tradeoff analysis: Wandering phase

Pseudo-random dependency seem to impose prohibitive penalties:

time blocks i i − 1 r[i]

Trees may cover the entire matrix.

SLIDE 83

Tradeoff analysis: Wandering phase

First idea: split the computation into levels, store all links within the level. n 4 n 2 3n 4

n

SLIDE 84

Tradeoff analysis: Wandering phase

Second idea: store everything that refers to the most expensive rows (keep a list).

1 R top-10%

SLIDE 85

Tradeoff analysis: Wandering phase

Third idea: note that rows are updated column-wise. Good for CPU cache, but even better for ASIC-equipped adversaries.

Store initial state of each row;
Compute new row columnwise;
So the extra latency is introduced before the first column only.

f f f Depth d Depth d Delay d Delay 1 Depth d Delay 1

SLIDE 86

Penalties

Penalties: Setup phase Wandering phase (T = 1) Memory fraction Penalty

1 2

1.5

1 4

2

1 8

3

1 16

4 Memory fraction Penalty

1 2

2

1 4

6.6

1 8

111.7

1 16

216 When we combine two phases, we count how many intervals of length q are accessed at the Wandering phase. Total: Memory fraction Penalty

1 2

118

1 3

602

1 4

2241

1 6

14801

SLIDE 87

Overall

Catena, Argon, and Lyra2 tradeoffs for 1 GB: Memory fraction Penalty Catena-3 Argon Lyra2 (T = 1)

1 2

7.4 180 118

1 3

11.2 229.5 602

1 4

15.5 234 2241

1 8

30.1 247 218

SLIDE 88

Optimal ASIC implementations

SLIDE 89

Password crackers

History of password crackers:

70-90s: regular desktops;
00s: GPUs and FPGAs;
10s: dedicated hardware?

SLIDE 90

Password crackers

History of password crackers:

70-90s: regular desktops;
00s: GPUs and FPGAs;
10s: dedicated hardware?

Let us figure out how a rich adversary would build his password cracker.

SLIDE 91

ASIC

ASIC (application-specific integrated chip) — dedicated hardware.

Large design costs (mln $);
Production costs high in small quantity;
The most energy-efficient systems.

SLIDE 92

ASIC

ASIC (application-specific integrated chip) — dedicated hardware.

Large design costs (mln $);
Production costs high in small quantity;
The most energy-efficient systems.

When passwords are of high value, an adversary may want to design a password-cracking scheme.

Parallelism in computations;
Parallelism in memory access (very difficult for all other

architectures);

In the long term electricity will dominate the costs.

So let us minimize the energy needed to compute a single password.

SLIDE 93

Straightforward implementation

A straightforward implementation of a password hashing scheme typically has a huge memory block and a small computational core block.

Memory

Core

SLIDE 94

Tradeoff implementation

Less memory, more computations:

Memory

g g g g g g g

Core Extra core

SLIDE 95

Tradeoff implementation

Less memory, more computations:

Memory

g g g g g g g

Core Extra core

Time may not grow:

If transformations are data-independent, they can be
precomputed. Protection against cache-based timing attacks

makes the scheme more vulnerable to tradeoff attacks.

Data-dependent transformations introduce some latency.

However, at the other tree levels all data dependencies are known.

SLIDE 96

Tradeoff evaluation

What determines the cracking cost? The following metrics can be used:

Computational complexity (total number of operations).

Rather easy to compute, but inaccurate for memory-hard functions.

Time×area. Good approximation to energy consumption if all

the elements consume the same energy. Needs to know latencies and area requirements of all operations.

Energy cost. More relevant when idle memory, active memory,

and logic consume different power (actual for static RAM). Needs to know energy requirements of all elements.

SLIDE 97

Our assumptions

So far no one has placed that much memory on a single ASIC, so the exact behaviour of such chip is unknown. We make the following assumptions:

Static RAM is more energy-efficient;
The memory can be partitioned into 216 banks (two levels of

hierarchy);

All banks can be read and written in parallel with average

latency of 3 cycles;

We ignore the area of communication wires between memory

and computational cores. OK for our 216 memory banks and not so many cores, but can be a problem for much more dense structure.

SLIDE 98

Energy model

Energy model:

E = LT + NMEM + NCEC

total energy scheme time static RAM power consumption memory

perations

access energy hash calls hash call energy

Three main contributors to the energy cost:

Leakage power of static RAM;
Memory access energy;
Hash computation energy.

SLIDE 99

Reference platform

We take the best implementations scale them to the reference platform: 65nm CMOS technology, 1.2V supply voltage, 400 MHz frequency.

AES: scaling down 22 nm, 1GHz implementation,1 cycle per

round;

Blake2b: scaling up and doubling 90nm, 286 MHz

implementation of Blake-32, 2 cycles per round;

Static RAM: 65nm, 850 MHz implementation;

SLIDE 100

Reference platform

Primitive Power Area Latency AES (full) 32 mW 17.5 kGE 10 Blake2b (full) 13.3 mW 19 kGE 20 16 KB – 32bit memory bank 12.6µW 192 kGE 3 Operation Energy 1 Gcall (230) of AES 800 mJ 1 Gcall of Blake2b 867 mJ 1 GB memory reads/writes 1 mJ Therefore, an AES core is equivalent to 700 bytes in area. One run

f AES costs as much as reading 800 bytes.

SLIDE 101

ASIC implementations of tradeoffs

SLIDE 102

Catena

SLIDE 103

Catena

Catena uses data-independent addressing.

We precompute the hash tree by the time it is needed;
If the memory is reduced by factor q, we add λq Blake2 cores
n the chip.

1 GB Catena-3: Total energy Time Memory Blake Fraction Read Energy Gcalls Energy 192 J 240 sec 1 6 GB 192 J 0.06 54 mJ

192 q + 0.2q J

240 sec

1 q

12 GB

192 q

J 0.25q 0.2q J Optimal tradeoff point: q = 32, 12 J per password.

SLIDE 104

Argon

SLIDE 105

ASIC implementation

We use the following strategy:

Always use 210 AES cores for the Mix operation, this makes

the latency of the AES part very low;

When 1

2 of memory is used, the latency grows by the factor of

6;

When 1

3 of memory is used, the latency further grows by the

factor 223.

SLIDE 106

ASIC implementation

We use the following strategy:

Always use 210 AES cores for the Mix operation, this makes

the latency of the AES part very low;

When 1

2 of memory is used, the latency grows by the factor of

6;

When 1

3 of memory is used, the latency further grows by the

factor 223. 1 GB Argon: Total energy Time Memory AES (5 rounds) Fraction Read Energy Gcalls Energy 209 mJ 0.02 sec 1 21 GB 34 mJ 0.43 175 mJ 33 J 0.1 sec 1/2 10.3 GB 52 mJ 83 33 J 139 MJ 8 hrs 1/3 2 PB 10 kJ 228.5 139 MJ Efficiency drops very quickly.

SLIDE 107

Lyra2

SLIDE 108

ASIC implementation

Lyra:

Store the initial values of the state (sponge);
Compute the dependency tree columnwise;
Quite large penalty for the setup phase, subject to improve.

Energy Time Memory Blake (1 round) Fraction Read Energy Gcalls Energy 71 mJ 0.08 sec 1 6 GB 68 mJ 0.03 3 mJ 318 mJ 0.10 sec 1/2 7.4 GB 50 mJ 3.7 269 mJ 1.4 J 0.15 sec 1/3 37.6 GB 77 mJ 18.8 1.4 J 5.2 J 0.17 sec 1/4 140 GB 173 mJ 70 5.1 J Memory-full implementation is energy-efficient. Growth is not that high though.

SLIDE 109

Lyra2

Is Lyra2 secure?

SLIDE 110

Lyra2

Is Lyra2 secure? Depends on the metric. Time× area Memory Cores Fraction Size (MGE) Number Size (MGE) 120 1 1536 1 0.02 80 1/2 768 179 3.2 76 1/3 512 642 11.6 71 1/4 384 2079 37.6 91 1/5 307 2992 54.2 Optimal point at 1/4 of memory, as below that point the scheme execution time grows too fast.

SLIDE 111

Memory-hard ciphers for white-box cryptography

SLIDE 112

White-box implementation

WBC centers around white-box implementation:

1. Obfuscated software implementation of a cipher (encryption or

decryption routine) with embedded key;

2. Implementation is assumed available to an adversary;
3. Goal to prevent the adversary from getting the key

(key-recovery security) or from inverting/decrypting the cipher (non-invertibility).

SLIDE 113

White-box implementation

WBC centers around white-box implementation:

1. Obfuscated software implementation of a cipher (encryption or

decryption routine) with embedded key;

2. Implementation is assumed available to an adversary;
3. Goal to prevent the adversary from getting the key

(key-recovery security) or from inverting/decrypting the cipher (non-invertibility). Similar to public-key cryptography (RSA). Why not using it?

SLIDE 114

White-box implementation

WBC centers around white-box implementation:

1. Obfuscated software implementation of a cipher (encryption or

decryption routine) with embedded key;

2. Implementation is assumed available to an adversary;
3. Goal to prevent the adversary from getting the key

(key-recovery security) or from inverting/decrypting the cipher (non-invertibility). Similar to public-key cryptography (RSA). Why not using it?

RSA-2048 encryption speed — 1000 CPU cycles per byte.
AES-128 encryption speed — 0.7 CPU cycles per byte.

SLIDE 115

White-box implementation

WBC centers around white-box implementation:

1. Obfuscated software implementation of a cipher (encryption or

decryption routine) with embedded key;

2. Implementation is assumed available to an adversary;
3. Goal to prevent the adversary from getting the key

(key-recovery security) or from inverting/decrypting the cipher (non-invertibility). Similar to public-key cryptography (RSA). Why not using it?

RSA-2048 encryption speed — 1000 CPU cycles per byte.
AES-128 encryption speed — 0.7 CPU cycles per byte.

Impractical for large amount of data. So one more requirement:

4. Performance loss should be minimal.

SLIDE 116

Key-recovery security

Traditionally, obfuscation aims for non-invertibility, and the resulting constructions are immensely impractical.

SLIDE 117

Key-recovery security

Traditionally, obfuscation aims for non-invertibility, and the resulting constructions are immensely impractical. Weaker notion of key-recovery security: an adversary can not extract the key from the code.

Chronologically first definition;
Apparently easier to achieve;
Practically relevant when the code can not be extracted and

isolated easily (code lifting); Trivial solution: hash K with H() before using in AES. Then recovery of K requires a preimage attack on H.

SLIDE 118

White-box implementations of AES

SLIDE 119

AES

AES-128 (designed in 1997, adopted in 2001): 10-round cipher with 16-byte state. One round of AES:

Four 32-bit blocks:
AddRoundKey (simple XOR);
SubBytes (bytewise

nonlinear);

MixColumns (linear).
ShiftRows (byte permutation).

S S S S MixColumn K subkey injection nonlinear linear

S32 S32 S32 S32

K 1 round

SLIDE 120

White-boxing AES round

S S S S MixColumn K

linear S32 S32 S32 S32

K A

secret linear secret I1 I2 I232 · · · O1 O2 O232 · · · lookup table key-dependent Table 1 Table 2 Table 3 Table 4

S S S S MixColumn K Original S1 S2 S3 S4

S−1

1

S−1

2

S−1

3

S−1

4

secret nonlinear

A2 S′

1

A S′

2

S′

3

S′

4

A1 A3 A4 SAS structure

S−1

1

S−1

2

S−1

3

S−1

4

Wrap the key addition and S-boxes with redundant linear and

nonlinear transformations;

The secret layers collapse to the SAS structure.
Replace every 32-bit block with a lookup table, claim that the

key can not be extracted from it;

Store everything in memory.

SLIDE 121

White-boxing AES round

S S S S MixColumn K

linear S32 S32 S32 S32

K A

secret linear secret I1 I2 I232 · · · O1 O2 O232 · · · lookup table key-dependent Table 1 Table 2 Table 3 Table 4

S S S S MixColumn K Original S1 S2 S3 S4

S−1

1

S−1

2

S−1

3

S−1

4

secret nonlinear

A2 S′

1

A S′

2

S′

3

S′

4

A1 A3 A4 SAS structure

S−1

1

S−1

2

S−1

3

S−1

4

Wrap the key addition and S-boxes with redundant linear and

nonlinear transformations;

The secret layers collapse to the SAS structure.
Replace every 32-bit block with a lookup table, claim that the

key can not be extracted from it;

Store everything in memory.

Actual proposal used smaller and weaker tables.

SLIDE 122

Attacks

All such constructions were broken:

Attack on the first variant [Billet’04];
Improved variants [Bringer’06, Karroumi’11];
Attacks on improved variants [DeMulder’10,’12,’13].

SLIDE 123

White-boxing AES round

Because the SAS structure is insecure:

S S S S MixColumn K

linear S32 S32 S32 S32

K A

secret linear secret I1 I2 I232 · · · O1 O2 O232 · · · lookup table key-dependent Table 1 Table 2 Table 3 Table 4

S S S S MixColumn K Original S1 S2 S3 S4

S−1

1

S−1

2

S−1

3

S−1

4

secret nonlinear

A2 S′

1

A S′

2

S′

3

S′

4

A1 A3 A4 SAS structure

S−1

1

S−1

2

S−1

3

S−1

4

Only three secret layers are insufficient to hide the internal components.

SLIDE 124

SASAS is insecure

Generic attack on a 5-layer scheme, where all components are key-dependent and unknown to the adversary15.

S – layer of nonlinear S-boxes.
A – affine layer.

S A A

nonlinear nonlinear affine affine

S S

nonlinear

Attack:

Assume we can query the black-box SASAS in both directions.
Remove the outer nonlinear layers with the multiset attack;
Remove the linear layers with low-rank detection technique.

15Alex Biryukov and Adi Shamir. “Structural Cryptanalysis of SASAS”. In:

EUROCRYPT’01. 2001.

SLIDE 125

ASASA

However, the ASASA structure is still unbroken:

A S S A A

affine nonlinear nonlinear affine affine

Hints for new designs...

SLIDE 126

Memory-hard ciphers

SLIDE 127

White-box implementation from scratch

Problems with existing ciphers:

Not designed with white-box implementations in mind;
Even key-recovery security is difficult to achieve.

What if we make a white-box suitable cipher from scratch?

SLIDE 128

Weak white-box security

It should be infeasible to derive a key or any other compact secret information from the WB implementation. Using Hash(K) instead of K clearly does not help.

SLIDE 129

Memory-hard implementation

The implementation is memory-hard if it requires a pre-specified and large enough amount of memory (like our password hashing schemes).

SLIDE 130

Memory-hard cipher

E1,1 L

R subciphers R iterations

E1,2 E1,R ER,1 ER,2 ER,R L A S A S A

Our concept:

Cipher composed of smaller d-bit subciphers (8 ≤ d ≤ 28).
Parameter d determines the implementation size.
Subcipher invocations alternate with public permutations (L).

SLIDE 131

Memory-hard cipher

E1,1 L

R subciphers R iterations

E1,2 E1,R ER,1 ER,2 ER,R L A S A S A

Details:

Each subcipher has ASASA or ASASASA structure;
Subciphers exposed as lookup tables of size from 2 MB to 20

GB.

Black-box implementation is within 100 KB.
A- and S-layers can not be extracted out of the lookup table

faster than 264 (ASASA) or 2128 (ASASASA).

SLIDE 132