SLIDE 1
Fitness Comparison by Statistical Testing in Construction of SAT-Based Guess-and-Determine Cryptographic Attacks
Artem Pavlenko, Maxim Buzdalov, Vladimir Ulyantsev GECCO 2019, July 16
SLIDE 2 Symmetric cryptography
0 0 1 0 1 1 1 0 Plaintext 1 0 0 1 1 0 1 1 Keystream 1 0 1 1 0 1 0 1 Ciphertext 1 0 0 1 1 0 1 1 Keystream 0 0 1 0 1 1 1 0 Plaintext Alice wants to send a secret message to Bob. To do that, she generates a random sequence. Then she applies bitwise XOR. Bob also generates the same random sequence. . . and also applies bitwise XOR to recover the message. The keystreams should be identical and have no regularity.
generator 1 1 1 0 0 Initial state
1 / 13
SLIDE 3 Symmetric cryptography
0 0 1 0 1 1 1 0 Plaintext 1 0 0 1 1 0 1 1 Keystream 1 0 1 1 0 1 0 1 Ciphertext 1 0 0 1 1 0 1 1 Keystream 0 0 1 0 1 1 1 0 Plaintext Alice wants to send a secret message to Bob. To do that, she generates a random sequence. Then she applies bitwise XOR. Bob also generates the same random sequence. . . and also applies bitwise XOR to recover the message. The keystreams should be identical and have no regularity.
generator 1 1 1 0 0 Initial state
1 / 13
SLIDE 4 Symmetric cryptography
0 0 1 0 1 1 1 0 Plaintext 1 0 0 1 1 0 1 1 Keystream 1 0 1 1 0 1 0 1 Ciphertext 1 0 0 1 1 0 1 1 Keystream 0 0 1 0 1 1 1 0 Plaintext Alice wants to send a secret message to Bob. To do that, she generates a random sequence. Then she applies bitwise XOR. Bob also generates the same random sequence. . . and also applies bitwise XOR to recover the message. The keystreams should be identical and have no regularity.
generator 1 1 1 0 0 Initial state
1 / 13
SLIDE 5 Symmetric cryptography
0 0 1 0 1 1 1 0 Plaintext 1 0 0 1 1 0 1 1 Keystream 1 0 1 1 0 1 0 1 Ciphertext 1 0 0 1 1 0 1 1 Keystream 0 0 1 0 1 1 1 0 Plaintext Alice wants to send a secret message to Bob. To do that, she generates a random sequence. Then she applies bitwise XOR. Bob also generates the same random sequence. . . and also applies bitwise XOR to recover the message. The keystreams should be identical and have no regularity.
generator 1 1 1 0 0 Initial state
1 / 13
SLIDE 6 Symmetric cryptography
0 0 1 0 1 1 1 0 Plaintext 1 0 0 1 1 0 1 1 Keystream 1 0 1 1 0 1 0 1 Ciphertext 1 0 0 1 1 0 1 1 Keystream 0 0 1 0 1 1 1 0 Plaintext Alice wants to send a secret message to Bob. To do that, she generates a random sequence. Then she applies bitwise XOR. Bob also generates the same random sequence. . . and also applies bitwise XOR to recover the message. The keystreams should be identical and have no regularity.
generator 1 1 1 0 0 Initial state
1 / 13
SLIDE 7 Symmetric cryptography
0 0 1 0 1 1 1 0 Plaintext 1 0 0 1 1 0 1 1 Keystream 1 0 1 1 0 1 0 1 Ciphertext 1 0 0 1 1 0 1 1 Keystream 0 0 1 0 1 1 1 0 Plaintext Alice wants to send a secret message to Bob. To do that, she generates a random sequence. Then she applies bitwise XOR. Bob also generates the same random sequence. . . and also applies bitwise XOR to recover the message. The keystreams should be identical and have no regularity.
generator 1 1 1 0 0 Initial state
1 / 13
SLIDE 8 Symmetric cryptography
0 0 1 0 1 1 1 0 Plaintext 1 0 0 1 1 0 1 1 Keystream 1 0 1 1 0 1 0 1 Ciphertext 1 0 0 1 1 0 1 1 Keystream 0 0 1 0 1 1 1 0 Plaintext Alice wants to send a secret message to Bob. To do that, she generates a random sequence. Then she applies bitwise XOR. Bob also generates the same random sequence. . . and also applies bitwise XOR to recover the message. The keystreams should be identical and have no regularity.
generator 1 1 1 0 0 Initial state
1 / 13
SLIDE 9 Attack on the keystream generator
0 0 1 0 1 Part of plaintext 1 0 1 1 0 Part of ciphertext 1 0 0 1 1 Part of keystream Eve has eavesdropped matching parts of plaintext and ciphertext. She applies bitwise XOR to reveal a part of keystream.
generator ? ? ? ? ? Initial state Generator is known. Eve needs to restore initial state, so that the rest of the transmission is cracked.
2 / 13
SLIDE 10 Attack on the keystream generator
0 0 1 0 1 Part of plaintext 1 0 1 1 0 Part of ciphertext 1 0 0 1 1 Part of keystream Eve has eavesdropped matching parts of plaintext and ciphertext. She applies bitwise XOR to reveal a part of keystream.
generator ? ? ? ? ? Initial state Generator is known. Eve needs to restore initial state, so that the rest of the transmission is cracked.
2 / 13
SLIDE 11 Attack on the keystream generator
0 0 1 0 1 Part of plaintext 1 0 1 1 0 Part of ciphertext 1 0 0 1 1 Part of keystream Eve has eavesdropped matching parts of plaintext and ciphertext. She applies bitwise XOR to reveal a part of keystream.
generator ? ? ? ? ? Initial state Generator is known. Eve needs to restore initial state, so that the rest of the transmission is cracked.
2 / 13
SLIDE 12 Attack on the keystream generator
0 0 1 0 1 Part of plaintext 1 0 1 1 0 Part of ciphertext 1 0 0 1 1 Part of keystream Eve has eavesdropped matching parts of plaintext and ciphertext. She applies bitwise XOR to reveal a part of keystream.
generator ? ? ? ? ? Initial state Generator is known. Eve needs to restore initial state, so that the rest of the transmission is cracked.
2 / 13
SLIDE 13
Example of a keystream generator: Trivium-64
3 / 13
SLIDE 14
Algebraic cryptoanalysis
z0 z1 z2 z3 z4 Produced keystream Keystream generator x0 x1 x2 x3 x4 Initial state
f (x0, . . . , xn, y0, . . . , ym, z0, . . . , zk) = true
SAT formula generator, yi – auxiliary variables SAT solver 1 1 1 Actual keystream 1 1 1 0 0 Cracked state
4 / 13
SLIDE 15
Algebraic cryptoanalysis
z0 z1 z2 z3 z4 Produced keystream Keystream generator x0 x1 x2 x3 x4 Initial state
f (x0, . . . , xn, y0, . . . , ym, z0, . . . , zk) = true
SAT formula generator, yi – auxiliary variables SAT solver 1 1 1 Actual keystream 1 1 1 0 0 Cracked state
4 / 13
SLIDE 16
Algebraic cryptoanalysis
z0 z1 z2 z3 z4 Produced keystream Keystream generator x0 x1 x2 x3 x4 Initial state
f (x0, . . . , xn, y0, . . . , ym, z0, . . . , zk) = true
SAT formula generator, yi – auxiliary variables SAT solver 1 1 1 Actual keystream 1 1 1 0 0 Cracked state
4 / 13
SLIDE 17
Guess-and-determine attacks
Standard way to solve SAT problems
◮ Take the formula ◮ Pass it to the SAT solver
5 / 13
SLIDE 18
Guess-and-determine attacks
Standard way to solve SAT problems
◮ Take the formula ◮ Pass it to the SAT solver
A possible alternative when solving hard SAT problems
◮ Choose a subset B of the formula’s variables – the guessed bit set ◮ Iterate over all 2|B| combinations of their values ◮ For each combination:
◮ Take the formula, substitute these variables with their values ◮ Pass it to the SAT solver ◮ If solution found, terminate
5 / 13
SLIDE 19
Guess-and-determine attacks
Standard way to solve SAT problems
◮ Take the formula ◮ Pass it to the SAT solver
A possible alternative when solving hard SAT problems
◮ Choose a subset B of the formula’s variables – the guessed bit set ◮ Iterate over all 2|B| combinations of their values ◮ For each combination:
◮ Take the formula, substitute these variables with their values ◮ Pass it to the SAT solver ◮ If solution found, terminate
◮ Sometimes this is faster. In cryptanalysis, it happens quite often
5 / 13
SLIDE 20
Attack time of a guess-and-determine attack
Several definitions possible. We use the following:
◮ Assume the keystream is infinite ◮ Set a time limit T for an attempt to solve one piece
◮ Found a solution within T → congratulations! ◮ Did not manage to find → continue with the next piece
◮ Let p be the (very small) probability that we find a solution:
◮ Expected time of an attack: T/p ◮ Time with 95% of confidence: ≈ 3T/p
6 / 13
SLIDE 21
Attack time of a guess-and-determine attack
Several definitions possible. We use the following:
◮ Assume the keystream is infinite ◮ Set a time limit T for an attempt to solve one piece
◮ Found a solution within T → congratulations! ◮ Did not manage to find → continue with the next piece
◮ Let p be the (very small) probability that we find a solution:
◮ Expected time of an attack: T/p ◮ Time with 95% of confidence: ≈ 3T/p
What is a good time of an attack?
◮ Any non-trivial result is important ◮ Example: “SHA-1 collisions now 252” ◮ A hint of a weakness → move to non-compromised ciphers until too late!
6 / 13
SLIDE 22
How to measure the attack time
Direct measurement?
◮ Well, possible, but it will take way too long
7 / 13
SLIDE 23
How to measure the attack time
Direct measurement?
◮ Well, possible, but it will take way too long
Clever indirect measurement
◮ A Monte-Carlo technique
7 / 13
SLIDE 24
How to measure the attack time
Direct measurement?
◮ Well, possible, but it will take way too long
Clever indirect measurement
◮ A Monte-Carlo technique
◮ Generate a random initial state
7 / 13
SLIDE 25
How to measure the attack time
Direct measurement?
◮ Well, possible, but it will take way too long
Clever indirect measurement
◮ A Monte-Carlo technique
◮ Generate a random initial state ◮ Compute the keystream of the needed length
7 / 13
SLIDE 26
How to measure the attack time
Direct measurement?
◮ Well, possible, but it will take way too long
Clever indirect measurement
◮ A Monte-Carlo technique
◮ Generate a random initial state ◮ Compute the keystream of the needed length ◮ Submit it as an input to the solver. We know that the solution exists!
7 / 13
SLIDE 27
How to measure the attack time
Direct measurement?
◮ Well, possible, but it will take way too long
Clever indirect measurement
◮ A Monte-Carlo technique
◮ Generate a random initial state ◮ Compute the keystream of the needed length ◮ Submit it as an input to the solver. We know that the solution exists! ◮ Set the time limit equal to T/2|B| ◮ The solver may find the solution, or may fail to meet the time limit
7 / 13
SLIDE 28
How to measure the attack time
Direct measurement?
◮ Well, possible, but it will take way too long
Clever indirect measurement
◮ A Monte-Carlo technique
◮ Generate a random initial state ◮ Compute the keystream of the needed length ◮ Submit it as an input to the solver. We know that the solution exists! ◮ Set the time limit equal to T/2|B| ◮ The solver may find the solution, or may fail to meet the time limit
◮ N measurements, N+ successes → the attack efficiency is ≈
N N+ · T
7 / 13
SLIDE 29
How to measure the attack time
Direct measurement?
◮ Well, possible, but it will take way too long
Clever indirect measurement
◮ A Monte-Carlo technique
◮ Generate a random initial state ◮ Compute the keystream of the needed length ◮ Submit it as an input to the solver. We know that the solution exists! ◮ Set the time limit equal to T/2|B| ◮ The solver may find the solution, or may fail to meet the time limit
◮ N measurements, N+ successes → the attack efficiency is ≈
N N+ · T
◮ Estimation of the attack time just got 2|B| times faster!
7 / 13
SLIDE 30
Evolutionary algorithms for attack construction
Problem definition
◮ Find the guessed bit set B with the smallest estimated attack time
8 / 13
SLIDE 31
Evolutionary algorithms for attack construction
Problem definition
◮ Find the guessed bit set B with the smallest estimated attack time
Settings
◮ Individual = a bit mask, where one-bits define the guessed bit set ◮ Fitness = the attack time as above
8 / 13
SLIDE 32
Evolutionary algorithms for attack construction
Problem definition
◮ Find the guessed bit set B with the smallest estimated attack time
Settings
◮ Individual = a bit mask, where one-bits define the guessed bit set ◮ Fitness = the attack time as above
Existing techniques
◮ Local search, simulated annealing, tabu search, (µ + λ)-style EAs ◮ Features: stochastic fitness, non-instant evaluation
8 / 13
SLIDE 33
Why not using Gray-Box Optimization?
9 / 13
SLIDE 34
Why not using Gray-Box Optimization?
Darrell in da house!1
9 / 13
SLIDE 35
Why not using Gray-Box Optimization?
Darrell in da house!1
9 / 13
SLIDE 36
Why not using Gray-Box Optimization?
Darrell in da house!1 Quick answer: the crypto-nature of the problem implies the enormous number of non-zero Walsh coefficients!
9 / 13
SLIDE 37
Fitness comparison by statistical testing
Monte-Carlo fitness → multiple measurements
◮ Existing approaches in the domain: fixed number of evaluations (≈ 500) ◮ Proposed: check whether more measurements are needed using statistical testing
10 / 13
SLIDE 38
Fitness comparison by statistical testing
Monte-Carlo fitness → multiple measurements
◮ Existing approaches in the domain: fixed number of evaluations (≈ 500) ◮ Proposed: check whether more measurements are needed using statistical testing
Domain-related special features
◮ Fitness is evaluated on a distributed cluster ◮ Inefficient to do measurements one by one → use batches of ≈ 100 measurements
10 / 13
SLIDE 39
Fitness comparison by statistical testing
Monte-Carlo fitness → multiple measurements
◮ Existing approaches in the domain: fixed number of evaluations (≈ 500) ◮ Proposed: check whether more measurements are needed using statistical testing
Domain-related special features
◮ Fitness is evaluated on a distributed cluster ◮ Inefficient to do measurements one by one → use batches of ≈ 100 measurements
Statistical tests employed: significant difference → save computations
◮ Wilcoxon rank sum test ◮ Barnard’s test (a simple test to compare two variables with two outcomes)
10 / 13
SLIDE 40
Fitness comparison by statistical testing
Monte-Carlo fitness → multiple measurements
◮ Existing approaches in the domain: fixed number of evaluations (≈ 500) ◮ Proposed: check whether more measurements are needed using statistical testing
Domain-related special features
◮ Fitness is evaluated on a distributed cluster ◮ Inefficient to do measurements one by one → use batches of ≈ 100 measurements
Statistical tests employed: significant difference → save computations
◮ Wilcoxon rank sum test ◮ Barnard’s test (a simple test to compare two variables with two outcomes) ◮ p-values, multiple comparisons etc. – Statistics is a lie, but it is the lesser evil
10 / 13
SLIDE 41
Experiments
◮ Simple GA, population size N = 10, five experiments, 12 wall-clock hours each ◮ ROKK SAT solver, time limit for each run is 10 seconds ◮ Time limit for the final attack time is further refined Time #individuals #individuals Cipher |B| limit Attack time w/ stats w/o stats A5/1 35 0.278 2.19 · 1012 1471 341 Bivium 28 2.715 1.15 · 1012 3616 2439 Trivium-64 21 2.373 3.23 · 107 3398 1323 Trivium-96 35 2.485 1.24 · 1012 2494 1299
11 / 13
SLIDE 42
Assessment of statistical tests
Cipher Wilcoxon only Barnard only Both None A5/1 215 146 1182 5812 Bivium 3786 946 9381 3974 Trivium-64 1943 560 5951 8476 Trivium-96 738 318 3322 8092
12 / 13
SLIDE 43
Conclusion
◮ An interesting application of evolutionary algorithms to serious cryptanalysis ◮ A few world records have been broken (for simplified ciphers however) ◮ Using statistical testing when comparing Monte-Carlo fitnesses was helpful
◮ The number of tested individuals increased by a factor from 1.5× to 4.3×
◮ The methodology is still young → more to come!
13 / 13
SLIDE 44
Conclusion
◮ An interesting application of evolutionary algorithms to serious cryptanalysis ◮ A few world records have been broken (for simplified ciphers however) ◮ Using statistical testing when comparing Monte-Carlo fitnesses was helpful
◮ The number of tested individuals increased by a factor from 1.5× to 4.3×
◮ The methodology is still young → more to come!
Thanks for listening! 13 / 13