SLIDE 1 Learning a SAT Solver from Single-Bit Supervision
Daniel Selsam1 Matthew Lamm1 Benedikt B¨ unz1 Percy Liang1 Leonardo de Moura2 David L. Dill1
1Stanford University 2Microsoft Research
March 22nd, 2018
SLIDE 2 Setup
Train:
SAT problem P Output: ✶ {P is satisfiable}
SLIDE 3 Setup
Train:
SAT problem P Output: ✶ {P is satisfiable}
P NeuroSAT unsat P NeuroSAT sat
SLIDE 4 Setup
Train:
SAT problem P Output: ✶ {P is satisfiable}
P NeuroSAT unsat P NeuroSAT sat solution
2
SLIDE 5
Background: SAT
3
SLIDE 6 Background: SAT
◮ A propositional formula can be represented as an AND of ORS.
– example: (x1 ∨ x2)
∧ (x2 ∨ x1)
3
SLIDE 7 Background: SAT
◮ A propositional formula can be represented as an AND of ORS.
– example: (x1 ∨ x2)
∧ (x2 ∨ x1)
◮ Jargon:
– x1, x1, x2, x2 are all literals – c1, c2 are both clauses
3
SLIDE 8 Background: SAT
◮ A propositional formula can be represented as an AND of ORS.
– example: (x1 ∨ x2)
∧ (x2 ∨ x1)
◮ Jargon:
– x1, x1, x2, x2 are all literals – c1, c2 are both clauses
◮ A formula is satisfiable if there exists a satisfying assignment.
– example: formula above is satisfiable (e.g. 11)
3
SLIDE 9 Background: SAT
◮ A propositional formula can be represented as an AND of ORS.
– example: (x1 ∨ x2)
∧ (x2 ∨ x1)
◮ Jargon:
– x1, x1, x2, x2 are all literals – c1, c2 are both clauses
◮ A formula is satisfiable if there exists a satisfying assignment.
– example: formula above is satisfiable (e.g. 11)
◮ The SAT problem: given a formula,
– determine if it is satisfiable – if it is, find a satisfying assignment
3
SLIDE 10 Machine learning for SAT
◮ Goal: train a neural network to predict satisfiability.
– (we’ll discuss decoding solutions later)
4
SLIDE 11 Machine learning for SAT
◮ Goal: train a neural network to predict satisfiability.
– (we’ll discuss decoding solutions later)
◮ Two design challenges:
4
SLIDE 12 Machine learning for SAT
◮ Goal: train a neural network to predict satisfiability.
– (we’ll discuss decoding solutions later)
◮ Two design challenges:
– what problems do we train on?
4
SLIDE 13 Machine learning for SAT
◮ Goal: train a neural network to predict satisfiability.
– (we’ll discuss decoding solutions later)
◮ Two design challenges:
– what problems do we train on? – with what kind of architecture?
4
SLIDE 14
Training data
5
SLIDE 15 Training data
◮ Issue: some problem distributions might be easy to classify.
– (based on superficial properties) – want: difficult problems to force it to learn something general
5
SLIDE 16 Training data
◮ Issue: some problem distributions might be easy to classify.
– (based on superficial properties) – want: difficult problems to force it to learn something general
◮ We define distribution SR(n) over problems such that:
– problems come in pairs – one unsat, one sat – they differ by negating a single literal in a single clause
5
SLIDE 17 Training data
◮ Issue: some problem distributions might be easy to classify.
– (based on superficial properties) – want: difficult problems to force it to learn something general
◮ We define distribution SR(n) over problems such that:
– problems come in pairs – one unsat, one sat – they differ by negating a single literal in a single clause
◮ To sample a pair from SR(n):
5
SLIDE 18 Training data
◮ Issue: some problem distributions might be easy to classify.
– (based on superficial properties) – want: difficult problems to force it to learn something general
◮ We define distribution SR(n) over problems such that:
– problems come in pairs – one unsat, one sat – they differ by negating a single literal in a single clause
◮ To sample a pair from SR(n):
– keep sampling random clauses until unsat
5
SLIDE 19 Training data
◮ Issue: some problem distributions might be easy to classify.
– (based on superficial properties) – want: difficult problems to force it to learn something general
◮ We define distribution SR(n) over problems such that:
– problems come in pairs – one unsat, one sat – they differ by negating a single literal in a single clause
◮ To sample a pair from SR(n):
– keep sampling random clauses until unsat – flip a single literal in the final clause to make it sat
5
SLIDE 20 Training data
◮ Issue: some problem distributions might be easy to classify.
– (based on superficial properties) – want: difficult problems to force it to learn something general
◮ We define distribution SR(n) over problems such that:
– problems come in pairs – one unsat, one sat – they differ by negating a single literal in a single clause
◮ To sample a pair from SR(n):
– keep sampling random clauses until unsat – flip a single literal in the final clause to make it sat – return the pair
5
SLIDE 21
Network architecture
6
SLIDE 22 Network architecture
(x1 ∨ x2)
∧ (x1 ∨ x2)
6
SLIDE 23 Network architecture
(x1 ∨ x2)
∧ (x1 ∨ x2)
x1 x1 x2 x2 c1 c2
6
SLIDE 24 Network architecture
(x1 ∨ x2)
∧ (x1 ∨ x2)
x1 x1 x2 x2 c1 c2 NeuroSAT:
6
SLIDE 25 Network architecture
(x1 ∨ x2)
∧ (x1 ∨ x2)
x1 x1 x2 x2 c1 c2 NeuroSAT:
◮ maintain embedding at every node
6
SLIDE 26 Network architecture
(x1 ∨ x2)
∧ (x1 ∨ x2)
x1 x1 x2 x2 c1 c2 NeuroSAT:
◮ maintain embedding at every node ◮ iteratively pass messages along the edges of the graph
6
SLIDE 27 Network architecture
(x1 ∨ x2)
∧ (x1 ∨ x2)
x1 x1 x2 x2 c1 c2 NeuroSAT:
◮ maintain embedding at every node ◮ iteratively pass messages along the edges of the graph ◮ after T time steps, map literals into scalar “votes”
6
SLIDE 28 Network architecture
(x1 ∨ x2)
∧ (x1 ∨ x2)
x1 x1 x2 x2 c1 c2 NeuroSAT:
◮ maintain embedding at every node ◮ iteratively pass messages along the edges of the graph ◮ after T time steps, map literals into scalar “votes” ◮ average votes to compute logit
6
SLIDE 29
Experiment
7
SLIDE 30 Experiment
◮ Datasets:
– Train: SR(U(10, 40)) – Test: SR(40)
7
SLIDE 31 Experiment
◮ Datasets:
– Train: SR(U(10, 40)) – Test: SR(40)
◮ SR(40)
– 40 variables (a trillion possible assignments) – ≈ 200 clauses – ≈ 1,000 literal occurrences – uniformly random: just a tangled, structureless mess – every problem a single bit away from flipping – (caveat: easy for SOTA)
7
SLIDE 32 Experiment
◮ Datasets:
– Train: SR(U(10, 40)) – Test: SR(40)
◮ SR(40)
– 40 variables (a trillion possible assignments) – ≈ 200 clauses – ≈ 1,000 literal occurrences – uniformly random: just a tangled, structureless mess – every problem a single bit away from flipping – (caveat: easy for SOTA)
◮ Results: NeuroSAT predicts with 85% accuracy.
7
SLIDE 33
NeuroSAT in action
8
SLIDE 34
NeuroSAT in action
8
SLIDE 35
NeuroSAT in action
8
SLIDE 36
NeuroSAT in action
8
SLIDE 37
NeuroSAT in action
8
SLIDE 38
NeuroSAT in action
8
SLIDE 39
NeuroSAT in action
8
SLIDE 40
NeuroSAT in action
8
SLIDE 41
NeuroSAT in action
8
SLIDE 42
NeuroSAT in action
8
SLIDE 43
NeuroSAT in action
8
SLIDE 44
NeuroSAT in action
8
SLIDE 45
NeuroSAT in action
8
SLIDE 46
NeuroSAT in action
8
SLIDE 47
NeuroSAT in action
8
SLIDE 48
NeuroSAT in action
8
SLIDE 49
NeuroSAT in action
8
SLIDE 50
NeuroSAT in action
8
SLIDE 51
NeuroSAT in action
8
SLIDE 52
NeuroSAT in action
8
SLIDE 53
NeuroSAT in action
8
SLIDE 54
NeuroSAT in action
8
SLIDE 55
NeuroSAT in action
8
SLIDE 56
NeuroSAT in action
8
SLIDE 57
NeuroSAT in action
8
SLIDE 58
Decoding satisfying assignments
9
SLIDE 59
Decoding satisfying assignments
9
SLIDE 60
Decoding satisfying assignments
9
SLIDE 61
Decoding satisfying assignments
9
SLIDE 62
Decoding satisfying assignments
9
SLIDE 63
Decoding satisfying assignments
9
SLIDE 64
Decoding satisfying assignments
9
SLIDE 65
Decoding satisfying assignments
9
SLIDE 66
Decoding satisfying assignments
9
SLIDE 67
Decoding satisfying assignments
Percent of satisfiable problems in SR(40) solved: 70%
9
SLIDE 68
Running for more rounds
10
SLIDE 69
Scaling to bigger problems
11
SLIDE 70 Generalizing to other domains
◮ NeuroSAT generalizes to problems from other domains.
12
SLIDE 71 Generalizing to other domains
◮ NeuroSAT generalizes to problems from other domains. ◮ First we generate random graphs:
12
SLIDE 72 Generalizing to other domains
◮ NeuroSAT generalizes to problems from other domains. ◮ First we generate random graphs: ◮ For each graph, we generate:
– k-coloring problems (3 ≤ k ≤ 5) – k-dominating set problems (2 ≤ k ≤ 4) – k-clique problems (3 ≤ k ≤ 5) – k-cover problems (4 ≤ k ≤ 6)
12
SLIDE 73 Generalization results
◮ Number of satisfiable problems in total: 4,888
13
SLIDE 74 Generalization results
◮ Number of satisfiable problems in total: 4,888 ◮ Average number of clauses: 532
13
SLIDE 75 Generalization results
◮ Number of satisfiable problems in total: 4,888 ◮ Average number of clauses: 532 ◮ Percent of satisfiable problems solved: 85%
13
SLIDE 76 Generalization results
◮ Number of satisfiable problems in total: 4,888 ◮ Average number of clauses: 532 ◮ Percent of satisfiable problems solved: 85% ◮ Percent solved by Survey Propagation:
13
SLIDE 77 Generalization results
◮ Number of satisfiable problems in total: 4,888 ◮ Average number of clauses: 532 ◮ Percent of satisfiable problems solved: 85% ◮ Percent solved by Survey Propagation: 0.00%
13
SLIDE 78 Detecting unsatisfiability
◮ NeuroSAT (trained on SR(U(10, 40))) never confident in unsat.
– searches indefinitely – low confidence unsat unless it finds a solution
14
SLIDE 79 Detecting unsatisfiability
◮ NeuroSAT (trained on SR(U(10, 40))) never confident in unsat.
– searches indefinitely – low confidence unsat unless it finds a solution
◮ Hypothesis: if the unsatisfiable problems in the training set have
short proofs, NeuroSAT will learn to find them.
14
SLIDE 80 Detecting unsatisfiability
◮ NeuroSAT (trained on SR(U(10, 40))) never confident in unsat.
– searches indefinitely – low confidence unsat unless it finds a solution
◮ Hypothesis: if the unsatisfiable problems in the training set have
short proofs, NeuroSAT will learn to find them.
◮ To test this, we train on a dataset where:
– data comes in pairs, one unsat, one sat – they differ by negating a single literal in a single clause – by construction, the unsat problems have small unsat cores
14
SLIDE 81 Detecting unsatisfiability
◮ NeuroSAT (trained on SR(U(10, 40))) never confident in unsat.
– searches indefinitely – low confidence unsat unless it finds a solution
◮ Hypothesis: if the unsatisfiable problems in the training set have
short proofs, NeuroSAT will learn to find them.
◮ To test this, we train on a dataset where:
– data comes in pairs, one unsat, one sat – they differ by negating a single literal in a single clause – by construction, the unsat problems have small unsat cores
◮ Results (abridged):
– 100% accuracy on test set – can decode the unsat core 98% of the time
14
SLIDE 82
Summary of results
15
SLIDE 83 Summary of results
◮ Train NeuroSAT as a classifier on random problems.
– a few dozen iterations per problem – it works well
15
SLIDE 84 Summary of results
◮ Train NeuroSAT as a classifier on random problems.
– a few dozen iterations per problem – it works well
◮ Learns to solve the problems to make its predictions.
– we can (almost always) decode a solution when it guesses sat
15
SLIDE 85 Summary of results
◮ Train NeuroSAT as a classifier on random problems.
– a few dozen iterations per problem – it works well
◮ Learns to solve the problems to make its predictions.
– we can (almost always) decode a solution when it guesses sat
◮ Extrapolates well.
– can run for longer – can solve much bigger problems – can solve problems from a wide range of domains
15
SLIDE 86 Summary of results
◮ Train NeuroSAT as a classifier on random problems.
– a few dozen iterations per problem – it works well
◮ Learns to solve the problems to make its predictions.
– we can (almost always) decode a solution when it guesses sat
◮ Extrapolates well.
– can run for longer – can solve much bigger problems – can solve problems from a wide range of domains
◮ Learn different algorithms depending on training data.
– when we train with small unsat cores, it learns to detect them
15
SLIDE 87 Summary of results
◮ Train NeuroSAT as a classifier on random problems.
– a few dozen iterations per problem – it works well
◮ Learns to solve the problems to make its predictions.
– we can (almost always) decode a solution when it guesses sat
◮ Extrapolates well.
– can run for longer – can solve much bigger problems – can solve problems from a wide range of domains
◮ Learn different algorithms depending on training data.
– when we train with small unsat cores, it learns to detect them
◮ Caveat: vastly less reliable than SOTA
15
SLIDE 88
Thank you
Code: www.github.com/dselsam/neurosat Paper: https://arxiv.org/abs/1802.03685 Any questions?
16