Discrete Sampling and Integration for the AI Practitioner
Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University
Discrete Sampling and Integration for the AI Practitioner Supratik - - PowerPoint PPT Presentation
Discrete Sampling and Integration for the AI Practitioner Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University Agenda Part 1: Boolean Satisfiability Solving (Vardi) Part 2(a): Applications
Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University
Part 1: Boolean Satisfiability Solving (Vardi) Part 2(a): Applications (Chakraborty) Coffee Break Part 2(b): Prior Work (Chakraborty) Part 3: Hashing-based Approach (Meel)
Discrete Sampling and Integration for the AI Practitioner Part I: Boolean Satisfiability Solving
Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University
Boolean Satisfiability
Boolean Satisfiability (SAT); Given a Boolean expression ϕ, using “and” (∧) “or”, (∨) and “not” (¬), is there a satisfying solution (an assignment
is Sol(ϕ) nonempty? Example: (¬x1 ∨ x2 ∨ x3) ∧ (¬x2 ∨ ¬x3 ∨ x4) ∧ (x3 ∨ x1 ∨ x4) Solution: x1 = 0, x2 = 0, x3 = 1, x4 = 1
1
Discrete Sampling and Integration
Discrete Sampling: Given a Boolean formula ϕ, sample from Sol(ϕ) uniformly at random? Discrete Integration: Given a Boolean formula ϕ, compute |Sol(ϕ)|. Weighted Sampling and Integration: As above, but subject to a weight function w : Sol(ϕ) → R+
2
Basic Theoretical Background
Discrete Integration: #SAT Known:
reducibility. Desideratum: Solve discrete sampling and integration using a SAT solver.
3
Is This Time Different? The Opportunities and Challenges of Artificial Intelligence
Jason Furman, Chair, Council of Economic Advisers, July 2016: “Even though we have not made as much progress recently on other areas of AI, such as logical reasoning, the advancements in deep learning techniques may ultimately act as at least a partial substitute for these other areas.”
4
P vs. NP: An Outstanding Open Problem
Does P = NP?
– A Clay Institute Millennium Problem – Million dollar prize! What is this about? It is about computational complexity – how hard it is to solve computational problems.
5
Rally To Restore Sanity, Washington, DC, October 2010
6
Computational Problems
Example: Graph – G = (V, E)
Two notions:
Question: How hard it is to find a Hamiltonian cycle? Eulerian cycle?
7
Figure 1: The Bridges of K¨
8
Figure 2: The Graph of The Bridges of K¨
9
Figure 3: Hamiltonian Cycle
10
Computational Complexity
Measuring complexity: How many (Turing machine) operations does it take to solve a problem of size n?
Complexity Class P: problems that can be solved in polynomial time – nc for a fixed c Examples:
What about the Hamiltonian Cycle Problem?
11
Hamiltonian Cycle
Note: The universe is much younger than 2200 Planck time units! Fundamental Question: Can we do better?
12
Checking Is Easy!
Observation: Checking if a given cycle is a Hamiltonian cycle of a graph G = (V, E) is easy! Complexity Class NP: problems where solutions can be checked in polynomial time. Examples:
Significance: Tens of thousands of optimization problems are in NP!!!
13
P vs. NP
The Big Question: Is P = NP or P = NP?
Intuitive Answer: Of course, checking is easier than discovering, so P = NP!!!
Alas: We do not know how to prove that P = NP.
14
P = NP
Consequences:
Question: Why is it so important to prove P = NP, if that is what is commonly believed? Answer:
15
P = NP
different place than we usually assume it to be. There would be no special value in ‘creative leaps,’ no fundamental gap between solving a problem and recognizing the solution once it’s found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss.” Consequences:
Question: Is it really possible that P = NP? Answer: Yes! It’d require discovering a very clever algorithm, but it took 40 years to prove that LinearProgramming is in P.
16
Sharpening The Problem
NP-Complete Problems: hardest problems is NP
Corollary: P = NP if and only if HamiltonianCycle is in P There are thousands of NP-complete problems. To resolve the P = NP question, it’d suffice to prove that one of them is or is not in P.
17
History
problems.
Programming, Clique, Set Packing, Vertex Cover, Set Covering, Hamiltonian Cycle, Graph Coloring, Exact Cover, Hitting Set, Steiner Tree, Knapsack, Job Scheduling, ... – All NP-complete problems are polynomially equivalent!
to NP-Completeness” - hundreds of NP-complete problems!
18
Boole’s Symbolic Logic
Boole’s insight: Aristotle’s syllogisms are about classes of objects, which can be treated algebraically. “If an adjective, as ‘good’, is employed as a term of description, let us represent by a letter, as y, all things to which the description ‘good’ is applicable, i.e., ‘all good things’, or the class of ‘good things’. Let it further be agreed that by the combination xy shall be represented that class of things to which the name or description represented by x and y are simultaneously applicable. Thus, if x alone stands for ‘white’ things and y for ‘sheep’, let xy stand for ‘white sheep’.
19
Boolean Satisfiability
Boolean Satisfiability (SAT); Given a Boolean expression, using “and” (∧) “or”, (∨) and “not” (¬), is there a satisfying solution (an assignment
Example: (¬x1 ∨ x2 ∨ x3) ∧ (¬x2 ∨ ¬x3 ∨ x4) ∧ (x3 ∨ x1 ∨ x4) Solution: x1 = 0, x2 = 0, x3 = 1, x4 = 1
20
Complexity of Boolean Reasoning
History:
therefore, to lessening both the manual and mental labour of the process, and I shall describe several devices which may be adopted for saving trouble and risk of mistake.”
consequences, seems to me to be one of the noblest, if not the ultimate goal of mathematics and logic.”
21
Algorithmic Boolean Reasoning: Early History
and Putnam, 1958: “Computational Methods in The Propositional calculus”, unpublished report to the NSA
“A Computing procedure for quantification theory”
for theorem proving” DPLL Method: Propositional Satisfiability Test
22
Modern SAT Solving
CDCL = conflict-driven clause learning
Key Tools: GRASP, 1996; Chaff, 2001 Current capacity: millions of variables
23
1
Sanjit A. Seshia
Speed-up of 2012 solver over other solvers
1 10 100 1,000
Solver Speed-up (log scale)
Figure 4: SAT Solvers Performance
24
Knuth Gets His Satisfaction
SIAM News, July 26, 2016: “Knuth Gives Satisfaction in SIAM von Neumann Lecture” Donald Knuth gave the 2016 John von Neumann lecture at the SIAM Annual Meeting. The von Neumann lecture is SIAM’s most prestigious prize. Knuth based the lecture, titled ”Satisfiability and Combinatorics”, on the latest part (Volume 4, Fascicle 6) of his The Art of Computer Programming book series. He showed us the first page of the fascicle, aptly illustrated with the quote ”I can’t get no satisfaction,” from the Rolling Stones. In the preface of the fascicle Knuth says ”The story of satisfiability is the tale of a triumph of software engineering, blended with rich doses of beautiful mathematics”.
25
SAT Heuristic – Backjumping
Backtracking: go up one level in the search tree when both Boolean values for a variable have been tested. Backjumping [Stallman-Sussman, 1977]: jump back in the search tree, if jump is safe – use highest node to jump to. Key: Distinguish between
then 1 − c.
Implication Graph: directed acyclic graph describing the relationships between decision variables and implication variables.
26
Smart Unit-Clause Preference
Boolean Constraint Propagation (BCP): propagating values forced by unit clauses.
time! Requirement: identifying unit clauses
appropriately, upon assigning and unassigning variables.
“watch” two un-false literals in each unsatisfied clause – no overhead for backjumping.
27
SAT Heuristic – Clause Learning
Conflict-Driven Clause Learning: If assignment l1, . . . , ln is bad, then add clause ¬l1 ∨ . . . ∨ ¬ln to block it. Marques-Silva&Sakallah, 1996: This would add very long clauses! Instead:
assignment.
Consequence:
28
Smart Decision Heuristic
Crucial: Choosing decision variables wisely! Dilemma: brainiac vs. speed demon
decisions required! VSIDS [Moskewicz-Madigan-Zhao-Zhang-Malik, 2001]: Variable State Independent Decaying Sum – prioritize variables according to recent participation on conflicts – compromise between Brainiac and Speed Demon.
29
Randomized Restarts
Randomize Restart [Gomes-Selman-Kautz, 1998]
Aggressive Restarting: restart every ∼50 backtracks.
30
SMT: Satisfiability Modulo Theory
SMT Solving: Solve Boolean combinations of constraints in an underlying theory, e.g., linear constraints, combining SAT techniques and domain- specific techniques.
Example: SMTLA (x > 10) ∧ [((x > 5) ∨ (x < 8)] Sample Application: Bounded Model Checking of Verilog programs – SMT(BV).
31
SMT Solving
General Approach: combine SAT-solving techniques with theory-solving techniques
Crux: Interaction between SAT solver and theory solver, e.g., conflict-clause learning – convert unsatisfiable theory-atom conjection to a new Boolean clause.
32
Applications of SAT/SMT Solving in SW Engineering
Leonardo De Moura+Nikolaj Bj¨
33
Reflection on P vs. NP
Old Clich´ e “What is the difference between theory and practice? In theory, they are not that different, but in practice, they are quite different.” P vs. NP in practice:
time, but the polynomial is n1,000 – impractical!
Conclusion: No guarantee that solving P vs. NP would yield practical benefits.
34
Are NP-Complete Problems Really Hard?
touched with a 10-foot pole.
be solved by any extant SAT solver.
solve real-life SAT instances with millions of variables! Conclusion We need a richer and broader complexity theory, a theory that would explain both the difficulty and the easiness of problems like SAT. Question: Now that SAT is “easy” in practice, how can we leverage that?
35
X1 , … Xn : variables with finite discrete domains D1, … Dn Constraint (logical formula) ϕ over X1 , … Xn Weight function W: D1 × … Dn → Q≥ 0
Sol(ϕ) : set of assignments of X1 , … Xn satisfying ϕ
Determine W(ϕ) = ∑ y ∈ Sol(ϕ) W(y) If W(y) = 1 for all y, then W(ϕ) = | Sol(ϕ) | Randomly sample from Sol(ϕ) such that Pr[y is sampled] ∝ W(y) If W(y) = 1 for all y, then uniformly sample from Sol(ϕ) For this tutorial: Initially, Di’s are {0,1} – Boolean variables Later, we’ll consider Di’s as {0, 1}n – Bit-vector variables
Discrete Integration (Model Counting) Discrete Sampling
Probabilistic Inference Network (viz. electrical grid) reliability Quantitative Information flow And many more …
Constrained random verification Automatic problem generation And many more …
An alarm rings if it’s in a working state when an earthquake happens
The alarm can malfunction and ring without earthquake or burglary happening Given that the alarm rang, what is the likelihood that an earthquake happened? Given conditional dependencies (and conditional probabilities) calculate Pr[event | evidence] What is Pr [Earthquake | Alarm] ?
How do we represent conditional dependencies efficiently, and calculate these probabilities?
] Pr[ ] | Pr[ ] Pr[ ] Pr[ ] Pr[ ] Pr[ ] Pr[ ] | Pr[
j j j j j i i i
event event evidence evidence event evidence event evidence event evidence evidence event evidence event × = ∩ ∩ ∩ = ∩ =
B E A
B E A Pr(A|E,B)
Conditional Probability Tables (CPT)
B E A
Pr 𝐹 ∩ 𝐵 = Pr 𝐹 ∗ Pr ¬𝐶 ∗ Pr 𝐵 𝐹, ¬𝐶 + Pr 𝐹 ∗ Pr 𝐶 ∗ Pr 𝐵 𝐹, 𝐶]
B E A Pr(A|E,B)
V = {vA, v~A, vB, v~B, vE, v~E} Prop vars corresponding to events T = {tA|B,E , t~A|B,E , tA|B,~E …} Prop vars corresponding to CPT entries Formula encoding probabilistic graphical model (ϕPGM): (vA ⊕ v~A) ∧ (vB ⊕ v~B) ∧ (vE ⊕ v~E) Exactly one of vA and v~A is true
(tA|B,E ⇔ vA ∧ vB ∧ vE) ∧ (t~A|B,E ⇔ v~A ∧ vB ∧ vE) ∧ … If vA , vB , vE are true, so must tA|B,E and vice versa
V = {vA, v~A, vB, v~B, vE, v~E} T = {tA|B,E , t~A|B,E , tA|B,~E …} W(v~B) = 0.2, W(vB) = 0.8 Probabilities of indep events are weights of +ve literals W(v~E) = 0.1, W(vE) = 0.9 W(tA|B,E) = 0.3, W(t~A|B,E) = 0.7, … CPT entries are weights of +ve literals W(vA) = W(v~A) = 1 Weights of vars corresponding to dependent events W(¬v~B) = W(¬vB) = W(¬ tA|B,E) … = 1 Weights of -ve literals are all 1 Weight of assignment (vA = 1, v~A= 0, tA|B,E = 1, …) = W(vA) * W(¬v~A)* W( tA|B,E)* … Product of weights of literals in assignment
V = {vA, v~A, vB, v~B, vE, v~E} T = {tA|B,E , t~A|B,E , tA|B,~E …} Formula encoding combination of events in probabilistic model (Alarm and Earthquake) F = ϕPGM ∧ vA ∧ vE Set of satisfying assignments of F:
RF = { (vA= 1, vE = 1, vB = 1, tA|B,E = 1, all else 0), (vA = 1, vE = 1, v~B = 1, tA|~B,E = 1, all else 0) }
Weight of satisfying assignments of F:
W(RF) = W(vA) * W(vE) * W(vB) * W(tA|B,E ) + W(vA) * W(vE) * W(v~B) * W(tA|~B,E ) = 1* Pr[E] * Pr[B] * Pr[A | B,E] + 1* Pr[E] * Pr[~B] * Pr[A | ~B,E] = Pr[ A ∩ E]
Graph G = (V, E) represents a (power-grid) network
independent
disconnected?
s t
π : E → {0, 1} … configuration of network
Prob of network being in configuration π Pr[ π ] = Π g(e) ×
Prob of s and t being disconnected Pds,t = Σ Pr [π]
e: π(e) = 0 e: π(e) = 1
s t
π : s, t disconnected in π May need to sum over numerous (> 2100) configurations
Boolean formula such that sat assignments σ of ϕs,t have 1-1 correspondence with configs π that disconnect s and t
s t Pds,t = Σ Pr [π] = Σ W(σ) = W(ϕ)
π : s, t disconnected in π
𝜏 ⊨ 𝜒𝑡, 𝑢
user input (UI) and returns “Yes” iff SP = UI [Bang et al 2016]
Suppose passwords are 4 characters (‘0’ through ‘9’) long
PC1 (char[] SP, char[] UI) { for (int i=0; i<SP.length(); i++) { if(SP[i] != UI[i]) return “No”; } return “Yes”; } PC2 (char[] H, char[] L) { match = true; for (int i=0; i<SP.length(); i++) { if (SP[i] != UI[i]) match=false; else match = match; } if match return “Yes”; else return “No”; }
Which of PC1 and PC2 is more likely to leak information about the secret key through side-channel observations?
“low” (L) output
Password checking: H is SP, L is time taken to answer “Is SP = UI?” Side-channel observations: memory, time …
E.g. in password checking, infer: 1st char is password is not 9.
“initial uncertainty in H” = “info leaked” + “remaining uncertainty in H” [Smith 2009]
information theoretic measures, e.g. Shannon entropy
Depends on # instructions executed
UI = N2345678, 𝑂 ≠ 0 PC1 executes for loop once UI = 02345678 PC1 executes for loop at least twice Observing time to “No” gives away whether 1st char is not N, 𝑂 ≠ 0 In 10 attempts, 1st char can of SP can be uniquely determined. In max 40 attempts, SP can be cracked.
PC1 (char[] SP, char[] UI) { for (int i=0; i<SP.length(); i++) { if(SP[i] != UI[i]) return “No”; } return “Yes”; }
Depends on # instructions executed
UI = N2345678, 𝑂 ≠ 0 PC1 executes for loop 4 times UI = 02345678 PC1 executes for loop 4 times Cracking SP requires max 104 attempts !!! (“less leakage”)
PC2 (char[] H, char[] L) { match = true; for (int i=0; i<SP.length(); i++) { if (SP[i] != UI[i]) match=false; else match = match; } if match return “Yes”; else return “No”; }
T
PC1 (char[] SP, char[] UI) { for (int i=0; i<SP.length(); i++) { if(SP[i] != UI[i]) return “No”; } return “Yes”; }
SP[0] != UI[0] “No” F
SP[0] != UI[0]
t = 3 SP[1] != UI[1] “No” F T
SP[1] != UI[1] SP[0] = UI[0]
t = 5 SP[2] != UI[2] “No” F T
SP[2] != UI[2] SP[1] = UI[1] SP[0] = UI[0]
t = 7 SP[3] != UI[3] “No” T “Yes”
SP[3] = UI[3] SP[1] = UI[1] SP[2] = UI[2] SP[0] = UI[0]
F
SP[3] != UI[3] SP[2] = UI[2] SP[1] = UI[1] SP[0] = UI[0]
t = 9 t = 11
SP[0] != UI[0] SP[1] != UI[1] SP[2] != UI[2] SP[3] != UI[3] “No” “No” “No” “No” F F F T T T T
SP[0] != UI[0]
“Yes”
SP[3] = UI[3] SP[1] = UI[1] SP[2] = UI[2] SP[0] = UI[0]
F
SP[1] != UI[1] SP[0] = UI[0] SP[2] != UI[2] SP[1] = UI[1] SP[0] = UI[0] SP[3] != UI[3] SP[2] = UI[2] SP[1] = UI[1] SP[0] = UI[0]
t = 3 t = 5 t = 7 t = 9 t = 11
𝜒567 ∶ 𝑇𝑄 1 ≠ 𝑉𝐽 1 ∧ 𝑇𝑄 0 = 𝑉𝐽 0 Pr [ t = 5 ]=
|@AB CDEF | GHI
Model Counting if UI uniformly chosen
SP[0] != UI[0] SP[1] != UI[1] SP[2] != UI[2] SP[3] != UI[3] “No” “No” “No” “No” F F F T T T T
SP[0] != UI[0]
“Yes”
SP[3] = UI[3] SP[1] = UI[1] SP[2] = UI[2] SP[0] = UI[0]
F
SP[1] != UI[1] SP[0] = UI[0] SP[2] != UI[2] SP[1] = UI[1] SP[0] = UI[0] SP[3] != UI[3] SP[2] = UI[2] SP[1] = UI[1] SP[0] = UI[0]
t = 3 t = 5 t = 7 t = 9 t = 11
𝜒567 ∶ 𝑇𝑄 1 ≠ 𝑉𝐽 1 ∧ 𝑇𝑄 0 = 𝑉𝐽 0 Pr [ t = 5 ] = W(𝜒567) Discrete Integration if UI chosen according to weight function
Exp information leakage = Shannon entropy of obs times = Information leakage in password checker example PC1: 0.52 (more “leaky”) PC2: 0.0014 (less “leaky”) Discrete integration crucial in obtaining Pr[t = k]
K Pr 𝑢 = 𝑙 . log1/Pr [𝑢 = 𝑙]
S∈{V,7,W,X,GG}
Weighted Model Counting Unweighted Model Counting
Reduction polynomial in #bits representing weights
IJCAI 2015
Probabilistic Inference Network Reliability Quantified Information Flow
DMPV 2017 KML 1989, Karger 2000
Functional Verification
Challenges: formal requirements, scalability ~10-15% of verification effort
§Design is simulated with test vectors
§Results from simulation compared to intended results §How do we generate test vectors?
Challenge: Exceedingly large test input space! Can’t try all input combinations 2128 combinations for a 64-bit binary operator!!!
§ Test vectors: solutions of constraints § Proposed by Lichtenstein, Malka, Aharon (IAAI 94)
a b
c
64 bit 64 bit 64 bit
c = f(a,b)
a b
c
64 bit 64 bit 64 bit
c = f(a,b)
Constraints
Modern SAT/SMT solvers are complex systems Efficiency stems from the solver automatically “biasing” search Fails to give unbiased or user-biased distribution of test vectors
Set of Constraints Sample satisfying assignments uniformly at random SAT Formula
Scalable Uniform Generation of SAT Witnesses
a b
c
64 bit 64 bit 64 bit
c = f(a,b)
Constrained Random Verification
generation of related but randomly different problems
solution would be
rather than on individual problem instances
systems
“Holes” in the template must be filled with elements from specified sets Constraints on elements chosen to fill various “holes” restricts problem instances so that undesired instances are eliminated
Non-deterministic finite automata to be generated for complementation Holes: States, alphabet size, transitions for (state, letter) pairs, final states, initial states Constraints: Alphabet size = 2 Min/max transitions for a (state, letter) pair = 0/4 Min/max states = 3/5 Min/max number of final states = 1/3 Min/max initial states = 1/2
Non-det finite automaton encoded as a formula on following variables s1, s2, s3, s4, s5 : States f1, f2, f3, f4, f5: Final states n1, n2, n3, n4, n5: Initial states s1a1s2, s1a2s2, … : Transitions 𝜒 Z[Z5 = \ 𝑜Z → 𝑡Z ∧ 1 ≤ K 𝑜Z
Z
≤ 2
Z
𝜒 5ab[c = \ 𝑡Z𝑏e𝑡S → 𝑡Z ∧ 𝑡S ∧ \ 0 ≤ K 𝑡Z𝑏e𝑡S ≤ 4
S Z,e Z
𝜒 c5gAh[5 = 3 ≤ ∑ 𝑡Z
Z
≤ 5 𝜒 lZ[c5 = \ 𝑔
Z → 𝑡Z ∧ 1 ≤ K 𝑔 Z Z
≤ 3
Z
Every solution of 𝜒 Z[Z5 ∧ 𝜒 5ab[c ∧ 𝜒 c5gAh[5 ∧ 𝜒 lZ[c5 gives an automaton satisfying specified constraints
Non-det finite automaton encoded as a formula on following variables s1 = 1, s2 = 0, s3 = 1, s4 = 1, s5 = 1: States f1 = 0, f2 = 0, f3 = 1, f4 = 1, f5 = 0: Final states n1 = 1, n2 = 0, n3 = 0, n4 = 0, n5 = 0: Initial states s1a1s3 = 1, s1a1s4 = 1, s4a2s4 = 1, s4a1s5 = 1, … : Transitions
s1 s3 s4 s5 a1 a1 a1 a5
automata satisfying constraints randomly
satisfying constraints with different priorities/weightages.
Examples: Weighing final state variables more gives automata with more final states Weighing transitions on letter a1 more gives automata with more transitions labeled a1
Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University
Exact unweighted counting: #P-complete [Valiant 1978] Approximate unweighted counting: Deterministic: Polynomial time det. Turing Machine with Σ2
p oracle [Stockmeyer 1983]
Randomized: Poly-time probabilistic Turing Machine with NP oracle [Stockmeyer 1983; Jerrum,Valiant,Vazirani 1986] Probably Approximately Correct (PAC) algorithm Weighted versions of counting: Exact: #P-complete [Roth 1996], Approximate: same class as unweighted version [follows from Roth 1996]
for ), 1 ( | | ) e(F, DetEstimat 1 | | > + × ≤ ≤ + ε ε ε ε
F F
R R
1 , for , 1 ) 1 ( | | ) , te(F, RandEstima 1 | | Pr ≤ < > − ≥ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + ⋅ ≤ ≤ + δ ε δ ε δ ε ε
F F
R R
Uniform sampling: Poly-time prob. Turing Machine with NP oracle
[Bellare,Goldreich,Petrank 2000]
Almost uniform sampling: Poly-time prob. Turing Machine with NP
R if
indep and R if where , erator(F)] UniformGen Pr[
F F
⎩ ⎨ ⎧ ∈ > ∉ = = = y y c y c c y ⎩ ⎨ ⎧ ∈ > ∉ = + ⋅ ≤ = ≤ +
F F
R if
indep and R if where , ) 1 ( )] r(F, AUGenerato Pr[ 1 y y c y c c y c ε ε ε
Pr[Algorithm outputs some y] ≥ ½, if F is satisfiable
[Jerrum,Sinclair 1996]
Metropolis-Hastings [Metropolis et al 1953, Hastings 1970], Simulated Annealing [Kirkpatrick et al 1982]
Start from a “state” (assignment of variables) Randomly choose next state using “local” biasing functions (depends on target distribution & algorithm parameters) Repeat for an appropriately large number (N) of steps After N steps, samples follow target distribution with high confidence
Nullifies/weakens theoretical guarantees [Kitchen,Keuhlman 2007]
DPLL branching search procedure, with partial truth assignments Once a branch is found satisfiable, if t out of n variables assigned, add 2n-t to model count, backtrack to last decision point, flip decision and continue Requires data structure to check if all clauses are satisfied by partial assignment Usually not implemented in modern DPLL SAT solvers Can output a lower bound at any time
Constraint graph G: Variables of F are vertices An edge connects two vertices if corresponding variables appear in some clause of F Disjoint components of G lazily identified during DPLL search F1, F2, … Fn : subformulas of F corresponding to components |RF| = |RF1| * |RF2| * |RF3| * … Heuristic optimizations: Solve most constrained sub-problems first Solving sub-problems in interleaved manner
sharpSAT: Thurley 2006] If same sub-formula revisited multiple times during DPLL search, cache result and re-use it “Signature” of the satisfiable sub-formula/component must be stored Different forms of caching used: Simple sub-formula caching Component caching Linear-space caching Component caching can also be combined with clause learning and
WeightedCachet: DPLL + Caching for weighted assignments
Compile given formula to another form which allows counting models in time polynomial in representation size Reduced Ordered Binary Decision Diagrams (ROBDD) [Bryant 1986]: Construction can blow up exponentially Deterministic Decomposable Negation Normal Form (d-DNNF) [c2d: Darwiche 2004] Generalizes ROBDDs; can be significantly more succinct Negation normal form with following restrictions: Decomposability: All AND operators have arguments with disjoint support Determinizability: All OR operators have arguments with disjoint solution sets Sentential Decision Diagrams (SDD) [Darwiche 2011]
in large problem instances with special structure
#P-completeness hits back eventually – scalability suffers!
[MBound: Gomes et al 2006; SampleCount: Gomes et al 2007; BPCount: Kroc et al 2008]
Provide lower and/or upper bounds of model count Usually more efficient than exact counters No approximation guarantees on bounds Useful only for limited applications
⎩ ⎨ ⎧ ∈ > ∉ = = y c y c y y
t independen is where , R if ) ( R if BGP(F)] Pr[
F F
Much more on this coming in Part 3
Almost-Uniform Generator PAC Counter
Polynomial reduction
instances were derived from this work
existed until a few years back
Performance
Guarantees
MCMC SAT- Based BGP BDD/
exact tech.
2006, Gomes et al 2007] used random XORs
Algorithms geared towards finding bounds without approximation guarantees Power of 2-universal hashing not exploited
2015: ICML, UAI; 2016: AAAI, ICML, AISTATS, …] Ermon et al used XOR hash functions for discrete counting/sampling
Random XORs, also XOR constraints with specific structures 2-universality exploited to provide improved guarantees Relaxed constraints (like short XORs) and their effects studied
Use random XORs to partition solutions into cells After partitioning into 2, 4, 8, 16, … cells Use Max Aposteriori Probability (MAP) optimizer to find solution with max weight in a cell (say, a2, a4, a8, a16, …) Estimated W(RF) = W(a2)*1 + W(a4)*2 + W(a8)* 4 + …
MAP is NP-complete Being optimization (not decision) problem), MAP is harder to solve in practice than SAT
Deeper dive into XOR hash-based counting and sampling Discuss theoretical aspects and experimental observations Based on work published in [2013: CP, CAV; 2014: DAC, AAAI; 2015: IJCAI, TACAS; 2016: AAAI, IJCAI, 2017: AAAI]
Discrete Sampling and Integration for the AI Practitioner Part III: Hashing-based Approach to Sampling and Integration
Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University
1 / 41Discrete Integration and Sampling
– Variables X1, X2, · · · Xn over finite discrete domains D1, D2, · · · Dn – Formula ϕ over X1, X2, · · · Xn – Weight Function W : D1 × D2 · · · × Dn → [0, 1]
– If W (y) = 1 for all y, then W (ϕ) = |Sol(ϕ)|
Pr[y is sampled] ∝ W (y)
– If W (y) = 1 for all y, then uniformly sample from Sol(ϕ)
2 / 41Part I Discrete Integration
3 / 41From Weighted to Unweighted Integration
Boolean Formula ϕ and weight function W : {0, 1}n → Q≥0
4 / 41From Weighted to Unweighted Integration
Boolean Formula ϕ and weight function W : {0, 1}n → Q≥0 Boolean Formula F ′ W (ϕ) = c(W ) × |Sol(F ′)|
4 / 41From Weighted to Unweighted Integration
Boolean Formula ϕ and weight function W : {0, 1}n → Q≥0 Boolean Formula F ′ W (ϕ) = c(W ) × |Sol(F ′)|
(CFMV, IJCAI15)
4 / 41From Weighted to Unweighted Integration
Boolean Formula ϕ and weight function W : {0, 1}n → Q≥0 Boolean Formula F ′ W (ϕ) = c(W ) × |Sol(F ′)|
(CFMV, IJCAI15)
How do we estimate |Sol(F ′)|?
4 / 41As Simple as Counting Dots
As Simple as Counting Dots
As Simple as Counting Dots
Pick a random cell Estimate = Number of solutions in a cell × Number of cells
5 / 41Challenges
Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions?
6 / 41Challenges
Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions? Challenge 2 How large is a “small” cell?
6 / 41Challenges
Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions? Challenge 2 How large is a “small” cell? Challenge 3 How many cells?
6 / 41Challenges
Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions?
Challenges
Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions?
Challenges
Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions?
functions Universal Hashing (Carter and Wegman 1977)
6 / 41r-Universal Hashing
{0, 1}m ∀y1, y2, · · · yr ∈ {0, 1}n, α1, α2, · · · αr ∈ {0, 1}m, h
R
← − H Pr[h(y1) = α1] = · · · Pr[h(yr) = αr] = 1 2m
1 2m r
7 / 41Desired Properties
the number of solutions in a randomly chosen cell α
– What is E[Z] and how much does Z deviate from E[Z]?
h(y) = α(y is in cell)
y∈Sol(ϕ) Iy
– Desired: E[Z] = |Sol(ϕ)|
2m
and σ2[Z] ≤ E[Z]
8 / 41Desired Properties
the number of solutions in a randomly chosen cell α
– What is E[Z] and how much does Z deviate from E[Z]?
h(y) = α(y is in cell)
y∈Sol(ϕ) Iy
– Desired: E[Z] = |Sol(ϕ)|
2m
and σ2[Z] ≤ E[Z] – It suffices to have H to be 2-universal
8 / 41Desired Properties
the number of solutions in a randomly chosen cell α
– What is E[Z] and how much does Z deviate from E[Z]?
h(y) = α(y is in cell)
y∈Sol(ϕ) Iy
– Desired: E[Z] = |Sol(ϕ)|
2m
and σ2[Z] ≤ E[Z] – It suffices to have H to be 2-universal – Pr
1+ε ≤ Z ≤ E[Z](1 + ε)
σ2[Z] (
ε 1+ε )2(E[Z])2 8 / 41Desired Properties
the number of solutions in a randomly chosen cell α
– What is E[Z] and how much does Z deviate from E[Z]?
h(y) = α(y is in cell)
y∈Sol(ϕ) Iy
– Desired: E[Z] = |Sol(ϕ)|
2m
and σ2[Z] ≤ E[Z] – It suffices to have H to be 2-universal – Pr
1+ε ≤ Z ≤ E[Z](1 + ε)
σ2[Z] (
ε 1+ε )2(E[Z])2 ≥ 1 −1 (
ε 1+ε )2(E[Z]) 8 / 41Desired Properties
the number of solutions in a randomly chosen cell α
– What is E[Z] and how much does Z deviate from E[Z]?
h(y) = α(y is in cell)
y∈Sol(ϕ) Iy
– Desired: E[Z] = |Sol(ϕ)|
2m
and σ2[Z] ≤ E[Z] – It suffices to have H to be 2-universal – Pr
1+ε ≤ Z ≤ E[Z](1 + ε)
σ2[Z] (
ε 1+ε )2(E[Z])2 ≥ 1 −1 (
ε 1+ε )2(E[Z]) 8 / 412-Universal Hash Functions
2 and XOR them
– X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 – Expected size of each XOR: n
2
9 / 412-Universal Hash Functions
2 and XOR them
– X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 – Expected size of each XOR: n
2
X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 = 0 (Q1) X2 ⊕ X5 ⊕ X6 · · · ⊕ Xn−1 ⊕ 1 = 1 (Q2) · · · (· · · ) X1 ⊕ X2 ⊕ X5 · · · ⊕ Xn−2 = 1 (Qm)
2-Universal Hash Functions
2 and XOR them
– X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 – Expected size of each XOR: n
2
X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 = 0 (Q1) X2 ⊕ X5 ⊕ X6 · · · ⊕ Xn−1 ⊕ 1 = 1 (Q2) · · · (· · · ) X1 ⊕ X2 ⊕ X5 · · · ⊕ Xn−2 = 1 (Qm)
2-Universal Hash Functions
2 and XOR them
– X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 – Expected size of each XOR: n
2
X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 = 0 (Q1) X2 ⊕ X5 ⊕ X6 · · · ⊕ Xn−1 ⊕ 1 = 1 (Q2) · · · (· · · ) X1 ⊕ X2 ⊕ X5 · · · ⊕ Xn−2 = 1 (Qm)
Modern SAT solvers are able to deal routinely with practical problems that involve many thousands of variables, although such problems were regarded as hopeless just a few years ago. (Knuth, 2016)
9 / 412-Universal Hash Functions
2 and XOR them
– X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 – Expected size of each XOR: n
2
X1 ⊕ X3 ⊕ X6 · · · ⊕ Xn−2 ⊕ 1 = 0 (Q1) X2 ⊕ X5 ⊕ X6 · · · ⊕ Xn−1 ⊕ 1 = 1 (Q2) · · · (· · · ) X1 ⊕ X2 ⊕ X5 · · · ⊕ Xn−2 = 1 (Qm)
in the size of XORs (SAT Solvers != SAT oracles)
9 / 41Improved Universal Hash Functions
– F := X3 ⇐ ⇒ (X1 ∨ X2) – X1 and X2 uniquely determines rest of the variables (i.e., X3)
and σ2 agree on I then σ1 = σ2
– {X1, X2} is independent support but {X1, X3} is not
10 / 41Improved Universal Hash Functions
– F := X3 ⇐ ⇒ (X1 ∨ X2) – X1 and X2 uniquely determines rest of the variables (i.e., X3)
and σ2 agree on I then σ1 = σ2
– {X1, X2} is independent support but {X1, X3} is not
(CMV DAC14)
10 / 41Improved Universal Hash Functions
– F := X3 ⇐ ⇒ (X1 ∨ X2) – X1 and X2 uniquely determines rest of the variables (i.e., X3)
and σ2 agree on I then σ1 = σ2
– {X1, X2} is independent support but {X1, X3} is not
(CMV DAC14)
dependent
(Tseitin 1968)
10 / 41Improved Universal Hash Functions
– F := X3 ⇐ ⇒ (X1 ∨ X2) – X1 and X2 uniquely determines rest of the variables (i.e., X3)
and σ2 agree on I then σ1 = σ2
– {X1, X2} is independent support but {X1, X3} is not
(CMV DAC14)
dependent
(Tseitin 1968)
Algorithmic procedure to determine I?
10 / 41Independent Support
∀σ1, σ2 ∈ Sol(ϕ), σ1 and σ2 agree on I then σ1 = σ2
11 / 41Independent Support
∀σ1, σ2 ∈ Sol(ϕ), σ1 and σ2 agree on I then σ1 = σ2
i|xi∈I(xi = yi) =
⇒
i(xi = yi)
where F(y1, · · · yn) := F(x1 y1, · · · xn yn)
11 / 41Independent Support
∀σ1, σ2 ∈ Sol(ϕ), σ1 and σ2 agree on I then σ1 = σ2
i|xi∈I(xi = yi) =
⇒
i(xi = yi)
where F(y1, · · · yn) := F(x1 y1, · · · xn yn)
i|xi∈I(xi = yi) ∧ ¬( i(xi =
yi))
11 / 41Independent Support
∀σ1, σ2 ∈ Sol(ϕ), σ1 and σ2 agree on I then σ1 = σ2
i|xi∈I(xi = yi) =
⇒
i(xi = yi)
where F(y1, · · · yn) := F(x1 y1, · · · xn yn)
i|xi∈I(xi = yi) ∧ ¬( i(xi =
yi))
Independent Support
H1 := {x1 = y1}, H2 := {x2 = y2}, · · · Hn := {xn = yn} Ω = F(x1, · · · xn) ∧ F(y1, · · · yn) ∧ ¬(
(xi = yi)) Lemma I = {xi} is independent support iif HI ∧ Ω is UNSAT where HI = {Hi|xi ∈ I}
12 / 41Minimal Unsatisfiable Subset
Given Ψ = H1 ∧ H2 · · · ∧ Hm ∧ Ω Unsatisfiable Subset Find subset {Hi1, Hi2, · · · Hik} of {H1, H2, · · · Hm} such that Hi1 ∧ Hi2 ∧ Hik ∧ Ω is UNSAT
13 / 41Minimal Unsatisfiable Subset
Given Ψ = H1 ∧ H2 · · · ∧ Hm ∧ Ω Unsatisfiable Subset Find subset {Hi1, Hi2, · · · Hik} of {H1, H2, · · · Hm} such that Hi1 ∧ Hi2 ∧ Hik ∧ Ω is UNSAT Minimal Unsatisfiable Subset Find minimal subset {Hi1, Hi2, · · · Hik}
UNSAT
13 / 41Minimal Unsatisfiable Subset
Given Ψ = H1 ∧ H2 · · · ∧ Hm ∧ Ω Unsatisfiable Subset Find subset {Hi1, Hi2, · · · Hik} of {H1, H2, · · · Hm} such that Hi1 ∧ Hi2 ∧ Hik ∧ Ω is UNSAT Minimal Unsatisfiable Subset Find minimal subset {Hi1, Hi2, · · · Hik}
UNSAT
13 / 41Minimal Independent Support
H1 := {x1 = y1}, H2 := {x2 = y2}, · · · Hn := {xn = yn} Ω = F(x1, · · · xn) ∧ F(y1, · · · yn) ∧ ¬(
(xi = yi)) Lemma I = {xi} is Minimal Independent Support iif HI is Minimal Unsatisfiable Subset where HI = {Hi|xi ∈ I}
MIS MUS
14 / 41Minimal Independent Support
H1 := {x1 = y1}, H2 := {x2 = y2}, · · · Hn := {xn = yn} Ω = F(x1, · · · xn) ∧ F(y1, · · · yn) ∧ ¬(
(xi = yi)) Lemma I = {xi} is Minimal Independent Support iif HI is Minimal Unsatisfiable Subset where HI = {Hi|xi ∈ I}
MIS MUS
Two orders of magnitude improvement in runtime
14 / 41Challenges
Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions?
Functions Challenge 2 How large is a “small” cell? Challenge 3 How many cells?
15 / 41Challenge 2: How large is a “small” cell
Challenge 2: How large is a “small” cell
Challenge 2: How large is a “small” cell
– Pr
1+ε ≤ Z ≤ E[Z](1 + ε)
1 (
ε 1+ε )2(E[Z]) 16 / 41Challenge 2: How large is a “small” cell
– Pr
1+ε ≤ Z ≤ E[Z](1 + ε)
1 (
ε 1+ε )2(E[Z])We want a “small” cell to have roughly thresh solutions, where thresh = 5
ε2
Challenges
Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions?
Functions Challenge 2 How large is a “small” cell?
Functions Challenge 3 How many cells?
17 / 41Challenge 3: How many cells?
ε)2 solutions
thresh
18 / 41Challenge 3: How many cells?
ε)2 solutions
thresh
– Check for every m = 0, 1, · · · n if the number of solutions ≤ thresh
18 / 41Challenge 3: How many cells?
ε)2 solutions
thresh
– Check for every m = 0, 1, · · · n if the number of solutions ≤ thresh – XORs for each m must be independently chosen
18 / 41Challenge 3: How many cells?
ε)2 solutions
thresh
– Check for every m = 0, 1, · · · n if the number of solutions ≤ thresh – XORs for each m must be independently chosen
◮ Query 1: Is #(F ∧ Q1 1) ≤ thresh ◮ Query 2: Is #(F ∧ Q2 1 ∧ Q2 2) ≤ thresh ◮ · · · ◮ Query n: Is #(F ∧ Qn 1 · · · ∧ Qn n) ≤ thresh 18 / 41Challenge 3: How many cells?
ε)2 solutions
thresh
– Check for every m = 0, 1, · · · n if the number of solutions ≤ thresh – XORs for each m must be independently chosen
◮ Query 1: Is #(F ∧ Q1 1) ≤ thresh ◮ Query 2: Is #(F ∧ Q2 1 ∧ Q2 2) ≤ thresh ◮ · · · ◮ Query n: Is #(F ∧ Qn 1 · · · ∧ Qn n) ≤ thresh– Stop at the first m where Query m returns YES and return estimate as #(F ∧ Qm
1 · · · ∧ Qm m) × 2m
18 / 41Challenge 3: How many cells?
ε)2 solutions
thresh
– Check for every m = 0, 1, · · · n if the number of solutions ≤ thresh – XORs for each m must be independently chosen
◮ Query 1: Is #(F ∧ Q1 1) ≤ thresh ◮ Query 2: Is #(F ∧ Q2 1 ∧ Q2 2) ≤ thresh ◮ · · · ◮ Query n: Is #(F ∧ Qn 1 · · · ∧ Qn n) ≤ thresh– Stop at the first m where Query m returns YES and return estimate as #(F ∧ Qm
1 · · · ∧ Qm m) × 2m
(CMV, CP13) (CFMSV, AAAI14)
18 / 41ApproxMC(F, ε, δ)
# of sols ≤ thresh?
19 / 41ApproxMC(F, ε, δ)
# of sols ≤ thresh? # of sols ≤ thresh? No
19 / 41ApproxMC(F, ε, δ)
# of sols ≤ thresh? # of sols ≤ thresh? No No
19 / 41ApproxMC(F, ε, δ)
# of sols ≤ thresh? # of sols ≤ thresh? # of sols ≤ thresh? # of sols ≤ thresh? · · · No No No
19 / 41ApproxMC(F, ε, δ)
# of sols ≤ thresh? # of sols ≤ thresh? # of sols ≤ thresh? Estimate = # of sols × # of cells # of sols ≤ thresh? · · · No No No Yes
19 / 41ApproxMC(F, ε, δ)
Theoretical Guarantees Theorem (Correctness) Pr
1+ε
≤ ApproxMC(F, ε, δ) ≤ |Sol(ϕ)|(1 + ε)
Theorem (Complexity) ApproxMC(F, ε, δ) makes O(
n log( 1
δ )ε2
) calls to SAT oracle.
n log n log( 1
δ )ε
) calls to SAT oracle
(Stockmeyer 1983)
20 / 41ApproxMC(F, ε, δ)
Theoretical Guarantees Theorem (Correctness) Pr
1+ε
≤ ApproxMC(F, ε, δ) ≤ |Sol(ϕ)|(1 + ε)
Theorem (Complexity) ApproxMC(F, ε, δ) makes O(
n log( 1
δ )ε2
) calls to SAT oracle.
n log n log( 1
δ )ε
) calls to SAT oracle
(Stockmeyer 1983)
Runtime performance
20 / 41ApproxMC(F, ε, δ)
Theoretical Guarantees Theorem (Correctness) Pr
1+ε
≤ ApproxMC(F, ε, δ) ≤ |Sol(ϕ)|(1 + ε)
Theorem (Complexity) ApproxMC(F, ε, δ) makes O(
n log( 1
δ )ε2
) calls to SAT oracle.
n log n log( 1
δ )ε
) calls to SAT oracle
(Stockmeyer 1983)
Runtime performance Handles thousands of variables in few hours but insufficient to solve practical applications
20 / 41ApproxMC(F, ε, δ)
Theoretical Guarantees Theorem (Correctness) Pr
1+ε
≤ ApproxMC(F, ε, δ) ≤ |Sol(ϕ)|(1 + ε)
Theorem (Complexity) ApproxMC(F, ε, δ) makes O(
n log( 1
δ )ε2
) calls to SAT oracle.
n log n log( 1
δ )ε
) calls to SAT oracle
(Stockmeyer 1983)
Runtime performance Handles thousands of variables in few hours but insufficient to solve practical applications How to scale to hundreds of thousands of variables and beyond?
20 / 41ApproxMC(F, ε, δ)
Theoretical Guarantees Theorem (Correctness) Pr
1+ε
≤ ApproxMC(F, ε, δ) ≤ |Sol(ϕ)|(1 + ε)
Theorem (Complexity) ApproxMC(F, ε, δ) makes O(
n log( 1
δ )ε2
) calls to SAT oracle.
n log n log( 1
δ )ε
) calls to SAT oracle
(Stockmeyer 1983)
Runtime performance Handles thousands of variables in few hours but insufficient to solve practical applications How to scale to hundreds of thousands of variables and beyond? Efficient SAT oracle calls?
20 / 41Beyond ApproxMC
1) ≤ thresh
1 ∧ Q2 2) ≤ thresh
1 · · · ∧ Qn n) ≤ thresh
Classical View
Beyond ApproxMC
1) ≤ thresh
1 ∧ Q2 2) ≤ thresh
1 · · · ∧ Qn n) ≤ thresh
Classical View
Practitioner’s View
1) followed by (F ∧ Q2 1 ∧ Q2 2) requires larger
runtime than solving (F ∧ Q1
1) followed by (F ∧ Q1 1 ∧ Q2 2)
21 / 41Beyond ApproxMC
1) ≤ thresh
1 ∧ Q2 2) ≤ thresh
1 · · · ∧ Qn n) ≤ thresh
Classical View
Practitioner’s View
1) followed by (F ∧ Q2 1 ∧ Q2 2) requires larger
runtime than solving (F ∧ Q1
1) followed by (F ∧ Q1 1 ∧ Q2 2)
– If (F ∧ Q1
1) =
⇒ L then (F ∧ Q1
1 ∧ Q2 2) =
⇒ L – But, If (F ∧ Q1
1) =
⇒ L then it is not always the case that (F ∧ Q2
1 ∧ Q2 2) =
⇒ L
21 / 41Beyond ApproxMC
– Query 1: Is #(F ∧ Q1) ≤ thresh – Query 2: Is #(F ∧ Q1 ∧ Q2) ≤ thresh – · · · – Query n: Is #(F ∧ Q1 ∧ Q2 · · · ∧ Qn) ≤ thresh
estimate as #(F ∧ Q1 ∧ Q2 · · · ∧ Qm) × 2m
– If Query i returns YES, then Query i + 1 must return YES
22 / 41Beyond ApproxMC
– Query 1: Is #(F ∧ Q1) ≤ thresh – Query 2: Is #(F ∧ Q1 ∧ Q2) ≤ thresh – · · · – Query n: Is #(F ∧ Q1 ∧ Q2 · · · ∧ Qn) ≤ thresh
estimate as #(F ∧ Q1 ∧ Q2 · · · ∧ Qm) × 2m
– If Query i returns YES, then Query i + 1 must return YES – Galloping search (# of SAT calls: O(log n)) – Incremental solving
22 / 41Beyond ApproxMC
– Query 1: Is #(F ∧ Q1) ≤ thresh – Query 2: Is #(F ∧ Q1 ∧ Q2) ≤ thresh – · · · – Query n: Is #(F ∧ Q1 ∧ Q2 · · · ∧ Qn) ≤ thresh
estimate as #(F ∧ Q1 ∧ Q2 · · · ∧ Qm) × 2m
– If Query i returns YES, then Query i + 1 must return YES – Galloping search (# of SAT calls: O(log n)) – Incremental solving
Beyond ApproxMC
– Query 1: Is #(F ∧ Q1) ≤ thresh – Query 2: Is #(F ∧ Q1 ∧ Q2) ≤ thresh – · · · – Query n: Is #(F ∧ Q1 ∧ Q2 · · · ∧ Qn) ≤ thresh
estimate as #(F ∧ Q1 ∧ Q2 · · · ∧ Qm) × 2m
– If Query i returns YES, then Query i + 1 must return YES – Galloping search (# of SAT calls: O(log n)) – Incremental solving
– Independence crucial to analysis (Stockmeyer 1983, · · · )
22 / 41Beyond ApproxMC
– Query 1: Is #(F ∧ Q1) ≤ thresh – Query 2: Is #(F ∧ Q1 ∧ Q2) ≤ thresh – · · · – Query n: Is #(F ∧ Q1 ∧ Q2 · · · ∧ Qn) ≤ thresh
estimate as #(F ∧ Q1 ∧ Q2 · · · ∧ Qm) × 2m
– If Query i returns YES, then Query i + 1 must return YES – Galloping search (# of SAT calls: O(log n)) – Incremental solving
– Independence crucial to analysis (Stockmeyer 1983, · · · )
small for i ≪ m∗
– Dependence of Query j upon Query i (i < j) does not hurt
(CMV, IJCAI16)
22 / 41Taming the Curse of Dependence
Let 2m∗ = |Sol(ϕ)|
thresh
Lemma (1) ApproxMC (F, ε, δ) terminates with m ∈ {m∗ − 1, m∗} with probability ≥ 0.8 Lemma (2) For m ∈ {m∗ − 1, m∗}, estimate obtained from a randomly picked cell lies within a tolerance of ε of |Sol(ϕ)| with probability ≥ 0.8
23 / 41Optimized ApproxMC(F, ε, δ)
Theorem (Correctness) Pr
1+ε
≤ ApproxMC(F, ε, δ) ≤ |Sol(ϕ)|(1 + ε)
Theorem (Complexity) ApproxMC(F, ε, δ) makes O(
log n log( 1
δ )ε2
) calls to SAT oracle.
24 / 41Optimized ApproxMC(F, ε, δ)
Theorem (Correctness) Pr
1+ε
≤ ApproxMC(F, ε, δ) ≤ |Sol(ϕ)|(1 + ε)
Theorem (Complexity) ApproxMC(F, ε, δ) makes O(
log n log( 1
δ )ε2
) calls to SAT oracle. Theorem (FPRAS for DNF) If ϕ is a DNF formula, then ApproxMC is FPRAS – fundamentally different from the only other known FPRAS for DNF (Karp, Luby 1983)
24 / 41Beyond Boolean: Handling bit-vectors
– Bit-vector constraints can be translated into a Boolean formula
Beyond Boolean: Handling bit-vectors
– Bit-vector constraints can be translated into a Boolean formula
Beyond Boolean: Handling bit-vectors
– Bit-vector constraints can be translated into a Boolean formula
– Few cells too many solutions in a cell – Too many cells No solutions in most of the cells
25 / 41HSMT: Efficient word-level Hash Function
i.e., N = pc1
1 pc2 2 pc3 3 · · · pcn n
– c1 (mod p) constraints – c2 (mod p) constraints – · · ·
From Timeouts to under 40 seconds
Performance of RDA Performance of ApproxMC
(DMPV, AAAI17)
27 / 41Highly Accurate Estimates
Observed relative error (G5)
10 20 30 40 50 60
Terminal Node
10 20 30 40 50 60
Source Node
0.02 0.04 0.06 0.08 0.1 0.12 0.14
Relative Error
(ε = 0.8, δ = 0.1)
28 / 41Beyond Network Reliability
ApproxMC
Network Reliability Probabilistic Inference Decision Making Under Uncertainty Quantified Information Flow Program Synthesis
(DMPV, AAAI17) (CFMSV, AAAI14), (IMMV, CP15), (CFMV, IJCAI15), (CMMV, AAAI16), (CMV, IJCAI16) (CMV, IJCAI16) Fremont, Rabe and Seshia 2017 (CFMSV, AAAI14), Fremont et al 2017, Ellis et al 2017 29 / 41Part II Discrete Sampling
30 / 41Discrete Sampling
– Boolean Variables X1, X2, · · · Xn – Formula ϕ over X1, X2, · · · Xn
Pr[y is output] = 1 |Sol(ϕ)|
1 (1 + ε)|Sol(ϕ)| ≤ Pr[y is output] = 1 + ε |Sol(ϕ)|
31 / 41As simple as sampling dots
As simple as sampling dots
As simple as sampling dots
Pick a random cell Enumerate all the solutions and pick a random solution
32 / 41As simple as sampling dots
Pick a random cell Enumerate all the solutions and pick a random solution Challenge: How many cells?
32 / 41How many cells?
thresh
– But determining |Sol(ϕ)| is expensive – ApproxMC(F, ε, δ) returns C such that Pr
1+ε
≤ C ≤ |Sol(ϕ)|(1 + ε)
– ˜ m = log
C thresh ( m∗ = log |Sol(ϕ)| thresh )
– Check for m = ˜ m − 1, ˜ m, ˜ m + 1 if a randomly chosen cell is small
33 / 41How many cells?
thresh
– But determining |Sol(ϕ)| is expensive – ApproxMC(F, ε, δ) returns C such that Pr
1+ε
≤ C ≤ |Sol(ϕ)|(1 + ε)
– ˜ m = log
C thresh ( m∗ = log |Sol(ϕ)| thresh )
– Check for m = ˜ m − 1, ˜ m, ˜ m + 1 if a randomly chosen cell is small – Not just a practical hack required non-trivial proof
(CMV, CAV13) (CMV, DAC14) (CFMSV, TACAS15)
33 / 41Theoretical Guarantees
Theorem (Almost-Uniformity) ∀y ∈ Sol(ϕ),
1 (1+ε)|Sol(ϕ)| ≤ Pr[y is output] ≤ 1+ε |Sol(ϕ)|
34 / 41Theoretical Guarantees
Theorem (Almost-Uniformity) ∀y ∈ Sol(ϕ),
1 (1+ε)|Sol(ϕ)| ≤ Pr[y is output] ≤ 1+ε |Sol(ϕ)|
Theorem (Query) For a formula ϕ over n variables, to generate m samples, UniGen makes
Theoretical Guarantees
Theorem (Almost-Uniformity) ∀y ∈ Sol(ϕ),
1 (1+ε)|Sol(ϕ)| ≤ Pr[y is output] ≤ 1+ε |Sol(ϕ)|
Theorem (Query) For a formula ϕ over n variables, to generate m samples, UniGen makes
Theoretical Guarantees
Theorem (Almost-Uniformity) ∀y ∈ Sol(ϕ),
1 (1+ε)|Sol(ϕ)| ≤ Pr[y is output] ≤ 1+ε |Sol(ϕ)|
Theorem (Query) For a formula ϕ over n variables, to generate m samples, UniGen makes
Universality
Theoretical Guarantees
Theorem (Almost-Uniformity) ∀y ∈ Sol(ϕ),
1 (1+ε)|Sol(ϕ)| ≤ Pr[y is output] ≤ 1+ε |Sol(ϕ)|
Theorem (Query) For a formula ϕ over n variables, to generate m samples, UniGen makes
Universality
Random XORs are 3-universal
34 / 41Three Orders of Improvement
Relative Runtime SAT Solver 1 Desired Uniform Generator 10 UniGen 20 XORSample (2012 state of the art) 50000 Experiments over 200+ benchmarks
35 / 41Three Orders of Improvement
Relative Runtime SAT Solver 1 Desired Uniform Generator 10 UniGen 20 XORSample (2012 state of the art) 50000 Experiments over 200+ benchmarks UniGen is highly parallelizable – achieves linear speedup i.e., runtime decreases linearly with number of processors.
35 / 41Three Orders of Improvement
Relative Runtime SAT Solver 1 Desired Uniform Generator 10 UniGen (two cores) 10 XORSample (2012 state of the art) 50000 Experiments over 200+ benchmarks UniGen is highly parallelizable – achieves linear speedup i.e., runtime decreases linearly with number of processors. Closer to technical transfer
36 / 41Uniformity
Statistically Indistinguishable
Beyond Verification
UniGen
Hardware Validation Music Improvisation Probabilistic Reasoning Program Analysis Problem Generation
39 / 41Towards Discrete Sampling and Integration Revolution
40 / 41Towards Discrete Sampling and Integration Revolution
Towards Discrete Sampling and Integration Revolution
(DMV, IJCAI16)
1 2 3 4 5 6 r: Density of 3-clauses 0.0 0.2 0.4 0.6 0.8 1.0 1.2 s: Density of XOR-clauses 0.00 0.15 0.30 0.45 0.60 0.75 0.90 1.00 40 / 41Towards Discrete Sampling and Integration Revolution
(DMV, IJCAI16)
1 2 3 4 5 6 r: Density of 3-clauses 0.0 0.2 0.4 0.6 0.8 1.0 1.2 s: Density of XOR-clauses 0.00 0.15 0.30 0.45 0.60 0.75 0.90 1.00Summary
Science
– Applications from network reliability, probabilistic inference, side-channel attacks to hardware verification
demonstrate scalability
– From problems with tens of variables to hundreds of thousands of variables
Generator Relative Runtime SAT Solver 1 Desired Uniform Generator 10 UniGen 20 UniGen (two cores) 10 XORSample 50000
41 / 41