discrete sampling and integration for the ai practitioner
play

Discrete Sampling and Integration for the AI Practitioner Supratik - PowerPoint PPT Presentation

Discrete Sampling and Integration for the AI Practitioner Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University Agenda Part 1: Boolean Satisfiability Solving (Vardi) Part 2(a): Applications


  1. Knuth Gets His Satisfaction SIAM News, July 26, 2016: “Knuth Gives Satisfaction in SIAM von Neumann Lecture” Donald Knuth gave the 2016 John von Neumann lecture at the SIAM Annual Meeting. The von Neumann lecture is SIAM’s most prestigious prize. Knuth based the lecture, titled ”Satisfiability and Combinatorics”, on the latest part (Volume 4, Fascicle 6) of his The Art of Computer Programming book series. He showed us the first page of the fascicle, aptly illustrated with the quote ”I can’t get no satisfaction,” from the Rolling Stones. In the preface of the fascicle Knuth says ”The story of satisfiability is the tale of a triumph of software engineering, blended with rich doses of beautiful mathematics”. 25

  2. SAT Heuristic – Backjumping Backtracking : go up one level in the search tree when both Boolean values for a variable have been tested. Backjumping [Stallman-Sussman, 1977]: jump back in the search tree, if jump is safe – use highest node to jump to. Key : Distinguish between • Decision variable : Variable is that chosen and then assigned first c and then 1 − c . • Implication variable : Assignment to variable is forced by a unit clause. Implication Graph : directed acyclic graph describing the relationships between decision variables and implication variables. 26

  3. Smart Unit-Clause Preference Boolean Constraint Propagation (BCP) : propagating values forced by unit clauses. • Empirical Observation : BCP can consume up to 80% of SAT solving time! Requirement : identifying unit clauses • Naive Method : associate a counter with each clause and update counter appropriately, upon assigning and unassigning variables. • Two-Literal Watching [Moskewicz-Madigan-Zhao-Zhang-Malik, 2001]: “watch” two un-false literals in each unsatisfied clause – no overhead for backjumping. 27

  4. SAT Heuristic – Clause Learning Conflict-Driven Clause Learning : If assignment � l 1 , . . . , l n � is bad, then add clause ¬ l 1 ∨ . . . ∨ ¬ l n to block it. Marques-Silva&Sakallah, 1996: This would add very long clauses! Instead: • Analyze implication graph for chain of reasoning that led to bad assignment. • Add a short clause to block said chain. • The “learned” clause is a resolvent of prior clauses. Consequence : • Combine search with inference ( resolution ). • Algorithm uses exponential space; “forgetting” heuristics required. 28

  5. Smart Decision Heuristic Crucial : Choosing decision variables wisely! Dilemma : brainiac vs. speed demon • Brainiac : chooses very wisely, to maximize BCP – decision-time overhead! • Speed Demon : chooses very fast, to minimize decision time – many decisions required! VSIDS [Moskewicz-Madigan-Zhao-Zhang-Malik, 2001]: Variable State Independent Decaying Sum – prioritize variables according to recent participation on conflicts – compromise between Brainiac and Speed Demon. 29

  6. Randomized Restarts Randomize Restart [Gomes-Selman-Kautz, 1998] • Stop search • Reset all variables • Restart search • Keep learned clauses Aggressive Restarting : restart every ∼ 50 backtracks. 30

  7. SMT: Satisfiability Modulo Theory SMT Solving : Solve Boolean combinations of constraints in an underlying theory, e.g., linear constraints, combining SAT techniques and domain- specific techniques. • Tremendous progress since 2000! Example : SMTLA ( x > 10) ∧ [(( x > 5) ∨ ( x < 8)] Sample Application : Bounded Model Checking of Verilog programs – SMT(BV). 31

  8. SMT Solving General Approach : combine SAT-solving techniques with theory-solving techniques • Consider formula as Boolean formula ove theory atoms. • Solve Boolean formula; obtain conjunction of theory atoms. • Use theory solver to check if conjunction is satisfiable. Crux : Interaction between SAT solver and theory solver, e.g., conflict-clause learning – convert unsatisfiable theory-atom conjection to a new Boolean clause. 32

  9. Applications of SAT/SMT Solving in SW Engineering Leonardo De Moura+Nikolaj Bj¨ orner, 2012: Applications of Z3 at Microsoft • Symbolic execution • Model checking • Static analysis • Model-based design • . . . 33

  10. Reflection on P vs. NP Old Clich´ e “What is the difference between theory and practice? In theory, they are not that different, but in practice, they are quite different.” P vs. NP in practice : • P = NP: Conceivably, NP-complete problems can be solved in polynomial time, but the polynomial is n 1 , 000 – impractical ! • P � = NP: Conceivably, NP-complete problems can be solved by n log log log n operations – practical ! Conclusion : No guarantee that solving P vs. NP would yield practical benefits. 34

  11. Are NP-Complete Problems Really Hard? • When I was a graduate student, SAT was a “scary” problem, not to be touched with a 10-foot pole. • Indeed, there are SAT instances with a few hundred variables that cannot be solved by any extant SAT solver. • But today’s SAT solvers, which enjoy wide industrial usage, routinely solve real-life SAT instances with millions of variables! Conclusion We need a richer and broader complexity theory, a theory that would explain both the difficulty and the easiness of problems like SAT. Question : Now that SAT is “easy” in practice, how can we leverage that? • Is BPP NP the “new” PTIME ? 35

  12. Notation • Given – X 1 , … X n : variables with finite discrete domains D 1 , … D n – Constraint (logical formula) ϕ over X 1 , … X n – Weight function W: D 1 × … D n → Q ≥ 0 Sol( ϕ ) : set of assignments of X 1 , … X n satisfying ϕ – Determine W( ϕ ) = ∑ y ∈ Sol( ϕ ) W(y) Discrete Integration If W(y) = 1 for all y, then W( ϕ ) = | Sol( ϕ ) | (Model Counting) – Randomly sample from Sol( ϕ ) such that Pr[y is sampled] ∝ W(y) If W(y) = 1 for all y, then uniformly sample from Sol( ϕ ) Discrete Sampling For this tutorial: Initially, D i ’s are {0,1} – Boolean variables Later, we’ll consider D i ’s as {0, 1} n – Bit-vector variables 1

  13. Closer Look At Some Applications • Discrete Integration – Probabilistic Inference – Network (viz. electrical grid) reliability – Quantitative Information flow – And many more … • Discrete Sampling – Constrained random verification – Automatic problem generation – And many more … 2

  14. Application 1: Probabilistic Inference – An alarm rings if it’s in a working state when an earthquake happens or a burglary happens – The alarm can malfunction and ring without earthquake or burglary happening – Given that the alarm rang, what is the likelihood that an earthquake happened? – Given conditional dependencies (and conditional probabilities) calculate Pr[event | evidence] – What is Pr [Earthquake | Alarm] ? 3

  15. Probabilistic Inference: Bayes’ Rule ∩ ∩ Pr[ event evidence ] Pr[ event evidence ] = = Pr[ event | evidence ] i i ∑ i ∩ Pr[ evidence ] Pr[ event evidence ] j j ∩ = × Pr[ event evidence ] Pr[ evidence | event ] Pr[ event ] j j j How do we represent conditional dependencies efficiently, and calculate these probabilities? 4

  16. Probablistic Inference: Graphical Models B E B E A Pr(A|E,B) A 5 Conditional Probability Tables (CPT)

  17. Probabilistic Inference: First Principle Calculation B E B E A Pr(A|E,B) A Pr 𝐹 ∩ 𝐵 = Pr 𝐹 ∗ Pr ¬𝐶 ∗ Pr 𝐵 𝐹, ¬𝐶 + Pr 𝐹 ∗ Pr 𝐶 ∗ Pr 𝐵 𝐹, 𝐶] 6

  18. Probabilisitc Inference: Logical Formulation V = {v A , v ~A , v B , v ~B , v E , v ~E } Prop vars corresponding to events T = {t A|B,E , t ~A|B,E , t A|B,~E …} Prop vars corresponding to CPT entries Formula encoding probabilistic graphical model ( ϕ PGM ): (v A ⊕ v ~A ) ∧ (v B ⊕ v ~B ) ∧ (v E ⊕ v ~E ) Exactly one of v A and v ~A is true ∧ (t A|B,E ⇔ v A ∧ v B ∧ v E ) ∧ (t ~A|B,E ⇔ v ~A ∧ v B ∧ v E ) ∧ … If v A , v B , v E are true, so must t A|B,E and vice versa 7

  19. Probabilistic Inference: Logic and Weights V = {v A , v ~A , v B , v ~B , v E , v ~E } T = {t A|B,E , t ~A|B,E , t A|B,~E …} W(v ~B ) = 0.2, W(v B ) = 0.8 Probabilities of indep events are weights of +ve literals W(v ~E ) = 0.1, W(v E ) = 0.9 W(t A|B,E ) = 0.3, W(t ~A|B,E ) = 0.7, … CPT entries are weights of +ve literals W(v A ) = W(v ~A ) = 1 Weights of vars corresponding to dependent events W( ¬ v ~B ) = W( ¬ v B ) = W( ¬ t A|B,E ) … = 1 Weights of -ve literals are all 1 Weight of assignment (v A = 1, v ~A = 0, t A|B,E = 1, …) = W(v A ) * W( ¬ v ~A )* W( t A|B,E )* … 8 Product of weights of literals in assignment

  20. Probabilistic Inference: Discrete Integration V = {v A , v ~A , v B , v ~B , v E , v ~E } T = {t A|B,E , t ~A|B,E , t A|B,~E …} Formula encoding combination of events in probabilistic model (Alarm and Earthquake) F = ϕ PGM ∧ v A ∧ v E Set of satisfying assignments of F: R F = { (v A = 1, v E = 1, v B = 1, t A|B,E = 1, all else 0), (v A = 1, v E = 1, v ~B = 1, t A|~B,E = 1, all else 0) } Weight of satisfying assignments of F: W(R F ) = W(v A ) * W(v E ) * W(v B ) * W(t A|B,E ) + W(v A ) * W(v E ) * W(v ~B ) * W(t A|~B,E ) = 1* Pr[E] * Pr[B] * Pr[A | B,E] + 1* Pr[E] * Pr[~B] * Pr[A | ~B,E] = Pr[ A ∩ E] 9

  21. Application 2: Network Reliability Graph G = (V, E) represents a (power-grid) network • Nodes (V) are towns, villages, power stations • Edges (E) are power lines s • Assume each edge e fails with prob g(e) ∈ [0,1] t • Assume failure of edges statistically independent • What is the probability that s and t become disconnected? 10

  22. Network Reliability: First Principles Modeling π : E → {0, 1} … configuration of network -- π (e) = 0 if edge e has failed, 1 otherwise s Prob of network being in configuration π t Pr[ π ] = Π g(e) × Π (1 - g(e)) e: π (e) = 0 e: π (e) = 1 Prob of s and t being disconnected May need to sum over numerous P ds,t = Σ Pr [ π ] (> 2 100 ) configurations 11 π : s, t disconnected in π

  23. Network Reliability: Discrete Integration • p v : Boolean variable for each v in V • q e : Boolean variable for each e in E • ϕ s,t (p v1 , … p vn , q e1 , … q em ) : s Boolean formula such that sat t assignments σ of ϕ s,t have 1-1 correspondence with configs π that disconnect s and t W( σ ) = Pr[ π ] - = Σ Pr [ π ] = Σ W( σ ) = W( ϕ ) P ds,t 12 𝜏 ⊨ 𝜒 𝑡, 𝑢 π : s, t disconnected in π

  24. Application 3: Quantitative Information Flow • A password-checker PC takes a secret password (SP) and a user input (UI) and returns “Yes” iff SP = UI [Bang et al 2016] – Suppose passwords are 4 characters (‘0’ through ‘9’) long PC2 (char[] H, char[] L) { match = true; PC1 (char[] SP, char[] UI) { for (int i=0; i<SP.length(); i++) { for (int i=0; i<SP.length(); i++) { if (SP[i] != UI[i]) match=false; if(SP[i] != UI[i]) return “No”; else match = match; } } return “Yes”; if match return “Yes”; } else return “No”; } Which of PC1 and PC2 is more likely to leak information about the 13 secret key through side-channel observations?

  25. QIF: Some Basics • Program P receives some “high” input (H) and produces a “low” (L) output – Password checking: H is SP , L is time taken to answer “Is SP = UI?” – Side-channel observations: memory, time … • Adversary may infer partial information about H on seeing L – E.g. in password checking, infer: 1st char is password is not 9 . • Can we quantify “leakage of information”? “initial uncertainty in H” = “info leaked” + “remaining uncertainty in H” [Smith 2009] • Uncertainty and information leakage usually quantified using information theoretic measures, e.g. Shannon entropy 14

  26. QIF: First Principles Approach • Password checking: Observed time to answer “Yes”/“No” – Depends on # instructions executed • E.g. SP = 00700700 PC1 (char[] SP, char[] UI) { for (int i=0; i<SP.length(); i++) { UI = N2345678, 𝑂 ≠ 0 if(SP[i] != UI[i]) return “No”; PC1 executes for loop once } return “Yes”; UI = 02345678 } PC1 executes for loop at least twice Observing time to “No” gives away whether 1 st char is not N, 𝑂 ≠ 0 In 10 attempts, 1 st char can of SP can be uniquely determined. In max 40 attempts, SP can be cracked. 15

  27. QIF: First Principles Approach • Password checking: Observed time to answer “Yes”/“No” – Depends on # instructions executed PC2 (char[] H, char[] L) { • E.g. SP = 00700700 match = true; for (int i=0; i<SP.length(); i++) { UI = N2345678, 𝑂 ≠ 0 if (SP[i] != UI[i]) match=false; else match = match; PC1 executes for loop 4 times } if match return “Yes”; UI = 02345678 else return “No”; } PC1 executes for loop 4 times Cracking SP requires max 10 4 attempts !!! (“less leakage”) 16

  28. QIF: Partitioning Space of Secret Password • Observable time effectively partitions values of SP [Bultan2016] F F F F SP[2] != SP[3] != SP[1] != SP[0] != UI[2] UI[3] UI[1] UI[0] T T T T “No” “No” “No” “No” t = 3 t = 9 t = 5 t = 7 SP[3] != UI[3] SP[2] != UI[2] SP[1] != UI[1] SP[0] != UI[0] SP[2] = UI[2] SP[1] = UI[1] SP[0] = UI[0] SP[1] = UI[1] SP[0] = UI[0] t = 11 SP[0] = UI[0] “Yes” PC1 (char[] SP, char[] UI) { for (int i=0; i<SP.length(); i++) { SP[3] = UI[3] SP[1] = UI[1] if(SP[i] != UI[i]) return “No”; SP[2] = UI[2] SP[0] = UI[0] } return “Yes”; 17 }

  29. QIF: Probabilities of Observed Times F F F SP[2] != F SP[1] != SP[3] != SP[0] != UI[2] UI[1] UI[3] UI[0] T T T T “No” “No” “No” “No” t = 3 t = 9 t = 5 t = 7 SP[3] != UI[3] SP[2] != UI[2] SP[1] != UI[1] SP[0] != UI[0] SP[2] = UI[2] SP[1] = UI[1] SP[0] = UI[0] SP[1] = UI[1] SP[0] = UI[0] t = 11 SP[0] = UI[0] “Yes” 𝜒 567 ∶ 𝑇𝑄 1 ≠ 𝑉𝐽 1 ∧ 𝑇𝑄 0 = 𝑉𝐽 0 SP[3] = UI[3] SP[1] = UI[1] SP[2] = UI[2] SP[0] = UI[0] |@AB C DEF | Pr [ t = 5 ] = GH I Model Counting if UI 18 uniformly chosen

  30. QIF: Probabilities of Observed Times F F F SP[2] != F SP[1] != SP[3] != SP[0] != UI[2] UI[1] UI[3] UI[0] T T T T “No” “No” “No” “No” t = 3 t = 9 t = 5 t = 7 SP[3] != UI[3] SP[2] != UI[2] SP[1] != UI[1] SP[0] != UI[0] SP[2] = UI[2] SP[1] = UI[1] SP[0] = UI[0] SP[1] = UI[1] SP[0] = UI[0] t = 11 SP[0] = UI[0] “Yes” 𝜒 567 ∶ 𝑇𝑄 1 ≠ 𝑉𝐽 1 ∧ 𝑇𝑄 0 = 𝑉𝐽 0 SP[3] = UI[3] SP[1] = UI[1] SP[2] = UI[2] SP[0] = UI[0] Pr [ t = 5 ] = W( 𝜒 567 ) Discrete Integration if UI chosen according to 19 weight function

  31. QIF: Quantifying Leakage via Integration – Exp information leakage = Shannon entropy of obs times = K Pr 𝑢 = 𝑙 . log1/Pr [𝑢 = 𝑙] S∈{V,7,W,X,GG} – Information leakage in password checker example PC1: 0.52 (more “leaky”) PC2: 0.0014 (less “leaky”) Discrete integration crucial in obtaining Pr[t = k] 20

  32. Unweighted Counting Suffices in Principle Probabilistic Inference Weighted KML 1989, Karger 2000 Network Reliability Model DMPV 2017 Counting Quantified Information Flow Weighted Model Counting Unweighted Model Counting IJCAI 2015 21 Reduction polynomial in #bits representing weights

  33. Application 4: Constr Random Verification Functional Verification • Formal verification – Challenges: formal requirements, scalability – ~10-15% of verification effort • Dynamic verification: dominant approach 22

  34. CRV: Dynamic Verification § Design is simulated with test vectors • Test vectors represent different verification scenarios § Results from simulation compared to intended results § How do we generate test vectors? Challenge : Exceedingly large test input space! Can’t try all input combinations 2 128 combinations for a 64-bit binary operator!!! 23

  35. CRV: Sources of Constraints • Designers: b a 1. a + 64 11 * 32 b = 12 64 bit 64 bit 2. a < 64 (b >> 4) • Past Experience: 1. 40 < 64 34 + a < 64 5050 c = f(a,b) 2. 120 < 64 b < 64 230 • Users: 1. 232 * 32 a + b != 1100 64 bit 2. 1020 < 64 (b / 64 2) + 64 a < 64 2200 c § Test vectors: solutions of constraints 24 § Proposed by Lichtenstein, Malka, Aharon (IAAI 94)

  36. CRV: Why Existing Solvers Don’t Suffice Constraints • Designers: b a 1. a + 64 11 * 32 b = 12 64 bit 64 bit 2. a < 64 (b >> 4) • Past Experience: c = f(a,b) 1. 40 < 64 34 + a < 64 5050 2. 120 < 64 b < 64 230 • Users: 1. 232 * 32 a + b != 1100 64 bit 2. 1020 < 64 (b / 64 2) + 64 a < 64 2200 c Modern SAT/SMT solvers are complex systems Efficiency stems from the solver automatically “biasing” search Fails to give unbiased or user-biased distribution of test vectors 25

  37. CRV: Need To Go Beyond SAT Solvers Constrained Random Verification b a Set of Constraints 64 bit 64 bit c = f(a,b) SAT Formula 64 bit Sample satisfying assignments c uniformly at random Scalable Uniform Generation of SAT Witnesses 26

  38. Application 5: Automated Problem Generation • Large class sizes, MOOC offerings require automated generation of related but randomly different problems • Discourages plagiarism between students • Randomness makes it hard for students to guess what the solution would be • Allows instructors to focus on broad parameters of problems, rather than on individual problem instances • Enables development of automated intelligent tutoring systems 27

  39. Auto Prob Gen: Using Problem Templates • A problem template is a partial specification of a problem – “Holes” in the template must be filled with elements from specified sets – Constraints on elements chosen to fill various “holes” restricts problem instances so that undesired instances are eliminated • Example: – Non-deterministic finite automata to be generated for complementation Holes: States, alphabet size, transitions for (state, letter) pairs, final states, initial states Constraints: Alphabet size = 2 Min/max transitions for a (state, letter) pair = 0/4 Min/max states = 3/5 Min/max number of final states = 1/3 Min/max initial states = 1/2 28

  40. Auto Prob Gen: An Illustration – Non-det finite automaton encoded as a formula on following variables Every solution of s 1 , s 2 , s 3 , s 4 , s 5 : States 𝜒 Z[Z5 ∧ 𝜒 5ab[c f 1 , f 2 , f 3 , f 4 , f 5 : Final states n 1 , n 2 , n 3 , n 4 , n 5 : Initial states ∧ 𝜒 c5gAh[5 ∧ 𝜒 lZ[c5 s 1 a 1 s 2 , s 1 a 2 s 2 , … : Transitions gives an automaton satisfying specified 𝜒 Z[Z5 = \ 𝑜 Z → 𝑡 Z ∧ 1 ≤ K 𝑜 Z ≤ 2 constraints Z Z 𝜒 5ab[c = \ 𝑡 Z 𝑏 e 𝑡 S → 𝑡 Z ∧ 𝑡 S ∧ \ 0 ≤ K 𝑡 Z 𝑏 e 𝑡 S ≤ 4 Z Z,e S 𝜒 c5gAh[5 = 3 ≤ ∑ 𝑡 Z ≤ 5 Z 𝜒 lZ[c5 = \ 𝑔 Z → 𝑡 Z ∧ 1 ≤ K 𝑔 ≤ 3 Z Z Z 29

  41. Auto Prob Gen: An Illustration – Non-det finite automaton encoded as a formula on following variables s 1 = 1, s 2 = 0, s 3 = 1, s 4 = 1, s 5 = 1: States f 1 = 0, f 2 = 0, f 3 = 1, f 4 = 1, f 5 = 0: Final states n 1 = 1, n 2 = 0, n 3 = 0, n 4 = 0, n 5 = 0: Initial states s 1 a 1 s 3 = 1, s 1 a 1 s 4 = 1, s 4 a 2 s 4 = 1, s 4 a 1 s 5 = 1, … : Transitions s 3 a 1 a 5 s 5 a 1 s 1 a 1 s 4 30

  42. Auto Prob Gen: Discrete Sampling • Uniform random generation of solutions of constraints gives automata satisfying constraints randomly • Weighted random generation of solutions gives automata satisfying constraints with different priorities/weightages. Examples: Weighing final state variables more gives automata with more final states Weighing transitions on letter a 1 more gives automata with more transitions labeled a 1 31

  43. Discrete Sampling and Integration for the AI Practitioner Part 2b: Survey of Prior Work Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University

  44. How Hard is it to Count/Sample? • Trivial if we could enumerate R F : Almost always impractical • Computational complexity of counting (discrete integration): Exact unweighted counting: #P-complete [Valiant 1978] Approximate unweighted counting: Deterministic: Polynomial time det. Turing Machine with Σ 2 p oracle [Stockmeyer 1983] | R | ≤ ε ≤ × + ε ε > F DetEstimat e(F, ) | R | ( 1 ), for 0 F + ε 1 Randomized: Poly-time probabilistic Turing Machine with NP oracle [Stockmeyer 1983; Jerrum,Valiant,Vazirani 1986] ⎡ ⎤ | R | ≤ ε δ ≤ ⋅ + ε ≥ − δ ε > < δ ≤ Pr F RandEstima te(F, , ) | R | ( 1 ) 1 , for 0 , 0 1 ⎢ ⎥ + ε F ⎣ 1 ⎦ Probably Approximately Correct (PAC) algorithm Weighted versions of counting: Exact: #P-complete [Roth 1996], Approximate: same class as unweighted version [follows from Roth 1996] 33

  45. How Hard is it to Count/Sample? • Computational complexity of sampling: Uniform sampling: Poly-time prob. Turing Machine with NP oracle [Bellare,Goldreich,Petrank 2000] = ∉ ⎧ c 0 if y R = = F Pr[ y UniformGen erator(F)] c , where ⎨ > ∈ c 0 and indep of y if y R ⎩ F Almost uniform sampling: Poly-time prob. Turing Machine with NP oracle [Jerrum,Valiant,Vazirani 1986, also from Bellare,Goldreich,Petrank 2000] = ∉ ⎧ c 0 if y R c ≤ = ε ≤ ⋅ + ε F Pr[ y AUGenerato r(F, )] c ( 1 ) , where ⎨ + ε > ∈ 1 c 0 and indep of y if y R ⎩ F Pr[Algorithm outputs some y] ≥ ½, if F is satisfiable 34

  46. Markov Chain Monte Carlo Techniques • Rich body of theoretical work with applications to sampling and counting [Jerrum,Sinclair 1996] • Some popular (and intensively studied) algorithms: – Metropolis-Hastings [Metropolis et al 1953, Hastings 1970], Simulated Annealing [Kirkpatrick et al 1982] • High-level idea: – Start from a “state” (assignment of variables) – Randomly choose next state using “local” biasing functions (depends on target distribution & algorithm parameters) – Repeat for an appropriately large number (N) of steps – After N steps, samples follow target distribution with high confidence • Convergence to desired distribution guaranteed only after N (large) steps • In practice, steps truncated early heuristically Nullifies/weakens theoretical guarantees [Kitchen,Keuhlman 2007] 35

  47. Exact Counters • DPLL based counters [CDP: Birnbaum,Lozinski 1999] – DPLL branching search procedure, with partial truth assignments – Once a branch is found satisfiable, if t out of n variables assigned, add 2 n-t to model count, backtrack to last decision point, flip decision and continue – Requires data structure to check if all clauses are satisfied by partial assignment Usually not implemented in modern DPLL SAT solvers – Can output a lower bound at any time 36

  48. Exact Counters • DPLL + component analysis [RelSat: Bayardo, Pehoushek 2000] – Constraint graph G: Variables of F are vertices An edge connects two vertices if corresponding variables appear in some clause of F – Disjoint components of G lazily identified during DPLL search – F1, F2, … Fn : subformulas of F corresponding to components |R F | = |R F1 | * |R F2 | * |R F3 | * … – Heuristic optimizations: Solve most constrained sub-problems first Solving sub-problems in interleaved manner 37

  49. Exact Counters • DPLL + Caching [Bacchus et al 2003, Cachet: Sang et al 2004, sharpSAT: Thurley 2006] If same sub-formula revisited multiple times during DPLL search, cache result and re-use it “Signature” of the satisfiable sub-formula/component must be stored Different forms of caching used: Simple sub-formula caching Component caching Linear-space caching Component caching can also be combined with clause learning and other reasoning techniques at each node of DPLL search tree 38 WeightedCachet: DPLL + Caching for weighted assignments

  50. Exact Counters • Knowledge Compilation based – Compile given formula to another form which allows counting models in time polynomial in representation size – Reduced Ordered Binary Decision Diagrams (ROBDD) [Bryant 1986]: Construction can blow up exponentially – Deterministic Decomposable Negation Normal Form (d-DNNF) [c2d: Darwiche 2004] Generalizes ROBDDs; can be significantly more succinct Negation normal form with following restrictions: Decomposability: All AND operators have arguments with disjoint support Determinizability: All OR operators have arguments with disjoint solution sets – Sentential Decision Diagrams (SDD) [Darwiche 2011] 39

  51. Exact Counters: How far do they go? • Work reasonably well in small-medium sized problems, and in large problem instances with special structure • Use them whenever possible – #P-completeness hits back eventually – scalability suffers! 40

  52. Bounding Counters [MBound: Gomes et al 2006; SampleCount: Gomes et al 2007; BPCount: Kroc et al 2008] – Provide lower and/or upper bounds of model count – Usually more efficient than exact counters – No approximation guarantees on bounds Useful only for limited applications 41

  53. Hashing-based Sampling • Bellare, Goldreich, Petrank (BGP 2000) • Uniform generator for SAT witnesses: • Polynomial time randomized algorithm with access to an NP oracle ∉ ⎧ 0 if y R = = F Pr[ y BGP(F)] ⎨ > ∈ c ( 0 ) if y R , where c is independen t of y ⎩ F • Employs n- universal hash functions Much more on this • Works well for small values of n coming in Part 3 • For high dimensions (large n), significant computational overheads 42

  54. Approximate Integration and Sampling: Close Cousins • Seminal paper by Jerrum, Valiant, Vazirani 1986 Polynomial Almost-Uniform PAC Generator Counter reduction • Yet, no practical algorithms that scale to large problem instances were derived from this work • No scalable PAC counter or almost-uniform generator existed until a few years back • The inter-reductions are practically computation intensive 43 •Think of O(n) calls to the counter when n = 100000

  55. Prior Work BDD/ other BGP exact tech. Guarantees MCMC SAT- Based 44 Performance

  56. Techniques using XOR hash functions • Bounding counters MBound, SampleCount [Gomes et al. 2006, Gomes et al 2007] used random XORs – Algorithms geared towards finding bounds without approximation guarantees – Power of 2-universal hashing not exploited • In a series of papers [2013: ICML, UAI, NIPS; 2014: ICML; 2015: ICML, UAI; 2016: AAAI, ICML, AISTATS, …] Ermon et al used XOR hash functions for discrete counting/sampling – Random XORs, also XOR constraints with specific structures – 2-universality exploited to provide improved guarantees – Relaxed constraints (like short XORs) and their effects studied 45

  57. An Interesting Combination: XOR + MAP Optimization • WISH: Ermon et al 2013 • Given a weight function W: {0,1} n → ℜ ≥ 0 – Use random XORs to partition solutions into cells – After partitioning into 2, 4, 8, 16, … cells Use Max Aposteriori Probability (MAP) optimizer to find solution with max weight in a cell (say, a 2 , a 4 , a 8 , a 16 , …) – Estimated W(R F ) = W(a 2 )*1 + W(a 4 )*2 + W(a 8 )* 4 + … • Constant factor approximation of W(R F ) with high confidence • MAP oracle needs repeated invokation O(n.log 2 n) – MAP is NP-complete – Being optimization (not decision) problem), MAP is harder to solve in practice than SAT 46

  58. XOR-based Counting and Sampling • Remainder of tutorial – Deeper dive into XOR hash-based counting and sampling – Discuss theoretical aspects and experimental observations – Based on work published in [2013: CP, CAV; 2014: DAC, AAAI; 2015: IJCAI, TACAS; 2016: AAAI, IJCAI, 2017: AAAI] 47

  59. Discrete Sampling and Integration for the AI Practitioner Part III: Hashing-based Approach to Sampling and Integration Supratik Chakraborty, IIT Bombay Kuldeep S. Meel, Rice University Moshe Y. Vardi, Rice University 1 / 41

  60. Discrete Integration and Sampling • Given – Variables X 1 , X 2 , · · · X n over finite discrete domains D 1 , D 2 , · · · D n – Formula ϕ over X 1 , X 2 , · · · X n – Weight Function W : D 1 × D 2 · · · × D n �→ [0 , 1] • Sol( ϕ ) = { solutions of F } • Discrete Integration: Determine W ( ϕ ) = Σ y ∈ Sol( ϕ ) W ( y ) – If W ( y ) = 1 for all y , then W ( ϕ ) = | Sol( ϕ ) | • Discrete Sampling: Randomly sample from Sol( ϕ ) such that Pr[y is sampled] ∝ W ( y ) – If W ( y ) = 1 for all y , then uniformly sample from Sol( ϕ ) 2 / 41

  61. Part I Discrete Integration 3 / 41

  62. From Weighted to Unweighted Integration Boolean Formula ϕ and weight function W : { 0 , 1 } n → Q ≥ 0 4 / 41

  63. From Weighted to Unweighted Integration Boolean Formula ϕ and weight Boolean Formula F ′ function W : { 0 , 1 } n → Q ≥ 0 W ( ϕ ) = c ( W ) × | Sol( F ′ ) | 4 / 41

  64. From Weighted to Unweighted Integration Boolean Formula ϕ and weight Boolean Formula F ′ function W : { 0 , 1 } n → Q ≥ 0 W ( ϕ ) = c ( W ) × | Sol( F ′ ) | • Key Idea: Encode weight function as a set of constraints (CFMV, IJCAI15) 4 / 41

  65. From Weighted to Unweighted Integration Boolean Formula ϕ and weight Boolean Formula F ′ function W : { 0 , 1 } n → Q ≥ 0 W ( ϕ ) = c ( W ) × | Sol( F ′ ) | • Key Idea: Encode weight function as a set of constraints (CFMV, IJCAI15) How do we estimate | Sol( F ′ ) | ? 4 / 41

  66. As Simple as Counting Dots

  67. As Simple as Counting Dots

  68. As Simple as Counting Dots Pick a random cell Estimate = Number of solutions in a cell × Number of cells 5 / 41

  69. Challenges Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions? 6 / 41

  70. Challenges Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions? Challenge 2 How large is a “small” cell? 6 / 41

  71. Challenges Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions? Challenge 2 How large is a “small” cell? Challenge 3 How many cells? 6 / 41

  72. Challenges Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions? • Designing function h : assignments → cells (hashing) • Solutions in a cell α : Sol( ϕ ) ∩ { y | h ( y ) = α } 6 / 41

  73. Challenges Challenge 1 How to partition into roughly equal small cells of solutions without knowing the distribution of solutions? • Designing function h : assignments → cells (hashing) • Solutions in a cell α : Sol( ϕ ) ∩ { y | h ( y ) = α } • Deterministic h unlikely to work 6 / 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend