understanding random sat understanding random sat
play

Understanding Random SAT Understanding Random SAT Beyond the - PowerPoint PPT Presentation

Understanding Random SAT Understanding Random SAT Beyond the Clauses-to-Variables Ratio Eugene Nudelman Eugene Nudelman Stanford University joint work with Kevin Leyton-Brown Kevin Leyton-Brown Holger Hoos Holger Hoos University of


  1. Understanding Random SAT Understanding Random SAT Beyond the Clauses-to-Variables Ratio Eugene Nudelman Eugene Nudelman Stanford University joint work with … Kevin Leyton-Brown Kevin Leyton-Brown Holger Hoos Holger Hoos University of British Columbia Alex Devkar Alex Devkar Yoav Shoham Yoav Shoham Stanford University

  2. Introduction Introduction • SAT is one of the most studied most studied problems in CS • Lots known about its worst-case worst-case complexity – But often, particular instances of NP -hard problems like SAT are easy in practice easy in practice • “ Drosophila ” for average-case average-case and empirical empirical (typical-case) complexity studies • (Uniformly) random SAT provides a way to bridge analytical and empirical work CP 2004

  3. Previously Previously … • Easy-hard-less Easy-hard-less hard hard transitions discovered in the behaviour of DPLL-type solvers [Selman, Mitchell, Levesque] – Strongly correlated with phase transition in solvability – Spawned a new enthusiasm for using empirical methods to study algorithm performance 2 1.5 1 0.5 0 • Follow up included study of: -0.5 -1 4 * Pr(SAT) - 2 -1.5 log(Kcnfs runtime) – Islands of tractability [Kolaitis et. al.] -2 3.3 3.5 3.7 3.9 4.1 4.3 4.5 4.7 4.9 5.1 5.3 c / / v – SLS search space topologies [Frank et.al., Hoos] – Backbones [Monasson et.al., Walsh and Slaney] – Backdoors [Williams et. al.] – Random restarts [Gomes et. al.] – Restart policies [Horvitz et.al, Ruan et.al.] – … CP 2004

  4. Empirical Hardness Models Empirical Hardness Models • We proposed building regression models regression models as a disciplined way of predicting and studying algorithms ’ behaviour [Leyton-Brown, Nudelman, Shoham, CP-02] • Applications Applications of this machine learning approach: 1) Predict running time � Useful to know how long how long an algorithm will run 2) Gain theoretical understanding � Which variables are important important to the hardness model? 3) Build algorithm portfolios � Can select the right algorithm on a per-instance per-instance basis 4) Tune distributions for hardness � Can generate harder harder benchmarks by rejecting easy instances CP 2004

  5. Outline Outline • Features Features • Experimental Results – Variable Size Data – Fixed Size Data CP 2004

  6. Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses Short Plateau 1000 BEST # Unsat 800 600 400 Long Plateau 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004

  7. Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses 1000 BEST # Unsat 800 600 Best Solution (mean, CV) 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004

  8. Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses 1000 BEST # Unsat 800 600 Number of Steps to Optimal 400 (mean, median, CV, 10%.90%) 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004

  9. Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses Ave. Improvement To 1000 Best Per Step BEST # Unsat (mean, CV) 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004

  10. Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses 1000 BEST # Unsat 800 600 First LM Ratio 400 (mean, CV) 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004

  11. Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses 1000 BestCV BEST # Unsat (CV of Local Minima) 800 (mean, CV) 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004

  12. Features: DPLL, LP Features: DPLL, LP • DPLL DPLL search space size estimate – Random probing Random probing with unit propagation – Compute mean depth till contradiction – Estimate log(#nodes) • Cumulative number of unit propagations unit propagations at different depths (DPLL with Satz heuristic) • LP relaxation LP relaxation – Objective value – stats of integer slacks – #vars set to an integer CP 2004

  13. Other Features Other Features • Problem Size Problem Size: Var Clause – v (#vars) } used for normalizing Var – c (#clauses) many other features Clause Powers of c / v , v / c , | c / v — 4.26 | – Var • Graphs: Graphs – Va Variable-Clause riable-Clause (VCG, bipartite) Var – Variable Variable (VG, edge whenever two Var variables occur in the same clause) Var – Clause Clause (CG, edge iff two clauses share a variable with opposite sign) Var Var • Balance Balance – #pos vs. #neg literals Clause Clause – unary, binary, ternary clauses • Proximity to Horn formula Horn formula Clause Clause CP 2004

  14. Outline Outline • Features Features • Experimental Results Experimental Results – Variable Size Data – Fixed Size Data CP 2004

  15. Experimental Setup Experimental Setup • Uniform random 3-SAT, 400 vars • Datasets Datasets (20000 instances each) – Variable-ratio Variable-ratio dataset (1 CPU-month) • c / v uniform in [3.26, 5.26] ( ∴ c ∈ [1304,2104]) – Fixed-ratio Fixed-ratio dataset (4 CPU-months) • c / v =4.26 ( ∴ v =400, c =1704) • Solvers Solvers – Kcnfs [Dubois and Dequen] – OKsolver [Kullmann] – Satz [Chu Min Li] • Quadratic Quadratic regression egression with logistic response function • Training : test : validation split – 70 : 15 : 15 CP 2004

  16. Kcnfs Data 2 1.5 1 0.5 0 -0.5 -1 4 * Pr(SAT) - 2 -1.5 log(Kcnfs runtime) -2 3.3 3.5 3.7 3.9 4.1 4.3 4.5 4.7 4.9 5.1 5.3 c / / v CP 2004

  17. Kcnfs Kcnfs Data Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004

  18. Kcnfs Kcnfs Data Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004

  19. Kcnfs Kcnfs Data Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004

  20. Kcnfs Kcnfs Data Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004

  21. Variable Ratio Prediction (Kcnfs) Variable Ratio Prediction (Kcnfs) 1000 Predicted Runtime [CPU sec] 100 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Actual Runtime [CPU sec] CP 2004

  22. Variable Ratio - Variable Ratio - UNSAT UNSAT 1000 Predicted Runtime [CPU sec] 100 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Actual Runtime [CPU sec] CP 2004

  23. Variable Ratio - Variable Ratio - SAT AT 1000 Predicted Runtime [CPU sec] 100 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Actual Runtime [CPU sec] CP 2004

  24. Kcnfs Kcnfs vs. Satz vs. Satz (UNSAT) (UNSAT) 1000 100 Satz time [CPU sec] 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Kcnfs time [CPU sec] CP 2004

  25. Kcnfs Kcnfs vs. Satz vs. Satz (SAT) (SAT) 1000 100 Satz time [CPU sec] 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Kcnfs time [CPU sec] CP 2004

  26. Feature Importance Feature Importance – Variable Ratio Variable Ratio • Subset selection Subset selection can be used to identify features sufficient sufficient for approximating full model performance • Other (correlated) sets could potentially achieve similar performance Cost of Cost of Variable Variable Omission Omission | c / v -4.26 | 100 | c / v -4.26 | 2 69 ( v / c ) 2 × SapsBestCVMean 53 | c / v -4.26 | × SapsBestCVMean 33 CP 2004

  27. Feature Importance Feature Importance – Variable Ratio Variable Ratio • Subset selection Subset selection can be used to identify features sufficient sufficient for approximating full model performance • Other (correlated) sets could potentially achieve similar performance Cost of Cost of Variable Variable Omission Omission | c / v -4.26 -4.26 | 100 | c / v -4.26 -4.26 | 2 69 ( v / c ) 2 × SapsBestCVMean 53 | c / v -4.26 -4.26 | × SapsBestCVMean 33 CP 2004

  28. Feature Importance Feature Importance – Variable Ratio Variable Ratio • Subset selection Subset selection can be used to identify features sufficient sufficient for approximating full model performance • Other (correlated) sets could potentially achieve similar performance Cost of Cost of Variable Variable Omission Omission | c / v -4.26 | 100 | c / v -4.26 | 2 69 ( v / c ) 2 × SapsBestCVMean SapsBestCVMean 53 | c / v -4.26 | × SapsBestCVMean 33 SapsBestCVMean CP 2004

  29. Fixed Ratio Data Fixed Ratio Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004

  30. Fixed Ratio Prediction (Kcnfs) Fixed Ratio Prediction (Kcnfs) 1000 Predicted Runtime [CPU sec] 100 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Actual Runtime [CPU sec] CP 2004

  31. Feature Importance Feature Importance – Fixed Ratio Fixed Ratio Cost of Cost of Variable Variable Omission Omission SapsBestSolMean 2 100 SapsBestSolMean × MeanDPLLDepth 74 GsatBestSolCV × MeanDPLLDepth 21 VCGClauseMean × GsatFirstLMRatioMean 9 CP 2004

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend