Trimming while Checking Clausal Proofs Marijn J.H. Heule Warren A. - - PowerPoint PPT Presentation

trimming while checking clausal proofs
SMART_READER_LITE
LIVE PREVIEW

Trimming while Checking Clausal Proofs Marijn J.H. Heule Warren A. - - PowerPoint PPT Presentation

Trimming while Checking Clausal Proofs Marijn J.H. Heule Warren A. Hunt, Jr. Nathan Wetzler The University of Texas at Austin Formal Methods in Computer-Aided Design (FMCAD) Portland, Oregon October 23, 2013 Wednesday, October 23, 13


slide-1
SLIDE 1

Trimming while Checking Clausal Proofs

Marijn J.H. Heule Warren A. Hunt, Jr. Nathan Wetzler

The University of Texas at Austin Formal Methods in Computer-Aided Design (FMCAD) Portland, Oregon October 23, 2013

Wednesday, October 23, 13

slide-2
SLIDE 2

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 2

  • Motivation and Contributions
  • Resolution versus Clausal Proofs
  • Checking Clausal Proofs Efficiently
  • Experimental Evaluation
  • Conclusion

Outline

Wednesday, October 23, 13

slide-3
SLIDE 3

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 3

Motivation

Wednesday, October 23, 13

slide-4
SLIDE 4

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 3

SAT solvers are used in many tools and applications.

  • Counter-examples (satisfiable) using symbolic simulation;
  • Equivalence-checking (unsatisfiable) using miters;
  • Small explanations (unsatisfiable core) for diagnosis;
  • Small (trimmed) proofs to validate with a verified checker.

Motivation

Wednesday, October 23, 13

slide-5
SLIDE 5

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 3

SAT solvers are used in many tools and applications.

  • Counter-examples (satisfiable) using symbolic simulation;
  • Equivalence-checking (unsatisfiable) using miters;
  • Small explanations (unsatisfiable core) for diagnosis;
  • Small (trimmed) proofs to validate with a verified checker.

Motivation

However,

  • Documented bugs in SAT, SMT, and QBF solvers

[Brummayer and Biere, 2009; Brummayer et al., 2010];

  • Solvers that emit additional information use lots of memory.

Wednesday, October 23, 13

slide-6
SLIDE 6

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 3

SAT solvers are used in many tools and applications.

  • Counter-examples (satisfiable) using symbolic simulation;
  • Equivalence-checking (unsatisfiable) using miters;
  • Small explanations (unsatisfiable core) for diagnosis;
  • Small (trimmed) proofs to validate with a verified checker.

Motivation

However,

  • Documented bugs in SAT, SMT, and QBF solvers

[Brummayer and Biere, 2009; Brummayer et al., 2010];

  • Solvers that emit additional information use lots of memory.

We developed a tool that can efficiently validate the results of SAT solvers and produce trimmed formulas and trimmed proofs

Wednesday, October 23, 13

slide-7
SLIDE 7

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 4

Contributions and Related Work

Wednesday, October 23, 13

slide-8
SLIDE 8

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 4

Contributions and Related Work

Easy to Emit Compact Checked Efficiently Resolution Proofs

Zhang and Malik, 2003

Van Gelder, 2008; Biere, 2008

Clausal Proofs

Goldberg and Novikov, 2003

Van Gelder, 2008

Clausal proofs + clause deletion

Heule, Hunt, Jr., and Wetzler [STVR 201X]

A fast clausal proof checker, called DRUP-trim

Heule, Hunt, Jr., and Wetzler [FMCAD 2013]

Wednesday, October 23, 13

slide-9
SLIDE 9

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 4

Contributions and Related Work

Easy to Emit Compact Checked Efficiently Resolution Proofs

Zhang and Malik, 2003

Van Gelder, 2008; Biere, 2008

Clausal Proofs

Goldberg and Novikov, 2003

Van Gelder, 2008

Clausal proofs + clause deletion

Heule, Hunt, Jr., and Wetzler [STVR 201X]

A fast clausal proof checker, called DRUP-trim

Heule, Hunt, Jr., and Wetzler [FMCAD 2013]

All approaches can be used for applications such as minimal unsatisfiable core extraction, computing interpolants, reduce proofs

Wednesday, October 23, 13

slide-10
SLIDE 10

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 5

Satisfiability and Resolution

Wednesday, October 23, 13

slide-11
SLIDE 11

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 5

Given a Boolean formula F, is there an assignment to variables in F such that the formula evaluates to TRUE?

Satisfiability and Resolution

¯ bc ac ¯

ab ¯ a¯ b a¯ b

Wednesday, October 23, 13

slide-12
SLIDE 12

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 5

Given a Boolean formula F, is there an assignment to variables in F such that the formula evaluates to TRUE?

Satisfiability and Resolution

¯ bc ac ¯

ab ¯ a¯ b a¯ b

Checking a solution, such as assignment , is easy.

¯ a ¯

b c

Wednesday, October 23, 13

slide-13
SLIDE 13

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 5

Given a Boolean formula F, is there an assignment to variables in F such that the formula evaluates to TRUE?

Satisfiability and Resolution

¯ bc ac ¯

ab ¯ a¯ b a¯ b

Unsatisfiability proofs use lemmas (resolvents):

ac ¯ bc ¯ ab ac ¯ bc ¯ ab

c c

¯ ac

Checking a solution, such as assignment , is easy.

¯ a ¯

b c

Wednesday, October 23, 13

slide-14
SLIDE 14

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 6

Resolution Graph / Proof and Core

Wednesday, October 23, 13

slide-15
SLIDE 15

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 6

Resolution Graph / Proof and Core

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c

resolution graph

Wednesday, October 23, 13

slide-16
SLIDE 16

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 6

Resolution Graph / Proof and Core

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c

¯ b

¯ a¯ b a¯ b

¯ a

¯ ab

¯ b c ¯ bc ac ¯

ab

b¯ c ¯ b c b¯ c ac ¯ ab ¯ a¯ b a¯ b ¯ bc

resolution graph resolution proof

Wednesday, October 23, 13

slide-17
SLIDE 17

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 6

Resolution Graph / Proof and Core

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c

¯ b

¯ a¯ b a¯ b

¯ a

¯ ab

¯ b c ¯ bc ac ¯

ab

b¯ c ¯ b c b¯ c ac ¯ ab ¯ a¯ b a¯ b ¯ bc

resolution graph resolution proof core

Wednesday, October 23, 13

slide-18
SLIDE 18

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 6

Resolution Graph / Proof and Core

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c

¯ b

¯ a¯ b a¯ b

¯ a

¯ ab

¯ b c ¯ bc ac ¯

ab

b¯ c ¯ b c b¯ c ac ¯ ab ¯ a¯ b a¯ b ¯ bc

resolution graph resolution proof core

resolution proofs are HUGE

Wednesday, October 23, 13

slide-19
SLIDE 19

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 7

Checking Lemmas by Unit Propagation

Wednesday, October 23, 13

slide-20
SLIDE 20

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 7

A clause is unit with respect to an assignment if all literals in the clause are falsified except for one literal, which is unassigned. Unit propagation:

  • If a unit clause is found, extend the assignment and repeat.
  • Else, return the assignment.

Checking Lemmas by Unit Propagation

Wednesday, October 23, 13

slide-21
SLIDE 21

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16

c c

a a ¯ a ¯ a

b

¯ b ¯ b ¯ b

7

A clause is unit with respect to an assignment if all literals in the clause are falsified except for one literal, which is unassigned. Unit propagation:

  • If a unit clause is found, extend the assignment and repeat.
  • Else, return the assignment.

Checking Lemmas by Unit Propagation

assignment: ¯

c

Wednesday, October 23, 13

slide-22
SLIDE 22

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16

c c ¯ b ¯ b ¯ b b

a a ¯ a ¯ a

7

A clause is unit with respect to an assignment if all literals in the clause are falsified except for one literal, which is unassigned. Unit propagation:

  • If a unit clause is found, extend the assignment and repeat.
  • Else, return the assignment.

Checking Lemmas by Unit Propagation

assignment:

¯ b ¯ c

Wednesday, October 23, 13

slide-23
SLIDE 23

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16

c c ¯ b ¯ b ¯ b b a ¯ a ¯ a a

7

A clause is unit with respect to an assignment if all literals in the clause are falsified except for one literal, which is unassigned. Unit propagation:

  • If a unit clause is found, extend the assignment and repeat.
  • Else, return the assignment.

Checking Lemmas by Unit Propagation

assignment:

a

¯ b ¯ c

Wednesday, October 23, 13

slide-24
SLIDE 24

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16

c c ¯ b ¯ b ¯ b b a ¯ a ¯ a a

7

A clause is unit with respect to an assignment if all literals in the clause are falsified except for one literal, which is unassigned. Unit propagation:

  • If a unit clause is found, extend the assignment and repeat.
  • Else, return the assignment.

Checking Lemmas by Unit Propagation

Reverse Unit Propagation (RUP) of a lemma:

  • Assign all literals in the lemma to false

and apply unit propagation

  • If another clause / lemma becomes

falsified, then the lemma is valid

c b ¯ a c ¯ b c a

assignment:

a

¯ b ¯ c

Wednesday, October 23, 13

slide-25
SLIDE 25

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 8

Clausal Proof: Check using Unit Propagation

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

c c

Wednesday, October 23, 13

slide-26
SLIDE 26

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 8

Clausal Proof: Check using Unit Propagation

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ b ¯ b ¯ b ¯ a a c c

Wednesday, October 23, 13

slide-27
SLIDE 27

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 8

Clausal Proof: Check using Unit Propagation

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ a ¯ b ¯ ab c c

Wednesday, October 23, 13

slide-28
SLIDE 28

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 8

Clausal Proof: Check using Unit Propagation

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

c c a ¯ b ¯ ab c c c

Wednesday, October 23, 13

slide-29
SLIDE 29

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 8

Clausal Proof: Check using Unit Propagation

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

c c ¯ c c ¯ b b

Wednesday, October 23, 13

slide-30
SLIDE 30

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 8

Clausal Proof: Check using Unit Propagation

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

c c

clausal proofs are expensive to validate

Wednesday, October 23, 13

slide-31
SLIDE 31

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 9

Goldberg and Novikov proposed checking the refutation backwards [DATE 2003]:

  • start by validating the empty clause;
  • mark all lemmas using conflict analysis;
  • only validate marked lemmas.

Advantage: validate fewer lemmas. Disadvantage: more complex. We provide a fast open source implementation of this procedure.

Improvement I: Backwards Checking

¯ b

¯ a

c

Wednesday, October 23, 13

slide-32
SLIDE 32

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 10

Improvement II: Clause Deletion

¯ b

¯ a

c

¯ bc ¯ ab

We proposed to extend clausal proofs with deletion information [STVR 201X]:

  • clause deletion is crucial for efficient solving;
  • emit learning and deletion information;
  • proof size might double;
  • checking speed can be reduced significantly.

Clause deletion can be combined with backwards checking [FMCAD 2013]:

  • ignore deleted clauses earlier in the proof;
  • optimize clause deletion for trimmed proofs.

Wednesday, October 23, 13

slide-33
SLIDE 33

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 11

Improvement III: Core-first Unit Propagation

We propose a new unit propagation variant: 1) propagate using clauses already in the core; 2) examine non-core clauses only at fixpoint; 3) if a non-core unit clause is found, goto 1); 4) otherwise terminate.

¯ a¯ b a¯ b b¯ c ¯ b

Our variant, called Core-first Unit Propagation, can reduce checking costs considerably. Also, the resulting core and proof are smaller Fast propagation in a checker is different than fast propagation in a SAT solver.

Wednesday, October 23, 13

slide-34
SLIDE 34

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Wednesday, October 23, 13

slide-35
SLIDE 35

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Wednesday, October 23, 13

slide-36
SLIDE 36

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Wednesday, October 23, 13

slide-37
SLIDE 37

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Wednesday, October 23, 13

slide-38
SLIDE 38

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Wednesday, October 23, 13

slide-39
SLIDE 39

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Wednesday, October 23, 13

slide-40
SLIDE 40

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Wednesday, October 23, 13

slide-41
SLIDE 41

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Wednesday, October 23, 13

slide-42
SLIDE 42

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 12

Checking: Backwards + Core-first + Deletion

¯ bc ac ¯ ab ¯ a¯ b a¯ b b¯ c ¯ b

¯ a

c ¯ b

¯ a

c

¯ bc ¯ ab

Core-first unit propagation results in smaller cores and proofs

Wednesday, October 23, 13

slide-43
SLIDE 43

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 13

We implemented DRUP logging into Glucose using only 40 LoC. Glucose (DRUP) solves about twice as many benchmarks as compared to Picosat (resolution). Resolution proof logging increased memory usage up to a factor 100. DRUPtrim validated clausal proofs in a time similar to the solving time.

Experimental Evaluation

10-1 100 101 102 103 20 40 60 80 100 120

time (s) logscale benchmarks (sorted)

Picosat DRUPtrim Glucose

Wednesday, October 23, 13

slide-44
SLIDE 44

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 14

Unsatisfiable tracks required certificates. Allowed formats:

  • TraceCheck (resolution);
  • DRUP

, Delete Reverse Unit Propagation.

Timeout : 5,000 seconds for solving and 20,000 seconds for checking Submissions with proof logging:

  • 11 application solvers (9 DRUP

, 2 RUP);

  • 9 hard-combinatorial solvers (7 DRUP

, 2 RUP);

  • Most submissions were certified unsatisfiable versions of top-tier solvers.

Statistics:

  • 98% of DRUP proofs of top-tier solvers were checked within the time limit;
  • Checking most RUP proofs (i.e., no clause deletion) results in a timeout.

DRUPtrim in SAT Competition 2013

Wednesday, October 23, 13

slide-45
SLIDE 45

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 15

Our DRUPtrim tool:

  • makes it feasible to check the results of state-of-the-art solvers efficiently

(demonstrated at SAT Competition 2013);

  • validates learning, preprocessing, and inprocessing techniques;
  • and produces trimmed proofs and trimmed formulas.

Conclusion

Our next goal is to increase confidence in all SAT solvers by efficiently checking proofs with a mechanically-verified proof checker. Discussion: should UNSAT proof logging be mandatory for tools participating in competitive events (e.g., SAT Competition, HWMCC)?

Wednesday, October 23, 13

slide-46
SLIDE 46

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16

Bridging the Gap Between Easy Generation and Efficient Verification

  • f Unsatisfiability Proofs

Marijn J.H. Heule, Warren A. Hunt, Jr., and Nathan Wetzler Accepted: Software Testing, Verification, and Reliability (STVR 201X)

Verifying Refutations with Extended Resolution

Marijn J.H. Heule, Warren A. Hunt, Jr., and Nathan Wetzler Published: Conference on Automated Deduction (CADE 2013)

Mechanical Verification of SAT Refutations with Extended Resolution

Nathan Wetzler, Marijn J.H. Heule, and Warren A. Hunt, Jr. Published: Interactive Theorem Proving (ITP 2013)

Trimming while Checking Clausal Proofs

Marijn J.H. Heule, Warren A. Hunt, Jr., and Nathan Wetzler Accepted: Formal Methods in Computer-Aided Design (FMCAD 2013) 16

Recent Work

Thank you for your attention! Questions?

Wednesday, October 23, 13

slide-47
SLIDE 47

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 17

Resolution graphs are huge

  • plot obtained using picosat

Lemmas require ~400 resolution steps (arcs in graph)

  • due to clause minimization

Lemmas have ~40 literals Resolution proofs are at least 10x larger than clausal proofs, up to 100x memory footprint !

Resolution Graphs: Arcs vs Vertices vs Literals

102 103 104 105 106 102 103 104 105 106 number of core arcs number of core lemmas (green) / number of literals in core lemmas (red) diagonal core arcs vs core lemmas core arcs vs literals in core lemmas

Wednesday, October 23, 13

slide-48
SLIDE 48

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16

104 105 106 20 40 60 80 100 number of clauses (logscale) number of (trimmed) benchmarks size of input formula picosat glucose backwards glucose core-first

18

Results: Trimming

The number of core clauses using

  • Picosat (resolution + no preprocessing)
  • Glucose (backwards + preprocessing)
  • Glucose (core-first + preprocessing)

Checking clausal proofs results in smaller trimmed formulas The core-first unit propagation technique further trims the formula

Wednesday, October 23, 13

slide-49
SLIDE 49

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 19

Proofs Plots for SAT Competition 2013

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 20 40 60 80 100 120 Lingeling Glucose Glucose DRUP Riss Riss DRUP Lingeling RUP

Application benchmarks Hard-combinatorial benchmarks

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 10 20 30 40 50 60 70 80 90 100 Lingeling Glucose Glucose DRUP Riss Riss DRUP Lingeling RUP

NB: a solved benchmark was only counted if the output was verified

Wednesday, October 23, 13

slide-50
SLIDE 50

Marijn J.H. Heule Trimming while Checking Clausal Proofs / 16 20

UNSAT Results of SAT Competition 2013

DRUP proofs can be checked in a time similar to the solving time Lack of deletion information made checking much more costly Enabling DRUP support has a small effect on the solving time Big performance differences were due to bugs

  • some "features" were turned off by enabling DRUP support
  • two buggy solvers were not disqualified, due to "correct results"

Our DRUP-trim tool made it feasible to validate all DRUP results

Wednesday, October 23, 13