Machine learning for instance selection in SMT solving (W ork in - - PowerPoint PPT Presentation

machine learning for instance selection in smt solving w
SMART_READER_LITE
LIVE PREVIEW

Machine learning for instance selection in SMT solving (W ork in - - PowerPoint PPT Presentation

Machine learning for instance selection in SMT solving (W ork in Progress ) Jasmin Christian Blanchete 1, 2 Daniel El Ouraoui 2 Pascal Fontaine 2 Cezary Kaliszyk 3 Vrije Universiteit Amsterdam, Amsterdam, The Netherlands University of Lorraine,


slide-1
SLIDE 1

Machine learning for instance selection in SMT solving (Work in Progress)

Jasmin Christian Blanchete 1, 2 Daniel El Ouraoui2 Pascal Fontaine2 Cezary Kaliszyk3

Vrije Universiteit Amsterdam, Amsterdam, The Netherlands University of Lorraine, CNRS, Inria, and LORIA, Nancy, France University of Innsbruck, Innsbruck, Austria

9th April 2019

slide-2
SLIDE 2

Contents

1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion

2 / 32

slide-3
SLIDE 3

Contents

1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion

3 / 32

slide-4
SLIDE 4

Motivations

Satisfiability modulo theories (SMT)

Automation

Proof assistant Verification conditions Model checking

Solvers

Z3, cvc4, veriT, ...

Instantiation

Hard for SMT solvers Heuristically solved

Challenge

Improve instantiation techniques Solve more problems Be more efficient

4 / 32

slide-5
SLIDE 5

Our tool

Université de Lorraine/UFRN (http://www.verit-solver.org)

5 / 32

slide-6
SLIDE 6

Contents

1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion

6 / 32

slide-7
SLIDE 7

Context

Ground b = a ∧ f (a) = f (b) ∧ ∀xy f (x) = f (y) ⇒ x = y Instantiation

7 / 32

slide-8
SLIDE 8

Ground problem

How efficiently check the satisfiability of a ground formula (f (a, b) = g(a) ∨ d = b) ∧ d = g(b) ∧ d = f (a, b) ∧ b = a ∧ d = g(a) (f (a, b) = g(a)

l1

∨ d = b

l2

) ∧ d = g(b)

l3

∧ d = f (a, b)

l4

∧ b = a

l5

∧ d = g(a)

l6

(l1 ∨ ¬l2) ∧ l3 ∧ l4 ∧ l5 ∧ l6

8 / 32

slide-9
SLIDE 9

Ground problem

How efficiently check the satisfiability of a ground formula (f (a, b) = g(a) ∨ d = b) ∧ d = g(b) ∧ d = f (a, b) ∧ b = a ∧ d = g(a) (f (a, b) = g(a)

l1

∨ d = b

l2

) ∧ d = g(b)

l3

∧ d = f (a, b)

l4

∧ b = a

l5

∧ d = g(a)

l6

(l1 ∨ ¬l2) ∧ l3 ∧ l4 ∧ l5 ∧ l6

8 / 32

slide-10
SLIDE 10

Ground problem

How efficiently check the satisfiability of a ground formula (f (a, b) = g(a) ∨ d = b) ∧ d = g(b) ∧ d = f (a, b) ∧ b = a ∧ d = g(a) (f (a, b) = g(a)

l1

∨ d = b

l2

) ∧ d = g(b)

l3

∧ d = f (a, b)

l4

∧ b = a

l5

∧ d = g(a)

l6

(l1 ∨ ¬l2) ∧ l3 ∧ l4 ∧ l5 ∧ l6

8 / 32

slide-11
SLIDE 11

Ground problem

How efficiently check the satisfiability of a ground formula (f (a, b) = g(a) ∨ d = b) ∧ d = g(b) ∧ d = f (a, b) ∧ b = a ∧ d = g(a) (f (a, b) = g(a)

l1

∨ d = b

l2

) ∧ d = g(b)

l3

∧ d = f (a, b)

l4

∧ b = a

l5

∧ d = g(a)

l6

(l1 ∨ ¬l2) ∧ l3 ∧ l4 ∧ l5 ∧ l6

8 / 32

slide-12
SLIDE 12

CDCL(T)

Ground Solver Theory solvers Conflict clauses Boolean model SAT solver

Formulas are embedded in SAT SAT solver produces a boolean model Theory solvers produce conflict clauses Conflict clauses guide the SAT solver

9 / 32

slide-13
SLIDE 13

First-Order problem

b = a ∧ f (a) = f (b) ∧ ∀xy f (x) = f (y) ⇒ x = y Instantiation

10 / 32

slide-14
SLIDE 14

First-Order problem

How to find an instance such that the problem is UNSAT b = a ∧ f (a) = f (b) ∧ ∀xy f (x) = f (y) ⇒ x = y b = a ∧ f (a) = f (b)

SAT

∧ ∀xy f (x) = f (y) ⇒ x = y f (a) = f (b) ∨ a = b b = a ∧ f (a) = f (b) ∧ f (a) = f (b)

UNSAT 11 / 32

slide-15
SLIDE 15

First-Order problem

How to find an instance such that the problem is UNSAT b = a ∧ f (a) = f (b) ∧ ∀xy f (x) = f (y) ⇒ x = y b = a ∧ f (a) = f (b)

SAT

∧ ∀xy f (x) = f (y) ⇒ x = y f (a) = f (b) ∨ a = b b = a ∧ f (a) = f (b) ∧ f (a) = f (b)

UNSAT 11 / 32

slide-16
SLIDE 16

First-Order problem

How to find an instance such that the problem is UNSAT b = a ∧ f (a) = f (b) ∧ ∀xy f (x) = f (y) ⇒ x = y b = a ∧ f (a) = f (b)

SAT

∧ ∀xy f (x) = f (y) ⇒ x = y f (a) = f (b) ∨ a = b b = a ∧ f (a) = f (b) ∧ f (a) = f (b)

UNSAT 11 / 32

slide-17
SLIDE 17

First-Order problem

How to find an instance such that the problem is UNSAT b = a ∧ f (a) = f (b) ∧ ∀xy f (x) = f (y) ⇒ x = y b = a ∧ f (a) = f (b)

SAT

∧ ∀xy f (x) = f (y) ⇒ x = y f (a) = f (b) ∨ a = b b = a ∧ f (a) = f (b) ∧ f (a) = f (b)

UNSAT 11 / 32

slide-18
SLIDE 18

First-Order problem

How to find an instance such that the problem is UNSAT b = a ∧ f (a) = f (b) ∧ ∀xy f (x) = f (y) ⇒ x = y b = a ∧ f (a) = f (b)

SAT

∧ ∀xy f (x) = f (y) ⇒ x = y f (a) = f (b) ∨ a = b b = a ∧ f (a) = f (b) ∧ f (a) = f (b)

UNSAT 11 / 32

slide-19
SLIDE 19

First-Order problem

How to find an instance such that the problem is UNSAT b = a ∧ f (a) = f (b) ∧ ∀xy f (x) = f (y) ⇒ x = y b = a ∧ f (a) = f (b)

SAT

∧ ∀xy f (x) = f (y) ⇒ x = y f (a) = f (b) ∨ a = b b = a ∧ f (a) = f (b) ∧ f (a) = f (b)

UNSAT 11 / 32

slide-20
SLIDE 20

First-Order CDCL(T)

SMT Solver Instantiation Instances FO model Ground solver 12 / 32

slide-21
SLIDE 21

Contents

1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion

13 / 32

slide-22
SLIDE 22

State of the art

Conflict based instantiation Introduced by Reynolds, this technique produces relevant sets of instances. The idea is that, given a ground model M and a quantified formula ∀(xn : τ n).ϕ, we find a substitution σ such that M | = ¬ϕσ. Congruence Closure with Free Variable (CCFV) Introduced by Barbosa et al., generalizes the idea of Conflict based instantiation by reasoning over equivalence classes.

14 / 32

slide-23
SLIDE 23

State of the art

Enumerative instantiation ∀(x : τ).ψ[x] ≡

  • t∈Dτ

ψ[t] Enumerate all ground terms over the domain of x (aka. Herbrand universe) Trigger based instantiation

Triggers

A trigger T for a quantified formula ∀xn.ψ is a set of non-ground terms u1, . . . , un ∈ T(ψ) such that: {x} ⊆ FV(u1) ∪ . . . ∪ FV(un). E = f (a) ≃ g(b), a ≃ g(b) Q = ∀x f (g(x)) ≃ g(x) T = f (g(x)) f (a) E-matches f (g(x)) under x → b

15 / 32

slide-24
SLIDE 24

Strategie

CCFV ground solver Trigger + Enum Works Fails

Figure: Instantiation strategie

16 / 32

slide-25
SLIDE 25

Summarize

Conflict based instantiation and CCFV : Pro Efficient, if find substitution kill the model Pro All generated instances are useful Cons Finds contradiction involving only one instance Enumerative and Trigger based instanciation : Pro Useful when CCFV fail Cons Many heuristics Cons Generates a lot of junk, and many instances

17 / 32

slide-26
SLIDE 26

Summarize

Conflict based instantiation and CCFV : Pro Efficient, if find substitution kill the model Pro All generated instances are useful Cons Finds contradiction involving only one instance Enumerative and Trigger based instanciation : Pro Useful when CCFV fail Cons Many heuristics Cons Generates a lot of junk, and many instances

Indeed

This is what we want improve!

17 / 32

slide-27
SLIDE 27

Contents

1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion

18 / 32

slide-28
SLIDE 28

Problem

How many lemmas are generated to solve a problem?

around 300 for the UF category of the SMT-LIB some generate more than 100 000 instances

How many lemmas are needed to solve a problem?

Only 10% of this number, and sometimes much less

19 / 32

slide-29
SLIDE 29

Problem

How many lemmas are generated to solve a problem?

around 300 for the UF category of the SMT-LIB some generate more than 100 000 instances

How many lemmas are needed to solve a problem?

Only 10% of this number, and sometimes much less

Question

Could we select the good one?

19 / 32

slide-30
SLIDE 30

Our approach

Instances in a priority queue Encode instances Call predictor Several strategies for selection

ML-Solver Ground Solver Instance selection Instantiation Processing Instance selection Instances Inst1 ... Instn Delayed Filter Predictor Selected instances Inst1 rank ... Instn rank 20 / 32

slide-31
SLIDE 31

State description

( l1, . . . , ln , ∀xn . ψ[xn] , x1 → t1, . . . , xn → tn )

Model Formula Instances

rounds{     (model1 Qformula1

1

Inst1

1,1

. . . Inst1

1,m)

(model1 Qformula2

1

Inst2

1,1

. . . Inst2

1,m)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (modelk Qformulai

k

Insti

k,1

. . . Insti

k,m)

    ֒ →      x12 x13 . . . x1n 1 x22 x23 . . . x2n . . . . . . . . . ... . . . xd2 xd3 . . . xdn     

21 / 32

slide-32
SLIDE 32

Experiments

veriT Small proof Data set

  • ver sampling

under sampling Train XGBoost Model C code Features importance XGBoost predictions

pre processing balancing data classification

22 / 32

slide-33
SLIDE 33

Contents

1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion

23 / 32

slide-34
SLIDE 34

Time evaluation

Experiments run on UF SMTLIB benchmarks with 120s timeout veriT without learning solves 2923 veriT with learning solves 2939 with learning

24 / 32

slide-35
SLIDE 35

Evaluation on test + training set

Figure: comparison of veriT configurations on UF SMT-LIB benchmarks.

25 / 32

slide-36
SLIDE 36

Evaluation on test set only

Figure: comparison of veriT configurations on UF SMT-LIB benchmarks.

26 / 32

slide-37
SLIDE 37

Contents

1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion

27 / 32

slide-38
SLIDE 38

Conclusions and future directions

Could be a significant improvement Reduces the number of instances by two in average Reinforcement learning Features embedding can be improved

28 / 32

slide-39
SLIDE 39

Thank you for you atention Qestions or suggestions?

29 / 32

slide-40
SLIDE 40

Evaluation

All Test only unsat avg less unsat avg less with learning 1443 113 1317 423 130 363 without learning 1443 318 128 423 264 62

Table: veriT configurations on UF SMTLIB benchmarks with 30s timeout.

30 / 32

slide-41
SLIDE 41

Features encoding

Terms abstraction

Variables Skolem constants Polarity

Features

FEATURE: Literal → Σ3 FEATURES: Σ3 → N Occurrences of term walks

Example

FEATURES (f (x, y) = g(sk1, sk2(x))) = (⊕, =, f ) → 1, (⊕, =, g) → 1, (=, f , ∗ ) → 2, (=, g, ⊙) → 2, (g, ⊙, ∗ ) → 1

31 / 32

slide-42
SLIDE 42

State description version 2

( Et1, Dt1, . . . , Etn, Dtn , T1, . . . Tn , x1 → t1, . . . , xn → tn )

Model Triggers Instances

Eti is the congruence class of ti Dti is the set of all terms explicitly disequals with ti Ti is the set of triggers of xi This description reduce drastically the size of the problem

32 / 32