Machine learning for automated theorem proving: the story so far - PowerPoint PPT Presentation

Machine learning for automated theorem proving: the story so far Sean Holden University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD, UK sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ ∼ sbh11 1

Machine learning: what is it? EVIL ROBOT... 2

Machine learning: what is it? EVIL ROBOT... ...hates kittens!!! 3

Machine learning: what is it? 4

Machine learning: what is it? I have d features allowing me to make vectors x = ( x 1 , . . . , x d ) describing in- stances . I have a set of m labelled examples s = (( x 1 , y 1 ) , . . . ( x m , y m )) where usually y is either real (regression) or one of a finite number of categories (classification). x h y Learning algorithm s I want to infer a function h that can predict the values for y given x on all in- stances , not just the ones in s . 9

Machine learning: what is it? There are a couple of things missing: x h y Learning . algorithm Parameter optimization s Generally we need to optimize some parameters associated with the learning algorithm. 10

Machine learning: what is it? There are a couple of things missing: x h y Learning . BLOOD, SWEAT algorithm Parameter AND TEARS!!! optimization s Generally we need to optimize some parameters associated with the learning algorithm. Also, the process is far from automatic... 11

Machine learning: what is it? So with respect to theorem proving, the key questions have been: 1. What specific problem do you want to solve? 2. What are the features ? 3. How do you get the training data ? 4. What machine learning method do you use? As far as the last question is concerned: 1. It’s been known for a long time that you don’t necessarily need a complicated method . ( Reference: Robert C Holt, “Very simple classification rules perform well on most commonly used datasets”, Machine Learning , 1993.) 2. The chances are that a support vector machine (SVM) is a good bet . ( Refer- ence: Fern´ andez-Delgado et al., “Do we need hundreds of classifiers to solve real world classification problems?”, Journal of Machine Learning Research , 2014.) 12

Three examples of machine learning for theorem proving In this talk we look at three representative examples of how machine learning has been applied to automatic theorem proving (ATP) : 1. Machine learning for solving boolean satisfiability SAT problems by selecting an algorithm from a portfolio . 2. Machine learning for proving theorems in first-order logic (FOL) by selecting a good heuristic . 3. Machine learning for selecting good axioms in the context of an interactive proof assistant . In each case I present the underlying problem, and a brief description of the machine learning method used. 13

Machine learning for SAT Given a Boolean formula , decide whether it is satisfiable. There is no single “best” SAT-solver. Basic machine learning approach: 1. Derive a standard set of features that can be used to describe any formula. 2. Apply a collection of solvers (the portfolio ) to some training set of formulas. 3. The running time of a solver provides the label y . 4. For each solver, train a classifier to predict the running time of an algorithm for a particular instance . This is known as an empirical hardness model . Reference: Lin Xu et al, “SATzilla: Portfolio-based algorithm selection for SAT”, Journal of Artificial Intelligence Research , 2008. (Actually more complex and uses a hierarchical model.) 14

Machine learning for SAT New instance Feature vectors x 1 , x 2 , . . . , x n Feature vector x SAT problems p 1 , p 2 , . . . , p n Training set h 1 Solver 1 s 1 Training set h 2 Solver 2 s 2 Predict best solver to try Training set Solver k h k s k 15

Machine learning for SAT The approach employed 48 features , including for example: 1. The number of clauses . 2. The number of variables . 3. The mean ratio of positive and negative literals in a clause . 4. The mean, minimum, maximum and entropy of the ratio of positive and negative occurences of a variable . 5. The number of DPLL unit propagations computed at various depths . 6. And so on... 16

Linear regression I have d features allowing me to make vectors x = ( x 1 , . . . , x d ) . I have a set of m labelled examples s = (( x 1 , y 1 ) , . . . ( x m , y m )) . I want a function h that can predict the values for y given x . In the simplest scenario I use d � h ( x ; w ) = w 0 + w i x i . i =1 and choose the weights w i to minimize m � ( h ( x i ; w ) − y i ) 2 . E ( w ) = i =1 This is linear regression . 17

Ridge regression This can be problematic: the function h is linear, and computing w can be numer- ically problematic. Instead introduce basis functions φ i and use d � h ( x ; w ) = w i φ i ( x ) i =1 minimizing m ( h ( x i ; w ) − y i ) 2 + λ || w || 2 � E ( w ) = i =1 This is ridge regression . The optimum w is � − 1 Φ T y Φ T Φ + λ I � w opt = where Φ i,j = φ j ( x i ) . Example: in SATzilla, we have linear basis functions φ i ( x ) = x i and quadratic basis functions φ i,j ( x ) = x i x j . 18

Mapping to a bigger space Mapping to a different space to introduce nonlinearity is a common trick: x 2 φ 2 ( x ) = x 2 φ 3 ( x ) = x 1 x 2 Φ x 1 φ 1 ( x ) = x 1 ...corresponds to a nonlinear A plane dividing division of this space. the groups in this space... We will see this again later... 19

Machine learning for first-order logic Am I AN UNDESIRABLE ? ∀ x . Pierced ( x ) ∧ Male ( x ) − → Undesirable ( x ) Pierced ( sean ) Male ( sean ) Does Undesirable ( sean ) follow? {¬ P ( x ) , ¬ M ( x ) , U ( x ) } { P ( sean ) } { M ( sean ) } {¬ U ( sean ) } x = sean {¬ M ( sean ) , U ( sean ) } { U ( sean ) } There is a choice of which pair of clauses to resolve The set of clauses grows {} Oh dear... 20

Machine learning for first-order logic The procedure has some similarities with the portfolio SAT solvers: However this time we have a single theorem prover and learn to choose a heuristic : 1. Convert any set of axioms along with a conjecture into (up to) 53 features. 2. Train using a library of problems . 3. For each problem in the library, run the prover with each available heuristic . 4. This produces a training set for each heuristic . Labels are whether or not the relevant heuristic is the best (fastest) . We then train a classifier per heuristic. New problems are solved using the predicted best heuristic. Reference: James P Bridge, Sean B Holden and Lawrence C Paulson, “Machine learning for first-order theorem proving: learning to select a good heuristic”, Jour- nal of Automated Reasoning , 2014. 21

Machine learning for first-order logic To select a heuristic for a new problem : Classifiers: SVM x or Gaussian process x 1 Fraction of h 0 unit clauses No heuristic x 2 Fraction of h 1 Horn clauses Heuristic 1 Conjecture Select the is best Clauses + best axioms heuristic h 5 Heuristic 5 x 53 is best Ratio of paramodulations to size of processed set We can also decline to attempt a proof . 22

The support vector machine (SVM) An SVM is essentially a linear classifier in a new space produced by Φ , as we saw before: ξ ξ Linear classifier: SVM: choose the possibility there are many ways that is as far as possible of dividing the classes from both classes BUT the decision line is chosen in a specific way: we maximize the margin . 23

The support vector machine (SVM) How do we train an SVM? 1. As previously, the basic function of interest is h ( x ) = w T Φ ( x ) + b and we classify new examples as y = sgn ( h ( x )) . 2. The margin for the i th example ( x i , y i ) is M ( x i ) = y i h ( x i ) . 3. We therefore want to solve � � argmax min y i h ( x i ) . i w ,b That doesn’t look straightforward... 24

The support vector machine (SVM) Equivalently however: 1. Formulate as a constrained optimization || w || 2 such that y i h ( x i ) ≥ 1 for i = 1 , . . . , m. argmin w ,b 2. We have a quadratic optimization with linear constraints so standard methods apply. 3. It turns out that the solution has the form m � w opt = y i α i Φ ( x i ) i =1 where the α i are Lagrange multipliers . 4. So we end up with � m � � y i α i Φ T ( x i ) Φ ( x ) + b y = sgn . i =1 25

The support vector machine (SVM) It turns our that the inner product Φ T ( x 1 ) Φ ( x 2 ) is fundamental to SVMs: 1. A kernel K is a function that directly computes the inner product K ( x 1 , x 2 ) = Φ T ( x 1 ) Φ ( x 2 ) . 2. A kernel may do this without explicitly computing the sum implied. 3. Mercer’s theorem characterises the K for which there exists a corresponding function Φ . 4. We generally deal with K directly. For example the radial basis function kernel. � � − 1 2 σ 2 || x 1 − x 2 || 2 K ( x 1 , x 2 ) = exp Various other refinements let us handle, for example, problems that are not linearly separable . 26

Machine learning for automated theorem proving: the story so far - PowerPoint PPT Presentation

Machine learning for automated theorem proving: the story so far Sean Holden University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD, UK sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11 1

Visual theorem proving with the Incredible Proof Machine The idea Theorem Proving without

Automated Theorem Proving 1/4: Introduction and Propositional Theorem Proving A.L. Lamprecht

Automated Theorem Proving 2/4: First-Order Theorem Proving A.L. Lamprecht Course Program

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA Overview Last Lecture theorem

Instantiation-Based Automated Theorem Proving for First-Order Logic Konstantin Korovin The

Theorem-Proving Environments Nathan Ng CSC2547: Learning to Search Theorem Proving What is a

Automated Theorem Proving in Real Applications John Harrison Intel Corporation The cost of

On Theorem Proving for Program Checking Historical perspective and recent developments Maria

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA 2019 Computer Theorem Proving

Saturation-based Theorem Proving and ML Course Machine Learning and Reasoning 2020 MLR 2020 1 1

Combining Automated and Interactive Theorem Proving in Agda Anton Setzer (Joint work with Karim

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers

1 QC STORY -32 QC STORY -32 QC STORY -32 QC Story-1 QC Story-1 QC Story-1 Awards and

Large Scale Deep Learning for Theorem Proving in HOList: First Results and Future Directions

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving

31. Stokes Theorem Stokes theorem is to Greens theorem, for the work done, as the

Magnetization dynamics revealed by time resolved X-ray techniques J an Lning Sorbonne

Smaller than o desk. ' Computer operates from any convenience 4096 word magnetic drum memory.

Introduction Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Admin Course Overview PL

Automatically quantifying information leaks in software CREST January 2012 Pasquale Malacaria

1 Undecidability; the Church-Turing Thesis The Church-Turing thesis: A Turing machine that halts

CS149: Elements of Computer Science Programming 1. The need for programming languages (a) CPU

Open vSwitch Config for libvirt VMs Jonas Andre advised by Johannes Naab Wednesday 9 th January,

(s.z) t; c'. (s 'r) ' cJ S Re ec t Se R.cv e 'o "? -ceS Da{ , Fou,re- Se n eJ

Machine learning for automated theorem proving: the story so far - PowerPoint PPT Presentation

Machine learning for automated theorem proving: the story so far Sean Holden University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD, UK sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11 1

Visual theorem proving with the Incredible Proof Machine The idea Theorem Proving without

Automated Theorem Proving 1/4: Introduction and Propositional Theorem Proving A.L. Lamprecht

Automated Theorem Proving 2/4: First-Order Theorem Proving A.L. Lamprecht Course Program

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA Overview Last Lecture theorem

Instantiation-Based Automated Theorem Proving for First-Order Logic Konstantin Korovin The

Theorem-Proving Environments Nathan Ng CSC2547: Learning to Search Theorem Proving What is a

Automated Theorem Proving in Real Applications John Harrison Intel Corporation The cost of

On Theorem Proving for Program Checking Historical perspective and recent developments Maria

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA 2019 Computer Theorem Proving

Saturation-based Theorem Proving and ML Course Machine Learning and Reasoning 2020 MLR 2020 1 1

Combining Automated and Interactive Theorem Proving in Agda Anton Setzer (Joint work with Karim

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers

1 QC STORY -32 QC STORY -32 QC STORY -32 QC Story-1 QC Story-1 QC Story-1 Awards and

Large Scale Deep Learning for Theorem Proving in HOList: First Results and Future Directions

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving

31. Stokes Theorem Stokes theorem is to Greens theorem, for the work done, as the

Magnetization dynamics revealed by time resolved X-ray techniques J an Lning Sorbonne

Smaller than o desk. ' Computer operates from any convenience 4096 word magnetic drum memory.

Introduction Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Admin Course Overview PL

Automatically quantifying information leaks in software CREST January 2012 Pasquale Malacaria

1 Undecidability; the Church-Turing Thesis The Church-Turing thesis: A Turing machine that halts

CS149: Elements of Computer Science Programming 1. The need for programming languages (a) CPU

Open vSwitch Config for libvirt VMs Jonas Andre advised by Johannes Naab Wednesday 9 th January,

(s.z) t; c'. (s 'r) ' cJ S Re ec t Se R.cv e 'o &quot;? -ceS Da{ , Fou,re- Se n eJ

(s.z) t; c'. (s 'r) ' cJ S Re ec t Se R.cv e 'o "? -ceS Da{ , Fou,re- Se n eJ