Reinforcement Learning for Interactive Theorem Proving in HOL4 - PowerPoint PPT Presentation

Reinforcement Learning for Interactive Theorem Proving in HOL4 Minchao Wu 1 Michael Norrish 1,2 Christian Walder 1,2 Amir Dezfouli 2 1 Research School of Computer Science Australian National University 2 Data61, CSIRO September 14, 2020

Overview ◮ Interface: HOL4 as an RL environment ◮ Enables interaction with HOL4. ◮ Monitor proof states on the Python side.

Overview ◮ Interface: HOL4 as an RL environment ◮ Enables interaction with HOL4. ◮ Monitor proof states on the Python side. ◮ Reinforcement learning settings

Overview ◮ Interface: HOL4 as an RL environment ◮ Enables interaction with HOL4. ◮ Monitor proof states on the Python side. ◮ Reinforcement learning settings ◮ Policies for choosing proof states, tactics, and theorems or terms as arguments.

Overview ◮ Interface: HOL4 as an RL environment ◮ Enables interaction with HOL4. ◮ Monitor proof states on the Python side. ◮ Reinforcement learning settings ◮ Policies for choosing proof states, tactics, and theorems or terms as arguments. ◮ Learning: policy gradient

Environment ◮ An environment can be created by specifying an initial goal. e = HolEnv(GOAL) ◮ An environment can be reset by providing a new goal. e.reset(GOAL2) ◮ The basic function is querying HOL4 about tactic applications. e.query(" ∀ l. NULL l ⇒ l = []", "strip_tac")

Environment The e.step(action) function applies action to the current state and generates the new state. e.step(action) step takes an action and returns the immediate reward received, and a Boolean value indicating whether the proof attempt has finished.

Demo ◮ A quick demo.

RL Formalization ◮ A goal g ∈ G is a HOL4 proposition.

RL Formalization ◮ A goal g ∈ G is a HOL4 proposition. ◮ A fringe is a finite set of goals.

RL Formalization ◮ A goal g ∈ G is a HOL4 proposition. ◮ A fringe is a finite set of goals. ◮ A fringe consists of all the remaining goals.

RL Formalization ◮ A goal g ∈ G is a HOL4 proposition. ◮ A fringe is a finite set of goals. ◮ A fringe consists of all the remaining goals. ◮ The main goal is proved if everything in any one fringe is discharged.

RL Formalization ◮ A goal g ∈ G is a HOL4 proposition. ◮ A fringe is a finite set of goals. ◮ A fringe consists of all the remaining goals. ◮ The main goal is proved if everything in any one fringe is discharged. ◮ A state s is a finite sequence of fringes.

RL Formalization ◮ A goal g ∈ G is a HOL4 proposition. ◮ A fringe is a finite set of goals. ◮ A fringe consists of all the remaining goals. ◮ The main goal is proved if everything in any one fringe is discharged. ◮ A state s is a finite sequence of fringes. ◮ A fringe can be referred by its index i , i.e., s ( i ) .

RL Formalization ◮ A goal g ∈ G is a HOL4 proposition. ◮ A fringe is a finite set of goals. ◮ A fringe consists of all the remaining goals. ◮ The main goal is proved if everything in any one fringe is discharged. ◮ A state s is a finite sequence of fringes. ◮ A fringe can be referred by its index i , i.e., s ( i ) . ◮ A reward is a real number r ∈ R .

Examples Fringe 0 0: p ∧ q ⇒ p ∧ q Fringe 1 0: p ⇒ q ⇒ p 1: p ⇒ q ⇒ q Figure: Example fringes and states

RL Formalization ◮ An action is a triple ( i, j, t ) : N × N × tactic .

RL Formalization ◮ An action is a triple ( i, j, t ) : N × N × tactic . ◮ i selects the i th fringe in a state s .

RL Formalization ◮ An action is a triple ( i, j, t ) : N × N × tactic . ◮ i selects the i th fringe in a state s . ◮ j selects the j th goal within fringe s ( i ) .

RL Formalization ◮ An action is a triple ( i, j, t ) : N × N × tactic . ◮ i selects the i th fringe in a state s . ◮ j selects the j th goal within fringe s ( i ) . ◮ t is a HOL4 tactic.

RL Formalization ◮ An action is a triple ( i, j, t ) : N × N × tactic . ◮ i selects the i th fringe in a state s . ◮ j selects the j th goal within fringe s ( i ) . ◮ t is a HOL4 tactic. ◮ Example: (0 , 0 , fs[listTheory.MEM] )

RL Formalization ◮ An action is a triple ( i, j, t ) : N × N × tactic . ◮ i selects the i th fringe in a state s . ◮ j selects the j th goal within fringe s ( i ) . ◮ t is a HOL4 tactic. ◮ Example: (0 , 0 , fs[listTheory.MEM] ) ◮ Rewards ◮ Successful application: 0.1 ◮ Discharges the current goal completely: 0.2 ◮ Main goal proved: 5 ◮ Otherwise: -0.1

Example Fringe 0 0: p ∧ q ⇒ p ∧ q (0,0,strip_tac) Fringe 1 0: p ⇒ q ⇒ p 1: p ⇒ q ⇒ q (1,0,simp[]) (1,0,Induct_on `p`) Fringe 2 Fringe 3 0: p ⇒ q ⇒ q 0: p ⇒ q ⇒ q 1: F ⇒ q ⇒ F (2,0,simp[]) 2: T ⇒ q ⇒ T Fringe 4 QED Figure: Example proof search

Choosing fringes An action is a triple ( i, j, t ) . Given state s . ◮ A value network V goal : G → R .

Choosing fringes An action is a triple ( i, j, t ) . Given state s . ◮ A value network V goal : G → R . ◮ The value v i of fringe s ( i ) is defined by: v i = Σ g ∈ s ( i ) V goal ( g )

Choosing fringes An action is a triple ( i, j, t ) . Given state s . ◮ A value network V goal : G → R . ◮ The value v i of fringe s ( i ) is defined by: v i = Σ g ∈ s ( i ) V goal ( g ) ◮ Sample from the following distribution π fringe ( s ) = Softmax( v 1 , ..., v | s | )

Choosing fringes An action is a triple ( i, j, t ) . Given state s . ◮ A value network V goal : G → R . ◮ The value v i of fringe s ( i ) is defined by: v i = Σ g ∈ s ( i ) V goal ( g ) ◮ Sample from the following distribution π fringe ( s ) = Softmax( v 1 , ..., v | s | ) ◮ By default, j is fixed to be 0. That is, we always deal with the first goal in a fringe.

Generating tactics Suppose we are dealing with goal g . ◮ A tactic is either

Generating tactics Suppose we are dealing with goal g . ◮ A tactic is either ◮ A tactic name followed by a list of theorem names, or

Generating tactics Suppose we are dealing with goal g . ◮ A tactic is either ◮ A tactic name followed by a list of theorem names, or ◮ A tactic name followed by a list of terms

Generating tactics Suppose we are dealing with goal g . ◮ A tactic is either ◮ A tactic name followed by a list of theorem names, or ◮ A tactic name followed by a list of terms ◮ A value network V tactic : G → R D where D is the total number of tactic names allowed.

Generating tactics Suppose we are dealing with goal g . ◮ A tactic is either ◮ A tactic name followed by a list of theorem names, or ◮ A tactic name followed by a list of terms ◮ A value network V tactic : G → R D where D is the total number of tactic names allowed. ◮ Sample from the following distribution π tactic ( g ) = Softmax( V tactic ( g ))

Argument policy softmax softmax a 0 v 0 a 1 v 1 a t v t a t+1 Policy Policy Policy h 0 h 1 h t+1 . . . x 0 x 1 x t Figure: Generation of arguments. x i is the candidate theorems. h i is a hidden variable. a i is a chosen argument. v i is the values computed by the policy. Each theorem is represented by an N -dimensional tensor based on its tokenized expression in Polish notation. If we have M candidate theorems, then the shape of x i is M × N . The representations are computed by a separately trained transformer.

Generating arguments Generation of arguments Given a chosen goal g . Each theorem is represented by an N - dimensional tensor based on its tokenized expression. Suppose we have M candidate theorems. Input : the chosen tactic or theorem t ∈ R N , the candidate theorems X ∈ R M × N and a hidden variable h ∈ R N . Policy : V arg : R N × R M × N × R N → R N × R M Initialize hidden variable h to t . l ← [ t ] . Loop for allowed length of arguments (e.g., 5): h, v ← V arg ( t, X, h ) t ← sample from π arg ( g ) = Softmax( v ) l ← l. append( t ) Return l and the associated (log) probabilities.

Generating actions Given state s , we now have some (log) probabilities. ◮ p ( f | s ) given by π fringe .

Generating actions Given state s , we now have some (log) probabilities. ◮ p ( f | s ) given by π fringe . ◮ p ( t | s, f ) given by π tactic .

Reinforcement Learning for Interactive Theorem Proving in HOL4 - PowerPoint PPT Presentation

Reinforcement Learning for Interactive Theorem Proving in HOL4 Minchao Wu 1 Michael Norrish 1,2 Christian Walder 1,2 Amir Dezfouli 2 1 Research School of Computer Science Australian National University 2 Data61, CSIRO September 14, 2020

Visual theorem proving with the Incredible Proof Machine The idea Theorem Proving without

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA Overview Last Lecture theorem

Theorem-Proving Environments Nathan Ng CSC2547: Learning to Search Theorem Proving What is a

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

On Theorem Proving for Program Checking Historical perspective and recent developments Maria

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA 2019 Computer Theorem Proving

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving

31. Stokes Theorem Stokes theorem is to Greens theorem, for the work done, as the

Automated Theorem Proving 1/4: Introduction and Propositional Theorem Proving A.L. Lamprecht

Instantiation-Based Automated Theorem Proving for First-Order Logic Konstantin Korovin The

Functional Programming Functional Programming and Theorem Proving and Theorem Proving for

Formal Verification Methods 4: Theorem Proving John Harrison Intel Corporation Need for

Automated Theorem Proving 2/4: First-Order Theorem Proving A.L. Lamprecht Course Program

Outline Data cleaning and standardisation Febrl A parallel open source data linkage and

Automatic Record Linkage using Seeded Nearest Neighbour and SVM Classification Peter Christen

Data Matching Research at the Australian National University Peter Christen Research School of

2016 17 May 2017 Simon Andrews, Manager Performance Audit Rob Luciani, Manager Technical and

Stratified Space Learning Reconstructing Embedded Graphs Y. Bokor 1 Mathematical Sciences

Foundations of Artificial Intelligence 5. Constraint Satisfaction Problems CSPs as Search

Glenn Stevens Governor References Butlin MW (1977), A Preliminary Annual Database 1900/01 to

The SABRE Proof of Principle Simone Copello on behalf of the SABRE collaboration *Gran Sasso