Large Scale Deep Learning for Theorem Proving in HOList: First - PowerPoint PPT Presentation

Large Scale Deep Learning for Theorem Proving in HOList: First Results and Future Directions Sarah Loos

HOList An Environment for Machine Learning of Higher-Order Theorem Proving ● HOList provides a simple API for ML researchers and theorem prover developers to experiment with using machine learning for mathematics. ● We use deep networks trained on an existing corpus of human proofs to guide the prover. ● We can improve our results by adding synthetic proofs (generated from supervised models and verified correct by the prover) to the training corpus.

Dataset Stats Training 60% Validation 20% Testing 20% Core 1.5K Theorems 500 Theorems 500 Theorems 10K Theorems 3.2K Theorems 3.2K Theorems Complex

Dataset Stats Training 60% Validation 20% Testing 20% Core 1.5K Theorems 500 Theorems 500 Theorems 10K Theorems 3.2K Theorems 3.2K Theorems Complex 375K Human 100K Human 100K Human Proof Steps Proof Steps Proof Steps

Dataset Stats Training 60% Validation 20% Testing 20% Core 1.5K Theorems 500 Theorems 500 Theorems 10K Theorems 3.2K Theorems 3.2K Theorems Complex 375K Human 100K Human 100K Human Proof Steps Proof Steps Proof Steps Flyspeck None 10.5K Theorems

Model Architecture Goal ( g ) Premise ( t i ) Goal Encoder Premise Encoder ( P ) ( G ) Goal Embedding ( G(g) ) Theorem Embedding ( P(t i ) ) Tactic Classifier Combiner Network Theorem Scorer ( S ) ( C ) ( R ) Figure courtesy of Viktor Toman

Results - Imitation Learning on Human Proofs Percent of Validation Model Theorems Closed Baselines ASM_MESON_TAC 6.1% ASM_MESON_TAC + WaveNet premise selection 9.2% Imitation Learning WaveNet 24.0% With Hard Negative Mining 37.2% Imitation Learning + Reinforcement Loop WaveNet 36.3% - trained alongside output 36.8% Tactic Dependent 38.9%

Reinforcement Loop: Setup ● In the reinforcement loop we train on a single GPU ● We simultaneously run search on multiple machines, each using the most recent checkpoint for proof search predictions. ● We run the neural prover in rounds, in each round trying to prove a random sample of theorems in the training set. ● Training examples are extracted from successful synthesized proofs and are mixed in with training examples from original human. ● Hard negatives: We omit arguments that do not change the outcome of the tactic application and store them as “hard negatives” for a specific goal to use during training.

Results - Reinforcement Loop Training Validation Percent Closed Thin WaveNet Loop 36.30% - Trained on loop output 36.80% Tactic Dependent Loop 38.90%

Dataset Stats Training 60% Validation 20% Testing 20% Core 1.5K Theorems 500 Theorems 500 Theorems 10K Theorems 3.2K Theorems 3.2K Theorems Complex 375K Human 100 Human 100 Human Proof Steps Proof Steps Proof Steps Flyspeck None 10.5K Theorems

Dataset Stats Training 60% Validation 20% Testing 20% Core 1.5K Theorems 500 Theorems 500 Theorems 10K Theorems 3.2K Theorems 3.2K Theorems Complex 375K Human 100 Human 100 Human Proof Steps Proof Steps Proof Steps 830K Synthesized Proof Steps Flyspeck None 10.5K Theorems

Results - Reinforcement Loop Percent of Validation Model Theorems Closed Baselines ASM_MESON_TAC 6.1% ASM_MESON_TAC + WaveNet premise selection 9.2% Imitation Learning WaveNet 24.0% Imitation Learning + Reinforcement Loop WaveNet 36.3% - trained alongside output 36.8% Tactic Dependent 38.9%

Results - Reinforcement Loop Percent of Validation Model Theorems Closed Baselines Flyspeck: ASM_MESON_TAC 6.1% On a sample ASM_MESON_TAC + WaveNet premise selection 9.2% of 2000 proofs from Imitation Learning the flyspeck WaveNet 24.0% dataset 37.6% Imitation Learning + Reinforcement Loop WaveNet 36.3% - trained alongside output 36.8% Tactic Dependent 38.9%

Tactics Distribution - Human Proofs Most commonly used human tactics: - REWRITE_TAC - RAW_POP_TAC - LABEL_TAC - MP_TAC - X_GEN_TAC

Tactics Distribution - Reinforcement Loop Tactics used in Reinforcement Loop: - ASM_MESON_TAC - REWRITE_TAC - ONCE_REWRITE_TAC - MP_TAC - SIMP_TAC

Tactics Comparison Most increased: Over used - ASM_MESON_TAC compared to human proofs - ONCE_REWRITE_TAC Most decreased: - LABEL_TAC - RAW_POP_TAC - MP_TAC - X_GEN_TAC Under used compared to human proofs

Soundness is Critical ITPs motivated by concerns around correctness of natural mathematics. ● HOL Light relies on only ~400 trusted lines of code. You should not need to trust more than that: ● Environment optimizations: startup cheats-ins and proof search code are now in the critical core (!) -- we must have a proof checker. ● Reinforcement learning reinforces soundness problems.

Proof Checker We provide a proof checker that compiles proof logs into OCaml code ● Human-readable format ● Can be checked with HOL Light’s core To be sure that the proofs work, the proof checker replaces HOL Light’s built-in proofs by the imported synthetic proofs. ● Same soundness guarantees as HOL Light.

Proof Checker - Example Goal: |- !x y. exp (x - y) = exp x / exp y Parse_tactic.parse "ONCE_REWRITE_TAC [ THM 1821089176251131959 ]" THEN Parse_tactic.parse "ONCE_REWRITE_TAC [ THM 4045159953109002127 ]" THEN Parse_tactic.parse "REWRITE_TAC [ ]" THEN Parse_tactic.parse "SIMP_TAC [ THM 3715151876396972661 ; THM 1821089176251131959 ; THM 2738361808698163958 ]" THEN Parse_tactic.parse "ASM_MESON_TAC [ THM 4334187771985600363 ; THM 1672658611913439754 ; THM 4290630536384220426 ; THM 3714350038189073359 ]"

Proof Checker - Example Goal: |- !x y. exp (x - y) = exp x / exp y ONCE_REWRITE_TAC [ FORALL_UNPAIR_THM ] THEN ONCE_REWRITE_TAC [ FORALL_PAIR_THM ] THEN REWRITE_TAC [ ] THEN SIMP_TAC [ MESON[] `(!t. p t) <=> ~(?t. ~p t)` ; FORALL_UNPAIR_THM ; real_div ] THEN ASM_MESON_TAC [ REAL_EXP_NEG (* |- !x. exp(--x) = inv(exp(x)) *) ; REAL_POLY_CLAUSES (* includes induction on exp *) ; REAL_EXP_ADD_MUL (* |- !x y. exp(x + y) * exp(--x) = exp(y)*) ; REAL_EQ_SUB_LADD (* |- x y z. (x = y - z) <=> (x + z = y)*) ]

Hard Negative Mining ● During training, we can simultaneously mine hard negatives by ranking all theorems and adding extra training on negative examples ranked just above positives. ● This is an early result, but it seems to help a lot for imitation learning. ● Next step: Try it in the reinforcement loop.

Results - Hard Negative Mining Percent of Validation Model Theorems Closed Baselines ASM_MESON_TAC 6.1% ASM_MESON_TAC + WaveNet premise selection 9.2% Imitation Learning WaveNet 24.0% With Hard Negative Mining 37.2% Imitation Learning + Reinforcement Loop WaveNet 36.3% - trained alongside output 36.8% Tactic Dependent 38.9%

Challenges: Learning for Theorem Proving ● Infinite, very heterogeneous action space ● Extremely sparse reward ● Unbounded, growing knowledge base ● Infeasibility of self-play/self-play is not obviously employed (the way it is known from chess or go) ● Slow evaluation

Discussion ● RL Loop - Zero shot learning. ● Suggestions from other work (e.g. imitation learning, from AlphaStar). ● Opportunities for the community. ● http://deephol.org (Code is on GitHub. Training data, checkpoints, docker images also being made available.) ● Arxiv preprint: https://arxiv.org/abs/1904.03241, "HOList: An Environment for Machine Learning of Higher-Order Theorem Proving"

Large Scale Deep Learning for Theorem Proving in HOList: First - PowerPoint PPT Presentation

Large Scale Deep Learning for Theorem Proving in HOList: First Results and Future Directions Sarah Loos HOList An Environment for Machine Learning of Higher-Order Theorem Proving HOList provides a simple API for ML researchers and theorem

HOList: An Environment for Machine Learning of Higher-Order Theorem Proving Kshitij Bansal, Sarah

HOList: An Environment for Machine Learning of Higher-Order Theorem Proving Kshitij Bansal,

Visual theorem proving with the Incredible Proof Machine The idea Theorem Proving without

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA Overview Last Lecture theorem

Theorem-Proving Environments Nathan Ng CSC2547: Learning to Search Theorem Proving What is a

On Theorem Proving for Program Checking Historical perspective and recent developments Maria

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA 2019 Computer Theorem Proving

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving

31. Stokes Theorem Stokes theorem is to Greens theorem, for the work done, as the

Automated Theorem Proving 1/4: Introduction and Propositional Theorem Proving A.L. Lamprecht

Instantiation-Based Automated Theorem Proving for First-Order Logic Konstantin Korovin The

Functional Programming Functional Programming and Theorem Proving and Theorem Proving for

Formal Verification Methods 4: Theorem Proving John Harrison Intel Corporation Need for

Automated Theorem Proving 2/4: First-Order Theorem Proving A.L. Lamprecht Course Program

Saturation-based Theorem Proving and ML Course Machine Learning and Reasoning 2020 MLR 2020 1 1

A SIMULATION-BASED ARCHITECTURE FOR SMART CYBER- PHYSICAL SYSTEMS THOMAS GABOR, LENZ BELZNER,

Opera Productions, Old and New 1. If its baroque Handel: Jephtha (Claus Guth, director )

THE CONFLICT BETWEEN JESUS AND THE JEWISH LEADERS Mark 7:1-16 MARK 6:53-56 53 When they had

Stabilization methods for the Korteweg-de Nonlinear System Boundary Control Vries equation from

De labstraction des modles de composants logiciels pour la programmation dapplications

NO CONFLICT OF Associate Chief, Cardiac Electrophysiology Director, Cardiac Electrophysiology

t

Accessible Books and Literacy: Supporting and Encouraging a Love for Literacy Presented by

Sambuz

Useful Links

Newsletter

Mail Us