Reinforcement Learning for leanCoP Cezary Kaliszyk Josef Urban - PowerPoint PPT Presentation

Reinforcement Learning for leanCoP Cezary Kaliszyk Josef Urban Henryk Michalewski Mirek Olšák AITP 2018 March 28, 2018

Automated Theorem Proving Historical dispute: Gentzen and Hilbert Today two communities: Resolution (-style) and Tableaux Possible answer: What is better in practice? Say the CASC competition or ITP libraries? Since the late 90s: resolution (superposition) But still so far from humans? We can do learning much better for Tableaux And with ML beating brute force search in games, maybe? C. Kaliszyk ML in ATP 2 / 17

leanCoP: Lean Connection Prover [ Otten 2010 ] Connected tableaux calculus Goal oriented, good for large theories Regularly beats Metis and Prover9 in CASC (CADE ATP competition) despite their much larger implementation Compact Prolog implementation, easy to modify Variants for other foundations: iLeanCoP , mLeanCoP First experiments with machine learning: MaLeCoP Easy to imitate leanCoP tactic in HOL Light C. Kaliszyk ML in ATP 3 / 17

Lean connection Tableaux Very simple rules: Extension unifies the current literal with a copy of a clause Reduction unifies the current literal with a literal on the path Axiom {} , M , Path C , M , Path ∪ { L 2 } Reduction C ∪ { L 1 } , M , Path ∪ { L 2 } C 2 \ { L 2 } , M , Path ∪ { L 1 } C , M , Path Ex tension C ∪ { L 1 } , M , Path C. Kaliszyk ML in ATP 4 / 17

First experiment: MaLeCoP in Prolog [ Tableaux 2011 ] Other leanCoP provers Select extension steps Using external advice advisor Slow implementation cache 1000 less inf per second Can avoid 90% inferences! SNoW learning system C. Kaliszyk ML in ATP 5 / 17

What about efficiency: FEMaLeCoP [ LPAR 2015 ] Very simple but very fast classifier Naive Bayes (with optimizations) Approximate features and multi-level indexing Offline indexing Indexing for the current problem Discrimination tree stores NB data Consistent clausification and skolemization Performance is about 40% of non-learning leanCoP speed A few more theorems proved (3–15%) C. Kaliszyk ML in ATP 6 / 17

What about stronger learning? Yes, but... If put directly, huge times needed Still improvement small NBayes vs XGBoost on 2h timeout Preliminary experiments with deep learning So far quite slow C. Kaliszyk ML in ATP 7 / 17

Is theorem proving just a maze search? C. Kaliszyk ML in ATP 8 / 17

Is theorem proving just a maze search? Yes and NO! The proof search tree is not the same as the tableau tree! Unification can cause other branches to disappear. Provide an external interface to proof search Usable in OCaml, C ++ , and Python Two functions suffice start : problem → state action : action → state where state = 〈 action list × goal × path × remaining 〉 C. Kaliszyk ML in ATP 9 / 17

Is it ok to change the tree? Most learning for games sticks to game dynamics Only tell it how to do the moves Why is it ok to skip other branches Theoretically ATP calculi are complete Practically most ATP strategies incomplete In usual 30s – 300s runs Depth of proofs with backtracking: 5–7 (complete) Depth with restricted backtracking: 7–10 (more proofs found!) But with random playouts: depth hundreds of thousands! Just unlikely to find a proof → learning C. Kaliszyk ML in ATP 10 / 17

Monte Carlo First Try: MonteCoP Use Monte Carlo playouts to guide restricted backtracking Improves on leanCoP , but not by a big margin Potential still limited by depth Can we do better? Arbitrarily long playouts Learn from the playouts C. Kaliszyk ML in ATP 11 / 17

Monte Carlo Tree Search + Upper Confidence Bounds for Trees [ Szepesvari 2006 ] How to search a tree? Given some prior probabilities Given success ( fail ) rates so far UCT: Select node n maximizing � w i � ln N + c · p i · n i n i Intuition Initially proportional to the prior Later win ratio dominates We will learn the win ratio C. Kaliszyk ML in ATP 12 / 17

Monte Carlo Tree Search + Upper Confidence Bounds for Trees [ Szepesvari 2006 ] MCTS tree for t9_zfmisc_1 How to search a tree? wi prior, p i visits, n i n 1 Given some prior probabilities O 1.00 0.799 10000 Given success ( fail ) rates so far O 0.17 0.606 5625 O 0.64 0.719 4713 ... UCT: Select node n maximizing O 0.36 0.023 912 ... � w i � ln N O 0.08 0.013 622 + c · p i · n i n i X O 0.20 0.014 76 ... Intuition O 0.32 0.024 113 ... Initially proportional to the prior X Later win ratio dominates O 0.08 0.011 68 We will learn the win ratio O 0.10 0.007 5 ... C. Kaliszyk ML in ATP 13 / 17

Learn Policy: Which actions to take? Even for a single problem If we know that some branches failed We can avoid such branches in other parts of the “maze” Playouts following UCT After a number of playouts, select the most visited branch Only continue inside that branch (called big step ) A sequence of big steps ends in a proof, dead end, or is too long We can either way learn which actions were chosen With some initial win heuristic (remaining goals, size, constant) C. Kaliszyk ML in ATP 14 / 17

Learn Value: How likely is a proof state to be provable? Learn from all bigstep states One if theorem, zero otherwise C. Kaliszyk ML in ATP 15 / 17

Learn Value: How likely is a proof state to be provable? Learn from all bigstep states One if theorem, zero otherwise With 150K good value training samples and 250K good policy training samples XGBoost policy train time: 4 min, Value train time: 8 min 2000 problems run with 100K inferences, no bigsteps time (min) Theorems No learning 1.5 440 Only learn values 5.0 535 Only learn policy 10.5 790 Learn both 11.5 871 C. Kaliszyk ML in ATP 15 / 17

Reinforcement from scratch Starting with no data, and with 1500 playouts per bigstep round thms MC 665 ........... 1 654 2 718 10 782 3 727 11 797 4 754 12 796 5 748 13 800 6 769 14 795 7 760 15 794 8 776 16 792 9 776 17 804 ............ ..... ....... 29 815 30 820 C. Kaliszyk ML in ATP 16 / 17

Conclusion Reinforcement learning on small Mizar data project UCT, action, value work in connection based setup Learning from scratch can work even for a single problem Lots of things to try Other cost functions Other learning frameworks Larger experiments C. Kaliszyk ML in ATP 17 / 17

Reinforcement Learning for leanCoP Cezary Kaliszyk Josef Urban - PowerPoint PPT Presentation

Reinforcement Learning for leanCoP Cezary Kaliszyk Josef Urban Henryk Michalewski Mirek Olk AITP 2018 March 28, 2018 Automated Theorem Proving Historical dispute: Gentzen and Hilbert Today two communities: Resolution (-style) and

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Setting the Scene Our Faith Journey Our faith journey is a process The Parables are our

Extreme Physics at Extreme Baselines Andrei Lobanov, MPIfR Bonn VLBI View of AGN Jets SVLBI

Ma Matt tthe hew w 5: 5:9 INT INTRODUCTION ODUCTION Pea eace-lo lovin ving g an

Caesar had his Brutus, Charles the First his Cromwell, and George the Third [at this point the

Our Commission Matthew 28:16-20 Our Commission: Introduction & Overview 1. Series overview

Land Quality Division Wyomings Progress on Becoming an Agreement State Nuclear Regulatory

The Outline: Be How to Be a were baptized (41b) Baptized First Responder to Be

MMRP Fieldwork Update Surface Monitoring Metal Mapper Joint Inspection Mowing 6/16/2015

Sambuz

Useful Links

Newsletter

Mail Us

Reinforcement Learning for leanCoP Cezary Kaliszyk Josef Urban - PowerPoint PPT Presentation

Reinforcement Learning for leanCoP Cezary Kaliszyk Josef Urban Henryk Michalewski Mirek Olk AITP 2018 March 28, 2018 Automated Theorem Proving Historical dispute: Gentzen and Hilbert Today two communities: Resolution (-style) and

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Setting the Scene Our Faith Journey Our faith journey is a process The Parables are our

Extreme Physics at Extreme Baselines Andrei Lobanov, MPIfR Bonn VLBI View of AGN Jets SVLBI

Ma Matt tthe hew w 5: 5:9 INT INTRODUCTION ODUCTION Pea eace-lo lovin ving g an

Caesar had his Brutus, Charles the First his Cromwell, and George the Third [at this point the

Our Commission Matthew 28:16-20 Our Commission: Introduction &amp; Overview 1. Series overview

Land Quality Division Wyomings Progress on Becoming an Agreement State Nuclear Regulatory

The Outline: Be How to Be a were baptized (41b) Baptized First Responder to Be

MMRP Fieldwork Update Surface Monitoring Metal Mapper Joint Inspection Mowing 6/16/2015

Sambuz

Useful Links

Newsletter

Mail Us

Our Commission Matthew 28:16-20 Our Commission: Introduction & Overview 1. Series overview