learning theorem proving through self play
play

Learning theorem proving through self-play Stanisaw Purga Overview - PowerPoint PPT Presentation

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving game adjusting MCTS for proving game some results 2019-10 1 Neural black box game state S move policy expected outcome R n v


  1. Learning theorem proving through self-play Stanisław Purgał

  2. Overview • AlphaZero • Proving game • adjusting MCTS for proving game • some results 2019-10 1

  3. Neural black box game state S move policy expected outcome π ∈ R n v ∈ R 2019-10 2

  4. Neural black box ( S 1 , π 1 , v 1 ) . . . ( S n , π n , v n ) 2019-10 3

  5. Monte-Carlo Tree Search game state S move policy expected outcome π ∈ R n v ∈ R 2019-10 4

  6. Monte-Carlo Tree Search S choose a child according S 1 S 2 S 3 to the formula: v π √ n n i π i + v i c · log n + c base + 1 � � c = + c init c base weighted c base = 19652 average c init = 1 . 25 2019-10 5

  7. Monte-Carlo Tree Search 2019-10 6

  8. Monte-Carlo Tree Search 2019-10 7

  9. Why not maximum? game state S move policy expected outcome π ∈ R n v ∈ R v = t + error 2019-10 8

  10. Why not maximum? v 1 = t 1 + error v 2 = t 2 + error v 3 = t 3 + ERROR min / max v = t + ERROR 2019-10 9

  11. Why not maximum? v 1 = t 1 + error v 2 = t 2 + error v 3 = t 3 + ERROR average v = t + Σ error n 2019-10 10

  12. Closing the loop • play lots of games • choose moves randomly, according to MCTS policy • use finished games for training: • desired value in the result of the game • desired policy is the MCTS policy • also add noise to neural network output to increase exploration 2019-10 11

  13. Proving game theorem Prove the theorem win lose 2019-10 12

  14. Proving game Construct a theorem Adversary wins Prove the theorem Prover wins 2019-10 13

  15. Prolog-like proving A ⊢ X A ⊢ Y (1) A ⊢ X ∧ Y holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) (2) 2019-10 14

  16. Prolog-like proving [ X : A ⊢ X ∧ ¬¬ X , ... ] A ⊢ X ∧ Y :- A ⊢ X , A ⊢ Y X : A ⊢ X ∧ ¬¬ X :- X : A ⊢ X , X : A ⊢ ¬¬ X [ X : A ⊢ X , X : A ⊢ ¬¬ X , ... ] 2019-10 15

  17. Prolog-like proving [ X : A , and ( X , not ( not ( X )))) , ... ] holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) holds ( X : A , and ( X , not ( not ( X )))) :- holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) [ holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) , ... ] 2019-10 16

  18. Prolog-like theorem constructing [ holds ( X : A , and ( X , not ( not ( X )))) , ... ] holds ( X : A , and ( X , not ( not ( X )))) :- holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) [ holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) , ... ] bad idea 2019-10 17

  19. Prolog-like theorem constructing [ holds ( A , ♣ ) , ... ] holds ( A , ♣ ) :- holds ( A , or ( ♦ , ♥ )) , holds ( A , implies ( ♦ , ♣ )) , holds ( A , implies ( ♥ , ♣ )) holds ( A , Z ) :- holds ( A , or ( X , Y )) , holds ( A , implies ( X , Z )) , holds ( A , implies ( Y , Z )) [ � , � , � , ... ] bad idea 2019-10 18

  20. Prolog-like theorem constructing [ T ] holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) [ holds ( A , X ) , holds ( A , Y ) ] 2019-10 19

  21. Prolog-like theorem constructing T holds ( X : A , and ( X , not ( not ( X )))) holds ( x:a , and ( x , not ( not ( x )))) 2019-10 20

  22. Forcing termination of the game Step limit: • ugly extension of game state • strategy may depend on number of steps left • even if we hide it, there is a correlation: large term constructed ∼ few steps left ∼ will likely lose 2019-10 21

  23. Forcing termination of the game Sudden death chance: • game states nicely equal • no hard limit for length of a theorem During training playout, randomly terminate game with chance p d . In MCTS, adjust value v ′ = ( − 1 ) · p d + v · ( 1 − p d ) . 2019-10 22

  24. Disadvantages of this game • two different players - if one player starts winning every game, we can’t learn much • proof use single inference steps - inefficient • players don’t take turns - MCTS not designed for that situation 2019-10 23

  25. Not using maximum 2019-10 24

  26. Not using maximum 2019-10 25

  27. Not using maximum 2019-10 26

  28. Not using maximum 2019-10 27

  29. Certainty propagation 2019-10 28

  30. Certainty propagation 2019-10 29

  31. Certainty propagation 2019-10 30

  32. Certainty propagation recursively: for uncertain leafs: for certain leafs: v = min( u , max( l , a )) v = � v = result a = � +Σ v i · n i a = � a = result n + 1 l = max i l i l = − 1 l = result u = max i u i u = 1 u = result when player changes: • values and bounds flip • lower and upper bound switch places 2019-10 31

  33. Toy problem ablist([]). ablist([a|L]) :- ablist(L). ablist([b|L]) :- ablist(L). ablist([c|L]) :- ablist(L). ablist([d|L]) :- ablist(L). rev3([],L,L). rev3([H|T],L,Acc) :- rev3(T, L, [H|Acc]). revablist(L) :- ablist(T), rev3(L, T, []). 2019-10 32

  34. Toy problem evaluation ablist([a,b,a,b,a,b,b]), revablist([]), revablist([a]), revablist([b]), revablist([c,d]), revablist([c,a,b]), revablist([a,d,c,b]), revablist([a,d,c,a,a]), revablist([a,b,c,d,b,d]), revablist([d,b,c,a,d,a,b]), revablist([a,c,b,a,c,a,d,d]) 2019-10 33

  35. Certainty propagation effect 2019-10 34

  36. Learning the proving game Like AlphaZero, with few differences: • using Graph Attention Network for � • for theorems that prover failed to prove, show proper path with additional policy training samples • during evaluation, greedy policy and step limit instead of sudden death 2019-10 35

  37. Proving game evaluation Construct a theorem evaluation theorem Adversary wins Prove the theorem Prover wins 2019-10 36

  38. Learning toy problem 2019-10 37

  39. Intuitionistic propositional logic holds([A|T], A). holds(T, A) :- holds([B|T], A), holds(T, B). holds([H|T], A) :- holds(T, A). holds(T, impl(A, B)) :- holds([A|T], B). holds(T, B) :- holds(T, A), holds(T, impl(A, B)). holds(T, or(A, B)) :- holds(T, A). holds(T, or(A, B)) :- holds(T, B). holds(T, C) :- holds(T, or(A, B)), holds([A|T], C), holds([B|T], C). holds(T, and(A, B)) :- holds(T, A), holds(T, B). holds(T, A) :- holds(T, and(A, B)). holds(T, B) :- holds(T, and(A, B)). holds([false|T], A). 2019-10 38

  40. Classical propositional logic holds([A|T], A). holds(T, A) :- holds([B|T], A), holds(T, B). holds([H|T], A) :- holds(T, A). holds(T, impl(A, B)) :- holds([A|T], B). holds(T, B) :- holds(T, A), holds(T, impl(A, B)). holds(T, or(A, B)) :- holds(T, A). holds(T, or(A, B)) :- holds(T, B). holds(T, C) :- holds(T, or(A, B)), holds([A|T], C), holds([B|T], C). holds(T, and(A, B)) :- holds(T, A), holds(T, B). holds(T, A) :- holds(T, and(A, B)). holds(T, B) :- holds(T, and(A, B)). holds([false|T], A). holds(T, A) :- holds([impl(A, false)|T], false). 2019-10 39

  41. Learning classical propositional logic 2019-10 40

  42. Constructed theorem example ConstantNode ConstantNode and/2 ConstantNode ⊢ ((( d ∧ b ∧ c ) ∨ ( b ∧ c ∧ d )) = ⇒ b ) ∨ e and/2 and/2 ⊥ ⊢ a ∨ b ∨ c or/2 ⊢ (((( a ∧ ⊥ ∧ b ) = ⇒ c ) = ⇒ d ) = ⇒ d ) (( a ∧ b ) = ⇒ a ) = ⇒ ( ⊥ ∧ c ) ⊢ d impl/2 ConstantNode (( a = ⇒ ⊥ ) = ⇒ b ) , c , ( a = ⇒ b ) ⊢ b ConstantNode or/2 holds/2 RootNode 2019-10 41

  43. First-order logic %some classical logic neq(var([a|_]), var([b|_])). neq(var([b|_]), var([a|_])). neq(var([_|A]), var([_|B])) :- neq(var(A), var(B)). repl(var(A), R, var(A), R). repl(var(A), R, var(B), var(B)) :- neq(var(A), var(B)). repl(var(A), R, op(O, X1, Y1), op(O, X2, Y2)) :- repl(var(A), R, X1, X2), repl(var(A), R, Y1, Y2). repl(var(A), R, q(O, var(A), P), q(O, var(A), P)). repl(var(A), R, q(O, var(B), P1), q(O, var(B), P2)) :- neq(var(A), var(B)), repl(var(A), R, P1, P2). repl(var(A), R, false, false). repl(var(A), R, [], []). repl(var(A), R, [H1|T1], [H2|T2]) :- repl(var(A), R, H1, H2), repl(var(A), R, T1, T2). holds(T, q(forall, var(A), Phi)) :- repl(var(A), var(B), Phi, PhiBA), repl(var(B), false, [Phi|T], [Phi|T]), holds(T, PhiBA). holds(T, Phi) :- holds(T, q(forall, var(A), PhiA)), repl(var(A), B, PhiA, Phi). holds(T, q(exists, var(A), Phi)) :- repl(var(A), R, Phi, PhiR), holds(T, PhiR). holds(T, P) :- holds(T, q(exists, var(A), Phi)), repl(var(B), false, Phi, Phi), repl(var(A), var(B), Phi, PhiB), holds([PhiB|T], P). 2019-10 42

  44. Future work • better rule representation? • proper prover with a different construction mechanism? • different use cases? • more computational power? 2019-10 43

  45. Thank you for your attention! Stanisław Purgał

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend