Learning theorem proving through self-play Stanisaw Purga Overview - - PowerPoint PPT Presentation

learning theorem proving through self play
SMART_READER_LITE
LIVE PREVIEW

Learning theorem proving through self-play Stanisaw Purga Overview - - PowerPoint PPT Presentation

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving game adjusting MCTS for proving game some results 2019-10 1 Neural black box game state S move policy expected outcome R n v


slide-1
SLIDE 1

Learning theorem proving through self-play

Stanisław Purgał

slide-2
SLIDE 2

Overview

  • AlphaZero
  • Proving game
  • adjusting MCTS for proving game
  • some results

2019-10 1

slide-3
SLIDE 3

Neural black box

game state S expected outcome v ∈ R move policy

π ∈ Rn

2019-10 2

slide-4
SLIDE 4

Neural black box

(S1, π1, v1)

. . .

(Sn, πn, vn)

2019-10 3

slide-5
SLIDE 5

Monte-Carlo Tree Search

game state S expected outcome v ∈ R move policy

π ∈ Rn

2019-10 4

slide-6
SLIDE 6

Monte-Carlo Tree Search

S

π

v weighted average S1 S2 S3 choose a child according to the formula: c ·

n ni πi + vi

c =

  • log n+cbase+1

cbase

+ cinit

  • cbase = 19652

cinit = 1.25

2019-10 5

slide-7
SLIDE 7

Monte-Carlo Tree Search

2019-10 6

slide-8
SLIDE 8

Monte-Carlo Tree Search

2019-10 7

slide-9
SLIDE 9

Why not maximum?

game state S expected outcome v ∈ R v = t + error move policy

π ∈ Rn

2019-10 8

slide-10
SLIDE 10

Why not maximum?

v1 = t1 + error v2 = t2 + error v3 = t3 + ERROR min / max v = t + ERROR

2019-10 9

slide-11
SLIDE 11

Why not maximum?

v1 = t1 + error v2 = t2 + error v3 = t3 + ERROR average v = t + Σerror

n

2019-10 10

slide-12
SLIDE 12

Closing the loop

  • play lots of games
  • choose moves randomly, according to MCTS policy
  • use finished games for training:
  • desired value in the result of the game
  • desired policy is the MCTS policy
  • also add noise to neural network output to increase exploration

2019-10 11

slide-13
SLIDE 13

Proving game

theorem Prove the theorem lose win

2019-10 12

slide-14
SLIDE 14

Proving game

Construct a theorem Prove the theorem Adversary wins Prover wins

2019-10 13

slide-15
SLIDE 15

Prolog-like proving

A ⊢ X A ⊢ Y A ⊢ X ∧ Y (1)

holds(A, and(X, Y)) :- holds(A, X), holds(A, Y)

(2)

2019-10 14

slide-16
SLIDE 16

Prolog-like proving

[ X:A ⊢ X ∧ ¬¬X , ... ]

A ⊢ X ∧ Y :- A ⊢ X, A ⊢ Y X:A ⊢ X ∧ ¬¬X :-X:A ⊢ X, X:A ⊢ ¬¬X

[ X:A ⊢ X, X:A ⊢ ¬¬X , ... ]

2019-10 15

slide-17
SLIDE 17

Prolog-like proving

[ X:A, and(X, not(not(X)))) , ... ] holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) holds(X:A, and(X, not(not(X)))) :- holds(X:A, X), holds(X:A, not(not(X))) [ holds(X:A, X), holds(X:A, not(not(X))) , ... ]

2019-10 16

slide-18
SLIDE 18

Prolog-like theorem constructing

[ holds(X:A, and(X, not(not(X)))) , ... ] holds(X:A, and(X, not(not(X)))) :- holds(X:A, X), holds(X:A, not(not(X))) holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) [ holds(X:A, X), holds(X:A, not(not(X))) , ... ]

bad idea

2019-10 17

slide-19
SLIDE 19

Prolog-like theorem constructing

[ holds(A, ♣) , ... ] holds(A, ♣) :- holds(A, or(♦, ♥)), holds(A, implies(♦, ♣)), holds(A, implies(♥, ♣)) holds(A, Z) :- holds(A, or(X, Y)), holds(A, implies(X, Z)), holds(A, implies(Y, Z)) [ , , , ... ]

bad idea

2019-10 18

slide-20
SLIDE 20

Prolog-like theorem constructing

[ T ] holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) [ holds(A, X), holds(A, Y) ]

2019-10 19

slide-21
SLIDE 21

Prolog-like theorem constructing

T

holds(X:A, and(X, not(not(X)))) holds(x:a, and(x, not(not(x))))

2019-10 20

slide-22
SLIDE 22

Forcing termination of the game

Step limit:

  • ugly extension of game state
  • strategy may depend on number of steps left
  • even if we hide it, there is a correlation:

large term constructed ∼ few steps left ∼ will likely lose

2019-10 21

slide-23
SLIDE 23

Forcing termination of the game

Sudden death chance:

  • game states nicely equal
  • no hard limit for length of a theorem

During training playout, randomly terminate game with chance pd. In MCTS, adjust value v′ = (−1) · pd + v · (1 − pd).

2019-10 22

slide-24
SLIDE 24

Disadvantages of this game

  • two different players - if one player starts winning every game, we can’t

learn much

  • proof use single inference steps - inefficient
  • players don’t take turns - MCTS not designed for that situation

2019-10 23

slide-25
SLIDE 25

Not using maximum

2019-10 24

slide-26
SLIDE 26

Not using maximum

2019-10 25

slide-27
SLIDE 27

Not using maximum

2019-10 26

slide-28
SLIDE 28

Not using maximum

2019-10 27

slide-29
SLIDE 29

Certainty propagation

2019-10 28

slide-30
SLIDE 30

Certainty propagation

2019-10 29

slide-31
SLIDE 31

Certainty propagation

2019-10 30

slide-32
SLIDE 32

Certainty propagation

for uncertain leafs: v = a = l = −1 u = 1 for certain leafs: v = result a = result l = result u = result recursively: v = min(u, max(l, a)) a = +Σvi·ni

n+1

l = maxi li u = maxi ui when player changes:

  • values and bounds flip
  • lower and upper bound switch places

2019-10 31

slide-33
SLIDE 33

Toy problem

ablist([]). ablist([a|L]) :- ablist(L). ablist([b|L]) :- ablist(L). ablist([c|L]) :- ablist(L). ablist([d|L]) :- ablist(L). rev3([],L,L). rev3([H|T],L,Acc) :- rev3(T, L, [H|Acc]). revablist(L) :- ablist(T), rev3(L, T, []).

2019-10 32

slide-34
SLIDE 34

Toy problem evaluation

ablist([a,b,a,b,a,b,b]), revablist([]), revablist([a]), revablist([b]), revablist([c,d]), revablist([c,a,b]), revablist([a,d,c,b]), revablist([a,d,c,a,a]), revablist([a,b,c,d,b,d]), revablist([d,b,c,a,d,a,b]), revablist([a,c,b,a,c,a,d,d])

2019-10 33

slide-35
SLIDE 35

Certainty propagation effect

2019-10 34

slide-36
SLIDE 36

Learning the proving game

Like AlphaZero, with few differences:

  • using Graph Attention Network for
  • for theorems that prover failed to prove, show proper path with additional

policy training samples

  • during evaluation, greedy policy and step limit instead of sudden death

2019-10 35

slide-37
SLIDE 37

Proving game evaluation

Construct a theorem evaluation theorem Prove the theorem Adversary wins Prover wins

2019-10 36

slide-38
SLIDE 38

Learning toy problem

2019-10 37

slide-39
SLIDE 39

Intuitionistic propositional logic

holds([A|T], A). holds(T, A) :- holds([B|T], A), holds(T, B). holds([H|T], A) :- holds(T, A). holds(T, impl(A, B)) :- holds([A|T], B). holds(T, B) :- holds(T, A), holds(T, impl(A, B)). holds(T, or(A, B)) :- holds(T, A). holds(T, or(A, B)) :- holds(T, B). holds(T, C) :- holds(T, or(A, B)), holds([A|T], C), holds([B|T], C). holds(T, and(A, B)) :- holds(T, A), holds(T, B). holds(T, A) :- holds(T, and(A, B)). holds(T, B) :- holds(T, and(A, B)). holds([false|T], A).

2019-10 38

slide-40
SLIDE 40

Classical propositional logic

holds([A|T], A). holds(T, A) :- holds([B|T], A), holds(T, B). holds([H|T], A) :- holds(T, A). holds(T, impl(A, B)) :- holds([A|T], B). holds(T, B) :- holds(T, A), holds(T, impl(A, B)). holds(T, or(A, B)) :- holds(T, A). holds(T, or(A, B)) :- holds(T, B). holds(T, C) :- holds(T, or(A, B)), holds([A|T], C), holds([B|T], C). holds(T, and(A, B)) :- holds(T, A), holds(T, B). holds(T, A) :- holds(T, and(A, B)). holds(T, B) :- holds(T, and(A, B)). holds([false|T], A). holds(T, A) :- holds([impl(A, false)|T], false).

2019-10 39

slide-41
SLIDE 41

Learning classical propositional logic

2019-10 40

slide-42
SLIDE 42

Constructed theorem example

RootNode ConstantNode holds/2 ConstantNode and/2 and/2 ConstantNode and/2 impl/2 ConstantNode

  • r/2
  • r/2

ConstantNode

⊢ (((d ∧ b ∧ c) ∨ (b ∧ c ∧ d)) = ⇒ b) ∨ e ⊥ ⊢ a ∨ b ∨ c ⊢ ((((a ∧ ⊥ ∧ b) = ⇒ c) = ⇒ d) = ⇒ d) ((a ∧ b) = ⇒ a) = ⇒ (⊥ ∧ c) ⊢ d ((a = ⇒ ⊥) = ⇒ b) , c , (a = ⇒ b) ⊢ b

2019-10 41

slide-43
SLIDE 43

First-order logic

%some classical logic neq(var([a|_]), var([b|_])). neq(var([b|_]), var([a|_])). neq(var([_|A]), var([_|B])) :- neq(var(A), var(B)). repl(var(A), R, var(A), R). repl(var(A), R, var(B), var(B)) :- neq(var(A), var(B)). repl(var(A), R, op(O, X1, Y1), op(O, X2, Y2)) :- repl(var(A), R, X1, X2), repl(var(A), R, Y1, Y2). repl(var(A), R, q(O, var(A), P), q(O, var(A), P)). repl(var(A), R, q(O, var(B), P1), q(O, var(B), P2)) :- neq(var(A), var(B)), repl(var(A), R, P1, P2). repl(var(A), R, false, false). repl(var(A), R, [], []). repl(var(A), R, [H1|T1], [H2|T2]) :- repl(var(A), R, H1, H2), repl(var(A), R, T1, T2). holds(T, q(forall, var(A), Phi)) :- repl(var(A), var(B), Phi, PhiBA), repl(var(B), false, [Phi|T], [Phi|T]), holds(T, PhiBA). holds(T, Phi) :- holds(T, q(forall, var(A), PhiA)), repl(var(A), B, PhiA, Phi). holds(T, q(exists, var(A), Phi)) :- repl(var(A), R, Phi, PhiR), holds(T, PhiR). holds(T, P) :- holds(T, q(exists, var(A), Phi)), repl(var(B), false, Phi, Phi), repl(var(A), var(B), Phi, PhiB), holds([PhiB|T], P).

2019-10 42

slide-44
SLIDE 44

Future work

  • better rule representation?
  • proper prover with a different construction mechanism?
  • different use cases?
  • more computational power?

2019-10 43

slide-45
SLIDE 45

Thank you for your attention!

Stanisław Purgał