Learning theorem proving through self-play Stanisaw Purga Overview - - PowerPoint PPT Presentation
Learning theorem proving through self-play Stanisaw Purga Overview - - PowerPoint PPT Presentation
Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving game adjusting MCTS for proving game some results 2019-10 1 Neural black box game state S move policy expected outcome R n v
Overview
- AlphaZero
- Proving game
- adjusting MCTS for proving game
- some results
2019-10 1
Neural black box
game state S expected outcome v ∈ R move policy
π ∈ Rn
2019-10 2
Neural black box
(S1, π1, v1)
. . .
(Sn, πn, vn)
2019-10 3
Monte-Carlo Tree Search
game state S expected outcome v ∈ R move policy
π ∈ Rn
2019-10 4
Monte-Carlo Tree Search
S
π
v weighted average S1 S2 S3 choose a child according to the formula: c ·
√
n ni πi + vi
c =
- log n+cbase+1
cbase
+ cinit
- cbase = 19652
cinit = 1.25
2019-10 5
Monte-Carlo Tree Search
2019-10 6
Monte-Carlo Tree Search
2019-10 7
Why not maximum?
game state S expected outcome v ∈ R v = t + error move policy
π ∈ Rn
2019-10 8
Why not maximum?
v1 = t1 + error v2 = t2 + error v3 = t3 + ERROR min / max v = t + ERROR
2019-10 9
Why not maximum?
v1 = t1 + error v2 = t2 + error v3 = t3 + ERROR average v = t + Σerror
n
2019-10 10
Closing the loop
- play lots of games
- choose moves randomly, according to MCTS policy
- use finished games for training:
- desired value in the result of the game
- desired policy is the MCTS policy
- also add noise to neural network output to increase exploration
2019-10 11
Proving game
theorem Prove the theorem lose win
2019-10 12
Proving game
Construct a theorem Prove the theorem Adversary wins Prover wins
2019-10 13
Prolog-like proving
A ⊢ X A ⊢ Y A ⊢ X ∧ Y (1)
holds(A, and(X, Y)) :- holds(A, X), holds(A, Y)
(2)
2019-10 14
Prolog-like proving
[ X:A ⊢ X ∧ ¬¬X , ... ]
A ⊢ X ∧ Y :- A ⊢ X, A ⊢ Y X:A ⊢ X ∧ ¬¬X :-X:A ⊢ X, X:A ⊢ ¬¬X
[ X:A ⊢ X, X:A ⊢ ¬¬X , ... ]
2019-10 15
Prolog-like proving
[ X:A, and(X, not(not(X)))) , ... ] holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) holds(X:A, and(X, not(not(X)))) :- holds(X:A, X), holds(X:A, not(not(X))) [ holds(X:A, X), holds(X:A, not(not(X))) , ... ]
2019-10 16
Prolog-like theorem constructing
[ holds(X:A, and(X, not(not(X)))) , ... ] holds(X:A, and(X, not(not(X)))) :- holds(X:A, X), holds(X:A, not(not(X))) holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) [ holds(X:A, X), holds(X:A, not(not(X))) , ... ]
bad idea
2019-10 17
Prolog-like theorem constructing
[ holds(A, ♣) , ... ] holds(A, ♣) :- holds(A, or(♦, ♥)), holds(A, implies(♦, ♣)), holds(A, implies(♥, ♣)) holds(A, Z) :- holds(A, or(X, Y)), holds(A, implies(X, Z)), holds(A, implies(Y, Z)) [ , , , ... ]
bad idea
2019-10 18
Prolog-like theorem constructing
[ T ] holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) [ holds(A, X), holds(A, Y) ]
2019-10 19
Prolog-like theorem constructing
T
holds(X:A, and(X, not(not(X)))) holds(x:a, and(x, not(not(x))))
2019-10 20
Forcing termination of the game
Step limit:
- ugly extension of game state
- strategy may depend on number of steps left
- even if we hide it, there is a correlation:
large term constructed ∼ few steps left ∼ will likely lose
2019-10 21
Forcing termination of the game
Sudden death chance:
- game states nicely equal
- no hard limit for length of a theorem
During training playout, randomly terminate game with chance pd. In MCTS, adjust value v′ = (−1) · pd + v · (1 − pd).
2019-10 22
Disadvantages of this game
- two different players - if one player starts winning every game, we can’t
learn much
- proof use single inference steps - inefficient
- players don’t take turns - MCTS not designed for that situation
2019-10 23
Not using maximum
2019-10 24
Not using maximum
2019-10 25
Not using maximum
2019-10 26
Not using maximum
2019-10 27
Certainty propagation
2019-10 28
Certainty propagation
2019-10 29
Certainty propagation
2019-10 30
Certainty propagation
for uncertain leafs: v = a = l = −1 u = 1 for certain leafs: v = result a = result l = result u = result recursively: v = min(u, max(l, a)) a = +Σvi·ni
n+1
l = maxi li u = maxi ui when player changes:
- values and bounds flip
- lower and upper bound switch places
2019-10 31
Toy problem
ablist([]). ablist([a|L]) :- ablist(L). ablist([b|L]) :- ablist(L). ablist([c|L]) :- ablist(L). ablist([d|L]) :- ablist(L). rev3([],L,L). rev3([H|T],L,Acc) :- rev3(T, L, [H|Acc]). revablist(L) :- ablist(T), rev3(L, T, []).
2019-10 32
Toy problem evaluation
ablist([a,b,a,b,a,b,b]), revablist([]), revablist([a]), revablist([b]), revablist([c,d]), revablist([c,a,b]), revablist([a,d,c,b]), revablist([a,d,c,a,a]), revablist([a,b,c,d,b,d]), revablist([d,b,c,a,d,a,b]), revablist([a,c,b,a,c,a,d,d])
2019-10 33
Certainty propagation effect
2019-10 34
Learning the proving game
Like AlphaZero, with few differences:
- using Graph Attention Network for
- for theorems that prover failed to prove, show proper path with additional
policy training samples
- during evaluation, greedy policy and step limit instead of sudden death
2019-10 35
Proving game evaluation
Construct a theorem evaluation theorem Prove the theorem Adversary wins Prover wins
2019-10 36
Learning toy problem
2019-10 37
Intuitionistic propositional logic
holds([A|T], A). holds(T, A) :- holds([B|T], A), holds(T, B). holds([H|T], A) :- holds(T, A). holds(T, impl(A, B)) :- holds([A|T], B). holds(T, B) :- holds(T, A), holds(T, impl(A, B)). holds(T, or(A, B)) :- holds(T, A). holds(T, or(A, B)) :- holds(T, B). holds(T, C) :- holds(T, or(A, B)), holds([A|T], C), holds([B|T], C). holds(T, and(A, B)) :- holds(T, A), holds(T, B). holds(T, A) :- holds(T, and(A, B)). holds(T, B) :- holds(T, and(A, B)). holds([false|T], A).
2019-10 38
Classical propositional logic
holds([A|T], A). holds(T, A) :- holds([B|T], A), holds(T, B). holds([H|T], A) :- holds(T, A). holds(T, impl(A, B)) :- holds([A|T], B). holds(T, B) :- holds(T, A), holds(T, impl(A, B)). holds(T, or(A, B)) :- holds(T, A). holds(T, or(A, B)) :- holds(T, B). holds(T, C) :- holds(T, or(A, B)), holds([A|T], C), holds([B|T], C). holds(T, and(A, B)) :- holds(T, A), holds(T, B). holds(T, A) :- holds(T, and(A, B)). holds(T, B) :- holds(T, and(A, B)). holds([false|T], A). holds(T, A) :- holds([impl(A, false)|T], false).
2019-10 39
Learning classical propositional logic
2019-10 40
Constructed theorem example
RootNode ConstantNode holds/2 ConstantNode and/2 and/2 ConstantNode and/2 impl/2 ConstantNode
- r/2
- r/2
ConstantNode
⊢ (((d ∧ b ∧ c) ∨ (b ∧ c ∧ d)) = ⇒ b) ∨ e ⊥ ⊢ a ∨ b ∨ c ⊢ ((((a ∧ ⊥ ∧ b) = ⇒ c) = ⇒ d) = ⇒ d) ((a ∧ b) = ⇒ a) = ⇒ (⊥ ∧ c) ⊢ d ((a = ⇒ ⊥) = ⇒ b) , c , (a = ⇒ b) ⊢ b
2019-10 41
First-order logic
%some classical logic neq(var([a|_]), var([b|_])). neq(var([b|_]), var([a|_])). neq(var([_|A]), var([_|B])) :- neq(var(A), var(B)). repl(var(A), R, var(A), R). repl(var(A), R, var(B), var(B)) :- neq(var(A), var(B)). repl(var(A), R, op(O, X1, Y1), op(O, X2, Y2)) :- repl(var(A), R, X1, X2), repl(var(A), R, Y1, Y2). repl(var(A), R, q(O, var(A), P), q(O, var(A), P)). repl(var(A), R, q(O, var(B), P1), q(O, var(B), P2)) :- neq(var(A), var(B)), repl(var(A), R, P1, P2). repl(var(A), R, false, false). repl(var(A), R, [], []). repl(var(A), R, [H1|T1], [H2|T2]) :- repl(var(A), R, H1, H2), repl(var(A), R, T1, T2). holds(T, q(forall, var(A), Phi)) :- repl(var(A), var(B), Phi, PhiBA), repl(var(B), false, [Phi|T], [Phi|T]), holds(T, PhiBA). holds(T, Phi) :- holds(T, q(forall, var(A), PhiA)), repl(var(A), B, PhiA, Phi). holds(T, q(exists, var(A), Phi)) :- repl(var(A), R, Phi, PhiR), holds(T, PhiR). holds(T, P) :- holds(T, q(exists, var(A), Phi)), repl(var(B), false, Phi, Phi), repl(var(A), var(B), Phi, PhiB), holds([PhiB|T], P).
2019-10 42
Future work
- better rule representation?
- proper prover with a different construction mechanism?
- different use cases?
- more computational power?
2019-10 43