Learning theorem proving through self-play Stanisaw Purga The goal - - PowerPoint PPT Presentation

learning theorem proving through self play
SMART_READER_LITE
LIVE PREVIEW

Learning theorem proving through self-play Stanisaw Purga The goal - - PowerPoint PPT Presentation

Learning theorem proving through self-play Stanisaw Purga The goal Learn to prove theorems without: any proofs any theorems What we get: a list of axioms defining the logic 1 Overview AlphaZero (briefly) Proving game


slide-1
SLIDE 1

Learning theorem proving through self-play

Stanisław Purgał

slide-2
SLIDE 2

The goal

Learn to prove theorems without:

  • any proofs
  • any theorems

What we get:

  • a list of axioms defining the logic

1

slide-3
SLIDE 3

Overview

  • AlphaZero (briefly)
  • Proving game
  • adjusting MCTS for proving game
  • some results

2

slide-4
SLIDE 4

Neural black box

game state S expected outcome v ∈ R move policy

π ∈ Rn

3

slide-5
SLIDE 5

Neural black box

(S1, π1, v1)

. . .

(Sn, πn, vn)

4

slide-6
SLIDE 6

Monte-Carlo Tree Search

game state S expected outcome v ∈ R move policy

π ∈ Rn

5

slide-7
SLIDE 7

Monte-Carlo Tree Search

S

π

v weighted average S1 S2 S3 choose a child according to the formula: c ·

n ni πi + vi

c =

  • log n+cbase+1

cbase

+ cinit

  • cbase = 19652

cinit = 1.25

6

slide-8
SLIDE 8

Monte-Carlo Tree Search

7

slide-9
SLIDE 9

Monte-Carlo Tree Search

8

slide-10
SLIDE 10

Closing the loop

  • play lots of games
  • choose moves randomly, according to MCTS policy
  • use finished games for training:
  • target value in the result of the game
  • target policy is the MCTS policy
  • also add noise to neural network output to increase exploration

9

slide-11
SLIDE 11

Proving game

theorem Prove the theorem lose win

10

slide-12
SLIDE 12

Proving game

Construct a theorem Prove the theorem Adversary wins Prover wins

11

slide-13
SLIDE 13

Prolog-like proving

A ⊢ X A ⊢ Y A ⊢ X ∧ Y (1)

holds(A, and(X, Y)) :- holds(A, X), holds(A, Y)

(2)

12

slide-14
SLIDE 14

Prolog-like proving

[ X:A ⊢ X ∧ ¬¬X , ... ]

A ⊢ X ∧ Y :- A ⊢ X, A ⊢ Y X:A ⊢ X ∧ ¬¬X :-X:A ⊢ X, X:A ⊢ ¬¬X

[ X:A ⊢ X, X:A ⊢ ¬¬X , ... ]

13

slide-15
SLIDE 15

Prolog-like proving

[ X:A, and(X, not(not(X)))) , ... ] holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) holds(X:A, and(X, not(not(X)))) :- holds(X:A, X), holds(X:A, not(not(X))) [ holds(X:A, X), holds(X:A, not(not(X))) , ... ]

14

slide-16
SLIDE 16

Prolog-like theorem constructing

[ holds(X:A, and(X, not(not(X)))) , ... ] holds(X:A, and(X, not(not(X)))) :- holds(X:A, X), holds(X:A, not(not(X))) holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) [ holds(X:A, X), holds(X:A, not(not(X))) , ... ]

bad idea

15

slide-17
SLIDE 17

Prolog-like theorem constructing

[ holds(A, ♣) , ... ] holds(A, ♣) :- holds(A, or(♦, ♥)), holds(A, implies(♦, ♣)), holds(A, implies(♥, ♣)) holds(A, Z) :- holds(A, or(X, Y)), holds(A, implies(X, Z)), holds(A, implies(Y, Z)) [ , , , ... ]

bad idea

16

slide-18
SLIDE 18

Prolog-like theorem constructing

[ T ] holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) holds(A, and(X, Y)) :- holds(A, X), holds(A, Y) [ holds(A, X), holds(A, Y) ]

17

slide-19
SLIDE 19

Prolog-like theorem constructing

T

holds(X:A, and(X, not(not(X)))) holds(x:a, and(x, not(not(x))))

18

slide-20
SLIDE 20

Forcing termination of the game

Step limit:

  • ugly extension of game state
  • strategy may depend on number of steps left
  • even if we hide it, there is a correlation:

large term constructed ∼ few steps left ∼ will likely lose

19

slide-21
SLIDE 21

Forcing termination of the game

Sudden death chance:

  • game states nicely equal
  • no hard limit for length of a theorem

During training playout, randomly terminate game with chance pd. In MCTS, adjust value v′ = (−1) · pd + v · (1 − pd).

20

slide-22
SLIDE 22

Disadvantages of this game

  • two different players - if one player starts winning every game, we can’t

learn much

  • proof use single inference steps - inefficient
  • players don’t take turns - MCTS not designed for that situation

21

slide-23
SLIDE 23

Not using maximum

22

slide-24
SLIDE 24

Not using maximum

23

slide-25
SLIDE 25

Not using maximum

24

slide-26
SLIDE 26

Not using maximum

25

slide-27
SLIDE 27

Certainty propagation

26

slide-28
SLIDE 28

Certainty propagation

27

slide-29
SLIDE 29

Certainty propagation

28

slide-30
SLIDE 30

Certainty propagation

for uncertain leafs: v = a = l = −1 u = 1 for certain leafs: v = result a = result l = result u = result recursively: v = min(u, max(l, a)) a = +Σvi·ni

n+1

l = maxi li u = maxi ui when player changes:

  • values and bounds flip
  • lower and upper bound switch places

29

slide-31
SLIDE 31

Learning the proving game

Like AlphaZero, with few differences:

  • using Transformer (encoder) for
  • for theorems that prover failed to prove, show proper path with additional

training samples

  • during evaluation, greedy policy and step limit instead of sudden death
  • balance training batches to have even split of won and lost games

30

slide-32
SLIDE 32

Proving game evaluation

Construct a theorem evaluation theorem Prove the theorem Adversary wins Prover wins

31

slide-33
SLIDE 33

Potential problems

Players are non symmetrical:

  • Prover could be winning everything
  • Adversary could be winning everything

to some extent this is handled by additional training samples

can be solved by more exploration

32

slide-34
SLIDE 34

Uninteresting space of hard theorems

∃xf(x) = y (where f is a one-way function)

  • easy to prove if you can choose what y is
  • hard to prove if y is fixed

so hard that we can’t expect the prover to learn it

this is stable - more learning and/or exploration won’t help

33

slide-35
SLIDE 35

Results

(intuitionstic first-order - sequential calculus)

time (hours) solved theorems 5 10 15 20 5 10 15 20 25

34

slide-36
SLIDE 36

Results

Solved:

⊢ (∀a∀bpc(fc(a, b)) → ∃d∃epc(fc(d, e))) ⊢ (¬(pa(∅) → pb(∅)) → (pb(∅) → pa(∅)))

Unsolved:

⊢ (∃apb(a) → ∃cpb(c))

(3)

35

slide-37
SLIDE 37

Results

(intuitionstic first-order - sequential calculus)

time 0% 25% 50% 75% 100% 5 10 15 20 construction failed proven not proven 36

slide-38
SLIDE 38

Results

(intuitionistic first-order - sequential calculus) unproven theorems - first hour: A, ⊥ ⊢ C

⊢ (⊥ → B) (A → B), A ⊢ B

A, B, C, D, E, F, G, H ⊢ H A, B, C, D, E, F, G, H, I, J, K, L, M ⊢ M A, B, C, D, E, F, G, H, I ⊢ I

37

slide-39
SLIDE 39

Results

(intuitionistic first-order - sequential calculus) unproven theorems - second hour:

∀aΩaC ⊢ ΩaC ⊢ (B ∨ (¬⊥ ∨ C)) (A ∧ ΩcΩeF) ⊢ ∃eΩcΩeF (A ∧ B) ⊢ (D → B) (A ∧ B) ⊢ (D ∨ A) ⊢ ((B ∧ (C ∧ D)) → C)

38

slide-40
SLIDE 40

Results

(intuitionistic first-order - sequential calculus) unproven theorems - third hour:

∀a(ΩcΩaE ∧ Ωg(ΩaJ ⋆ ΩaL)) ⊢ Ωg(ΩaJ ⋆ ΩaL)

A, B, C, D, E, F, G, ((H ∧ ⊥) ∧ I) ⊢ ¬K A, B, C, D, E, F, G, H, ⊥ ⊢ (J ∨ K) A, ¬B, C, (D ∧ B) ⊢ (F ∨ G)

∀a(pb(fc(fd(a, ∅), ∅)) ∧ ⊥), ¬¬E ⊢ ∃gΩgI

A, B, ¬C, D, E, (C ∧ F) ⊢ (H ↔ ¬⊥)

39

slide-41
SLIDE 41

Results

(intuitionistic first-order - sequential calculus) unproven theorems - twelth hour: A, B, (∀cΩe(ΩcH ⋆ ¬¬¬¬ΩjΩl¬(¬⊥ ⋆ (¬¬(⊥ ⋆ ΩcQ) ⋆ ¬¬ΩcS))) ↔ A)

⊢ Ωe(ΩcH ⋆ ¬¬¬¬ΩjΩl¬(¬⊥ ⋆ (¬¬(⊥ ⋆ ΩcQ) ⋆ ¬¬ΩcS)))

A, B, (∀cX ↔ A) ⊢ X

40

slide-42
SLIDE 42

How to do better

  • train longer and/or harder

costly

  • relegate low-level reasoning to some more efficient solver

need to invent some other mechanism for generating theorems

  • allow use of theorems, not only axioms

action space becomes large and changing over time all above still face uninteresting theorem space

  • use some other objective

would be nice to find theorems that are useful in proving other theorems – but how exactly would that work?

41

slide-43
SLIDE 43

Thank you for your attention!

Stanisław Purgał