mdmax at School Learning Selection in Equational Reasoning Sarah - - PowerPoint PPT Presentation

m dmax at school
SMART_READER_LITE
LIVE PREVIEW

mdmax at School Learning Selection in Equational Reasoning Sarah - - PowerPoint PPT Presentation

mdmax at School Learning Selection in Equational Reasoning Sarah Winkler University of Innsbruck 4th Conference on Artificial Intelligence and Theorem Proving 10 April 2019 1 Equational Theorem Proving 0 + x x ( x ) + x 0 x + (


slide-1
SLIDE 1

mædmax at School

Learning Selection in Equational Reasoning

Sarah Winkler University of Innsbruck

4th Conference on Artificial Intelligence and Theorem Proving 10 April 2019 1

slide-2
SLIDE 2

Equational Theorem Proving

0 + x ≈ x (−x) + x ≈ 0 x + (y + z) ≈ (x + y) + z x + y ≈ y + x x · (y · z) ≈ (x · y) · z (x + y) · z ≈ (x · z) + (y · z) a · 0 ≈ 0 YES timeout NO ◮

input: set of equations E0 and goal s ≈ t

  • utput:

YES if E0 s ≈ t or NO otherwise

2

slide-3
SLIDE 3

Equational Theorem Proving

0 + x ≈ x (−x) + x ≈ 0 x + (y + z) ≈ (x + y) + z x + y ≈ y + x x · (y · z) ≈ (x · y) · z (x + y) · z ≈ (x · z) + (y · z) a · 0 ≈ 0 YES timeout NO ◮

input: set of equations E0 and goal s ≈ t

  • utput:

YES if E0 s ≈ t or NO otherwise

maedmax

equational theorem proving tool based on maximal ordered completion

2

slide-4
SLIDE 4

Equational Theorem Proving

0 + x ≈ x (−x) + x ≈ 0 x + (y + z) ≈ (x + y) + z x + y ≈ y + x x · (y · z) ≈ (x · y) · z (x + y) · z ≈ (x · z) + (y · z) a · 0 ≈ 0 YES timeout NO ◮

input: set of equations E0 and goal s ≈ t

  • utput:

YES if E0 s ≈ t or NO otherwise

maedmax

equational theorem proving tool based on maximal ordered completion (maedmax: equational deduction maximized)

2

slide-5
SLIDE 5

Equational Theorem Proving

0 + x ≈ x (−x) + x ≈ 0 x + (y + z) ≈ (x + y) + z x + y ≈ y + x x · (y · z) ≈ (x · y) · z (x + y) · z ≈ (x · z) + (y · z) a · 0 ≈ 0 YES timeout NO ◮

input: set of equations E0 and goal s ≈ t

  • utput:

YES if E0 s ≈ t or NO otherwise

maedmax

equational theorem proving tool based on maximal ordered completion (maedmax: equational deduction maximized)

  • S. Winkler and G. Moser.

Maedmax: A Maximal Ordered Completion Tool. In Proc. 9th IJCAR, LNCS 10900, pp. 472-480, 2018. 2

slide-6
SLIDE 6

Content

Maximal Ordered Completion Learning Experiments Equation Selection Proof Progress Conclusion

3

slide-7
SLIDE 7

Maximal Ordered Completion

(E, G)

0 initialize equations and goals (E, G) to (E0, {s ≈ t})

4

slide-8
SLIDE 8

Maximal Ordered Completion

R ⊆ E±

(E, G)

0 initialize equations and goals (E, G) to (E0, {s ≈ t}) 1 get terminating rewrite system R from E

4

slide-9
SLIDE 9

Maximal Ordered Completion

R ⊆ E± success?

R (E, G)

0 initialize equations and goals (E, G) to (E0, {s ≈ t}) 1 get terminating rewrite system R from E 2 check for joinable goal or saturation

4

slide-10
SLIDE 10

Maximal Ordered Completion

R ⊆ E± success?

R (E, G) yes

0 initialize equations and goals (E, G) to (E0, {s ≈ t}) 1 get terminating rewrite system R from E 2 check for joinable goal or saturation

4

slide-11
SLIDE 11

Maximal Ordered Completion

R ⊆ E± success?

R (E, G) yes

0 initialize equations and goals (E, G) to (E0, {s ≈ t}) 1 get terminating rewrite system R from E 2 check for joinable goal or saturation

4

slide-12
SLIDE 12

Maximal Ordered Completion

R ⊆ E± success?

R

E := E ∪ SE G := G ∪ SG

no (E, G) yes

0 initialize equations and goals (E, G) to (E0, {s ≈ t}) 1 get terminating rewrite system R from E 2 check for joinable goal or saturation 3 add some critical pairs SE ⊆ CP>(R ∪ E) and SG ⊆ CP>(R ∪ E, G)

4

slide-13
SLIDE 13

Maximal Ordered Completion

R ⊆ E± success?

R

E := E ∪ SE G := G ∪ SG

no (E, G) yes

0 initialize equations and goals (E, G) to (E0, {s ≈ t}) 1 get terminating rewrite system R from E 2 check for joinable goal or saturation 3 add some critical pairs SE ⊆ CP>(R ∪ E) and SG ⊆ CP>(R ∪ E, G),

repeat from

1

4

slide-14
SLIDE 14

Maximal Ordered Completion

R ⊆ E±

success?

R

E := E ∪ SE G := G ∪ SG

no (E, G) yes

0 initialize equations and goals (E, G) to (E0, {s ≈ t}) 1 get terminating rewrite system R from E 2 check for joinable goal or saturation 3 add some critical pairs SE ⊆ CP>(R ∪ E) and SG ⊆ CP>(R ∪ E, G),

repeat from

1

maxSMT solver

4

slide-15
SLIDE 15

Critical Choice Points

finding R 5

slide-16
SLIDE 16

Critical Choice Points

finding R selection

Selecting things

selection of SE and SG is highly critical

  • n average only 15% of selected equations and goals used for proof

learn better choice criteria

5

slide-17
SLIDE 17

Critical Choice Points

finding R selection proof progress estimate

Estimating proof progress

heuristic proof progress estimate: if stuck

◮ add additional (old) equations ◮ or ultimately restart

learn better estimate

5

slide-18
SLIDE 18

it’s all about the next (small) thing

6

slide-19
SLIDE 19

Content

Maximal Ordered Completion Learning Experiments Equation Selection Proof Progress Conclusion

7

slide-20
SLIDE 20

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

8

slide-21
SLIDE 21

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

  • N. Augsten, M. Böhlen, and J. Gamper.

The pq-gram distance between ordered labeled trees. ACM Transactions on Database Systems, 35(1):1–36, 2010. 8

slide-22
SLIDE 22

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

f(i(f(i(x), y)), i(x)) ≈ i(y)

8

slide-23
SLIDE 23

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

f ∗ i ∗ f ∗ i ∗ x ∗ y ∗ ∗ i ∗ x ∗ ∗ i ∗ y ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

8

slide-24
SLIDE 24

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

8

slide-25
SLIDE 25

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

add dummy nodes

8

slide-26
SLIDE 26

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

add dummy nodes

pq-grams: “go p − 1 down, q right”

8

slide-27
SLIDE 27

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

add dummy nodes

pq-grams: “go p − 1 down, q right”

for p = 2, q = 1 obtain 2.*.1

8

slide-28
SLIDE 28

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

add dummy nodes

pq-grams: “go p − 1 down, q right”

for p = 2, q = 1 obtain 2.*.1, 2.1.1

8

slide-29
SLIDE 29

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

add dummy nodes

pq-grams: “go p − 1 down, q right”

for p = 2, q = 1 obtain 2.*.1, 2.1.1, 2.1.*

8

slide-30
SLIDE 30

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

add dummy nodes

pq-grams: “go p − 1 down, q right”

for p = 2, q = 1 obtain 2.*.1, 2.1.1, 2.1.*, 1.*.2, 1.2.*, 2.*.1, 2.1.X, 2.X.*, 1.*.X, 1.X.*, 1.*.X, 1.X.* and 1.*.X, 1.X.*

8

slide-31
SLIDE 31

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

add dummy nodes

pq-grams: “go p − 1 down, q right”

for p = 2, q = 1 obtain 2.*.1, 2.1.1, 2.1.*, 1.*.2, 1.2.*, 2.*.1, 2.1.X, 2.X.*, 1.*.X, 1.X.*, 1.*.X, 1.X.* and 1.*.X, 1.X.*

count up to arity 3

8

slide-32
SLIDE 32

Learning Equation Selection

Features

state: number of iterations, equations, and goals

equation:

◮ hand-crafted: polarity, size, size difference, age, orientability,

linearity, duplicatingness, # of matches and critical pairs on E

◮ term structure: ∼200 pq-gram counts

Example (pq-Grams)

2 ∗ 1 ∗ 2 ∗ 1 ∗ X ∗ X ∗ ∗ 1 ∗ X ∗ ∗ 1 ∗ X ∗ ≈ f(i(f(i(x), y)), i(x)) ≈ i(y)

abstract symbol names to arities, variables to X

add dummy nodes

pq-grams: “go p − 1 down, q right”

for p = 2, q = 1 obtain 2.*.1, 2.1.1, 2.1.*, 1.*.2, 1.2.*, 2.*.1, 2.1.X, 2.X.*, 1.*.X, 1.X.*, 1.*.X, 1.X.* and 1.*.X, 1.X.*

count up to arity 3

1 polarity 2 size 10 3 size diff 6 4 age 42 . . . 16 2.2.1 17 2.*.1 2 18 2.1.1 1 19 2.1.0 20 1.*.X 2 21 1.X.* 2 . . . 121 2.2.1 122 2.*.1 123 2.1.1 124 2.1.0 125 1.*.X 1 126 1.X.* 1 . . .

8

slide-33
SLIDE 33

Binary classification

classify equation as positive (useful) if occurs in proof, negative otherwise

9

slide-34
SLIDE 34

Binary classification

classify equation as positive (useful) if occurs in proof, negative otherwise

Data collection

run mædmax in different strategies, recording selections

check classification of all selected literals upon successful proof

  • nly 15% positives: duplicate positives for balancing

9

slide-35
SLIDE 35

Binary classification

classify equation as positive (useful) if occurs in proof, negative otherwise

Data collection

run mædmax in different strategies, recording selections

check classification of all selected literals upon successful proof

  • nly 15% positives: duplicate positives for balancing

Classifier

random forest of 100 trees and maximal depth 14

fast evaluation, in tests slightly better recall than SVC, extra trees

9

slide-36
SLIDE 36

Binary classification

classify equation as positive (useful) if occurs in proof, negative otherwise

Data collection

run mædmax in different strategies, recording selections

check classification of all selected literals upon successful proof

  • nly 15% positives: duplicate positives for balancing

Classifier

random forest of 100 trees and maximal depth 14

fast evaluation, in tests slightly better recall than SVC, extra trees

Setup

evaluate data with Python Scikit

export random forest to json, load into mædmax (OCaml)

9

slide-37
SLIDE 37

Reinforcement loop experiment

start with random selection M0

Results

solved useful M0 206 8%

run on 897 UEQ problems in TPTP, 60s timeout

10

slide-38
SLIDE 38

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

Results

solved useful |Si| M0 206 8% 114K

run on 897 UEQ problems in TPTP, 60s timeout

10

slide-39
SLIDE 39

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

build classifier

Results

solved useful |Si| precision recall f1 M0 206 8% 114K 0.86 0.94 0.9

run on 897 UEQ problems in TPTP, 60s timeout

10

slide-40
SLIDE 40

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

build classifier: M1 selects literals e with Ppositive(e) > 0.4

Results

solved useful |Si| precision recall f1 M0 206 8% 114K 0.86 0.94 0.9 M1 409 14%

run on 897 UEQ problems in TPTP, 60s timeout

10

slide-41
SLIDE 41

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

build classifier: M1 selects literals e with Ppositive(e) > 0.4

selections S1: run M1 with 3 different strategies, add to S0

Results

solved useful |Si| precision recall f1 M0 206 8% 114K 0.86 0.94 0.9 M1 409 14% 358K

run on 897 UEQ problems in TPTP, 60s timeout

10

slide-42
SLIDE 42

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

build classifier: M1 selects literals e with Ppositive(e) > 0.4

selections S1: run M1 with 3 different strategies, add to S0

. . .

Results

solved useful |Si| precision recall f1 M0 206 8% 114K 0.86 0.94 0.9 M1 409 14% 358K 0.77 0.83 0.8 M2 423 15% 528K 0.76 0.83 0.79 M3 433 14% 704K 0.78 0.8 0.79

run on 897 UEQ problems in TPTP, 60s timeout

10

slide-43
SLIDE 43

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

build classifier: M1 selects literals e with Ppositive(e) > 0.4

selections S1: run M1 with 3 different strategies, add to S0

. . .

Results

solved useful |Si| precision recall f1 M0 206 8% 114K 0.86 0.94 0.9 M1 409 14% 358K 0.77 0.83 0.8 M2 423 15% 528K 0.76 0.83 0.79 M3 433 14% 704K 0.78 0.8 0.79

run on 897 UEQ problems in TPTP, 60s timeout

time spent on selection rises from 0% (random) to 8-10%

10

slide-44
SLIDE 44

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

build classifier: M1 selects literals e with Ppositive(e) > 0.4

selections S1: run M1 with 3 different strategies, add to S0

. . .

Results

solved useful |Si| precision recall f1 M0 206 8% 114K 0.86 0.94 0.9 M1 409 14% 358K 0.77 0.83 0.8 M2 423 15% 528K 0.76 0.83 0.79 M3 433 14% 704K 0.78 0.8 0.79

run on 897 UEQ problems in TPTP, 60s timeout

  • ther baselines: size solves 575, fifo 366, best 609

10

slide-45
SLIDE 45

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

build classifier: M1 selects literals e with Ppositive(e) > 0.4

selections S1: run M1 with 3 different strategies, add to S0

. . .

Results

solved useful |Si| precision recall f1 M0 206 8% 114K 0.86 0.94 0.9 M1 409 14% 358K 0.77 0.83 0.8 M2 423 15% 528K 0.76 0.83 0.79 M3 433 14% 704K 0.78 0.8 0.79

run on 897 UEQ problems in TPTP, 60s timeout

  • ther baselines: size solves 575, fifo 366, best 609

combination with old strategy solves 625

10

slide-46
SLIDE 46

Reinforcement loop experiment

start with random selection M0

collect selections S0 running M0 with 3 different strategies

build classifier: M1 selects literals e with Ppositive(e) > 0.4

selections S1: run M1 with 3 different strategies, add to S0

. . .

Results

solved useful |Si| precision recall f1 M0 206 8% 114K 0.86 0.94 0.9 M1 409 14% 358K 0.77 0.83 0.8 M2 423 15% 528K 0.76 0.83 0.79 M3 433 14% 704K 0.78 0.8 0.79

run on 897 UEQ problems in TPTP, 60s timeout

  • ther baselines: size solves 575, fifo 366, best 609

feature importance: 60% pq-grams, 40% hand crafted most useful: literal size, # active literals, # matches and CPs

10

slide-47
SLIDE 47

Learning Proof Progress

State features

◮ size of E ◮ # SMT checks ◮ cost of last maxSMT check ◮ memory used ◮ # CPs with R ◮ # facts in E reducible by last R ◮ . . . 11

slide-48
SLIDE 48

Learning Proof Progress

State features

◮ size of E ◮ # SMT checks ◮ cost of last maxSMT check ◮ memory used ◮ # CPs with R ◮ # facts in E reducible by last R ◮ . . .

Data collection

collect TSTP proofs from mædmax, E, Vampire

11

slide-49
SLIDE 49

Learning Proof Progress

State features

◮ size of E ◮ # SMT checks ◮ cost of last maxSMT check ◮ memory used ◮ # CPs with R ◮ # facts in E reducible by last R ◮ . . .

Data collection

collect TSTP proofs from mædmax, E, Vampire

add proof tracking mode to mædmax: given input proof P, check progress wrt P in every iteration

11

slide-50
SLIDE 50

Learning Proof Progress

State features

◮ size of E ◮ # SMT checks ◮ cost of last maxSMT check ◮ memory used ◮ # CPs with R ◮ # facts in E reducible by last R ◮ . . .

Data collection

collect TSTP proofs from mædmax, E, Vampire

add proof tracking mode to mædmax: given input proof P, check progress wrt P in every iteration (progress: unseen literals to passive set, passive literals to active set)

11

slide-51
SLIDE 51

Learning Proof Progress

State features

◮ size of E ◮ # SMT checks ◮ cost of last maxSMT check ◮ memory used ◮ # CPs with R ◮ # facts in E reducible by last R ◮ . . .

Data collection

collect TSTP proofs from mædmax, E, Vampire

add proof tracking mode to mædmax: given input proof P, check progress wrt P in every iteration (progress: unseen literals to passive set, passive literals to active set)

store difference of proof state vectors between two iterations, along with progress classification

11

slide-52
SLIDE 52

Learning Proof Progress

State features

◮ size of E ◮ # SMT checks ◮ cost of last maxSMT check ◮ memory used ◮ # CPs with R ◮ # facts in E reducible by last R ◮ . . .

Data collection

collect TSTP proofs from mædmax, E, Vampire

add proof tracking mode to mædmax: given input proof P, check progress wrt P in every iteration (progress: unseen literals to passive set, passive literals to active set)

store difference of proof state vectors between two iterations, along with progress classification

get data of about 20K iterations

11

slide-53
SLIDE 53

Classifier

random forest of 100 trees and maximal depth 10

binary classification: progress or no progress

12

slide-54
SLIDE 54

Classifier

random forest of 100 trees and maximal depth 10

binary classification: progress or no progress

Evaluation

cross-validated precision and recall of 0.72

manually design new decision tree based on most influential features: gain 1.5% new problems with best strategy

12

slide-55
SLIDE 55

Classifier

random forest of 100 trees and maximal depth 10

binary classification: progress or no progress

Evaluation

cross-validated precision and recall of 0.72

manually design new decision tree based on most influential features: gain 1.5% new problems with best strategy

Overall gain

+4.5% solved problems combining new selection classifier with old selection, and adding new progress estimate

12

slide-56
SLIDE 56

What worked

a bit more

13

slide-57
SLIDE 57

What worked

a bit more

What did not work

taking symbol names into account: TPTP problems too diverse?

13

slide-58
SLIDE 58

What worked

a bit more

What did not work

taking symbol names into account: TPTP problems too diverse?

What’s next

more data

more/other features

◮ more state features? ◮ longer pq-grams, or vertical ENIGMA features?

more experiments

◮ how to combine with previous selection strategy? ◮ use proof progress estimate also for restarts?

. . .

13