[PPT] - Machine Learning on Knowledge Bases Marcus Pfeiffer 19th June 2018 PowerPoint Presentation

SLIDE 1

Machine Learning on Knowledge Bases

Marcus Pfeiffer 19th June 2018

SLIDE 2

Introduction Data Preprocessing Machine Learning Algorithms for Premise Selection Premise Selectors in Detail Comparison and Outlook

1

SLIDE 3

Introduction

SLIDE 4

Knowledge Bases and Provers

❼ Knowledge Bases

❼ ❼ ❼ ❼

❼

❼ ❼ ❼ ❼ ❼

❼

2

SLIDE 5

Knowledge Bases and Provers

❼ Knowledge Bases

❼ Mizar Mathematical Library (MML) ❼ ❼ ❼

❼

❼ ❼ ❼ ❼ ❼

❼

2

SLIDE 6

Knowledge Bases and Provers

❼ Knowledge Bases

❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ ❼

❼

❼ ❼ ❼ ❼ ❼

❼

2

SLIDE 7

Knowledge Bases and Provers

❼ Knowledge Bases

❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc

❼

❼ ❼ ❼ ❼ ❼

❼

2

SLIDE 8

Knowledge Bases and Provers

❼ Knowledge Bases

❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc

❼ Automatic Theorem Provers

❼ ❼ ❼ ❼ ❼

❼

2

SLIDE 9

Knowledge Bases and Provers

❼ Knowledge Bases

❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc

❼ Automatic Theorem Provers

❼ E ❼ Vampire ❼ CVC4 ❼ SPASS ❼ Z3

❼

2

SLIDE 10

Knowledge Bases and Provers

❼ Knowledge Bases

❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc

❼ Automatic Theorem Provers

❼ E ❼ Vampire ❼ CVC4 ❼ SPASS ❼ Z3

❼ Integration of Automatic Provers in Interactive Provers

2

SLIDE 11

Premise Selection problem

Definition (Premise Selection problem) ❼ ATP A ❼ large number of premises P ❼ conjecture c

3

SLIDE 12

Premise Selection problem

Definition (Premise Selection problem) ❼ ATP A ❼ large number of premises P ❼ conjecture c Predict those premises from P that are likely of use to A for constructing a proof for c

3

SLIDE 13

Introduction Example

map f xs = ys ⇒ zip(rev xs)(rev ys) = rev(zipxs ys)

4

SLIDE 14

Introduction Example

map f xs = ys ⇒ zip(rev xs)(rev ys) = rev(zipxs ys) A proof can be found including the following two lemmas: lengthxs = lengthys ⇒ zip(rev xs)(rev ys) = rev(zipxs ys) (1) length(mapf xs) = lengthxs (2)

4

SLIDE 15

Introduction Example

used[] = usedevs

5

SLIDE 16

Introduction Example

used[] = usedevs A straightforward proof uses the following lemmas: used[] = ⋃

B

parts(initStateB) (3) X ∈ parts(initStateB) ⇒ X ∈ usedevs (4) (⋀x.x ∈ A ⇒ x ∈ B) ⇒ A ⊆ B (5) b ∈ ⋃

x∈A

Bx ← → ∃x ∈ A.b ∈ Bx (6)

5

SLIDE 17

Data Preprocessing

SLIDE 18

Dependency

Definition (Dependency) A definition or theorem T depends on some definition, lemma or theorem T ′, if T ′ is needed for the proof of T

6

SLIDE 19

Theory dependencies graph of Free Groups

C2 Cancelation FreeGroups Generators Isomorphisms PingPongLemma UnitGroup [Applicative_Lifting] [HOL-Algebra] [HOL-Analysis] [HOL-Cardinals] [HOL-Computational_Algebra] [HOL-Library] [HOL-Nonstandard_Analysis] [HOL-Probability] [HOL-Proofs-Lambda] [HOL-Proofs] [HOL] [Pure]

7

SLIDE 20

Example of feature extraction

For given depth 2, g (h x a) with x ∶∶ τ we get the following structural features:

8

SLIDE 21

Example of feature extraction

For given depth 2, g (h x a) with x ∶∶ τ we get the following structural features: x a g g(h ) h h x h a h x a

8

SLIDE 22

Example of feature extraction

For given depth 2, g (h x a) with x ∶∶ τ we get the following structural features: x a g g(h ) h h x h a h x a which are simplified to τ a g g(h) h h(τ) h(a) h(τ,a)

8

SLIDE 23

Another example of feature extraction

transpose(map(map f) xss) = map(map f)(transpose xss) has the features: map map(list list) fun map(fun) map(map, list list) list map(map) transpose list list map(transpose) transpose(map) List map(map, transpose) transpose(list list)

9

SLIDE 24

Stored Information

(c,usedPremises(c),F(c))

10

SLIDE 25

Machine Learning Algorithms for Premise Selection

SLIDE 26

k-Nearest-Neighbors

”Nearness” of two formulas a and b in terms of shared features:

11

SLIDE 27

k-Nearest-Neighbors

”Nearness” of two formulas a and b in terms of shared features: n(a,b) = ∑

t∈F(a)∩F(b)

w(t)τ1 (7)

11

SLIDE 28

k-Nearest-Neighbors

”Nearness” of two formulas a and b in terms of shared features: n(a,b) = ∑

t∈F(a)∩F(b)

w(t)τ1 (7) N ∶= k nearest neighbors of c

11

SLIDE 29

k-Nearest-Neighbors

”Nearness” of two formulas a and b in terms of shared features: n(a,b) = ∑

t∈F(a)∩F(b)

w(t)τ1 (7) N ∶= k nearest neighbors of c Relevancec(p) = ⎛ ⎝τ2 ∑

q∈N ∣ p∈usedPremises(q)

n(q,c) ∣usedPremises(q)∣ ⎞ ⎠+ ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ n(p,c) if p ∈ N

therwise

(8)

11

SLIDE 30

Naive Bayes

P(p is used in the proof of c) (9)

12

SLIDE 31

Naive Bayes

P(p is used in the proof of c) (9) ≈ P(p is used to prove c’∣c’ has features F(c)) (10)

12

SLIDE 32

Naive Bayes

P(p is used in the proof of c) (9) ≈ P(p is used to prove c’∣c’ has features F(c)) (10) P(p is used in the proof of c’) ⋅ ∏

t∈F(c)∩ ¯ F(p)

P(c’ has feature t∣p is used in the proof of c’) ⋅ ∏

t∈F(c)− ¯ F(p)

P(p has feature t∣p is not used in the proof of c’) ⋅ ∏

t∈ ¯ F(p)−F(c)

P(c’ does not have feature t∣t is used in the proof of c’) (11)

12

SLIDE 33

Naive Bayes

❼ r(q) = number of times a fact q occurs as a dependency ❼ ❼

13

SLIDE 34

Naive Bayes

❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼

13

SLIDE 35

Naive Bayes

❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs.

13

SLIDE 36

Naive Bayes

❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs.

13

SLIDE 37

Naive Bayes

❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs. P(p is used in the proof of c’) = r(p) K (12)

13

SLIDE 38

Naive Bayes

❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs. P(p is used in the proof of c’) = r(p) K (12) P(c’ has feature t∣p is used in the proof of c’) = s(p,t) r(p) (13)

13

SLIDE 39

Naive Bayes

❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described with feature t. ❼ K = total number of known proofs. Relevancec(p) = σ1 ln(r(p)) +

∑

t∈F(c′)∩ ¯ F(p)

w(f )ln σ2s(p,t) r(p) +σ3

∑

t∈ ¯ F(p)−F(c)

w(f )ln(1 − s(p,t) r(t) ) + σ4 ∑

t∈F(c)− ¯ F(p)

w(f )

14

SLIDE 40

Premise Selectors in Detail

SLIDE 41

MePo

For a set K of known symbols let k = number of known symbols occurring in the fact u = number of unknown symbols occurring in the fact

15

SLIDE 42

MePo

For a set K of known symbols let k = number of known symbols occurring in the fact u = number of unknown symbols occurring in the fact

1. For each fact, compute as its score k/(k + u)
2. Select the facts with the highest score;

add the features of the selected facts to the set of known symbols K.

15

SLIDE 43

MaSh, MeSh

❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ ❼

16

SLIDE 44

MaSh, MeSh

❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ MeSh: Combination of MaSh and a MePo like selector ❼

16

SLIDE 45

MaSh, MeSh

❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ MeSh: Combination of MaSh and a MePo like selector ❼ In practice: Combined with a proximity selector

16

SLIDE 46

MaSh, MeSh

❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ MeSh: Combination of MaSh and a MePo like selector ❼ In practice: Combined with a proximity selector

16

SLIDE 47

Comparison and Outlook

SLIDE 48

Metrics

for p1,...pn selected by selection algorithm Full recall: min

k∈N usedPremises(c) ⊂ {p1,...,pk} 17

SLIDE 49

Metrics

for p1,...pn selected by selection algorithm Full recall: min

k∈N usedPremises(c) ⊂ {p1,...,pk}

k-Coverage ∣{p1,...,pk} ∩ usedPremises(c)∣ min{k,∣usedPremises(c)∣}

17

SLIDE 50

Comparison in Metric

MaSh MeSh Formalization MePo NB kNN NB kNN Auth 647 104 143 96 112 IsaFoR 1332 513 604 517 570 Jinja 839 244 306 229 256 List 1083 234 263 259 271 Nominal2 1045 220 276 229 264 Probability 1324 424 422 393 395

Figure 1: Average full recall

18

SLIDE 51

Comparison in Metric

Figure 2: k-Coverage for IsaFoR

19

SLIDE 52

Comparison in Performance

Figure 3: Success rates

20

SLIDE 53

Summary and Outlook

❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ ❼ ❼ ❼ ❼

21

SLIDE 54

Summary and Outlook

❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ ❼ ❼ ❼

21

SLIDE 55

Summary and Outlook

❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ Fine grained dependency analysis and good feature extraction enable the successful application of machine learning algorithms ❼ ❼ ❼

21

SLIDE 56

Summary and Outlook

❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ Fine grained dependency analysis and good feature extraction enable the successful application of machine learning algorithms ❼ MaSh and MeSh outperform MePo in terms of the chosen metrics ❼ ❼

21

SLIDE 57

Summary and Outlook

❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ Fine grained dependency analysis and good feature extraction enable the successful application of machine learning algorithms ❼ MaSh and MeSh outperform MePo in terms of the chosen metrics ❼ Similar results for SNoW and MOR compared to Vampire-SInE for MML ❼

21

SLIDE 58

Summary and Outlook

❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ Fine grained dependency analysis and good feature extraction enable the successful application of machine learning algorithms ❼ MaSh and MeSh outperform MePo in terms of the chosen metrics ❼ Similar results for SNoW and MOR compared to Vampire-SInE for MML ❼ Possibilities of applying other machine learning algorithms, e.g. Neural Networks