SLIDE 1
Machine Learning on Knowledge Bases Marcus Pfeiffer 19th June 2018 - - PowerPoint PPT Presentation
Machine Learning on Knowledge Bases Marcus Pfeiffer 19th June 2018 - - PowerPoint PPT Presentation
Machine Learning on Knowledge Bases Marcus Pfeiffer 19th June 2018 Introduction Data Preprocessing Machine Learning Algorithms for Premise Selection Premise Selectors in Detail Comparison and Outlook 1 Introduction
SLIDE 2
SLIDE 3
Introduction
SLIDE 4
Knowledge Bases and Provers
❼ Knowledge Bases
❼ ❼ ❼ ❼
❼
❼ ❼ ❼ ❼ ❼
❼
2
SLIDE 5
Knowledge Bases and Provers
❼ Knowledge Bases
❼ Mizar Mathematical Library (MML) ❼ ❼ ❼
❼
❼ ❼ ❼ ❼ ❼
❼
2
SLIDE 6
Knowledge Bases and Provers
❼ Knowledge Bases
❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ ❼
❼
❼ ❼ ❼ ❼ ❼
❼
2
SLIDE 7
Knowledge Bases and Provers
❼ Knowledge Bases
❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc
❼
❼ ❼ ❼ ❼ ❼
❼
2
SLIDE 8
Knowledge Bases and Provers
❼ Knowledge Bases
❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc
❼ Automatic Theorem Provers
❼ ❼ ❼ ❼ ❼
❼
2
SLIDE 9
Knowledge Bases and Provers
❼ Knowledge Bases
❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc
❼ Automatic Theorem Provers
❼ E ❼ Vampire ❼ CVC4 ❼ SPASS ❼ Z3
❼
2
SLIDE 10
Knowledge Bases and Provers
❼ Knowledge Bases
❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc
❼ Automatic Theorem Provers
❼ E ❼ Vampire ❼ CVC4 ❼ SPASS ❼ Z3
❼ Integration of Automatic Provers in Interactive Provers
2
SLIDE 11
Premise Selection problem
Definition (Premise Selection problem) ❼ ATP A ❼ large number of premises P ❼ conjecture c
3
SLIDE 12
Premise Selection problem
Definition (Premise Selection problem) ❼ ATP A ❼ large number of premises P ❼ conjecture c Predict those premises from P that are likely of use to A for constructing a proof for c
3
SLIDE 13
Introduction Example
map f xs = ys ⇒ zip(rev xs)(rev ys) = rev(zipxs ys)
4
SLIDE 14
Introduction Example
map f xs = ys ⇒ zip(rev xs)(rev ys) = rev(zipxs ys) A proof can be found including the following two lemmas: lengthxs = lengthys ⇒ zip(rev xs)(rev ys) = rev(zipxs ys) (1) length(mapf xs) = lengthxs (2)
4
SLIDE 15
Introduction Example
used[] = usedevs
5
SLIDE 16
Introduction Example
used[] = usedevs A straightforward proof uses the following lemmas: used[] = ⋃
B
parts(initStateB) (3) X ∈ parts(initStateB) ⇒ X ∈ usedevs (4) (⋀x.x ∈ A ⇒ x ∈ B) ⇒ A ⊆ B (5) b ∈ ⋃
x∈A
Bx ← → ∃x ∈ A.b ∈ Bx (6)
5
SLIDE 17
Data Preprocessing
SLIDE 18
Dependency
Definition (Dependency) A definition or theorem T depends on some definition, lemma or theorem T ′, if T ′ is needed for the proof of T
6
SLIDE 19
Theory dependencies graph of Free Groups
C2 Cancelation FreeGroups Generators Isomorphisms PingPongLemma UnitGroup [Applicative_Lifting] [HOL-Algebra] [HOL-Analysis] [HOL-Cardinals] [HOL-Computational_Algebra] [HOL-Library] [HOL-Nonstandard_Analysis] [HOL-Probability] [HOL-Proofs-Lambda] [HOL-Proofs] [HOL] [Pure]
7
SLIDE 20
Example of feature extraction
For given depth 2, g (h x a) with x ∶∶ τ we get the following structural features:
8
SLIDE 21
Example of feature extraction
For given depth 2, g (h x a) with x ∶∶ τ we get the following structural features: x a g g(h ) h h x h a h x a
8
SLIDE 22
Example of feature extraction
For given depth 2, g (h x a) with x ∶∶ τ we get the following structural features: x a g g(h ) h h x h a h x a which are simplified to τ a g g(h) h h(τ) h(a) h(τ,a)
8
SLIDE 23
Another example of feature extraction
transpose(map(map f) xss) = map(map f)(transpose xss) has the features: map map(list list) fun map(fun) map(map, list list) list map(map) transpose list list map(transpose) transpose(map) List map(map, transpose) transpose(list list)
9
SLIDE 24
Stored Information
(c,usedPremises(c),F(c))
10
SLIDE 25
Machine Learning Algorithms for Premise Selection
SLIDE 26
k-Nearest-Neighbors
”Nearness” of two formulas a and b in terms of shared features:
11
SLIDE 27
k-Nearest-Neighbors
”Nearness” of two formulas a and b in terms of shared features: n(a,b) = ∑
t∈F(a)∩F(b)
w(t)τ1 (7)
11
SLIDE 28
k-Nearest-Neighbors
”Nearness” of two formulas a and b in terms of shared features: n(a,b) = ∑
t∈F(a)∩F(b)
w(t)τ1 (7) N ∶= k nearest neighbors of c
11
SLIDE 29
k-Nearest-Neighbors
”Nearness” of two formulas a and b in terms of shared features: n(a,b) = ∑
t∈F(a)∩F(b)
w(t)τ1 (7) N ∶= k nearest neighbors of c Relevancec(p) = ⎛ ⎝τ2 ∑
q∈N ∣ p∈usedPremises(q)
n(q,c) ∣usedPremises(q)∣ ⎞ ⎠+ ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ n(p,c) if p ∈ N
- therwise
(8)
11
SLIDE 30
Naive Bayes
P(p is used in the proof of c) (9)
12
SLIDE 31
Naive Bayes
P(p is used in the proof of c) (9) ≈ P(p is used to prove c’∣c’ has features F(c)) (10)
12
SLIDE 32
Naive Bayes
P(p is used in the proof of c) (9) ≈ P(p is used to prove c’∣c’ has features F(c)) (10) P(p is used in the proof of c’) ⋅ ∏
t∈F(c)∩ ¯ F(p)
P(c’ has feature t∣p is used in the proof of c’) ⋅ ∏
t∈F(c)− ¯ F(p)
P(p has feature t∣p is not used in the proof of c’) ⋅ ∏
t∈ ¯ F(p)−F(c)
P(c’ does not have feature t∣t is used in the proof of c’) (11)
12
SLIDE 33
Naive Bayes
❼ r(q) = number of times a fact q occurs as a dependency ❼ ❼
13
SLIDE 34
Naive Bayes
❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼
13
SLIDE 35
Naive Bayes
❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs.
13
SLIDE 36
Naive Bayes
❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs.
13
SLIDE 37
Naive Bayes
❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs. P(p is used in the proof of c’) = r(p) K (12)
13
SLIDE 38
Naive Bayes
❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs. P(p is used in the proof of c’) = r(p) K (12) P(c’ has feature t∣p is used in the proof of c’) = s(p,t) r(p) (13)
13
SLIDE 39
Naive Bayes
❼ r(q) = number of times a fact q occurs as a dependency ❼ s(q,t) = number of times a fact q occurs as a dependency of a fact described with feature t. ❼ K = total number of known proofs. Relevancec(p) = σ1 ln(r(p)) +
∑
t∈F(c′)∩ ¯ F(p)
w(f )ln σ2s(p,t) r(p) +σ3
∑
t∈ ¯ F(p)−F(c)
w(f )ln(1 − s(p,t) r(t) ) + σ4 ∑
t∈F(c)− ¯ F(p)
w(f )
14
SLIDE 40
Premise Selectors in Detail
SLIDE 41
MePo
For a set K of known symbols let k = number of known symbols occurring in the fact u = number of unknown symbols occurring in the fact
15
SLIDE 42
MePo
For a set K of known symbols let k = number of known symbols occurring in the fact u = number of unknown symbols occurring in the fact
- 1. For each fact, compute as its score k/(k + u)
- 2. Select the facts with the highest score;
add the features of the selected facts to the set of known symbols K.
15
SLIDE 43
MaSh, MeSh
❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ ❼
16
SLIDE 44
MaSh, MeSh
❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ MeSh: Combination of MaSh and a MePo like selector ❼
16
SLIDE 45
MaSh, MeSh
❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ MeSh: Combination of MaSh and a MePo like selector ❼ In practice: Combined with a proximity selector
16
SLIDE 46
MaSh, MeSh
❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ MeSh: Combination of MaSh and a MePo like selector ❼ In practice: Combined with a proximity selector
16
SLIDE 47
Comparison and Outlook
SLIDE 48
Metrics
for p1,...pn selected by selection algorithm Full recall: min
k∈N usedPremises(c) ⊂ {p1,...,pk} 17
SLIDE 49
Metrics
for p1,...pn selected by selection algorithm Full recall: min
k∈N usedPremises(c) ⊂ {p1,...,pk}
k-Coverage ∣{p1,...,pk} ∩ usedPremises(c)∣ min{k,∣usedPremises(c)∣}
17
SLIDE 50
Comparison in Metric
MaSh MeSh Formalization MePo NB kNN NB kNN Auth 647 104 143 96 112 IsaFoR 1332 513 604 517 570 Jinja 839 244 306 229 256 List 1083 234 263 259 271 Nominal2 1045 220 276 229 264 Probability 1324 424 422 393 395
Figure 1: Average full recall
18
SLIDE 51
Comparison in Metric
Figure 2: k-Coverage for IsaFoR
19
SLIDE 52
Comparison in Performance
Figure 3: Success rates
20
SLIDE 53
Summary and Outlook
❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ ❼ ❼ ❼ ❼
21
SLIDE 54
Summary and Outlook
❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ ❼ ❼ ❼
21
SLIDE 55
Summary and Outlook
❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ Fine grained dependency analysis and good feature extraction enable the successful application of machine learning algorithms ❼ ❼ ❼
21
SLIDE 56
Summary and Outlook
❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ Fine grained dependency analysis and good feature extraction enable the successful application of machine learning algorithms ❼ MaSh and MeSh outperform MePo in terms of the chosen metrics ❼ ❼
21
SLIDE 57
Summary and Outlook
❼ Fact Selection is an important problem in Automatic Theorem Proving ❼ MePo performs well on problems with rare symbols ❼ Fine grained dependency analysis and good feature extraction enable the successful application of machine learning algorithms ❼ MaSh and MeSh outperform MePo in terms of the chosen metrics ❼ Similar results for SNoW and MOR compared to Vampire-SInE for MML ❼
21
SLIDE 58