Ontological Pathfinding: Mining First-Order Knowledge from Large - - PowerPoint PPT Presentation

ontological pathfinding mining first order knowledge from
SMART_READER_LITE
LIVE PREVIEW

Ontological Pathfinding: Mining First-Order Knowledge from Large - - PowerPoint PPT Presentation

Introduction Ontological Pathfinding Experiments Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri { yang,sean,daisyw } @cise.ufl.edu,


slide-1
SLIDE 1

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases

Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri

{yang,sean,daisyw}@cise.ufl.edu, soumitra.johri@ufl.edu

Computer and Information Science and Engineering University of Florida

SIGMOD’16, San Francisco, CA Jun 29, 2016

Ontological Pathfinding Jun 29, 2016 1/25

slide-2
SLIDE 2

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Outline

1

Introduction Knowledge Bases

2

Ontological Pathfinding Partitioning Parallel Rule Mining

3

Experiments Overall Result Partitioning

Ontological Pathfinding Jun 29, 2016 2/25

slide-3
SLIDE 3

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Outline

1

Introduction Knowledge Bases

2

Ontological Pathfinding Partitioning Parallel Rule Mining

3

Experiments Overall Result Partitioning

Ontological Pathfinding Jun 29, 2016 3/25

slide-4
SLIDE 4

Introduction Ontological Pathfinding Experiments

Knowledge Bases

A knowledge base organizes human information in a structured format.

Predicate Subject Object isLocatedIn Washington, D.C. United States hasCapital Canada Ottawa wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States dealsWith United States Canada

Ontological Pathfinding Jun 29, 2016 4/25

slide-5
SLIDE 5

Introduction Ontological Pathfinding Experiments

Knowledge Bases

A knowledge base organizes human information in a structured format.

H(x, y) b1(x, z) b2(y, z) dealsWith isLocatedIn isLocatedIn dealsWith imports exports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn

Ontological Pathfinding Jun 29, 2016 4/25

slide-6
SLIDE 6

Introduction Ontological Pathfinding Experiments

Knowledge Bases

A knowledge base organizes human information in a structured format.

H(x, y) b1(x, z) b2(y, z) dealsWith isLocatedIn isLocatedIn dealsWith imports exports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn Figure: Knowledge base examples.

Ontological Pathfinding Jun 29, 2016 4/25

slide-7
SLIDE 7

Introduction Ontological Pathfinding Experiments

Knowledge Bases

ProbKB

Ontological Pathfinding Jun 29, 2016 5/25

slide-8
SLIDE 8

Data Science Research

@

Introduction Ontological Pathfinding Experiments

First-Order Knowledge

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis → Kale helps prevent Osteoporosis.

Ontological Pathfinding Jun 29, 2016 6/25

slide-9
SLIDE 9

Data Science Research

@

Introduction Ontological Pathfinding Experiments

First-Order Knowledge

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis → Kale helps prevent Osteoporosis. Question answering;

Ontological Pathfinding Jun 29, 2016 6/25

slide-10
SLIDE 10

Data Science Research

@

Introduction Ontological Pathfinding Experiments

First-Order Knowledge

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis → Kale helps prevent Osteoporosis. Question answering; Data cleaning;

Ontological Pathfinding Jun 29, 2016 6/25

slide-11
SLIDE 11

Data Science Research

@

Introduction Ontological Pathfinding Experiments

First-Order Knowledge

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis → Kale helps prevent Osteoporosis. Question answering; Data cleaning; Incremental knowledge construction.

Ontological Pathfinding Jun 29, 2016 6/25

slide-12
SLIDE 12

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts; Runtime: 3.59 minutes.

Ontological Pathfinding Jun 29, 2016 7/25

slide-13
SLIDE 13

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts; Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts; Runtime: 1 hour.

Ontological Pathfinding Jun 29, 2016 7/25

slide-14
SLIDE 14

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts; Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts; Runtime: 1 hour.

Sherlock

TextRunner: 250K facts; Runtime: 50 minutes.

Ontological Pathfinding Jun 29, 2016 7/25

slide-15
SLIDE 15

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts; Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts; Runtime: 1 hour.

Sherlock

TextRunner: 250K facts; Runtime: 50 minutes.

Freebase: 112M entities, 388M facts;

Ontological Pathfinding Jun 29, 2016 7/25

slide-16
SLIDE 16

Introduction Ontological Pathfinding Experiments

State-of-the-Art

AMIE

YAGO2: 834K entities, 1M facts; Runtime: 3.59 minutes.

AMIE+

YAGO2S: 2.1M entities, 4.5M facts; Runtime: 1 hour.

Sherlock

TextRunner: 250K facts; Runtime: 50 minutes.

Freebase: 112M entities, 388M facts; Is it possible to mine first-order rules from Freebase?

Ontological Pathfinding Jun 29, 2016 7/25

slide-17
SLIDE 17

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases.

Ontological Pathfinding Jun 29, 2016 8/25

slide-18
SLIDE 18

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases. Result: Design the Ontological Pathfinding algorithm to mine 36,625 inference rules from Freebase (388M facts) in 34 hours;

Ontological Pathfinding Jun 29, 2016 8/25

slide-19
SLIDE 19

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases. Result: Design the Ontological Pathfinding algorithm to mine 36,625 inference rules from Freebase (388M facts) in 34 hours; publish the first Freebase rule set.

Ontological Pathfinding Jun 29, 2016 8/25

slide-20
SLIDE 20

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases. Result: Design the Ontological Pathfinding algorithm to mine 36,625 inference rules from Freebase (388M facts) in 34 hours; publish the first Freebase rule set. Contributions:

Partition KB into independent subsets to reduce join sizes.

Ontological Pathfinding Jun 29, 2016 8/25

slide-21
SLIDE 21

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases. Result: Design the Ontological Pathfinding algorithm to mine 36,625 inference rules from Freebase (388M facts) in 34 hours; publish the first Freebase rule set. Contributions:

Partition KB into independent subsets to reduce join sizes. (Improve runtime from 2.55 days to 5.06 hours for a single task.)

Ontological Pathfinding Jun 29, 2016 8/25

slide-22
SLIDE 22

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases. Result: Design the Ontological Pathfinding algorithm to mine 36,625 inference rules from Freebase (388M facts) in 34 hours; publish the first Freebase rule set. Contributions:

Partition KB into independent subsets to reduce join sizes. (Improve runtime from 2.55 days to 5.06 hours for a single task.) Design a parallel rule mining algorithm for each partition.

Ontological Pathfinding Jun 29, 2016 8/25

slide-23
SLIDE 23

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases. Result: Design the Ontological Pathfinding algorithm to mine 36,625 inference rules from Freebase (388M facts) in 34 hours; publish the first Freebase rule set. Contributions:

Partition KB into independent subsets to reduce join sizes. (Improve runtime from 2.55 days to 5.06 hours for a single task.) Design a parallel rule mining algorithm for each partition. (Achieve 3-6 times of speedup.)

Ontological Pathfinding Jun 29, 2016 8/25

slide-24
SLIDE 24

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases. Result: Design the Ontological Pathfinding algorithm to mine 36,625 inference rules from Freebase (388M facts) in 34 hours; publish the first Freebase rule set. Contributions:

Partition KB into independent subsets to reduce join sizes. (Improve runtime from 2.55 days to 5.06 hours for a single task.) Design a parallel rule mining algorithm for each partition. (Achieve 3-6 times of speedup.) Prune inefficient and erroneous candidate rules.

Ontological Pathfinding Jun 29, 2016 8/25

slide-25
SLIDE 25

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Contributions

Goal: Mining first-order knowledge from web-scale knowledge bases. Result: Design the Ontological Pathfinding algorithm to mine 36,625 inference rules from Freebase (388M facts) in 34 hours; publish the first Freebase rule set. Contributions:

Partition KB into independent subsets to reduce join sizes. (Improve runtime from 2.55 days to 5.06 hours for a single task.) Design a parallel rule mining algorithm for each partition. (Achieve 3-6 times of speedup.) Prune inefficient and erroneous candidate rules. (Make joins possible.)

Ontological Pathfinding Jun 29, 2016 8/25

slide-26
SLIDE 26

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Outline

1

Introduction Knowledge Bases

2

Ontological Pathfinding Partitioning Parallel Rule Mining

3

Experiments Overall Result Partitioning

Ontological Pathfinding Jun 29, 2016 9/25

slide-27
SLIDE 27

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Ontological Pathfinding Jun 29, 2016 10/25

slide-28
SLIDE 28

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Independent Overlapping Partitions

Partition 1 Partition 2

Ontological Pathfinding Jun 29, 2016 10/25

slide-29
SLIDE 29

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Output 1 Output 2 Output 3

Independent Overlapping Partitions

Partition 1 Partition 2

Ontological Pathfinding Jun 29, 2016 10/25

slide-30
SLIDE 30

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Output 1 Output 2 Output 3 Output

Independent Overlapping Partitions

Partition 1 Partition 2

Ontological Pathfinding Jun 29, 2016 10/25

slide-31
SLIDE 31

Data Science Research

@

Introduction Ontological Pathfinding Experiments

The Partitioning Problem

Given size constraints s and m, find a partition {M1, . . . , Mk} of the rules M that satisfies the following constraints: (C1) |Γi| ≤ s, 1 ≤ i ≤ k Where σ(Γ, Mi) = |Γi| =

  • p∈Γ(Mi)

H0(p).

Ontological Pathfinding Jun 29, 2016 11/25

slide-32
SLIDE 32

Data Science Research

@

Introduction Ontological Pathfinding Experiments

The Partitioning Problem

Given size constraints s and m, find a partition {M1, . . . , Mk} of the rules M that satisfies the following constraints: (C1) |Γi| ≤ s, 1 ≤ i ≤ k (C2) |Mi| ≤ m, 1 ≤ i ≤ k Where σ(Γ, Mi) = |Γi| =

  • p∈Γ(Mi)

H0(p).

Ontological Pathfinding Jun 29, 2016 11/25

slide-33
SLIDE 33

Data Science Research

@

Introduction Ontological Pathfinding Experiments

The Partitioning Problem

Given size constraints s and m, find a partition {M1, . . . , Mk} of the rules M that satisfies the following constraints: (C1) |Γi| ≤ s, 1 ≤ i ≤ k (C2) |Mi| ≤ m, 1 ≤ i ≤ k (C3)

k

  • i=1

Mi = M, Where σ(Γ, Mi) = |Γi| =

  • p∈Γ(Mi)

H0(p).

Ontological Pathfinding Jun 29, 2016 11/25

slide-34
SLIDE 34

Data Science Research

@

Introduction Ontological Pathfinding Experiments

The Partitioning Problem

Given size constraints s and m, find a partition {M1, . . . , Mk} of the rules M that satisfies the following constraints: (C1) |Γi| ≤ s, 1 ≤ i ≤ k (C2) |Mi| ≤ m, 1 ≤ i ≤ k (C3)

k

  • i=1

Mi = M, (C4) Mi ∩ Mj = ∅, 1 ≤ i < j ≤ k Where σ(Γ, Mi) = |Γi| =

  • p∈Γ(Mi)

H0(p).

Ontological Pathfinding Jun 29, 2016 11/25

slide-35
SLIDE 35

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z) dealsWith isLocatedIn isLocatedIn dealsWith exports imports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn p x y exports United States Computer exports Canada Aluminum imports United States Aluminum imports United States Clothing dealsWith Canada United States isLocatedIn Washington, D.C. United States isLocatedIn Ottawa Canada isLocatedIn Stanford University Stanford, California hasCapital Canada Ottawa hasCapital United States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University hasAcademicAdvisor Donald Knuth Marshall Hall, Jr. (a) Γ (b) M Partition 1 Partition 2 Ontological Pathfinding Jun 29, 2016 12/25

slide-36
SLIDE 36

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z) dealsWith isLocatedIn isLocatedIn dealsWith exports imports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn p x y exports United States Computer exports Canada Aluminum imports United States Aluminum imports United States Clothing dealsWith Canada United States isLocatedIn Washington, D.C. United States isLocatedIn Ottawa Canada isLocatedIn Stanford University Stanford, California hasCapital Canada Ottawa hasCapital United States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University hasAcademicAdvisor Donald Knuth Marshall Hall, Jr. (a) Γ (b) M Partition 1 Partition 2 Ontological Pathfinding Jun 29, 2016 12/25

slide-37
SLIDE 37

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z) dealsWith isLocatedIn isLocatedIn dealsWith exports imports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn p x y exports United States Computer exports Canada Aluminum imports United States Aluminum imports United States Clothing dealsWith Canada United States isLocatedIn Washington, D.C. United States isLocatedIn Ottawa Canada isLocatedIn Stanford University Stanford, California hasCapital Canada Ottawa hasCapital United States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University hasAcademicAdvisor Donald Knuth Marshall Hall, Jr. (a) Γ (b) M Partition 1 Partition 2 Ontological Pathfinding Jun 29, 2016 12/25

slide-38
SLIDE 38

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z) dealsWith isLocatedIn isLocatedIn dealsWith exports imports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn p x y exports United States Computer exports Canada Aluminum imports United States Aluminum imports United States Clothing dealsWith Canada United States isLocatedIn Washington, D.C. United States isLocatedIn Ottawa Canada isLocatedIn Stanford University Stanford, California hasCapital Canada Ottawa hasCapital United States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University hasAcademicAdvisor Donald Knuth Marshall Hall, Jr. (a) Γ (b) M Partition 1 Partition 2 Ontological Pathfinding Jun 29, 2016 12/25

slide-39
SLIDE 39

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z) dealsWith isLocatedIn isLocatedIn dealsWith exports imports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn p x y exports United States Computer exports Canada Aluminum imports United States Aluminum imports United States Clothing dealsWith Canada United States isLocatedIn Washington, D.C. United States isLocatedIn Ottawa Canada isLocatedIn Stanford University Stanford, California hasCapital Canada Ottawa hasCapital United States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University hasAcademicAdvisor Donald Knuth Marshall Hall, Jr. (a) Γ (b) M Partition 1 Partition 2 Ontological Pathfinding Jun 29, 2016 12/25

slide-40
SLIDE 40

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z) dealsWith isLocatedIn isLocatedIn dealsWith exports imports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn p x y exports United States Computer exports Canada Aluminum imports United States Aluminum imports United States Clothing dealsWith Canada United States isLocatedIn Washington, D.C. United States isLocatedIn Ottawa Canada isLocatedIn Stanford University Stanford, California hasCapital Canada Ottawa hasCapital United States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University hasAcademicAdvisor Donald Knuth Marshall Hall, Jr. (a) Γ (b) M Partition 1 Partition 2 Ontological Pathfinding Jun 29, 2016 12/25

slide-41
SLIDE 41

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Binary Partitioning Example

H(x,y) b1(x,z) b2(y,z) dealsWith isLocatedIn isLocatedIn dealsWith exports imports isCitizenOf wasBornIn hasCapital worksAt wasBornIn isLocatedIn isLocatedIn hasCapital isLocatedIn p x y exports United States Computer exports Canada Aluminum imports United States Aluminum imports United States Clothing dealsWith Canada United States isLocatedIn Washington, D.C. United States isLocatedIn Ottawa Canada isLocatedIn Stanford University Stanford, California hasCapital Canada Ottawa hasCapital United States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University hasAcademicAdvisor Donald Knuth Marshall Hall, Jr. (a) Γ (b) M Partition 1 Partition 2 M1 M2 Γ1 Γ2 Ontological Pathfinding Jun 29, 2016 12/25

slide-42
SLIDE 42

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Partitioning

Joining partitioned RDDs requires: O(tl−1|S||M|) → O(tl−1sm|M|). bounded above by the largest partition size sm.

Ontological Pathfinding Jun 29, 2016 13/25

slide-43
SLIDE 43

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Group facts by the join variable z.

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-44
SLIDE 44

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Group facts by the join variable z.

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-45
SLIDE 45

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). For each group, apply inference rules by an in-memory hash join, each fact noted by the inferring rule or “0” for base facts.

R1 R2 R3 R1 R2 R3

Group joins

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-46
SLIDE 46

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-47
SLIDE 47

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-48
SLIDE 48

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-49
SLIDE 49

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-50
SLIDE 50

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-51
SLIDE 51

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-52
SLIDE 52

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-53
SLIDE 53

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-54
SLIDE 54

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-55
SLIDE 55

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each fact to (fact, {r}) pair, {r} containing a list of inferring rules.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-56
SLIDE 56

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Check (fact, {r}) and generate (r, c, 1) tuples, where c = (0 ∈ {r}) indicates correctness.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-57
SLIDE 57

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Check (fact, {r}) and generate (r, c, 1) tuples, where c = (0 ∈ {r}) indicates correctness.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Check Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-58
SLIDE 58

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Check (fact, {r}) and generate (r, c, 1) tuples, where c = (0 ∈ {r}) indicates correctness.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Check Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-59
SLIDE 59

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Check (fact, {r}) and generate (r, c, 1) tuples, where c = (0 ∈ {r}) indicates correctness.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Check Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-60
SLIDE 60

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Check (fact, {r}) and generate (r, c, 1) tuples, where c = (0 ∈ {r}) indicates correctness.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Check Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-61
SLIDE 61

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Check (fact, {r}) and generate (r, c, 1) tuples, where c = (0 ∈ {r}) indicates correctness.

R1 R2 R3 R1 R2 R3

Group joins Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Check Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-62
SLIDE 62

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Reduce by r, aggregating the counts.

R1 R2 R3 R1 R2 R3

Group joins Count Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Check Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-63
SLIDE 63

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Parallel Rule Mining

p(x, y) ← q(x, z), r(y, z). Map each rule to its confidence score.

R1 R2 R3 R1 R2 R3

Group joins Count Group by facts

R1 R2 R3 F1 F2, F5 F4 F2, F3 F1 F2 F5 F3, F4 F5

Check Group by join variables

Rules = {R1, R2, R3}

Ontological Pathfinding Jun 29, 2016 14/25

slide-64
SLIDE 64

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule Pruning

The Non-Functionality Problem Example

diedIn(x, z), wasBornIn(y, z) → hasAcademicAdvisor(x, y).

Ontological Pathfinding Jun 29, 2016 15/25

slide-65
SLIDE 65

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule Pruning

The Non-Functionality Problem Example

diedIn(x, z), wasBornIn(y, z) → hasAcademicAdvisor(x, y). “diedIn,” “wasBornIn” are N : 1 predicates.

Ontological Pathfinding Jun 29, 2016 15/25

slide-66
SLIDE 66

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule Pruning

The Non-Functionality Problem Example

diedIn(x, z), wasBornIn(y, z) → hasAcademicAdvisor(x, y). “diedIn,” “wasBornIn” are N : 1 predicates. Large intermediate results.

Ontological Pathfinding Jun 29, 2016 15/25

slide-67
SLIDE 67

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule Pruning

The Non-Functionality Problem Example

diedIn(x, z), wasBornIn(y, z) → hasAcademicAdvisor(x, y). “diedIn,” “wasBornIn” are N : 1 predicates. Large intermediate results. Histogram based detection: Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)};

Ontological Pathfinding Jun 29, 2016 15/25

slide-68
SLIDE 68

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule Pruning

The Non-Functionality Problem Example

diedIn(x, z), wasBornIn(y, z) → hasAcademicAdvisor(x, y). “diedIn,” “wasBornIn” are N : 1 predicates. Large intermediate results. Histogram based detection: Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)}; Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)};

Ontological Pathfinding Jun 29, 2016 15/25

slide-69
SLIDE 69

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule Pruning

The Non-Functionality Problem Example

diedIn(x, z), wasBornIn(y, z) → hasAcademicAdvisor(x, y). “diedIn,” “wasBornIn” are N : 1 predicates. Large intermediate results. Histogram based detection: Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)}; Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)}; Functional constraint t requires H2(diedIn, z) ≤ t and H2(wasBornIn, z) ≤ t for ∀z;

Ontological Pathfinding Jun 29, 2016 15/25

slide-70
SLIDE 70

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Rule Pruning

The Non-Functionality Problem Example

diedIn(x, z), wasBornIn(y, z) → hasAcademicAdvisor(x, y). “diedIn,” “wasBornIn” are N : 1 predicates. Large intermediate results. Histogram based detection: Predicate-Subject Histogram H1 = {(p, x, |{p(x, ·)}|)}; Predicate-Object Histogram H2 = {(p, y, |{p(·, y)}|)}; Functional constraint t requires H2(diedIn, z) ≤ t and H2(wasBornIn, z) ≤ t for ∀z; t picked by experiments.

Ontological Pathfinding Jun 29, 2016 15/25

slide-71
SLIDE 71

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Outline

1

Introduction Knowledge Bases

2

Ontological Pathfinding Partitioning Parallel Rule Mining

3

Experiments Overall Result Partitioning

Ontological Pathfinding Jun 29, 2016 16/25

slide-72
SLIDE 72

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Datasets

KB YAGO2 YAGO2s Freebase # Predicates 130 126 67,415 # Entities 834,554 2,137,468 111,781,246 # Facts 948,047 4,484,907 388,474,630

Table: Dataset statistics.

Ontological Pathfinding Jun 29, 2016 17/25

slide-73
SLIDE 73

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Experiments

Overall Results

KB Algorithm # Rules Precision Runtime OP 218 0.35 3.59 min YAGO2 AMIE 1090 0.46 4.56 min OP 312 0.35 19.40 min YAGO2s AMIE 278+ N/A 5+ d OP 36,625 0.60 33.22 h Freebase AMIE 0+ N/A 5+ d

Table: Overall mining result.

Ontological Pathfinding Jun 29, 2016 18/25

slide-74
SLIDE 74

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Experiments

Quality

We detect trivial extensions and composite rules, which provide little knowledge in addition to lengths 2 and 3 rules.

Ontological Pathfinding Jun 29, 2016 19/25

slide-75
SLIDE 75

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Experiments

Quality

We detect trivial extensions and composite rules, which provide little knowledge in addition to lengths 2 and 3 rules. Trivial extensions add valid rules to body of another rule.

book/book/first edition(x, u), book/book edition/book(u, v), book/book/first edition(v, y) → book/book/editions(x, y)

Ontological Pathfinding Jun 29, 2016 19/25

slide-76
SLIDE 76

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Experiments

Quality

We detect trivial extensions and composite rules, which provide little knowledge in addition to lengths 2 and 3 rules. Trivial extensions add valid rules to body of another rule.

book/book/first edition(x, u), book/book edition/book(u, v), book/book/first edition(v, y) → book/book/editions(x, y)

Composite rules chain multiple shorter rules.

film/film/sequel(x, u), film/film/country(u, v), (→file/film/country(x, v)) location/country/official language(v, y) → film/film/language(x, y)

Ontological Pathfinding Jun 29, 2016 19/25

slide-77
SLIDE 77

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Experiments

Quality

Correct 3.4% Incorrect 6.3% Composite 9.0% Trivial extensions 81.3%

(c) Freebase Length 4 Rules Figure: Quality of long rules.

Ontological Pathfinding Jun 29, 2016 20/25

slide-78
SLIDE 78

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Experiments

Quality

2 3 4 5 Rule Length 100 200 300 400 500 600 700 # Mined Rules 0.0 0.2 0.4 0.6 0.8 1.0 Precision # Rules Precision Runtime 1 2 3 4 5 6 7 8 Runtime/h

(a) YAGO2s: Rule Lengths

2 3 4 Rule Length 10000 20000 30000 40000 50000 60000 70000 80000 90000 # Mined Rules 0.0 0.2 0.4 0.6 0.8 1.0 Precision # Rules Precision Runtime 20 40 60 80 100 Runtime/h

(b) Freebase: Rule Lengths

Figure: OP performance for mining lengths 4 (YAGO and Freebase) and 5 (YAGO) rules.

Ontological Pathfinding Jun 29, 2016 21/25

slide-79
SLIDE 79

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Experiments

Effect of Partitioning

Partitions 5 10 15 20 25 30 Partition size £ rule size/109

Freebase Partitions (s = 20M, m = 2K)

10 20 30 40 50 60 70 80 Runtime/min Partition size £ rule size Runtime Partitions 100 200 300 400 500 600 700 Partition size £ rule size/109

Freebase Partitions (s = 200M, m = 10K)

200 400 600 800 1000 1200 1400 1600 1800 2000 Runtime/min Partition size £ rule size Runtime

Figure: Effect of partitioning.

Ontological Pathfinding Jun 29, 2016 22/25

slide-80
SLIDE 80

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Experiments

Effect of Partitioning

50 100 150 200 Max Partition size/M 10 20 30 40 50 60 Runtime/h

m = 10K m = 5K m = 2K m = 1K

(a) Freebase: Runtime vs Partitioning

50 100 150 200 Max Partition size/M 5 10 15 20 25 30 Runtime/h

m = 10K m = 5K m = 2K m = 1K

(b) Freebase: Max Runtime vs Partitioning

50 100 150 200 Max Partition size/M 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 DOV

m = 10K m = 5K m = 2K m = 1K

(c) Freebase: DOV vs Partitioning

1 2 3 4 Max Partition size/M 5 10 15 20 25 30 Runtime/min

m = 1000 m = 500 m = 1000; max runtime m = 500; max runtime

(d) YAGO2s: Runtime vs Partitioning

100 200 300 400 500 600 Functional Constraint 10 20 30 40 50 60 70 80 YAGO2s Runtime/min 2 4 6 8 10 12 14 Freebase Runtime/h

(e) Runtime vs Functional Constraint

YAGO2s runtime Freebase runtime

100 200 300 400 500 600 Functional Constraint 0.80 0.85 0.90 0.95 1.00 Pruning Precision 1000 2000 3000 4000 5000 6000 # Pruned Rules

(f) Pruned Rules Quality

YAGO2s pruning precision YAGO2s # pruned rules Freebase pruning precision Freebase # pruned rules

Runtime: 2.55 days → 5.06 hours. Slowest partition: 1.27 days → 38.14 minutes.

Ontological Pathfinding Jun 29, 2016 23/25

slide-81
SLIDE 81

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Conclusion

Design the Ontological Pathfinding algorithm that scales rule mining to Freebase (largest KB with 388M facts in 34 hours). Partition KB into independent subsets to reduce join sizes. Divide joins into smaller joins that run in parallel. Prototype with Spark. Publish the first Freebase rule set (36,625 inference rules). Open-source at http://dsr.cise.ufl.edu/projects/ probkb-web-scale-probabilistic-knowledge-base.

Ontological Pathfinding Jun 29, 2016 24/25

slide-82
SLIDE 82

Data Science Research

@

Introduction Ontological Pathfinding Experiments

Thank you!

Yang Chen: http://cise.ufl.edu/˜yang Data Science Research at UF: http://dsr.cise.ufl.edu Questions?

Ontological Pathfinding Jun 29, 2016 25/25