ProbKB D ata S cience R esearch Knowledge Expansion over - - PowerPoint PPT Presentation

probkb
SMART_READER_LITE
LIVE PREVIEW

ProbKB D ata S cience R esearch Knowledge Expansion over - - PowerPoint PPT Presentation

Introduction The ProbKB System Conclusion Knowledge Expansion over Probabilistic Knowledge Bases Yang Chen, Daisy Zhe Wang { yang,daisyw } @cise.ufl.edu Computer and Information Science and Engineering University of Florida SIGMOD, Snowbird,


slide-1
SLIDE 1

ProbKB

Data Science Research

@

Introduction The ProbKB System Conclusion

Knowledge Expansion over Probabilistic Knowledge Bases

Yang Chen, Daisy Zhe Wang

{yang,daisyw}@cise.ufl.edu

Computer and Information Science and Engineering University of Florida

SIGMOD, Snowbird, UT Jun 25, 2014

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 1/28

slide-2
SLIDE 2

Data Science Research

@

Introduction The ProbKB System Conclusion

Outline

1

Introduction Knowledge Bases Knowledge Expansion

2

The ProbKB System Probabilistic Knowledge Bases ProbKB Architecture Grounding Quality Control

3

Conclusion Conclusion

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 2/28

slide-3
SLIDE 3

Data Science Research

@

Introduction The ProbKB System Conclusion

Knowledge Bases

A knowledge base is a collection of entities, facts, and relationships that conforms with a certain data model. Allows machines to interpret human information in a principled manner.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 3/28

slide-4
SLIDE 4

Data Science Research

@

Introduction The ProbKB System Conclusion

Knowledge Bases

A knowledge base is a collection of entities, facts, and relationships that conforms with a certain data model. Allows machines to interpret human information in a principled manner.

Figure: Google knowledge graph

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 3/28

slide-5
SLIDE 5

Data Science Research

@

Introduction The ProbKB System Conclusion

Knowledge Bases

A knowledge base is a collection of entities, facts, and relationships that conforms with a certain data model. Allows machines to interpret human information in a principled manner. But they are often incomplete.

Figure: Google knowledge graph

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 3/28

slide-6
SLIDE 6

Data Science Research

@

Introduction The ProbKB System Conclusion

Knowledge Base Construction Review

1 Human collaboration:

DBPedia, Freebase, Google Knowledge Graph, YAGO.

2 Automatic construction:

DeepDive, Knowledge Vault, Nell, OpenIE, ProBase, YAGO.

3 Knowledge integration:

Knowledge Fusion, Knowledge Vault, PIDGIN.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 4/28

slide-7
SLIDE 7

Data Science Research

@

Introduction The ProbKB System Conclusion

Inferring Implicit Information

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis → Kale helps prevent Osteoporosis.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 5/28

slide-8
SLIDE 8

Data Science Research

@

Introduction The ProbKB System Conclusion

Inferring Implicit Information

Kale is rich in Calcium ∧ Calcium helps prevent Osteoporosis → Kale helps prevent Osteoporosis.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 5/28

slide-9
SLIDE 9

Data Science Research

@

Introduction The ProbKB System Conclusion

Inferring Implicit Information

IsHeadquarteredIn(Company, State) :- IsBasedIn(Company, City) ∧ IsLocatedIn(City, State); Contains(Food, Chemical) :- IsMadeFrom(Food, Ingredient) ∧ Contains(Ingredient, Chemical); Reduce(Medication, Factor) :- KnownGenericallyAs(Medication, Drug) ∧ Reduce(Drug, Factor); ReturnTo(Writer, Place) :- BornIn(Writer, City) ∧ CapitalOf(City, Place); Make(Company1, Device) :- Buy(Company1, Company2) ∧ Make(Company2, Device);

Figure: Sherlock Horn clauses learner.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 6/28

slide-10
SLIDE 10

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-11
SLIDE 11

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-12
SLIDE 12

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-13
SLIDE 13

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-14
SLIDE 14

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-15
SLIDE 15

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-16
SLIDE 16

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-17
SLIDE 17

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-18
SLIDE 18

Data Science Research

@

Introduction The ProbKB System Conclusion

Contributions

Knowledge Expansion Problem

Inferring implicit knowledge in KBs. Efficiency.

We use DBMSes to model knowledge bases; We design a SQL-based algorithm to apply inference rules in batches; We use MPP databases to parallelize the inference process.

Quality.

We identify major error sources and combine state-of-the-art methods to detect and recover from errors; We use semantic constraints to identify errors and ambiguities; We clean the rule set based on their statistical properties.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 7/28

slide-19
SLIDE 19

Data Science Research

@

Introduction The ProbKB System Conclusion

Outline

1

Introduction Knowledge Bases Knowledge Expansion

2

The ProbKB System Probabilistic Knowledge Bases ProbKB Architecture Grounding Quality Control

3

Conclusion Conclusion

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 8/28

slide-20
SLIDE 20

Introduction The ProbKB System Conclusion

Probabilistic Knowledge Bases

Example (Probabilistic Knowledge Bases)

We define a probabilistic knowledge base to be a 5-tuple Γ = (E, C, R, Π, L):

Entities E Classes C Relations R Ruth Gruber, New York City, Brooklyn W (Writer) = {Ruth Gruber}, C (City) = {New York City}, P (Place) = {Brooklyn} born in(W, P), born in(W, C), live in(W, P), live in(W, C), locate in(P, C) Facts Π 0.93 born in(Ruth Gruber, Brooklyn) 0.96 born in(Ruth Gruber, New York City)

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 9/28

slide-21
SLIDE 21

Introduction The ProbKB System Conclusion

Probabilistic Knowledge Bases

Example (ReVerb-Sherlock KB Cont.)

Rules L 1.40 ∀x ∈ W ∀y ∈ P (live in(x, y) ← born in(x, y)) 1.53 ∀x ∈ W ∀y ∈ C (live in(x, y) ← born in(x, y)) 2.68 ∀x ∈ W ∀y ∈ P (grow up in(x, y) ← born in(x, y)) 0.74 ∀x ∈ W ∀y ∈ C (grow up in(x, y) ← born in(x, y)) 0.32 ∀x ∈ P ∀y ∈ C ∀z ∈ W (locate in(x, y) ← live in(z, x) ∧ live in(z, y)) 0.52 ∀x ∈ P ∀y ∈ C ∀z ∈ W (locate in(x, y) ← born in(z, x) ∧ born in(z, y)) ∞ ∀x ∈ C ∀y ∈ C ∀z ∈ W (born in(z, x) ∧ born in(z, y) → x = y)

Table: Probabilistic KB from ReVerb-Sherlock.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 10/28

slide-22
SLIDE 22

Data Science Research

@

Introduction The ProbKB System Conclusion

MLN: The State-of-the-Art

MLN

(Weighted rules)

DB Program

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 11/28

slide-23
SLIDE 23

Data Science Research

@

Introduction The ProbKB System Conclusion

ProbKB In-Database Architecture

Query Optimizer & Execution Engine MLN Entities Facts SQL UDF/UDA Factor Graph Inference Engine (e.g., GraphLab) RDMBS

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 12/28

slide-24
SLIDE 24

Data Science Research

@

Introduction The ProbKB System Conclusion

ProbKB In-Database Architecture

Query Optimizer & Execution Engine MLN Entities Facts SQL UDF/UDA Factor Graph Inference Engine (e.g., GraphLab) RDMBS

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 13/28

slide-25
SLIDE 25

Data Science Research

@

Introduction The ProbKB System Conclusion

Relational ProbKB

born in(Ruth Gruber, Brooklyn) 0.93 born in(Ruth Gruber, New York City) 0.96

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 14/28

slide-26
SLIDE 26

Data Science Research

@

Introduction The ProbKB System Conclusion

Relational ProbKB

born in(Ruth Gruber, Brooklyn) 0.93 born in(Ruth Gruber, New York City) 0.96 ⇓ I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 T

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 14/28

slide-27
SLIDE 27

Data Science Research

@

Introduction The ProbKB System Conclusion

Relational ProbKB

Definition

Two first-order clauses are defined to be structurally equivalent if they differ only in the entities, classes, and relations symbols.

∀x ∈ W ∀y ∈ P (live in(x, y) ← born in(x, y)) 1.40 ∀x ∈ W ∀y ∈ C (live in(x, y) ← born in(x, y)) 1.53 ∀x ∈ W ∀y ∈ P (grow up in(x, y) ← born in(x, y)) 2.68 ∀x ∈ W ∀y ∈ C (grow up in(x, y) ← born in(x, y)) 0.74 ⇓ R1 R2 C1 C2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74 M1

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 15/28

slide-28
SLIDE 28

Data Science Research

@

Introduction The ProbKB System Conclusion

Relational ProbKB

Definition

Two first-order clauses are defined to be structurally equivalent if they differ only in the entities, classes, and relations symbols.

∀x ∈ W ∀y ∈ P (live in(x, y) ← born in(x, y)) 1.40 ∀x ∈ W ∀y ∈ C (live in(x, y) ← born in(x, y)) 1.53 ∀x ∈ W ∀y ∈ P (grow up in(x, y) ← born in(x, y)) 2.68 ∀x ∈ W ∀y ∈ C (grow up in(x, y) ← born in(x, y)) 0.74 ⇓ R1 R2 C1 C2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74 M1

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 15/28

slide-29
SLIDE 29

Data Science Research

@

Introduction The ProbKB System Conclusion

Relational ProbKB

Definition

Two first-order clauses are defined to be structurally equivalent if they differ only in the entities, classes, and relations symbols.

∀x ∈ P ∀y ∈ C ∀z ∈ W (locate in(x, y) ← live in(z, x) ∧ live in(z, y)) 0.32 ∀x ∈ P ∀y ∈ C ∀z ∈ W (locate in(x, y) ← born in(z, x) ∧ born in(z, y)) 0.52 ⇓ R1 R2 R3 C1 C2 C3 w located in live in live in P C W 0.32 located in born in born in P C W 0.52 M3

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 16/28

slide-30
SLIDE 30

Data Science Research

@

Introduction The ProbKB System Conclusion

Relational ProbKB

Definition

Two first-order clauses are defined to be structurally equivalent if they differ only in the entities, classes, and relations symbols.

∀x ∈ P ∀y ∈ C ∀z ∈ W (locate in(x, y) ← live in(z, x) ∧ live in(z, y)) 0.32 ∀x ∈ P ∀y ∈ C ∀z ∈ W (locate in(x, y) ← born in(z, x) ∧ born in(z, y)) 0.52 ⇓ R1 R2 R3 C1 C2 C3 w located in live in live in P C W 0.32 located in born in born in P C W 0.52 M3

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 16/28

slide-31
SLIDE 31

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M1.R1 AS R, T.x AS x, T.C1 AS C1, T.y AS y, T.C2 AS C2 FROM M1 JOIN T ON M1.R2 = T.R AND M1.C1 = T.C1 AND M1.C2 = T.C2;

T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 M1 R1 R2 C1 C2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 17/28

slide-32
SLIDE 32

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M1.R1 AS R, T.x AS x, T.C1 AS C1, T.y AS y, T.C2 AS C2 FROM M1 JOIN T ON M1.R2 = T.R AND M1.C1 = T.C1 AND M1.C2 = T.C2;

T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 M1 R1 R2 C1 C2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 17/28

slide-33
SLIDE 33

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M1.R1 AS R, T.x AS x, T.C1 AS C1, T.y AS y, T.C2 AS C2 FROM M1 JOIN T ON M1.R2 = T.R AND M1.C1 = T.C1 AND M1.C2 = T.C2;

T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 M1 R1 R2 C1 C2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 17/28

slide-34
SLIDE 34

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M1.R1 AS R, T.x AS x, T.C1 AS C1, T.y AS y, T.C2 AS C2 FROM M1 JOIN T ON M1.R2 = T.R AND M1.C1 = T.C1 AND M1.C2 = T.C2;

T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 M1 R1 R2 C1 C2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 17/28

slide-35
SLIDE 35

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M1.R1 AS R, T.x AS x, T.C1 AS C1, T.y AS y, T.C2 AS C2 FROM M1 JOIN T ON M1.R2 = T.R AND M1.C1 = T.C1 AND M1.C2 = T.C2;

T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 M1 R1 R2 C1 C2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 17/28

slide-36
SLIDE 36

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M3.R1 AS R, T2.y AS x, T2.C2 AS C1, T3.y AS y, T3.C2 AS C2 FROM M3 JOIN T T2 ON M3.R2 = T2.R AND M3.C3 = T2.C1 AND M3.C1 = T2.C2 JOIN T T3 ON M3.R3 = T3.R AND M3.C3 = T3.C1 AND M3.C2 = T3.C2 WHERE T2.x = T3.x; T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P M3 R1 R2 R3 C1 C2 C3 w located in live in live in P C W 0.32 located in born in born in P C W 0.52 7 located in Br P NYC C Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 18/28

slide-37
SLIDE 37

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M3.R1 AS R, T2.y AS x, T2.C2 AS C1, T3.y AS y, T3.C2 AS C2 FROM M3 JOIN T T2 ON M3.R2 = T2.R AND M3.C3 = T2.C1 AND M3.C1 = T2.C2 JOIN T T3 ON M3.R3 = T3.R AND M3.C3 = T3.C1 AND M3.C2 = T3.C2 WHERE T2.x = T3.x; T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P M3 R1 R2 R3 C1 C2 C3 w located in live in live in P C W 0.32 located in born in born in P C W 0.52 7 located in Br P NYC C Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 18/28

slide-38
SLIDE 38

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M3.R1 AS R, T2.y AS x, T2.C2 AS C1, T3.y AS y, T3.C2 AS C2 FROM M3 JOIN T T2 ON M3.R2 = T2.R AND M3.C3 = T2.C1 AND M3.C1 = T2.C2 JOIN T T3 ON M3.R3 = T3.R AND M3.C3 = T3.C1 AND M3.C2 = T3.C2 WHERE T2.x = T3.x; T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P M3 R1 R2 R3 C1 C2 C3 w located in live in live in P C W 0.32 located in born in born in P C W 0.52 7 located in Br P NYC C Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 18/28

slide-39
SLIDE 39

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding

SELECT M3.R1 AS R, T2.y AS x, T2.C2 AS C1, T3.y AS y, T3.C2 AS C2 FROM M3 JOIN T T2 ON M3.R2 = T2.R AND M3.C3 = T2.C1 AND M3.C1 = T2.C2 JOIN T T3 ON M3.R3 = T3.R AND M3.C3 = T3.C1 AND M3.C2 = T3.C2 WHERE T2.x = T3.x; T I R x C1 y C2 w 1 born in RG W Br P 0.93 2 born in RG W NYC C 0.96 3 live in RG W NYC C 4 grow up in RG W NYC C 5 live in RG W Br P 6 grow up in RG W Br P M3 R1 R2 R3 C1 C2 C3 w located in live in live in P C W 0.32 located in born in born in P C W 0.52 7 located in Br P NYC C Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 18/28

slide-40
SLIDE 40

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding Efficiency

Sherlock-ReVerb KB

Tuffy State-of-the-art MLN inference engine; ReVerb 400K extracted facts from web text corpus; Sherlock 31K inference rules learned from ReVerb. # relations 82,768 # rules 30,912 # entities 277,216 # facts 407,247

Table: Sherlock-ReVerb KB statistics

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 19/28

slide-41
SLIDE 41

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding Efficiency

Sherlock-ReVerb KB

Systems Load Round 1 Round 2 Round 3 Round 4 ProbKB-p 0.25 0.07 0.07 0.15 0.48 ProbKB 0.03 0.05 0.12 0.23 1.28 Tuffy-T 18.22 1.92 9.40 22.40 44.77 # records 396K 420K 456K 580K 1.5M

Table: ReVerb-Sherlock case study

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 20/28

slide-42
SLIDE 42

Data Science Research

@

Introduction The ProbKB System Conclusion

Grounding Efficiency

Synthetic KBs

0.01 0.20 0.50 1.00 5 10 15 0.0 0.5 1.0 1.5 2.0

# Inferred facts/ 106 (a) # Rules/ 106 Execution time/ 103 s

Tuffy − T ProbKB ProbKB − p # Inferred

(a) Varying # Rules: 311x- speedup

0.1 2.0 5.0 10.0 2 4 6 8 10 12 14 0.2 0.4 0.6 0.8 1.0 1.2 1.4

# Inferred facts/ 106 (b) # Facts/ 106 Execution time/ 103 s

Tuffy − T ProbKB ProbKB − p # Inferred

(b) Varying # Facts: 237x- speedup

0.1 2.0 5.0 10.0 0.00 0.05 0.10 0.15 0.20 0.25 0.30

Execution time/ 103 s

0.2 0.4 0.6 0.8 1.0 1.2 1.4

# Inferred facts/ 106 (c) # Facts/ 106

ProbKB ProbKB − pn ProbKB − p # Inferred

(c) MPP Improvements: 6.3x-speedup

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 21/28

slide-43
SLIDE 43

Data Science Research

@

Introduction The ProbKB System Conclusion

Quality Control

Inference Errors

born in(Mandel, Berlin) born in(Mandel, Baltimore) located in(Baltimore, Berlin) capital of(Berlin, Germany) hub of(Berlin, Germany) born in(Freud, Berlin) born in(Freud, Berlin) born in(Freud, Germany) born in(Freud, Baltimore) capital of(Baltimore, Germany) live in(Rothman, Baltimore) live in(Rothman, Germany) born in(Rothman, Baltimore)

Incorrect extractions; Incorrect rules; Ambiguous entities; Propagated errors.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 22/28

slide-44
SLIDE 44

Data Science Research

@

Introduction The ProbKB System Conclusion

Quality Control

Semantic constraints and ambiguity detection:

Functional Relations Violating Facts Ambiguous Entities born in born in(Mandel, Berlin) Leonard Mandel born in(Mandel, New York City) Johnny Mandel born in(Mandel, Chicago) Tom Mandel (futurist) grow up in grow up in(Miller, Placentia) Dustin Miller grow up in(Miller, New York City) Alan Gifford Miller grow up in(Miller, New Orleans) Taylor Miller located in located in(Regional office, Glasgow) McCarthy & Stone regional offices located in(Regional office, Panama City) OCHA regional offices located in(Regional office, South Bend) Indiana Landmarks regional offices capital of capital of(Delhi, India) capital of(Calcutta, India) (Incorrect extraction)

Statistical Rule Cleaning: P(Head(...)|Body(...)) ≫ P(Head(...))

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 23/28

slide-45
SLIDE 45

Data Science Research

@

Introduction The ProbKB System Conclusion

Results

5000 10000 15000 20000 25000 0.0 0.2 0.4 0.6 0.8 1.0

(a) Estimated number of correct facts Precision of inferred facts

No SC RC RC top 20% RC top 10% SC only SC RC top 50% SC RC top 20%

(a) 0.6 higher precision.

Ambiguities (detected) 34% Ambiguous join keys 24% Incorrect rules 33% Incorrect extractions 6% General types 2% Synonyms 1% (b)

(b) Error sources.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 24/28

slide-46
SLIDE 46

Data Science Research

@

Introduction The ProbKB System Conclusion

Outline

1

Introduction Knowledge Bases Knowledge Expansion

2

The ProbKB System Probabilistic Knowledge Bases ProbKB Architecture Grounding Quality Control

3

Conclusion Conclusion

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 25/28

slide-47
SLIDE 47

ProbKB

Data Science Research

@

Introduction The ProbKB System Conclusion

Conclusion

We present ProbKB, a Probabilistic Knowledge Base system. We design a novel relational model and an efficient SQL-based inference algorithm that applies inference rules in batches. We use MPP databases to parallelize the inference process. We combine state-of-the-art data cleaning techniques to improve knowledge quality. Future work will focus on rules and constraints learning.

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 26/28

slide-48
SLIDE 48

Data Science Research

@

Introduction The ProbKB System Conclusion

Related Work

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS

http://hazy.cs.wisc.edu/hazy/tuffy

OpenIE: Open Information Extraction

http://openie.cs.washington.edu

Sherlock: Learning First-Order Horn Clauses from Web Text

http://www.cs.washington.edu/research/ sherlock-hornclauses

Leibniz: Identifying Functional Relations in Web Text

http: //knowitall.cs.washington.edu/leibniz

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 27/28

slide-49
SLIDE 49

Data Science Research

@

Introduction The ProbKB System Conclusion

Thank you!

Yang Chen: http://cise.ufl.edu/˜yang Data Science Research at UF: http://dsr.cise.ufl.edu Questions?

Knowledge Expansion over Probabilistic Knowledge Bases Jun 25, 2014 28/28