Mining Knowledge Graphs from Text WSDM 2018 J AY P UJARA , S AMEER - PowerPoint PPT Presentation

Mining Knowledge Graphs from Text WSDM 2018 J AY P UJARA , S AMEER S INGH

Tutorial Overview https://kgtutorial.github.io Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph Extraction Construction Part 4: Critical Analysis 2

Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction Primer [Jay] 3. Knowledge Graph Construction a. Probabilistic Models [Jay] Coffee Break b. Embedding Techniques [Sameer] 4. Critical Overview and Conclusion [Sameer] 3

Knowledge Graph Construction TO TOPICS: P ROBLEM S ETTING P ROBABILISTIC M ODELS E MBEDDING T ECHNIQUES 4

Knowledge Graph Construction TO TOPICS: P RO ROBLEM S ET ETTI TING P ROBABILISTIC M ODELS E MBEDDING T ECHNIQUES 5

Reminder: Basic problems A 1 E 1 A 2 • Who are the entities (nodes) in the graph? • What are their attributes E 2 and types (labels)? A 1 A 2 • How are they related E 3 (edges)? A 1 A 2 6

Graph Construction Issues Extracted knowledge is: • ambiguous: ◦ Ex: Beetles, beetles, Beatles ◦ Ex: citizenOf, livedIn, bornIn 7

Graph Construction Issues Extracted knowledge is: • ambiguous • incomplete ◦ Ex: missing relationships ◦ Ex: missing labels ◦ Ex: missing entities 8

Graph Construction Issues Extracted knowledge is: • ambiguous • incomplete spouse • inconsistent ◦ Ex: Cynthia Lennon, Yoko Ono ◦ Ex: exclusive labels (alive, dead) spouse ◦ Ex: domain-range constraints 9

Graph Construction Issues Extracted knowledge is: • ambiguous • incomplete • inconsistent 10

Graph Construction approach •Graph construction cleans and completes extraction graph •Incorporate ontological constraints and relational patterns •Discover statistical relationships within knowledge graph 11

Knowledge Graph Construction TO TOPICS: P ROBLEM S ETTING P ROBABILISTIC M ODELS E MBEDDING T ECHNIQUES 12

Graph Construction Probabilistic Models TO TOPICS: O VERVIEW G RAPHICAL MODELS R ANDOM W ALK M ETHODS 13

Beyond Pure Reasoning •Classical AI approach to knowledge: reasoning Lbl(Socrates, Man) & Sub(Man, Mortal) -> Lbl(Socrates, Mortal) 15

Beyond Pure Reasoning •Classical AI approach to knowledge: reasoning Lbl(Socrates, Man) & Sub(Man, Mortal) -> Lbl(Socrates, Mortal) •Reasoning difficult when extracted knowledge has errors 16

Beyond Pure Reasoning •Classical AI approach to knowledge: reasoning Lbl(Socrates, Man) & Sub(Man, Mortal) -> Lbl(Socrates, Mortal) •Reasoning difficult when extracted knowledge has errors •Solution: probabilistic models P(Lbl(Socrates, Mortal)|Lbl(Socrates,Man)=0.9) 17

Graphical Models: Overview •Define joint probability distribution on knowledge graphs •Each candidate fact in the knowledge graph is a variable •Statistical signals, ontological knowledge and rules parameterize the dependencies between variables •Find most likely knowledge graph by optimization / sampling 19

Knowledge Graph Identification Define a graphical model to perform all three of these A 1 tasks simultaneously! E 1 A 2 • Who are the entities (nodes) in the graph? E 2 A 1 • What are their attributes A 2 and types (labels)? E 3 A 1 • How are they related A 2 (edges)? PUJARA+ISWC13 20

Knowledge Graph Identification A 1 E 1 A 2 P(Who, What, How|Extractions) E 2 A 1 A 2 E 3 A 1 A 2 PUJARA+ISWC13 21

Probabilistic Models •Use dependencies between facts in KG •Probability defined jointly over facts P=0 P=0.25 P=0.75 22

What determines probability? • Statistical signals from text extractors and classifiers 23

What determines probability? • Statistical signals from text extractors and classifiers • P(R(John,Spouse,Yoko))=0.75; P(R(John,Spouse,Cynthia))=0.25 • LevenshteinSimilarity(Beatles, Beetles) = 0.9 24

What determines probability? • Statistical signals from text extractors and classifiers • Ontological knowledge about domain 25

What determines probability? • Statistical signals from text extractors and classifiers • Ontological knowledge about domain • Functional(Spouse) & R(A,Spouse,B) -> !R(A,Spouse,C) • Range(Spouse, Person) & R(A,Spouse,B) -> Type(B, Person) 26

What determines probability? • Statistical signals from text extractors and classifiers • Ontological knowledge about domain • Rules and patterns mined from data 27

What determines probability? • Statistical signals from text extractors and classifiers • Ontological knowledge about domain • Rules and patterns mined from data • R(A, Spouse, B) & R(A, Lives, L) -> R(B, Lives, L) • R(A, Spouse, B) & R(A, Child, C) -> R(B, Child, C) 28

What determines probability? • Statistical signals from text extractors and classifiers • P(R(John,Spouse,Yoko))=0.75; P(R(John,Spouse,Cynthia))=0.25 • LevenshteinSimilarity(Beatles, Beetles) = 0.9 • Ontological knowledge about domain • Functional(Spouse) & R(A,Spouse,B) -> !R(A,Spouse,C) • Range(Spouse, Person) & R(A,Spouse,B) -> Type(B, Person) • Rules and patterns mined from data • R(A, Spouse, B) & R(A, Lives, L) -> R(B, Lives, L) • R(A, Spouse, B) & R(A, Child, C) -> R(B, Child, C) 29

Example: The Fab Four 30

Illustration of KG Identification Uncertain Extractions: .5: Lbl(Fab Four, novel) .7: Lbl(Fab Four, musician) .9: Lbl(Beatles, musician) .8: Rel(Beatles,AlbumArtist, Abbey Road) PUJARA+ISWC13; PUJARA+AIMAG15

Illustration of KG Identification (Annotated) Extraction Graph Uncertain Extractions: .5: Lbl(Fab Four, novel) Fab Four Beatles .7: Lbl(Fab Four, musician) .9: Lbl(Beatles, musician) .8: Rel(Beatles,AlbumArtist, Abbey Road) musician novel Abbey Road PUJARA+ISWC13; PUJARA+AIMAG15

Illustration of KG Identification Extraction Graph Uncertain Extractions: .5: Lbl(Fab Four, novel) Fab Four Beatles .7: Lbl(Fab Four, musician) .9: Lbl(Beatles, musician) .8: Rel(Beatles,AlbumArtist, Abbey Road) musician Ontology: Dom(albumArtist, musician) Mut(novel, musician) novel Abbey Road PUJARA+ISWC13; PUJARA+AIMAG15

Illustration of KG Identification (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Fab Four, novel) Fab Four Beatles .7: Lbl(Fab Four, musician) .9: Lbl(Beatles, musician) .8: Rel(Beatles,AlbumArtist, Abbey Road) musician Ontology: Dom(albumArtist, musician) Mut(novel, musician) novel Entity Resolution: Abbey Road SameEnt(Fab Four, Beatles) PUJARA+ISWC13; PUJARA+AIMAG15

Illustration of KG Identification (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Fab Four, novel) Fab Four Beatles .7: Lbl(Fab Four, musician) .9: Lbl(Beatles, musician) .8: Rel(Beatles,AlbumArtist, Abbey Road) musician Ontology: Dom(albumArtist, musician) Mut(novel, musician) novel Entity Resolution: Abbey Road SameEnt(Fab Four, Beatles) After Knowledge Graph Identification Beatles Rel(AlbumArtist ) Lbl Abbey Road musician Fab Four PUJARA+ISWC13; PUJARA+AIMAG15

Probabilistic graphical model for KG Rel(Beatles, AlbumArtist, Lbl(Beatles, novel) Abbey Road) Lbl(Beatles, musician) Lbl(Fab Four, musician) Rel(Fab Four, Lbl(Fab Four, novel) AlbumArtist, Abbey Road)

Defining graphical models •Many options for defining a graphical model •We focus on two approaches, MLNs and PSL, that use rules • MLNs treat facts as Boolean, use sampling for satisfaction • PSL infers a “truth value” for each fact via optimization 37

Rules for KG Model 100: Subsumes(L1,L2) & Label(E,L1) -> Label(E,L2) 100: Exclusive(L1,L2) & Label(E,L1) -> !Label(E,L2) 100: Inverse(R1,R2) & Relation(R1,E,O) -> Relation(R2,O,E) 100: Subsumes(R1,R2) & Relation(R1,E,O) -> Relation(R2,E,O) 100: Exclusive(R1,R2) & Relation(R1,E,O) -> !Relation(R2,E,O) 100: Domain(R,L) & Relation(R,E,O) -> Label(E,L) 100: Range(R,L) & Relation(R,E,O) -> Label(O,L) 10: SameEntity(E1,E2) & Label(E1,L) -> Label(E2,L) 10: SameEntity(E1,E2) & Relation(R,E1,O) -> Relation(R,E2,O) 1: Label_OBIE(E,L) -> Label(E,L) 1: Label_OpenIE(E,L) -> Label(E,L) 1: Relation_Pattern(R,E,O) -> Relation(R,E,O) 1: !Relation(R,E,O) 1: !Label(E,L) JIANG+ICDM12; PUJARA+ISWC13, PUJARA+AIMAG15 38

Rules to Distributions •Rules are grounded by substituting literals into formulas w r : SameEnt (Fab Four , Beatles) ∧ Lbl (Beatles , musician) ⇒ Lbl (Fab Four , musician) •Each ground rule has a weighted satisfaction derived from the formula’s truth value "X # P ( G | E ) = 1 Z exp w r φ r ( G, E ) r ∈ R •Together, the ground rules provide a joint probability distribution over knowledge graph facts, conditioned on the extractions JIANG+ICDM12; PUJARA+ISWC13

Probability Distribution over KGs P ( G | E ) = 1 $ & ∑ Z exp − w r ϕ r ( G ) % ' r ∈ R CandLbl T ( FabFour , novel ) ⇒ Lbl ( FabFour , novel ) Mut ( novel , musician ) ∧ Lbl ( Beatles , novel ) ⇒ ¬ Lbl ( Beatles , musician ) SameEnt ( Beatles , FabFour ) ∧ Lbl ( Beatles , musician ) ⇒ Lbl ( FabFour , musician )

Mining Knowledge Graphs from Text WSDM 2018 J AY P UJARA , S AMEER - PowerPoint PPT Presentation

Mining Knowledge Graphs from Text WSDM 2018 J AY P UJARA , S AMEER S INGH Tutorial Overview https://kgtutorial.github.io Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph Extraction Construction Part 4: Critical Analysis 2

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Mining Knowledge Graphs from Text WSDM 2018 J AY P UJARA , S AMEER S INGH Tutorial Overview

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Lessons between Computer Algebra and Verification/Satisfiability Checking James Davenport 1

Yvain Bruned (U Edinburgh) Resonance based schemes for dispersive equations via decorated trees

E aStencils ExaSlang and the ExaStencils code generator Christian Schmitt 1 , Stefan Kronawitter 2

Mass modifica+on of hadrons associated with par+al chiral symmetry restora+on Masayasu Harada

Performance evaluation of SVX4 telescope 25/12/2016 Yoko YAMAUCHI 1 ATLAS Silicon detectors LHC

Communication by Qawi Harvard Thesis Defense October 8 th , 2009 Introduction Bandwidth and

Fundamental Propertjes of the GraphQL Language Olaf Hartjg @olafiartjg Joint work with Jorge

Robust Topology Control for Indoor Wireless Sensor Networks