Knowledge Vault: a web-scale approach to probabilistic knowledge - PowerPoint PPT Presentation

Knowledge Vault: a web-scale approach to probabilistic knowledge fusion Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy , Thomas Strohmann, Shaohua Sun, Wei Zhang Google (Machine Intelligence group) KV @ KDD 2014

Outline of the talk 1. Knowledge Graph 2. Knowledge Vault 3. Fact mining from the web 4. Fact mining from graphs 5. Knowledge Fusion KV @ KDD 2014 2

A Knowledge Graph is a multi-graph where nodes = entities, edges = relations NY Knicks opponent teamInLeague LA Lakers playFor playFor Kobe Bryant Pau Gasol playInLeague teammate Kobe Bryant KV @ KDD 2014 3

Example Knowledge Graphs Google’s KG Facebook’s Walmart’s Microsoft’s Entity Graph Satori Kosmix KV @ KDD 2014 4

Freebase is created by fusing structured data sources and human contributions MusicBrainz Wikipedia companies products people TVDB Geo movies places FB KV Talk at KDD, New York, August 25, 2014

The long tail of knowledge • FB is very large (40M nodes, 637M edges) • But it still very incomplete: • We are missing many edges (facts) This talk Relation % unknown in Freebase Profession 68% Place of birth 71% Nationality 75% Education 91% Spouse 92% Parents 94% • We are also missing many nodes (entities) • We are also missing many edge types (schema) KV @ KDD 2014

From Knowledge Graph to Knowledge Vault • There are many groups at Google working on enlarging KG while maintaining high precision . • KV is an exploratory research project to investigate other points along the precision-recall curve. • KV automatically extracts facts from public web sources. • KV embraces the inherent uncertainty associated with this process (every fact has associated confidence and provenance info). KV @ KDD 2014

Previous projects on automatically building KBs (eg NELL, YAGO) predict facts based on text ? Kobe playFor LA Lakers Bryant Pr(<s, r, o>=1|D) “Kobe Bryant, “Kobe Bryant, the franchise player of the Lakers” “Kobe “Kobe once again saved his team” “Kobe Bryant “Kobe Bryant man of the match for Los Angeles” KV @ KDD 2014 9

KV: Predict new facts based on text AND existing edges in FB ? NY Knicks opponent teamInLeague Kobe playFor LA Lakers Bryant LA Lakers playFor Pr(<s, r, o>=1|D) playInLeague Pau Gasol Kobe Bryant teammate “Kobe Bryant, “Kobe Bryant, the franchise player of the Lakers” “Kobe “Kobe once again saved his team” “Kobe Bryant “Kobe Bryant man of the match for Los Angeles” KV @ KDD 2014 10

Web Extractors Priors Fusion KV @ KDD 2014 11

KV is 50x bigger than comparable KBs Total # facts in KV > 2.5B 302M with Prob > 0.9 Open IE (e.g., Mausam et al., 2012) 381M with Prob > 0.7 5B assertions (Mausam, Michael Schmitz, personal communication, October 2013) KV @ KDD 2014 12

Uses for KV's uncertain triples probably false possibly true triples possibly false probably true triples removed used as weak triples used for triples uploaded to KG from KG signals error analysis KV Talk at KDD, New York, August 25, 2014

Fact extraction from the web Webmaster annotations Tables NL text Page structure Extractors Fusion KV @ KDD 2014 15

Fact extraction from text (TXT) • First identify named entities (entity linkage). • Then classify verb phrase as one of 2000 relations Patrick Newport ,who has been working at IHS Global Insight, noted... ORG PER /m/201 /people/person/employment /m/101 The result is a probabilistic triple: Pr(<subject, reln, object>=1 | text) Classifier trained using distant supervision.* Details: see eg tutorial by Ralph Grishman (NYU): “Information Extraction: Capabilities and Challenges”, 2012 * Mintz et al, RANLP 2009 KV @ KDD 2014

Fact extraction from DOM trees* • First identify named entities on page • Then classify X-path connecting each entity pair as one of 2000 relations * Cafarella et al, CACM’11 KV @ KDD 2014

Fact extraction from tables (TBL)* Squares are CVT nodes * Cafarella et al, VLDB’08 KV @ KDD 2014

Fact extraction from schema.org annotation (ANO) <script type=“application/ld+json”> {“@context” : “http://www.schema.org”, “@type” : “Event”, “startDate” : “2014-07-26”, ...} </script> ● About 20% of webpages have machine-readable annotations of commercial events, products, etc. ● Automatically map to KG schema. ● We still need to do entity linking. KV @ KDD 2014

Combine outputs from all extractors • Train binary classifier on Webmaster annotations Tables f(t) = [score-txt(t), #txt(t), … ] using distant supervision. NL text Page structure • Platt scaling to get calibrated probabilities. Extractors Fusion KV @ KDD 2014 20

ROC for each extraction system KV @ KDD 2014 21

Confidence of true facts rises given more evidence KV @ KDD 2014 22

Mining facts from graphs Web Priors Extractors Fusion KV @ KDD 2014 24

Link prediction using tensor factorization • Many methods have been used to fill in missing values in binary matrices, eg tensor factorization associates a low-dimensional vector with every row and column. NY Knicks opponent teamInLeague LA Lakers playFor playFor Kobe Bryant Pau Gasol playInLeague teammate Kobe Bryant = < , ,> KV @ KDD 2014 25

(Deep) neural network for link prediction - Represent each entity and relation by its own low-dimensional (100D) embedding vector. - Stack together, feed into neural net. - Train model to maximize log-likelihood of observed positive and negative triples. - Outperforms neural tensor model (Socher et al). NY teamInLeague Knicks opponent teamInLeague 2 Hidden playFor LA layers Lakers Kobe Bryant playFor Pau Gasol playFor Pau NBA Gasol playInLeague NY Knicks teammate LA Lakers Kobe Bryant KV @ KDD 2014 26

Path Ranking Algorithm [Lao et al., EMNLP11] CityLocatedInCountry(Pittsburgh) = ? U.S. Japan Pennsylvania CityLocatedInCountry … (14) Pittsburgh Philadelphia Harisburg Atlanta Dallas AtLocation Tokyo PPG Delta Logistic Regresssion Feature Value Weight Feature = Typed Path CityInState, CityInstate -1 , CityLocatedInCountry 0.8 0.32 AtLocation -1 , AtLocation, CityLocatedInCountry 0.6 0.20 … … … CityLocatedInCountry(Pittsburgh) = U.S. p=0.58 Figure courtesy ofTom Mitchell and Partha Talukdar KV @ KDD 2014

Example of paths / rules learned by PRA CityLocatedInCountry( city, country ): 7 of the 2985 learned paths 8.04 cityliesonriver, cityliesonriver -1 , citylocatedincountry 5.42 hasofficeincity -1 , hasofficeincity, citylocatedincountry 4.98 cityalsoknownas, cityalsoknownas, citylocatedincountry 2.85 citycapitalofcountry,citylocatedincountry -1 ,citylocatedincountry 2.29 agentactsinlocation -1 , agentactsinlocation, citylocatedincountry 1.22 statehascapital -1 , statelocatedincountry 0.66 citycapitalofcountry . . Figure courtesy of Tom Mitchell and Partha Talukdar KV @ KDD 2014

PRA similar in performance to neural network KV Talk at KDD, New York, August 25, 2014

Web Extractors Priors Fusion KV @ KDD 2014 31

Fusing web extractions with graph priors KV @ KDD 2014 32

Example: (Barry Richter, studiedAt, UW-Madison) “In the fall of 1989, Richter accepted a scholarship to the University of Wisconsin, where he played for four years and earned numerous individual accolades ...” “The Polar Caps' cause has been helped by the impact of knowledgeable coaches such as Andringa, Byce and former UW teammates Chris Tancill and Barry Richter.” è Web extraction confidence: 0.14 <Barry Richter, born in, Madison> <Barry Richter, lived in, Madison> è Final belief (fused with prior): 0.61 KV @ KDD 2014 33

Summary and future work • KV has 2.5B triples automatically extracted from the web. • Combining web mining and graph mining can improve precision. • Work in progress Discovering new entities § • Clustering open IE extractions, CIKM 2014 • Robust wrapper induction for long-tail verticals (work in progress) Discovering new relations § • Clustering open IE extractions, CIKM 2014 • “Biperpedia”, VLDB 2014 Assessing trust-worthiness of web sites: VLDB 2014 § Common sense fact mining eg “apples” (work in progress) § KV @ KDD 2014 34

EXTRA SLIDES KV @ KDD 2014 35

Application 1: Knowledge Panels Augmenting the presentation with relevant facts KV @ KDD 2014 36

Application 2: Related Entities KV @ KDD 2014 37

Knowledge Vault: a web-scale approach to probabilistic knowledge - PowerPoint PPT Presentation

Knowledge Vault: a web-scale approach to probabilistic knowledge fusion Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy , Thomas Strohmann, Shaohua Sun, Wei Zhang Google (Machine Intelligence group) KV @ KDD 2014

Vault Provider The Vault provider allows Terraform to read from, write to, and congure

Hacking and protecting Oracle Database Vault Esteban Martnez Fay Argeniss (www.argeniss.com)

Secret Management with Hashicorp's Vault Daniel Bornkessel Secret Management with Hashicorp's

VH1 Presentation CRACK THE VAULT CODE Contest Description The Crack The Vault

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Basic Fit Training Vault and Skirt Curvature ClearKone is available in 11 different vaults of

Village of Wilmette West Side Neighborhood Storage Stormwater Detention Vault Configuration

Playbill Vault The largest source of information about Broadway people, shows, theatres, and

Be secret like a ninja with Mehdi LARUELLE Hashicorp Vault @D2SI Whoami ? D2SI Me Mehdi

CephFS Development Update John Spray john.spray@redhat.com Vault 2015 Agenda Introduction

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Large-Scale Web Applications Mendel Rosenblum CS142 Lecture Notes - Large-Scale Web Apps Web

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

City of Ivanhoe Road Bond Proposition 9/25/18 1 Key Points Road Construction Cost

INVESTOR PRESENTA TATIO ION ASX:K2F DISCLAIMER Important Information The material contained

Extension of Central Sewers Information and Options for Property Owners What this meeting is

SOME OME USE SEFUL FUL IN INFORM ORMATION TION TO HE HELP LP YOU U TO UNDER DERST STAND

Michael Twiddy Regional Director, Elizabeth City State University Matthew J. Byrne Business

Smart Cities @ Catalyzing collaboration between academia, industry and government to solve

Peer Cities Project Baris Gumus-Dawes, Senior Researcher January 19, 2012 Overview The

WELCOME SAAESD to NC State March 27, 2012 Dr. Johnny C. Wynne Dean Research Triangle