tutorial overview
play

Tutorial Overview https://kgtutorial.github.io Part 1: Knowledge - PowerPoint PPT Presentation

Tutorial Overview https://kgtutorial.github.io Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph Extraction Construction Part 4: Critical Analysis 1 Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction Primer


  1. Tutorial Overview https://kgtutorial.github.io Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph Extraction Construction Part 4: Critical Analysis 1

  2. Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction Primer [Jay] Coffee Break 3. Knowledge Graph Construction a. Probabilistic Models [Jay] b. Embedding Techniques [Sameer] 4. Critical Overview and Conclusion [Sameer] 2

  3. Critical Overview SUMMARY SUCCESS STORIES DATASETS, TASKS, SOFTWARES EXCITING RESEARCH DIRECTIONS 3

  4. Critical Overview SUMMARY SUCCESS STORIES DATASETS, TASKS, SOFTWARES EXCITING RESEARCH DIRECTIONS 4

  5. Why do we need Knowledge graphs? •Humans can explore large database in intuitive ways •AI agents get access to human common sense knowledge 5

  6. Knowledge graph construction A 1 E 1 • Who are the entities A 2 (nodes) in the graph? • What are their attributes E 2 and types (labels)? A 1 A 2 • How are they related E 3 (edges)? A 1 A 2 6

  7. Knowledge Graph Construction Extraction Knowledge Knowledge Graph graph Text Extraction Construction graph 7

  8. Two perspectives Extraction graph Knowledge graph • • Who are the entities? Named Entity Entity Linking • (nodes) Recognition Entity Resolution • Entity Coreference • • What are their Entity Typing Collective attributes? (labels) classification • • How are they related? Semantic role Link prediction (edges) labeling • Relation Extraction 8

  9. Knowledge Extraction John was born in Liverpool, to Julia and Alfred Lennon. Text NLP Lennon.. Mrs. Lennon.. his father the Pool John Lennon... .. his mother .. Alfred he Location Person Person Person John was born in Liverpool, to Julia and Alfred Lennon. Annotated text NNP VBD VBD IN NNP TO NNP CC NNP NNP Extraction graph Information Alfred Extraction Lennon childOf birthplace John Liverpool Lennon Julia childOf Lennon 9

  10. Information Extraction Single extractor Supervised Defining domain Learning extractors Semi-supervised Scoring candidate facts Unsupervised Fusing multiple extractors 10

  11. Knowledge Graph Construction Part 2: Extraction Part 3: Knowledge Knowledge Graph graph Text graph Extraction Construction 11

  12. Issues with Extraction Graph Extracted knowledge could be: • ambiguous • incomplete • inconsistent 12

  13. Two approaches for KG construction PROBABILISTIC MODELS EMBEDDING BASED MODELS 13

  14. Two approaches for KG construction PROBABILISTIC MODELS EMBEDDING BASED MODELS 14

  15. Two classes of Probabilistic Models GRAPHICAL MODEL BASED RANDOM WALK BASED ◦ Possible facts in KG are ◦ Possible facts posed as variables queries ◦ Logical rules relate facts ◦ Random walks of the KG constitute “proofs” ◦ Probability path ◦ Probability satisfied lengths/transitions rules ◦ Local grounding ◦ Universal-quantification 15

  16. Illustration of KG Identification (Annotated) Extraction Graph Uncertain Extractions: SameEnt .5: Lbl(Fab Four, novel) Fab Four Beatles .7: Lbl(Fab Four, musician) .9: Lbl(Beatles, musician) .8: Rel(Beatles,AlbumArtist, Abbey Road) Ontology: musician Dom(albumArtist, musician) Mut(novel, musician) novel Entity Resolution: Abbey Road SameEnt(Fab Four, Beatles) After Knowledge Graph Identification Beatles Rel(AlbumArtist ) Lbl Abbey Road musician Fab Four PUJARA+ISWC13; PUJARA+AIMAG15

  17. Random Walk Illustration Query: R(Lennon, PlaysInstrument, ?) 17

  18. Two approaches for KG construction PROBABILISTIC MODELS EMBEDDING BASED MODELS 18

  19. Why embeddings? Limitations of probabilistic models Embedding based models Limitation to Logical Relations • Representation restricted by manual design • Everything as dense vectors • Clustering? Asymmetric implications? • Captures many relations • Information flows through these relations • Learned from data • Difficult to generalize to unseen entities/relations • Can generalize to unseen entities and relations Computational Complexity of Algorithms • Efficient inference at large scale • • Learning is NP-Hard, difficult to approximate Learning using stochastic • Query-time inference is also NP-Hard gradient, back-propagation • • Not easy to parallelize, or use GPUs Querying is often cheap • • Scalability is badly affected by representation GPU-parallelism friendly

  20. Relation Embeddings 20

  21. Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph Extraction Construction 21

  22. Critical Overview SUMMARY SUCCESS STORIES DATASETS, TASKS, SOFTWARES EXCITING RESEARCH DIRECTIONS 22

  23. Success stories YAGO 23

  24. Success story: OpenIE (ReVerb) openie.allenai.org 24

  25. Success story: NELL 25

  26. Success story: YAGO • Input: Wikipedia infoboxes, WordNet and GeoNames • Output: KG with 350K entity types, 10M entities, 120M facts • Temporal and spatial information 26

  27. Link 27

  28. Success story • DBPedia is automatically extracted structured data from Wikipedia • 17M canonical entities • 88M type statements • 72M infobox statements 28

  29. DeepDive • Machine learning based extraction system • Best Precision/recall/F1 in KBP-slot filling task 2014 evaluations (31 teams participated) 29

  30. IE systems in practice Defining Learning Scoring Fusing domain extractors candidate extractors facts ConceptNet NELL Heuristic rules Knowledge Classifier Vault OpenIE 30

  31. Critical Overview SUMMARY SUCCESS STORIES DATASETS, TASKS, SOFTWARES EXCITING RESEARCH DIRECTIONS 31

  32. Datasets • KG as datasets • FB15K-237 Knowledge base completion dataset based on Freebase 1 • DBPedia Structured data extracted from Wikipedia • NELL Read the web datasets • AristoKB Tuple knowledge base for Science domain • Text datasets • Clueweb09: 1 billion webpages (sample of Web) • FACC1: Freebase Annotations of the Clueweb09 Corpora • Gigaword: automatically-generated syntactic and discourse structure • NYTimes: The New York Times Annotated Corpus • Datasets related to Semi-supervised learning for information extraction Link: entity typing, concept discovery, aligning glosses to KB, multi-view learning 1 see Dettmers et al, 2017 for details (https://arxiv.org/pdf/1707.01476.pdf) 32

  33. Shared tasks • Text Analysis Conference on Knowledge Base Population (TAC KBP) • Slot filling task • Cold Start KBP Track • Tri-Lingual Entity Discovery and Linking Track (EDL) • Event Track • Validation/Ensembling Track 33

  34. Software: NLP • Stanford CoreNLP: a suite of core NLP tools [link] (Java code) • FIGER: fine-grained entity recognizer assigns over 100 semantic types link (Java code) • FACTORIE: out-of-the-box tools for NLP and information integration link (Scala code) • EasySRL: Semantic role labeling link (Java code) 34

  35. Software: Extracting and Reasoning • Open IE (University of Washington) Open IE 4.2 link (Scala code) Stanford Open IE link (Java code) • Interactive Knowledge Extraction (IKE) (Allen Institute for Artificial Intelligence) link (Scala code) • PSL: Probabilistic soft logic link (Java code) • ProPPR: Programming with Personalized PageRank link (Java code) 35

  36. Critical Overview SUMMARY SUCCESS STORIES DATASETS, TASKS, SOFTWARES EXCITING RESEARCH DIRECTIONS 36

  37. Exciting Active Research • INTERESTING APPLICATIONS OF KG • MULTI-MODAL INFORMATION EXTRACTION • KNOWLEDGE AS SUPERVISION • COMMON KNOWLEDGE 37

  38. Exciting Active Research • INTERESTING APPLICATIONS OF KG • MULTI-MODAL INFORMATION EXTRACTION • KNOWLEDGE AS SUPERVISION • COMMON KNOWLEDGE 38

  39. Interesting application of Knowledge Graphs The Literome Project [link] • Automatic system for extracting genomic knowledge from PubMed articles • Web-accessible knowledge base Literome: PubMed-Scale Genomic Knowledge Base in the Cloud , Hoifung Poon et al., Bioinformatics 2014 39

  40. Interesting application of Knowledge Graphs Chronic disease management: develop AI technology for predictive and preventive personalized medicine to reduce the national healthcare expenditure on chronic diseases (90% of total cost) 40

  41. Exciting Active Research • INTERESTING APPLICATIONS OF KG • MULTI-MODAL INFORMATION EXTRACTION • KNOWLEDGE AS SUPERVISION • COMMON KNOWLEDGE 41

  42. Knowledge Base Completion Scoring Function Table from Dettmers, et al. (2017)

  43. Multimodal KB Embeddings Scoring Function Encoder Object Entity Lookup Images CNN Text LSTM Numbers, etc. FeedFwd

  44. Exciting Active Research • INTERESTING APPLICATIONS OF KG • MULTI-MODAL INFORMATION EXTRACTION • KNOWLEDGE AS SUPERVISION • COMMON KNOWLEDGE 44

  45. Knowledge as Supervision ✔ spouseOf(Barack, Michelle) Learned Model Update Model Learning User Algorithm Problem 1: Each annotation takes time Problem 2: Each annotation is a drop in the ocean ✔ X husband of Y => spouseOf(X,Y) Learned Model Update Model Learning User Algorithm Many different options - Generalized Expectation - Posterior Regularization - Labeling functions in SNORKEL 45

  46. Exciting Active Research • INTERESTING APPLICATIONS OF KG • MULTI-MODAL INFORMATION EXTRACTION • KNOWLEDGE AS SUPERVISION • COMMON KNOWLEDGE 46

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend