Epistemological Databases Andrew McCallum Department of Computer - PowerPoint PPT Presentation

Knowledge Base Construction with Epistemological Databases Andrew McCallum Department of Computer Science University of Massachusetts Amherst Joint work with Sameer Singh , Michael Wick , Limin Yao , Sebastian Riedel, Karl Schultz, Aron Culotta.

institutions, conferences, journals, grants, advisors,...

Goal Application A KB of all scientists in the world from papers, reports, web pages, newswire, press releases, blogs, patents,.. • Better tools → Accelerate progress of science. • Help... - find papers to read, to cite - find reviewers, collaborators, people to hire - understand trends and landscape of science • Platform for a “New Model of Publishing” [LeCun] - post to archive; public comments and ratings.

Attributes of our Task A KB of all scientists in the world from papers, reports, web pages, newswire, press releases, blogs, patents,.. • Open universe of entities (strong entity resolution essential) - not coref into pre-known finite set e.g. in Wikipedia • Closed list of relation types* - not OpenIE *later “open” through “universal schema” • Low tolerance for error - users willing to edit • Changing world - e.g. new papers, people moving institutions,...

Knowledge Base Construction Text Text Text docs Structured Wei Li studies at Xinghua U. docs docs Her 2008 publications include Data query W. Li. “Scalable NLP” ACL, 2008. Entity Entities, Relation Mentions Relations Mentions Entity Relation Resolution KB Extraction Extraction (Coref) Wei Li Attends ( Wei Li 72% W. Li Wei Li, W. Li ML ML ML Xinghua U. Xinghua U.) Xinghua U. 90% 90% 90% “truth” answer Information Extraction components aren’t perfect. Errors snowball.

Knowledge Base Construction Text Text Text docs Structured docs docs Data query p(Entity p(Entities, p(Relation Mentions) Relations) Mentions) Entity Relation Resolution KB Extraction Extraction (Coref) p(“truth”) ML ML ML ML Joint Inference Fundamental Issue in answer all Artificial Intelligence 1. How to represent & inject uncertainty from IE into DB? [POS & shallow parsing, ICML 2004] 2. Want to use DB contents to aid IE. [Entity & Relation Extraction, ACL, 2011] ... 3. IE isn’t “one-shot.” Add new data later; redo inference. Want DB infrastructure to manage IE.

Knowledge Base Construction “Epistemological Database” [2010, 2012] evidence evidence evidence Text Text Human Text docs Structured docs Edits docs Data query p(Entity p(Entities, p(Relation Mentions) Relations) Mentions) Entity Relation Resolution KB Extraction Extraction (Coref) p(“truth”) Human Edits as evidence: [Wick, Schultz, McCallum 2012] answer ✘ Traditional: Change DB record of truth ✔ Mini-document “Nov 15: Scott said this was true” - Sometimes humans are wrong, disagree, out-of-date. Epistemological Philosophy - Jointly reason about truth & editors’ reliability/reputation. “Truth is inferred, not observed.”

“Epistemological Database” evidence evidence evidence Text Text Human Text docs Structured docs Edits docs Data query p(Entity p(Entities, p(Relation Mentions) Relations) Mentions) Entity Relation Resolution KB Extraction Extraction (Coref) p(“truth”) inference constantly bubbling in background... Never Ending Inference [Riedel, Wick, McCallum 2012] answer ✘ KB entries locked in ✔ KB entries always reconsidered with more evidence, time,...

“Epistemological Database” evidence evidence evidence Text Text Human Text docs Structured docs Edits docs Data query p(Entity p(Entities, p(Relation Mentions) Relations) Mentions) Entity Relation Resolution KB Extraction Extraction (Coref) p(“truth”) inference constantly bubbling in background... Resolution is foundational [KDD 2008; ACL 2012] answer ✘ Not just for coref of entity-mentions... ✔ Align values, ontologies, schemas, relations, events,... Especially in Epistemological DB: entities/relations never input, only “mentions”

“Epistemological Database” evidence evidence evidence Text Text Human Text docs Structured docs Edits docs Data query p(Entity p(Entities, p(Relation Mentions) Relations) Mentions) Entity Relation Resolution KB Extraction Extraction (Coref) p(“truth”) inference constantly bubbling in background... Resource-bounded Information Gathering [WSDM 2012] answer ✘ Full processing on whole web ✔ Focus queries and processing where needed & fruitful

“Epistemological Database” evidence evidence evidence Text Text Human Text docs Structured docs Edits docs Data query p(Entity p(Entities, p(Relation Mentions) Relations) Mentions) Entity Relation Resolution KB Extraction Extraction (Coref) p(“truth”) inference constantly bubbling in background... Inference Inference Inference Inference Inference Inference worker worker worker worker worker worker answer Smart Parallelism [ACL 2011; NIPS 2011] ✘ MapReduce, black-box ✔ Reason about inference & parallelism together

“Epistemological Database” evidence evidence evidence Text Text Human Text docs Structured docs Edits docs Data query p(Entity p(Entities, p(Relation Mentions) Relations) Mentions) Entity Relation Resolution KB Extraction Extraction (Coref) p(“truth”) inference constantly bubbling in background... Inference Inference Inference Inference Inference Inference worker worker worker worker worker worker answer MCMC, parallel, distributed [ACL 2011; submitted 2012] ✘ Unroll whole factor graph. Limited model structures. ✔ Focused sampling, conflict resolution, particle filtering

“Epistemological Database” evidence evidence evidence Text Text Human Text docs Structured docs Edits docs Data query p(Entity p(Entities, p(Relation Mentions) Relations) Mentions) Entity Relation Resolution KB Extraction Extraction (Coref) Samples p(“truth”) inference constantly bubbling in background... Inference Inference Inference Inference Inference Inference worker worker worker worker worker worker answer MCMC, parallel, distributed [ACL 2011; submitted 2012] ✘ Unroll whole factor graph. Limited model structures. ✔ Focused sampling, conflict resolution, particle filtering

Research Ingredients 1. Learning SampleRank 2. Entity Resolution 3. Human Edits 4. Relations with “Universal Schema” 5. Probabilistic Programming

#2 Entity Resolution Parallel / Distributed Interplay between modeling & efficiency

Entity Resolution Entity resolution by CRF with pairwise factors M. Smith Michael Smith

Entity Resolution Entity resolution by CRF with pairwise factors

Entity Resolution Entity resolution by CRF with pairwise factors Machine 1 Machine 2 These two proposals can be evaluated (and accepted) in parallel.

Entity Resolution in Parallel by Map-Reduce [Singh, Subramanian, Pereira, McCallum, ACL, 2011] Inference Distributor Inference Inference “Reduce step” “Map step”

Parallelism = faster

Epistemological Databases Andrew McCallum Department of Computer - PowerPoint PPT Presentation

Knowledge Base Construction with Epistemological Databases Andrew McCallum Department of Computer Science University of Massachusetts Amherst Joint work with Sameer Singh , Michael Wick , Limin Yao , Sebastian Riedel, Karl Schultz, Aron

How to e ff ect change in the Epistemological Wasteland of Application Security James Wickett

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Module 3: Creating and Managing Databases Overview Creating Databases Creating

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Image Databases Image Databases Image Databases Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

Databases and PHP Accessing databases from PHP PHP & Databases l PHP can connect to

3. Text and document databases Normal databases: formatted records; document databases:

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Investigating the influence of assessment questions on student epistemological resources in

Reframing the gambling field: epistemological and methodological shifts and the study of the

WetlandLIFE, epistemological equality and a disciplinary theatre: the experiences of art

Neo4j and graph databases Presented By: Stephanie McIntyre Graph Databases: The Database Model

HUDOC databases: CPT and ESC Patrick Mller HUDOC databases: CPT and ESC HUDOC HUDOC = Human

Natural Language Processing Info 159/259 Lecture 19: Semantic parsing (Oct. 30, 2018) David

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra

Slide 7 / 120 Slide 8 / 120 4 What is the formula for measuring the speed of an 5 Lance

Hormonal contraception (HC), thrombosis and cancer. An update jvind Lidegaard Clinical

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Multimedia Event Detection: Strong by Integration Hao ZHANG 1 , Maaike de Boer 2 Yijie Lu 1 ,

1 Operator Good morning and welcome to the Henkel conference call. With us today are Kasper

An Overview What is sustainability materiality? Are there any difference between

Epistemological Databases Andrew McCallum Department of Computer - PowerPoint PPT Presentation

Knowledge Base Construction with Epistemological Databases Andrew McCallum Department of Computer Science University of Massachusetts Amherst Joint work with Sameer Singh , Michael Wick , Limin Yao , Sebastian Riedel, Karl Schultz, Aron

How to e ff ect change in the Epistemological Wasteland of Application Security James Wickett

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Module 3: Creating and Managing Databases Overview Creating Databases Creating

GEMS/Food Databases and GEMS/Food Databases and GEMS/Food Databases and in the Food Supply

Image Databases Image Databases Image Databases Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

Databases and PHP Accessing databases from PHP PHP &amp; Databases l PHP can connect to

3. Text and document databases Normal databases: formatted records; document databases:

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Investigating the influence of assessment questions on student epistemological resources in

Reframing the gambling field: epistemological and methodological shifts and the study of the

WetlandLIFE, epistemological equality and a disciplinary theatre: the experiences of art

Neo4j and graph databases Presented By: Stephanie McIntyre Graph Databases: The Database Model

HUDOC databases: CPT and ESC Patrick Mller HUDOC databases: CPT and ESC HUDOC HUDOC = Human

Natural Language Processing Info 159/259 Lecture 19: Semantic parsing (Oct. 30, 2018) David

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra

Slide 7 / 120 Slide 8 / 120 4 What is the formula for measuring the speed of an 5 Lance

Hormonal contraception (HC), thrombosis and cancer. An update jvind Lidegaard Clinical

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Multimedia Event Detection: Strong by Integration Hao ZHANG 1 , Maaike de Boer 2 Yijie Lu 1 ,

1 Operator Good morning and welcome to the Henkel conference call. With us today are Kasper

An Overview What is sustainability materiality? Are there any difference between

Databases and PHP Accessing databases from PHP PHP & Databases l PHP can connect to