Structured Databases of Named Entities from Bayesian Nonparametrics - PowerPoint PPT Presentation

Structured Databases of Named Entities from Bayesian Nonparametrics Dr. Jacob Eisenstein Machine Learning Department Carnegie Mellon University Ms. Tae Yano Language Technologies Institute Carnegie Mellon University Prof. William W. Cohen Machine Learning Department Carnegie Mellon University Prof. Noah A. Smith Language Technologies Institute Carnegie Mellon University Prof. Eric P. Xing Computer Science Department Carnegie Mellon University

In a Nutshell • A joint model over – a collection of named entity mentions from text and – a structured database table (entities ⨉ name-fields) with data-defined dimensions • Model aims to solve three problems: 1. canonicalize the entities 2. infer a schema for the names 3. match mentions to entities (i.e., coreference resolution) • Preliminary experiments on political blog data, only task 1 in this paper. 2

An Imagined Information Extraction Scenario … [ … ] … ... … John  McCain  Sen.  Mr.  … [ … … ] … … George  Bush  W.  Mr.  … … … [ … ] … Hillary  Clinton  Rodham  Mrs.  [ … ] [ … … ] … … … … … … … Barack  Obama  Sen.  … [ … … … ] Sarah  Palin  We want a initial table database of NER-tagged text: systematic variation in mentions all blogworthy U.S. political inference figures. John  McCain  Sen.  Mr.  … [ … ] … ... … George  Bush  Pres.  W.  Mr.  … [ … … ] … … Hillary  Clinton  Sen.  Rodham  Mrs.  … … … [ … ] … Barack  Obama  Sen.  H.  Mr.  [ … ] [ … … ] … Sarah  Palin  Gov.  Mrs.  … … … … … … Joe  Biden  Sen.  Mr.  … [ … … … ] Ron  Paul  Rep.  Mr.  3

Caveat • Sen. Tom Coburn, M.D. (Rep., Oklahoma), a.k.a. “Dr. No,” does not approve of this research. 4

Prior Work Research problem Related papers Diff Haghighi and Klein, 2010 Predefined schema Information extraction (columns/fields). Charniak, 2001; Elsner et No resolution to Name structure al., 2009 entities. models Felligi and Sunter, 1969; Often on Record linkage Cohen et al., 2000; Pasula bibliographies (not et al., 2002; Bhattacharya raw text); predefined and Getoor, 2007 schema. Li et al., 2004; Haghighi No canonicalization Multi-document and Klein, 2007; Poon and of entity names. coreference resolution Domingos, 2008; Singh et al., 2011 Dreyer and Eisner, 2011 Fixed schema, Morphological linguistic analysis paradigm learning problem. 5

Goal We want a model that solves three problems: 1. canonicalize mentioned entities 2. infer a schema for their names 3. match mentions to entities (i.e., coreference resolution) 6

Generative Story: Types First, generate the table. • Let μ and σ 2 be hyperparameters. • For each column j: – Sample α j from LogNormal(μ, σ 2 ) – Sample multinomial φ j from DP(G 0 , α j ), where G 0 is uniform up to a fixed string length. – For each row i, draw cell value x i,j from φ j μ x i,j φ j α j σ 2 rows/entities columns/fields 7

Field-wise Dirichlet Process Priors very high repetition (low α j ) very high diversity (high α j ) John  McCain  Sen.  Mr.  George  Bush  Pres.  W.  Mr.  Hillary  Clinton  Sen.  Rodham  Mrs.  Barack  Obama  Sen.  H.  Mr.  Sarah  Palin  Gov.  Mrs.  Joe  Biden  Sen.  Mr.  Ron  Paul  Rep.  Mr.  μ x i,j φ j α j σ 2 rows/entities columns/fields 8

Generative Story: Tokens Next, generate the mention tokens. • Draw the distribution over rows/entities to be mentioned, θ r , from Stick(η r ). • Draw the distribution over columns/fields to be used in mentions, θ c , from Stick(η c ). • For each mention m, sample its row r m from θ r . – For each word in the mention, sample its column c m,n from θ c . – Fill in the word to be x r m , c m,n . η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 9

Entity-wise Dirichlet Process Priors entities receive different amounts of attention (fictitious) John  McCain  Sen.  Mr.  George  Bush  Pres.  W.  Mr.  Hillary  Clinton  Sen.  Rodham  Mrs.  Barack  Obama  Sen.  H.  Mr.  Sarah  Palin  Gov.  Mrs.  Joe  Biden  Sen.  Mr.  Ron  Paul  Rep.  Mr.  η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 10

Entity-wise Dirichlet Process Priors entities receive different amounts of attention (fictitious) John  McCain  Sen.  Mr.  George  Bush  Pres.  W.  Mr.  Hillary  Clinton  Sen.  Rodham  Mrs.  Barack  Obama  Sen.  H.  Mr.  Sarah  Palin  Gov.  Mrs.  Joe  Biden  Sen.  Mr.  Ron  Paul  Rep.  Mr.  η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 11

Field-wise Dirichlet Process Priors fields are used with different frequencies (fictitious) John  McCain  Sen.  Mr.  George  Bush  Pres.  W.  Mr.  Hillary  Clinton  Sen.  Rodham  Mrs.  Barack  Obama  Sen.  H.  Mr.  Sarah  Palin  Gov.  Mrs.  Joe  Biden  Sen.  Mr.  Ron  Paul  Rep.  Mr.  η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields menBons  12

Inference At a high level, we are doing Monte Carlo EM. M step: update hyperparameters to improve likelihood E step: MCMC inference over hidden variables η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 13

Gibbs Sampling • Collapse out θ r , θ r , and φ j (standard collapsed Gibbs sampler for Dirichlet process). • Given rows, columns, and words, some of x is determined, and we marginalize the rest. • I’ll describe how we sample columns, rows, and concentrations α j . η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 14

Sampling c m,n Hinges on p(w | …) factors: p ( c m,n | . . . ) p ( w m,n | r m , c m,n , x obs , . . . ) ∝ � N ( c − ( m,n ) = j ) 1 if N ( c − ( m,n ) = j ) > 0 × otherwise N ( c − ( m,n ) ) + η c η c η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 16

Sampling r m • Need to multiply together p(w | …) quantities (see paper) for all words in the mention. • We speed things up by marginalizing out c m,* . • This calculation exploits conditional independence of tokens given the row. η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 17

Sampling α j   • Given number of specified entries in x *,j (n j ) and number of unique entries in x *,j (k j ): exp( − (log α j − µ ) 2 ) α k j j Γ ( α j ) p ( α j | . . . ) ∝ 2 σ 2 Γ ( n j + α j ) η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 18

Column Swaps • One additional move: in a single row, swap entries in two columns of x. • The swap also implies changing some c variables. • See the paper for details on this Metropolis- Hastings step. η r r m w μ θ r x i,j φ j α j σ 2 η c θ c c m,n rows/entities columns/fields mentions 19

Temporal Dynamics entities receive different amounts of attention at different times John  McCain  Sen.  Mr.  George  Bush  Pres.  W.  Mr.  Hillary  Clinton  Sen.  Rodham  Mrs.  Barack  Obama  Sen.  H.  Mr.  Sarah  Palin  Gov.  Mrs.  Joe  Biden  Sen.  Mr.  Ron  Paul  Rep.  Mr.  June July August 20

Recurrent Chinese Restaurant Process (Ahmed and Xing, 2008) • Data are divided into discrete epochs. • Row Dirichlet process includes pseudocounts from previous epoch. • Entities come and go; reappearing after disappearance is vanishingly improbable. In Chinese restaurant view: 1 ,...,m − 1 = i ) + N ( r ( t − 1) = i ) N ( r ( t ) � if positive m = i | r ( t ) p ( r ( t ) 1 ,...,m − 1 , r ( t − 1) , η r ) ∝ otherwise η r This affects updates to η r and sampling of r. 21

Data for Evaluation • Data: blogs on U.S. politics from 2008 (Eisenstein and Xing, 2008) – Stanford NER → 25,000 mentions – Eliminate those with frequency less than 4 and more than 7 tokens – 19,247 mentions (45,466 tokens), 813 unique • Annotation: 100 reference entities – Constructed by merging sets of most frequent mentions, discarding errors – Example: { Barack, Obama, Mr., Sen. } 22

Evaluation • Bipartite matching between reference entities and rows of x. • Measure precision and recall. – Precision is very harsh (only 100 entities in reference set, and finding anything else incurs a penalt y!) – same problem is present in earlier work. • Baseline: agglomerative clustering based on string edit distance (Elmacioglu et al., 2007); di ff erent stopping points define a P-R curve. – No database! 23

Results temporal model basic model baseline 24

Examples Bill Clinton Benazir Bhutto Nancy Pelosi Speaker John Kerry Sen. Roberts Martin King Dr. Jr. Luther Bill Nelson ☺ Bill Clinton is not Bill Nelson 25

Structured Databases of Named Entities from Bayesian Nonparametrics - PowerPoint PPT Presentation

Structured Databases of Named Entities from Bayesian Nonparametrics Dr. Jacob Eisenstein Machine Learning Department Carnegie Mellon University Ms. Tae Yano Language Technologies Institute Carnegie Mellon University Prof. William

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

XML and Databases Chapter 2: XML II: Entities and Marked Sections Prof. Dr. Stefan Brass

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Named Entity Recognition Using BERT and ELMo Group 8 : Mikaela Guerrero Vikash Kumar Nitya

Learning the Species of Biomedical Named Entities from Annotated Corpora Xinglong Wang and Claire

VI.2 IE for Entities, Relations, Roles Extracting named entities (either type-less constants or

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Structured Generative Models for Unsupervised Named Entity Clustering Micha Elsner, Prof. Eugene

Module 3: Creating and Managing Databases Overview Creating Databases Creating

3. Text and document databases Normal databases: formatted records; document databases:

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

11/11/13 1 11/11/13 2 11/11/13 3 11/11/13 4

W eb Soil Survey https://websoilsurvey.sc.egov.usda.go v Shooting Star CSAs Area of I

Session 04 Model selection An example: Cars 93 data Details (~data) from 93 makes of car

A Better Way to Teach Algebra: Spreadsheets and Modeling Eric Gaze Bowdoin College National

Electron Cloud studies in J- PARC MR and Fermilab RR K. Ohmi (KEK) Mar 19-20, 2019 US-Japan

Slides by Mr. Zindman NEW YORK STATE STANDARD 7.5 THE CONSTITUTION IN PRACTICE: The United States

Ischemic Heart Failure Trial Commentary : Alec Vahanian Bichat Hospital, Paris, France

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC