Structured Generative Models for Unsupervised Named Entity - PowerPoint PPT Presentation

Structured Generative Models for Unsupervised Named Entity Clustering Micha Elsner, Prof. Eugene Charniak, Prof. Mark E. Johnson Brown Lab for Linguistic and Information Processing Brown University Providence, RI

Named Entities People Micha Elsner Prof. Eugene Charniak Prof. Mark E. Johnson Organizations Brown Lab for Linguistic and Information Processing Brown University Places Providence, RI 2

Named Entity Structure People Micha Elsner Prof. Eugene Charniak Prof. Mark E. Johnson Organizations Brown Lab for Linguistic and Information Processing Brown University Places Providence RI 3

Motivation Isn’t this old news? ◮ Cotraining: (Collins+Singer ‘99, Riloff+Jones ‘99) 4

Motivation Isn’t this old news? ◮ Cotraining: (Collins+Singer ‘99, Riloff+Jones ‘99) Generative models New direction in coreference resolution: (Haghighi+Klein ‘07) (Ng ‘08) and others Integrated models for subtasks (including Named Entity) ◮ (H+K) cluster named entities using... ◮ Head word ◮ Coreferent pronouns ◮ Results are promising. ◮ Can we make them state-of-the-art? 4

Goal ◮ Unsupervised, generative model ◮ Cluster named entities by type People Micha Elsner Prof. Eugene Charniak 5

Goal ◮ Unsupervised, generative model ◮ Cluster named entities by type People Micha Elsner Prof. Eugene Charniak ◮ Discover word classes Micha Elsner Prof. Eugene Charniak 5

Goal ◮ Unsupervised, generative model ◮ Cluster named entities by type People Micha Elsner Prof. Eugene Charniak ◮ Discover word classes Micha Elsner Prof. Eugene Charniak ◮ Cluster possibly-coreferent phrases? People Micha Elsner Prof. Eugene Charniak Charniak 5

Overview Introduction Clustering as parsing Consistency: finding possible entities Experiments: pronouns are key! Future directions 6

Clustering as parsing Grammar: NE NE → pers org NE → org NE → loc org_term org_term org → org _ term + Brown University org _ term → Brown NE org _ term → University pers → pers _ term + pers pers _ term → Moses pers_term pers_term pers _ term → Brown Moses Brown 8

Internal structure Grammar: NE NE → org org → org 1 org 2 org 1 2 org 1 → Brown org org org 2 → University Brown University 9

Internal structure Grammar: NE NE → org org → org 1 org 2 org org → ( org 1 )( org 2 )( org 3 )( org 4 )( org 5 ) 1 2 org 1 → Brown org org org 2 → University Brown University 9

Multiword expansions Grammar: NE NE → loc loc place → loc 1 loc 2 1 2 loc 1 → Providence loc loc loc 2 → Rhode Island Providence Rhode Island 10

Gathering features ◮ Nominal modifiers (Collins+Singer ‘99) ◮ Appositive: “Hillary Clinton, the Secretary of State ◮ Prenominal: “ candidate Hillary Clinton” ◮ Prepositional governor (C+S ‘99) ◮ “a spokesman for Hillary Clinton” ◮ Personal pronouns ◮ “. . . Hillary Clinton. She said . . . ” ◮ Unsupervised model of (Charniak+Elsner ‘09) ◮ Relative pronouns ◮ “Hillary Clinton, who said. . . ” Add features to input strings: Hillary Clinton # Secretary candidate # spokesman-for # she who 11

Adding features Grammar: NE org pronouns org → org org 1 org 2 → pronouns org # pronoun org ∗ → pronoun org which → pronoun org they → . . . pronoun org he → . . . NE org 1 2 pronouns org org org Brown University # which 12

Learning the grammar How to learn rule probabilities? ◮ Many, many rules: ◮ With multiword strings, infinite! ◮ Most of them useless. Bayesian model Sparse prior over rules. Only useful rules get non-zero probability. 13

Adaptor grammars (Johnson+al ‘07) ◮ Prior over grammars ◮ Form of hierarchical Dirichlet process ◮ Black-box inference, downloadable software ◮ Development is just writing the grammar ◮ But standard inference isn’t always good enough Tuesday, 11:30 “Improving nonparameteric Bayesian inference experiments on unsupervised word segmentation with adaptor grammars”, Mark Johnson and Sharon Goldwater. 14

Consistent phrases Definition: Consistent Phrases that could refer to the same entity. Weaker than coreference. Non-trivial for named entities. Inconsistent, same heads: ◮ Ford Motor Co. ◮ Lockheed Martin Co. Consistent, different heads: ◮ Professor Johnson ◮ Mark 16

Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson 17

Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson realizations Mark Johnson 17

Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson realizations Mark Johnson Prof. Johnson 17

Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson realizations Mark Johnson Prof. Johnson Mark 17

Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson realizations Mark Johnson Prof. Johnson Mark inconsistent Mark Steedman 17

Experimental setup Datasets: ◮ Labeled data: MUC-7 ◮ Three entity classes: PERS, ORG, LOC ◮ Unlabeled data: NANC Combine features for multiple examples: Hillary Clinton # # # who Hillary Clinton # Secretary # # she Hillary Clinton # # spokesman-for # her Hillary Clinton # Secretary # spokesman-for # she her who More data in equal time... but no per-document features. 19

Basic results Our model: Baseline (all ORG): 46% 86% Our best model: Confusion matrix: loc org per LOC 1187 97 37 ORG 223 1517 122 PER 36 20 820 20

Essentially unjustified comparisons (Haghighi+Klein ‘07) ◮ ACE corpus: 61% (Collins+Singer ‘99) ◮ Easier dataset ◮ Only examples with features ◮ Proportionally more people ◮ Generative baseline: 83% ◮ Cotraining: 91% Supervised MUC-7: ◮ Best system (LTG): 94% ◮ Human: 97% 21

Breakdown by features Model Dev accuracy Baseline (All ORG) 42.5 Core NPs (no consistency) 45.5 Core NPs (consistency) 48.5 Context features (nominal/prep) 83.3 All features (context + pronouns) 87.1 22

Named entity structure pers 0 pers 1 pers 2 pers 3 pers 4 rep. john minister brown jr. sen. robert j. smith a washington david john b smith dr. michael l. johnson iii loc 0 loc 1 loc 2 loc 3 loc 4 washington the texas county monday los angeles st. new york city thursday south new washington beach river north national united states valley tuesday 23

Judging consistency Sometimes right: ◮ Dr. Seuss ◮ Dr. Quinn ... correctly judged inconsistent. 24

Judging consistency Sometimes right: ◮ Dr. Seuss ◮ Dr. Quinn ... correctly judged inconsistent. Sometimes wrong: ◮ Dr. William F. Gibson ◮ Dr. William Gibson ... judged inconsistent. ◮ Bruce Jarvis ◮ Bruce Ellen Jarvis ... judged consistent. 24

Inference is a problem Gibbs sampling ◮ Converges in the limit.... ◮ Not in real life! ◮ Clustering problems are often NP-hard: ◮ There’s no guaranteed method. For this model: ◮ Used heuristic inference ◮ Still only partial convergence! 25

Conclusion Introduction Clustering as parsing Consistency: finding possible entities Experiments: pronouns are key! Future directions 26

What’s next ◮ Add named-entity to unsupervised coreference ◮ Document-level features might help NE... ◮ If the combined model could scale. ◮ Improve inference for Bayesian models ◮ Gibbs sampling isn’t good enough... ◮ Better sampling? ◮ Or something completely different? ◮ Adaptor grammars: what else are they good for? 28

Thanks! ◮ Three reviewers ◮ NSF ◮ All of you! 29

Overview Adaptor grammars: framework for Bayesian grammar learning Implementing Consistency Inference: a general problem for this approach 30

Structured Generative Models for Unsupervised Named Entity - PowerPoint PPT Presentation

Structured Generative Models for Unsupervised Named Entity Clustering Micha Elsner, Prof. Eugene Charniak, Prof. Mark E. Johnson Brown Lab for Linguistic and Information Processing Brown University Providence, RI Named Entities People Micha

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

generative design systems Generative Brief Design Definitions Workshop Processes

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Recycling Named Entity Taggers Unsupervised Domain and Language Adaptation for Named Entity

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

AIDA-light: High-Throughput Named-Entity Disambiguation Ba Dat Nguyen Johannes Hoffart Martin

Stanford CS193p Developing Applications for iOS Winter 2017 CS193p Winter 2017 Today Core Data

Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema Patrick Verga,

The Entity-Relationship Model Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Conceptual Design Using the Entity-Relationship (ER) Model Module 5, Lectures 1 and 2 Database

Outline Morning program Preliminaries Modeling user behavior Semantic matching Learning to

NECKAr: A Named Entity Classifier for Wikidata Johanna Gei, Andreas Spitz, Michael Gertz

The Entity-Relationship Model Chapter 2 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu

Structured Generative Models for Unsupervised Named Entity - PowerPoint PPT Presentation

Structured Generative Models for Unsupervised Named Entity Clustering Micha Elsner, Prof. Eugene Charniak, Prof. Mark E. Johnson Brown Lab for Linguistic and Information Processing Brown University Providence, RI Named Entities People Micha

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

generative design systems Generative Brief Design Definitions Workshop Processes

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Recycling Named Entity Taggers Unsupervised Domain and Language Adaptation for Named Entity

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

AIDA-light: High-Throughput Named-Entity Disambiguation Ba Dat Nguyen Johannes Hoffart Martin

Stanford CS193p Developing Applications for iOS Winter 2017 CS193p Winter 2017 Today Core Data

Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema Patrick Verga,

The Entity-Relationship Model Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Conceptual Design Using the Entity-Relationship (ER) Model Module 5, Lectures 1 and 2 Database

Outline Morning program Preliminaries Modeling user behavior Semantic matching Learning to

NECKAr: A Named Entity Classifier for Wikidata Johanna Gei, Andreas Spitz, Michael Gertz

The Entity-Relationship Model Chapter 2 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan