 
              Structured Generative Models for Unsupervised Named Entity Clustering Micha Elsner, Prof. Eugene Charniak, Prof. Mark E. Johnson Brown Lab for Linguistic and Information Processing Brown University Providence, RI
Named Entities People Micha Elsner Prof. Eugene Charniak Prof. Mark E. Johnson Organizations Brown Lab for Linguistic and Information Processing Brown University Places Providence, RI 2
Named Entity Structure People Micha Elsner Prof. Eugene Charniak Prof. Mark E. Johnson Organizations Brown Lab for Linguistic and Information Processing Brown University Places Providence RI 3
Motivation Isn’t this old news? ◮ Cotraining: (Collins+Singer ‘99, Riloff+Jones ‘99) 4
Motivation Isn’t this old news? ◮ Cotraining: (Collins+Singer ‘99, Riloff+Jones ‘99) Generative models New direction in coreference resolution: (Haghighi+Klein ‘07) (Ng ‘08) and others Integrated models for subtasks (including Named Entity) ◮ (H+K) cluster named entities using... ◮ Head word ◮ Coreferent pronouns ◮ Results are promising. ◮ Can we make them state-of-the-art? 4
Goal ◮ Unsupervised, generative model ◮ Cluster named entities by type People Micha Elsner Prof. Eugene Charniak 5
Goal ◮ Unsupervised, generative model ◮ Cluster named entities by type People Micha Elsner Prof. Eugene Charniak ◮ Discover word classes Micha Elsner Prof. Eugene Charniak 5
Goal ◮ Unsupervised, generative model ◮ Cluster named entities by type People Micha Elsner Prof. Eugene Charniak ◮ Discover word classes Micha Elsner Prof. Eugene Charniak ◮ Cluster possibly-coreferent phrases? People Micha Elsner Prof. Eugene Charniak Charniak 5
Overview Introduction Clustering as parsing Consistency: finding possible entities Experiments: pronouns are key! Future directions 6
Overview Introduction Clustering as parsing Consistency: finding possible entities Experiments: pronouns are key! Future directions 7
Clustering as parsing Grammar: NE NE → pers org NE → org NE → loc org_term org_term org → org _ term + Brown University org _ term → Brown NE org _ term → University pers → pers _ term + pers pers _ term → Moses pers_term pers_term pers _ term → Brown Moses Brown 8
Internal structure Grammar: NE NE → org org → org 1 org 2 org 1 2 org 1 → Brown org org org 2 → University Brown University 9
Internal structure Grammar: NE NE → org org → org 1 org 2 org org → ( org 1 )( org 2 )( org 3 )( org 4 )( org 5 ) 1 2 org 1 → Brown org org org 2 → University Brown University 9
Multiword expansions Grammar: NE NE → loc loc place → loc 1 loc 2 1 2 loc 1 → Providence loc loc loc 2 → Rhode Island Providence Rhode Island 10
Gathering features ◮ Nominal modifiers (Collins+Singer ‘99) ◮ Appositive: “Hillary Clinton, the Secretary of State ◮ Prenominal: “ candidate Hillary Clinton” ◮ Prepositional governor (C+S ‘99) ◮ “a spokesman for Hillary Clinton” ◮ Personal pronouns ◮ “. . . Hillary Clinton. She said . . . ” ◮ Unsupervised model of (Charniak+Elsner ‘09) ◮ Relative pronouns ◮ “Hillary Clinton, who said. . . ” Add features to input strings: Hillary Clinton # Secretary candidate # spokesman-for # she who 11
Adding features Grammar: NE org pronouns org → org org 1 org 2 → pronouns org # pronoun org ∗ → pronoun org which → pronoun org they → . . . pronoun org he → . . . NE org 1 2 pronouns org org org Brown University # which 12
Learning the grammar How to learn rule probabilities? ◮ Many, many rules: ◮ With multiword strings, infinite! ◮ Most of them useless. Bayesian model Sparse prior over rules. Only useful rules get non-zero probability. 13
Adaptor grammars (Johnson+al ‘07) ◮ Prior over grammars ◮ Form of hierarchical Dirichlet process ◮ Black-box inference, downloadable software ◮ Development is just writing the grammar ◮ But standard inference isn’t always good enough Tuesday, 11:30 “Improving nonparameteric Bayesian inference experiments on unsupervised word segmentation with adaptor grammars”, Mark Johnson and Sharon Goldwater. 14
Overview Introduction Clustering as parsing Consistency: finding possible entities Experiments: pronouns are key! Future directions 15
Consistent phrases Definition: Consistent Phrases that could refer to the same entity. Weaker than coreference. Non-trivial for named entities. Inconsistent, same heads: ◮ Ford Motor Co. ◮ Lockheed Martin Co. Consistent, different heads: ◮ Professor Johnson ◮ Mark 16
Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson 17
Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson realizations Mark Johnson 17
Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson realizations Mark Johnson Prof. Johnson 17
Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson realizations Mark Johnson Prof. Johnson Mark 17
Modeling consistency Model’s concept of consistency follows (Charniak ‘01) : Phrases are consistent if none of their internal subparts clash. 1 2 3 4 pers pers pers pers Ordered template Prof. Mark E. Johnson realizations Mark Johnson Prof. Johnson Mark inconsistent Mark Steedman 17
Overview Introduction Clustering as parsing Consistency: finding possible entities Experiments: pronouns are key! Future directions 18
Experimental setup Datasets: ◮ Labeled data: MUC-7 ◮ Three entity classes: PERS, ORG, LOC ◮ Unlabeled data: NANC Combine features for multiple examples: Hillary Clinton # # # who Hillary Clinton # Secretary # # she Hillary Clinton # # spokesman-for # her Hillary Clinton # Secretary # spokesman-for # she her who More data in equal time... but no per-document features. 19
Basic results Our model: Baseline (all ORG): 46% 86% Our best model: Confusion matrix: loc org per LOC 1187 97 37 ORG 223 1517 122 PER 36 20 820 20
Essentially unjustified comparisons (Haghighi+Klein ‘07) ◮ ACE corpus: 61% (Collins+Singer ‘99) ◮ Easier dataset ◮ Only examples with features ◮ Proportionally more people ◮ Generative baseline: 83% ◮ Cotraining: 91% Supervised MUC-7: ◮ Best system (LTG): 94% ◮ Human: 97% 21
Breakdown by features Model Dev accuracy Baseline (All ORG) 42.5 Core NPs (no consistency) 45.5 Core NPs (consistency) 48.5 Context features (nominal/prep) 83.3 All features (context + pronouns) 87.1 22
Named entity structure pers 0 pers 1 pers 2 pers 3 pers 4 rep. john minister brown jr. sen. robert j. smith a washington david john b smith dr. michael l. johnson iii loc 0 loc 1 loc 2 loc 3 loc 4 washington the texas county monday los angeles st. new york city thursday south new washington beach river north national united states valley tuesday 23
Judging consistency Sometimes right: ◮ Dr. Seuss ◮ Dr. Quinn ... correctly judged inconsistent. 24
Judging consistency Sometimes right: ◮ Dr. Seuss ◮ Dr. Quinn ... correctly judged inconsistent. Sometimes wrong: ◮ Dr. William F. Gibson ◮ Dr. William Gibson ... judged inconsistent. ◮ Bruce Jarvis ◮ Bruce Ellen Jarvis ... judged consistent. 24
Inference is a problem Gibbs sampling ◮ Converges in the limit.... ◮ Not in real life! ◮ Clustering problems are often NP-hard: ◮ There’s no guaranteed method. For this model: ◮ Used heuristic inference ◮ Still only partial convergence! 25
Conclusion Introduction Clustering as parsing Consistency: finding possible entities Experiments: pronouns are key! Future directions 26
Overview Introduction Clustering as parsing Consistency: finding possible entities Experiments: pronouns are key! Future directions 27
What’s next ◮ Add named-entity to unsupervised coreference ◮ Document-level features might help NE... ◮ If the combined model could scale. ◮ Improve inference for Bayesian models ◮ Gibbs sampling isn’t good enough... ◮ Better sampling? ◮ Or something completely different? ◮ Adaptor grammars: what else are they good for? 28
Thanks! ◮ Three reviewers ◮ NSF ◮ All of you! 29
Overview Adaptor grammars: framework for Bayesian grammar learning Implementing Consistency Inference: a general problem for this approach 30
Recommend
More recommend