A Unified Local and Global Model for Discourse Coherence Micha - PowerPoint PPT Presentation

A Unified Local and Global Model for Discourse Coherence Micha Elsner, Joseph Austerweil, Eugene Charniak Brown Laboratory for Linguistic Information Processing (BLLIP)

Coherence Ranking Sentence 4 Sentence 1 A+! Sentence 3 Sentence 2 Sentence 1 Sentence 3 Sentence 2 Sentence 4 Sentence 2 Sentence 2 B Sentence 1 Sentence 1 Sentence 4 Sentence 4 Sentence 3 Sentence 3 Sentence 1 Sentence 4 C Sentence 2 Sentence 3 Sentence 3 Sentence 1 Sentence 4 Sentence 2 Ranked Proposed Orderings Orderings

Sentence Ordering Sentence ? Sentence 1 Sentence ? Sentence 2 Data Sentence ? Sentence 3 Source Sentence ? Sentence 4 Bag of Sentences Ordered Document

Overview ● Previous Work: Entity Grids ● Previous Work: Hidden Markov Model ● Relaxed Entity Grid ● Unified Hidden Markov Model ● Corpus and Experiments ● Conclusions and Future Work

An Entity Grid Barzilay and Lapata '05, Lapata and Barzilay '05. The commercial pilot , sole occupant of the airplane , was not injured. The airplane was owned and operated by a private owner . Visual meteorological conditions prevailed for the personal cross country flight for which a VFR flight plan was filed. The flight originated at Nuevo Laredo , Mexico , at approximately 1300. O C A o c c n L i O r u a p d F w p r l i P a t l P e i a i n g o l n i d a n l e h n o e o n t r t t 0 - X - - O - - X Syntactic 1 - O - - - - X - Role in 2 O - S X - - - - Sentence 3 - - - S - X - -

Local Coherence: Entity Grids ● Loosely based on Centering Theory. – Coherent texts repeat important nouns . ● Grid shows most prominent role of each head noun in each sentence. C A o n i r d p A transition from X to O. F i l t a P i l o i n g l n a e h (Here the history size is 1, n but 2 works better.) - X - - O - O - S - - -

Computing with Entity Grids ● Generatively: Lapata and Barzilay. – Assume independence between columns. ● This independence assumption C o A n can cause problems for the i r d p i F l t a i P l o i n g l n a e generative approach. h n - X - – Barzilay and Lapata get - O - better results with SVMs. O - S - - - ∏ ∏ ∏ ... Π

Entity Grids Model Local Coherence A coherent entity grid at very low zoom: entities occur in long contiguous columns. A grid for a randomly permuted document tends to look like this. But what if we flip it? Or move around paragraphs?

Markov Model ● Barzilay and Lee q i 2004, “Catching the q i = 1 Drift” ● Hidden Markov the pilot Model for document received structure. minor ● Each state generates injuries sentences from another HMM.

Global Coherence ● The HMM is good at learning overall document structure: – Finding the start, end and boundaries. ● But all local information has to be stored in the state variable. – Creates problems with sparsity. A wombat escaped from the cargo bay. Finally the wombat was captured. The last major wombat incident was in 1987. ● Is there a state q-wombat?

Creating a Unified Model ● What we want: an HMM with entity-grid features. – We need a quick estimator for transition probabilities in the entity grid. – In the past, entity grids have worked better as conditional models...

Relaxing the Entity Grid ● The most common transition is from – to –. – The maximum likelihood document has no entities at all! ● Entities don't occur independently. – There may not be room for them all. – They 'compete' with one another.

Relaxed Entity Grid ● Assume we have already generated the set of roles we need to fill with known entities. – New entities come from somewhere else. The commercial pilot , sole occupant of the airplane , was not injured. new noun: owner The ? was owned and operated by a private ?

Filling Roles with Known Entities ● P(entity e fills role j | j, histories of known entities) – history: roles in previous sentences – known entity: has occurred before in document ● Still hard to estimate because of sparsity. – Too many combinations of histories. ● Normalize: P(entity e fills role j | j, history of entity e ) ● Much easier to estimate!

Graphical Model State q i = 1 q i Known Entities New entities ... E i = 1 E i W i N i Non-entities

Hidden Markov Model ● Need to lexicalize the entity grid. – States describe common words, not simply transitions. ● Back off to the unlexicalized version. ● Also generate the other words of the sentence (unigram language models): – Words that aren't entities. – First occurrences of entities.

Learning the HMM ● We used Gibbs sampling to fit: – Transition probabilities. – Number of states. ● Number of states heavily dependent on the backoff constants. ● We aimed for about 40-50 states. – As in Barzilay and Lee.

Has This Been Done Before? ● Soricut and Marcu '06: – Mixture model with HMM, entity grid and word-to-word (IBM) components. – Results are as good as ours. ● Didn't do joint learning, just fit mixture weights. – Less explanatory power. ● Uses more information (ngrams and IBM). – Might be improved by adding our model.

Airplane (NTSB) Corpus ● Traditional for this task. – 100 test, 100 train. ● Short (avg. 11.5 sents) press releases on airplane emergencies. ● A bit artificial: – 40% begin: “This is preliminary information, subject to change, and may contain errors. Any errors in this report will be corrected when the final report has been completed.”

Discriminative Task ● 20 random permutations per document: 2000 tests. ● Binary judgement between Sentence 2 Sentence 1 random permutation and Sentence 4 original document. Sentence 3 ● Local models do well. VS Sentence 1 Sentence 2 Sentence 3 Sentence 4

Results Airplane Test Discriminative (%) Barzilay and Lapata (SVM EGrid) 90 Barzilay and Lee (HMM) 74 Soricut and Marcu (Mixture) - Unified (Relaxed EGrid/HMM) 94

Ordering Task ● Used simulated annealing to find optimal orderings. ● Score: similarity to original ordering. τ = 1 Kendall's τ metric: Sentence ? Sentence 1 -1 (worst) to 1 (best). ~ # of pairwise swaps. Sentence ? Sentence 2 Sentence ? Sentence 3 Sentence ? Sentence 4

Results Airplane Test Kendall's τ Barzilay and Lapata (SVM EGrid) - Barzilay and Lee (HMM) 0.44 Soricut and Marcu (Mixture) 0.50 Unified (Relaxed EGrid/HMM) 0.50

Relaxed Entity Grid Airplane Development τ Discr. (%) Generative EGrid 0.17 81 Relaxed EGrid 0.02 87 Unified (Generative EGrid/HMM) 0.39 85 Unified (Relaxed EGrid/HMM) 0.54 96

What We Did ● Explained strengths of local and global models. ● Proposed a new generative entity grid model. ● Built a unified model with joint local and global features. – Improves on purely local or global approaches. – Comparable to state-of-the-art.

What To Do Next ● Escape from the airplane corpus! – Too constrained and artificial. – Real documents have more complex syntax and lexical choices. ● Longer documents pose challenges: – Current algorithms aren't scalable. – Neither are evaluation metrics.

Acknowledgements Couldn't have done it without: ● Regina Barzilay (code, data, advice & support) ● Mirella Lapata (code, advice) ● BLLIP (comments & criticism) ● Tom Griffiths & Sharon Goldwater (Bayes) ● DARPA GALE ($$) ● Karen T. Romer Foundation ($$)

A Unified Local and Global Model for Discourse Coherence Micha - PowerPoint PPT Presentation

A Unified Local and Global Model for Discourse Coherence Micha Elsner, Joseph Austerweil, Eugene Charniak Brown Laboratory for Linguistic Information Processing (BLLIP) Coherence Ranking Sentence 4 Sentence 1 A+! Sentence 3 Sentence 2

Discourse Coherence Lecture Plan: Einf uhrung in Pragmatik Discourse cohesion and

Coherence Intuition that the parts of a discourse hang together Local coherence: Consecutive

Discourse structure and coherence Christopher Potts CS 244U: Natural language understanding Mar

Computational Models of Discourse Regina Barzilay MIT What is Discourse? What is Discourse?

Presentation Karaoke Discourse Coherence Theories and Modelling Department of Computational

Computational Discourse 11-711 Algorithms for NLP 15 November 2018 What Is Discourse? Discourse

Computational Discourse 11-711 Algorithms for NLP 31 October 2019 What Is Discourse? Discourse

Coherence Coherence Coherence Holography Recording Holography Recording Let the object

Ti Ti Tiny Directory Tiny Directory Di Di t t Making Coherence Tracking Making Coherence

Discourse Coherence: Concurrent Explicit and Implicit Relations Hannah Rohde, Alexander Johnson,

Automatically Evaluating Text Coherence Using Discourse Relations Ziheng Lin , Hwee Tou Ng and

Einf uhrung in Pragmatik und Texttheorie Discourse Coherence Ivana Kruijff-Korbayov a

Computational Models of Discourse: Introduction to Discourse: Coherence and Cohesion, Lexical

What is discourse? An Introduction M.Sc. Seminar: Discourse Coherence Theories and Modeling

Discourse Structure & Wrap-up: Q-A Ling571 Deep Processing Techniques for NLP March 8, 2017

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

KSU Teams QA System for World History Exams at the NTCIR-13 QA Lab-3 Task Tasuku Kimura, Ryo

Model-based Visual Tracking: the OpenTL framework Giorgio Panin Technische Universitt Mnchen

Fusion of RTK GNSS receiver and IMU for accurate vehicle tracking Shenghong Li, Mark Hedley,

Sensor Fusion: A Technique To Improve Location Systems E-NEXT WG1 TF Thomas King University of

Guidance for English Slides Refer to year group planning for the sequence of lessons and

Truth Conditional Meaning of Sentences Ling324 Reading: Meaning and Grammar , pg. 69-87 Meaning

A Formal Classification of Pathological Satisfaction Classes Alexander Jones University of

Propositional Logic A: Syntax & Semantics CS171, Summer 1 Quarter, 2019 Introduction to

A Unified Local and Global Model for Discourse Coherence Micha - PowerPoint PPT Presentation

A Unified Local and Global Model for Discourse Coherence Micha Elsner, Joseph Austerweil, Eugene Charniak Brown Laboratory for Linguistic Information Processing (BLLIP) Coherence Ranking Sentence 4 Sentence 1 A+! Sentence 3 Sentence 2

Discourse Coherence Lecture Plan: Einf uhrung in Pragmatik Discourse cohesion and

Coherence Intuition that the parts of a discourse hang together Local coherence: Consecutive

Discourse structure and coherence Christopher Potts CS 244U: Natural language understanding Mar

Computational Models of Discourse Regina Barzilay MIT What is Discourse? What is Discourse?

Presentation Karaoke Discourse Coherence Theories and Modelling Department of Computational

Computational Discourse 11-711 Algorithms for NLP 15 November 2018 What Is Discourse? Discourse

Computational Discourse 11-711 Algorithms for NLP 31 October 2019 What Is Discourse? Discourse

Coherence Coherence Coherence Holography Recording Holography Recording Let the object

Ti Ti Tiny Directory Tiny Directory Di Di t t Making Coherence Tracking Making Coherence

Discourse Coherence: Concurrent Explicit and Implicit Relations Hannah Rohde, Alexander Johnson,

Automatically Evaluating Text Coherence Using Discourse Relations Ziheng Lin , Hwee Tou Ng and

Einf uhrung in Pragmatik und Texttheorie Discourse Coherence Ivana Kruijff-Korbayov a

Computational Models of Discourse: Introduction to Discourse: Coherence and Cohesion, Lexical

What is discourse? An Introduction M.Sc. Seminar: Discourse Coherence Theories and Modeling

Discourse Structure &amp; Wrap-up: Q-A Ling571 Deep Processing Techniques for NLP March 8, 2017

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

KSU Teams QA System for World History Exams at the NTCIR-13 QA Lab-3 Task Tasuku Kimura, Ryo

Model-based Visual Tracking: the OpenTL framework Giorgio Panin Technische Universitt Mnchen

Fusion of RTK GNSS receiver and IMU for accurate vehicle tracking Shenghong Li*, Mark Hedley*,

Sensor Fusion: A Technique To Improve Location Systems E-NEXT WG1 TF Thomas King University of

Guidance for English Slides Refer to year group planning for the sequence of lessons and

Truth Conditional Meaning of Sentences Ling324 Reading: Meaning and Grammar , pg. 69-87 Meaning

A Formal Classification of Pathological Satisfaction Classes Alexander Jones University of

Propositional Logic A: Syntax &amp; Semantics CS171, Summer 1 Quarter, 2019 Introduction to

Discourse Structure & Wrap-up: Q-A Ling571 Deep Processing Techniques for NLP March 8, 2017

Fusion of RTK GNSS receiver and IMU for accurate vehicle tracking Shenghong Li, Mark Hedley,

Propositional Logic A: Syntax & Semantics CS171, Summer 1 Quarter, 2019 Introduction to