a unified local and global model for discourse coherence
play

A Unified Local and Global Model for Discourse Coherence Micha - PowerPoint PPT Presentation

A Unified Local and Global Model for Discourse Coherence Micha Elsner, Joseph Austerweil, Eugene Charniak Brown Laboratory for Linguistic Information Processing (BLLIP) Coherence Ranking Sentence 4 Sentence 1 A+! Sentence 3 Sentence 2


  1. A Unified Local and Global Model for Discourse Coherence Micha Elsner, Joseph Austerweil, Eugene Charniak Brown Laboratory for Linguistic Information Processing (BLLIP)

  2. Coherence Ranking Sentence 4 Sentence 1 A+! Sentence 3 Sentence 2 Sentence 1 Sentence 3 Sentence 2 Sentence 4 Sentence 2 Sentence 2 B Sentence 1 Sentence 1 Sentence 4 Sentence 4 Sentence 3 Sentence 3 Sentence 1 Sentence 4 C Sentence 2 Sentence 3 Sentence 3 Sentence 1 Sentence 4 Sentence 2 Ranked Proposed Orderings Orderings

  3. Sentence Ordering Sentence ? Sentence 1 Sentence ? Sentence 2 Data Sentence ? Sentence 3 Source Sentence ? Sentence 4 Bag of Sentences Ordered Document

  4. Overview ● Previous Work: Entity Grids ● Previous Work: Hidden Markov Model ● Relaxed Entity Grid ● Unified Hidden Markov Model ● Corpus and Experiments ● Conclusions and Future Work

  5. An Entity Grid Barzilay and Lapata '05, Lapata and Barzilay '05. The commercial pilot , sole occupant of the airplane , was not injured. The airplane was owned and operated by a private owner . Visual meteorological conditions prevailed for the personal cross country flight for which a VFR flight plan was filed. The flight originated at Nuevo Laredo , Mexico , at approximately 1300. O C A o c c n L i O r u a p d F w p r l i P a t l P e i a i n g o l n i d a n l e h n o e o n t r t t 0 - X - - O - - X Syntactic 1 - O - - - - X - Role in 2 O - S X - - - - Sentence 3 - - - S - X - -

  6. Local Coherence: Entity Grids ● Loosely based on Centering Theory. – Coherent texts repeat important nouns . ● Grid shows most prominent role of each head noun in each sentence. C A o n i r d p A transition from X to O. F i l t a P i l o i n g l n a e h (Here the history size is 1, n but 2 works better.) - X - - O - O - S - - -

  7. Computing with Entity Grids ● Generatively: Lapata and Barzilay. – Assume independence between columns. ● This independence assumption C o A n can cause problems for the i r d p i F l t a i P l o i n g l n a e generative approach. h n - X - – Barzilay and Lapata get - O - better results with SVMs. O - S - - - ∏ ∏ ∏ ... Π

  8. Computing with Entity Grids ● Generatively: Lapata and Barzilay. – Assume independence between columns. ● This independence assumption C o A n can cause problems for the i r d p i F l t a i P l o i n g l n a e generative approach. h n - X - – Barzilay and Lapata get - O - better results with SVMs. O - S - - - ∏ ∏ ∏ ... Π

  9. Computing with Entity Grids ● Generatively: Lapata and Barzilay. – Assume independence between columns. ● This independence assumption C o A n can cause problems for the i r d p i F l t a i P l o i n g l n a e generative approach. h n - X - – Barzilay and Lapata get - O - better results with SVMs. O - S - - - ∏ ∏ ∏ ... Π

  10. Entity Grids Model Local Coherence A coherent entity grid at very low zoom: entities occur in long contiguous columns. A grid for a randomly permuted document tends to look like this. But what if we flip it? Or move around paragraphs?

  11. Overview ● Previous Work: Entity Grids ● Previous Work: Hidden Markov Model ● Relaxed Entity Grid ● Unified Hidden Markov Model ● Corpus and Experiments ● Conclusions and Future Work

  12. Markov Model ● Barzilay and Lee q i 2004, “Catching the q i = 1 Drift” ● Hidden Markov the pilot Model for document received structure. minor ● Each state generates injuries sentences from another HMM.

  13. Global Coherence ● The HMM is good at learning overall document structure: – Finding the start, end and boundaries. ● But all local information has to be stored in the state variable. – Creates problems with sparsity. A wombat escaped from the cargo bay. Finally the wombat was captured. The last major wombat incident was in 1987. ● Is there a state q-wombat?

  14. Creating a Unified Model ● What we want: an HMM with entity-grid features. – We need a quick estimator for transition probabilities in the entity grid. – In the past, entity grids have worked better as conditional models...

  15. Overview ● Previous Work: Entity Grids ● Previous Work: Hidden Markov Model ● Relaxed Entity Grid ● Unified Hidden Markov Model ● Corpus and Experiments ● Conclusions and Future Work

  16. Relaxing the Entity Grid ● The most common transition is from – to –. – The maximum likelihood document has no entities at all! ● Entities don't occur independently. – There may not be room for them all. – They 'compete' with one another.

  17. Relaxed Entity Grid ● Assume we have already generated the set of roles we need to fill with known entities. – New entities come from somewhere else. The commercial pilot , sole occupant of the airplane , was not injured. new noun: owner The ? was owned and operated by a private ?

  18. Filling Roles with Known Entities ● P(entity e fills role j | j, histories of known entities) – history: roles in previous sentences – known entity: has occurred before in document ● Still hard to estimate because of sparsity. – Too many combinations of histories. ● Normalize: P(entity e fills role j | j, history of entity e ) ● Much easier to estimate!

  19. Overview ● Previous Work: Entity Grids ● Previous Work: Hidden Markov Model ● Relaxed Entity Grid ● Unified Hidden Markov Model ● Corpus and Experiments ● Conclusions and Future Work

  20. Graphical Model State q i = 1 q i Known Entities New entities ... E i = 1 E i W i N i Non-entities

  21. Hidden Markov Model ● Need to lexicalize the entity grid. – States describe common words, not simply transitions. ● Back off to the unlexicalized version. ● Also generate the other words of the sentence (unigram language models): – Words that aren't entities. – First occurrences of entities.

  22. Learning the HMM ● We used Gibbs sampling to fit: – Transition probabilities. – Number of states. ● Number of states heavily dependent on the backoff constants. ● We aimed for about 40-50 states. – As in Barzilay and Lee.

  23. Has This Been Done Before? ● Soricut and Marcu '06: – Mixture model with HMM, entity grid and word-to-word (IBM) components. – Results are as good as ours. ● Didn't do joint learning, just fit mixture weights. – Less explanatory power. ● Uses more information (ngrams and IBM). – Might be improved by adding our model.

  24. Overview ● Previous Work: Entity Grids ● Previous Work: Hidden Markov Model ● Relaxed Entity Grid ● Unified Hidden Markov Model ● Corpus and Experiments ● Conclusions and Future Work

  25. Airplane (NTSB) Corpus ● Traditional for this task. – 100 test, 100 train. ● Short (avg. 11.5 sents) press releases on airplane emergencies. ● A bit artificial: – 40% begin: “This is preliminary information, subject to change, and may contain errors. Any errors in this report will be corrected when the final report has been completed.”

  26. Discriminative Task ● 20 random permutations per document: 2000 tests. ● Binary judgement between Sentence 2 Sentence 1 random permutation and Sentence 4 original document. Sentence 3 ● Local models do well. VS Sentence 1 Sentence 2 Sentence 3 Sentence 4

  27. Results Airplane Test Discriminative (%) Barzilay and Lapata (SVM EGrid) 90 Barzilay and Lee (HMM) 74 Soricut and Marcu (Mixture) - Unified (Relaxed EGrid/HMM) 94

  28. Ordering Task ● Used simulated annealing to find optimal orderings. ● Score: similarity to original ordering. τ = 1 Kendall's τ metric: Sentence ? Sentence 1 -1 (worst) to 1 (best). ~ # of pairwise swaps. Sentence ? Sentence 2 Sentence ? Sentence 3 Sentence ? Sentence 4

  29. Results Airplane Test Kendall's τ Barzilay and Lapata (SVM EGrid) - Barzilay and Lee (HMM) 0.44 Soricut and Marcu (Mixture) 0.50 Unified (Relaxed EGrid/HMM) 0.50

  30. Relaxed Entity Grid Airplane Development τ Discr. (%) Generative EGrid 0.17 81 Relaxed EGrid 0.02 87 Unified (Generative EGrid/HMM) 0.39 85 Unified (Relaxed EGrid/HMM) 0.54 96

  31. Overview ● Previous Work: Entity Grids ● Previous Work: Hidden Markov Model ● Relaxed Entity Grid ● Unified Hidden Markov Model ● Corpus and Experiments ● Conclusions and Future Work

  32. What We Did ● Explained strengths of local and global models. ● Proposed a new generative entity grid model. ● Built a unified model with joint local and global features. – Improves on purely local or global approaches. – Comparable to state-of-the-art.

  33. What To Do Next ● Escape from the airplane corpus! – Too constrained and artificial. – Real documents have more complex syntax and lexical choices. ● Longer documents pose challenges: – Current algorithms aren't scalable. – Neither are evaluation metrics.

  34. Acknowledgements Couldn't have done it without: ● Regina Barzilay (code, data, advice & support) ● Mirella Lapata (code, advice) ● BLLIP (comments & criticism) ● Tom Griffiths & Sharon Goldwater (Bayes) ● DARPA GALE ($$) ● Karen T. Romer Foundation ($$)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend