A Unified Local and Global Model for Discourse Coherence Micha - - PowerPoint PPT Presentation

a unified local and global model for discourse coherence
SMART_READER_LITE
LIVE PREVIEW

A Unified Local and Global Model for Discourse Coherence Micha - - PowerPoint PPT Presentation

A Unified Local and Global Model for Discourse Coherence Micha Elsner, Joseph Austerweil, Eugene Charniak Brown Laboratory for Linguistic Information Processing (BLLIP) Coherence Ranking Sentence 4 Sentence 1 A+! Sentence 3 Sentence 2


slide-1
SLIDE 1

A Unified Local and Global Model for Discourse Coherence

Micha Elsner, Joseph Austerweil, Eugene Charniak Brown Laboratory for Linguistic Information Processing (BLLIP)

slide-2
SLIDE 2

Coherence Ranking

Sentence 4 Sentence 3 Sentence 1 Sentence 2 Sentence 2 Sentence 1 Sentence 4 Sentence 3 Sentence 1 Sentence 2 Sentence 3 Sentence 4 Proposed Orderings Sentence 1 Sentence 2 Sentence 3 Sentence 4 Sentence 2 Sentence 1 Sentence 4 Sentence 3 Sentence 4 Sentence 3 Sentence 1 Sentence 2

A+! B C

Ranked Orderings

slide-3
SLIDE 3

Sentence Ordering

Data Source

Bag of Sentences Sentence ? Sentence ? Sentence ? Sentence ? Sentence 1 Sentence 2 Sentence 3 Sentence 4 Ordered Document

slide-4
SLIDE 4

Overview

  • Previous Work: Entity Grids
  • Previous Work: Hidden Markov Model
  • Relaxed Entity Grid
  • Unified Hidden Markov Model
  • Corpus and Experiments
  • Conclusions and Future Work
slide-5
SLIDE 5

An Entity Grid

Barzilay and Lapata '05, Lapata and Barzilay '05.

The commercial pilot, sole occupant of the airplane, was not injured. The airplane was owned and operated by a private owner. Visual meteorological conditions prevailed for the personal cross country flight for which a VFR flight plan was filed. The flight originated at Nuevo Laredo , Mexico , at approximately 1300.

  • X
  • O
  • X

1

  • O
  • X
  • 2

O

  • S

X

  • 3
  • S
  • X
  • P

l a n A i r p l a n e C

  • n

d i t i

  • n

F l i g h t P i l

  • t

L a r e d

  • O

w n e r O c c u p a n t

Syntactic Role in Sentence

slide-6
SLIDE 6

Local Coherence: Entity Grids

  • Loosely based on Centering Theory.

– Coherent texts repeat important nouns.

  • Grid shows most prominent role of each head

noun in each sentence.

  • X
  • O
  • O
  • S
  • P

l a n A i r p l a n e C

  • n

d i t i

  • n

F l i g h

A transition from X to O. (Here the history size is 1, but 2 works better.)

slide-7
SLIDE 7

Computing with Entity Grids

  • Generatively: Lapata and Barzilay.

– Assume independence between columns.

  • X
  • O
  • O
  • S

∏ ∏

P l a n A i r p l a n e C

  • n

d i t i

  • n

F l i g h

Π

...

  • This independence assumption

can cause problems for the generative approach.

– Barzilay and Lapata get

better results with SVMs.

slide-8
SLIDE 8

Computing with Entity Grids

  • Generatively: Lapata and Barzilay.

– Assume independence between columns.

  • X
  • O
  • O
  • S

∏ ∏

P l a n A i r p l a n e C

  • n

d i t i

  • n

F l i g h

Π

...

  • This independence assumption

can cause problems for the generative approach.

– Barzilay and Lapata get

better results with SVMs.

slide-9
SLIDE 9

Computing with Entity Grids

  • Generatively: Lapata and Barzilay.

– Assume independence between columns.

  • X
  • O
  • O
  • S

∏ ∏

P l a n A i r p l a n e C

  • n

d i t i

  • n

F l i g h

Π

...

  • This independence assumption

can cause problems for the generative approach.

– Barzilay and Lapata get

better results with SVMs.

slide-10
SLIDE 10

Entity Grids Model Local Coherence

A coherent entity grid at very low zoom: entities occur in long contiguous columns. A grid for a randomly permuted document tends to look like this. But what if we flip it? Or move around paragraphs?

slide-11
SLIDE 11

Overview

  • Previous Work: Entity Grids
  • Previous Work: Hidden Markov Model
  • Relaxed Entity Grid
  • Unified Hidden Markov Model
  • Corpus and Experiments
  • Conclusions and Future Work
slide-12
SLIDE 12

Markov Model

the received minor injuries

  • Barzilay and Lee

2004, “Catching the Drift”

  • Hidden Markov

Model for document structure.

  • Each state generates

sentences from another HMM.

pilot

qi qi=1

slide-13
SLIDE 13

Global Coherence

  • The HMM is good at learning overall document

structure:

– Finding the start, end and boundaries.

  • But all local information has to be stored in the

state variable.

– Creates problems with sparsity.

A wombat escaped from the cargo bay. Finally the wombat was captured. The last major wombat incident was in 1987.

  • Is there a state q-wombat?
slide-14
SLIDE 14

Creating a Unified Model

  • What we want: an HMM with entity-grid

features.

– We need a quick estimator for transition

probabilities in the entity grid.

– In the past, entity grids have worked better as

conditional models...

slide-15
SLIDE 15

Overview

  • Previous Work: Entity Grids
  • Previous Work: Hidden Markov Model
  • Relaxed Entity Grid
  • Unified Hidden Markov Model
  • Corpus and Experiments
  • Conclusions and Future Work
slide-16
SLIDE 16

Relaxing the Entity Grid

  • The most common transition is from – to –.

– The maximum likelihood document has no entities

at all!

  • Entities don't occur independently.

– There may not be room for them all. – They 'compete' with one another.

slide-17
SLIDE 17

Relaxed Entity Grid

  • Assume we have already generated the set of

roles we need to fill with known entities.

– New entities come from somewhere else.

The commercial pilot, sole occupant of the airplane, was not injured. The ? was owned and operated by a private ? new noun: owner

slide-18
SLIDE 18

Filling Roles with Known Entities

  • P(entity e fills role j | j, histories of known entities)

– history: roles in previous sentences – known entity: has occurred before in document

  • Still hard to estimate because of sparsity.

– Too many combinations of histories.

  • Normalize:

P(entity e fills role j | j, history of entity e)

  • Much easier to estimate!
slide-19
SLIDE 19

Overview

  • Previous Work: Entity Grids
  • Previous Work: Hidden Markov Model
  • Relaxed Entity Grid
  • Unified Hidden Markov Model
  • Corpus and Experiments
  • Conclusions and Future Work
slide-20
SLIDE 20

Graphical Model

N i W i Ei qi qi=1 Ei=1 State Known Entities New entities Non-entities

...

slide-21
SLIDE 21

Hidden Markov Model

  • Need to lexicalize the entity grid.

– States describe common words, not simply

transitions.

  • Back off to the unlexicalized version.
  • Also generate the other words of the sentence

(unigram language models):

– Words that aren't entities. – First occurrences of entities.

slide-22
SLIDE 22

Learning the HMM

  • We used Gibbs sampling to fit:

– Transition probabilities. – Number of states.

  • Number of states heavily dependent on the

backoff constants.

  • We aimed for about 40-50 states.

– As in Barzilay and Lee.

slide-23
SLIDE 23

Has This Been Done Before?

  • Soricut and Marcu '06:

– Mixture model with HMM, entity grid and

word-to-word (IBM) components.

– Results are as good as ours.

  • Didn't do joint learning, just fit mixture weights.

– Less explanatory power.

  • Uses more information (ngrams and IBM).

– Might be improved by adding our model.

slide-24
SLIDE 24

Overview

  • Previous Work: Entity Grids
  • Previous Work: Hidden Markov Model
  • Relaxed Entity Grid
  • Unified Hidden Markov Model
  • Corpus and Experiments
  • Conclusions and Future Work
slide-25
SLIDE 25

Airplane (NTSB) Corpus

  • Traditional for this task.

– 100 test, 100 train.

  • Short (avg. 11.5 sents) press releases on

airplane emergencies.

  • A bit artificial:

– 40% begin: “This is preliminary information, subject

to change, and may contain errors. Any errors in this report will be corrected when the final report has been completed.”

slide-26
SLIDE 26

Discriminative Task

  • 20 random permutations per document: 2000

tests.

Sentence 2 Sentence 1 Sentence 4 Sentence 3 Sentence 1 Sentence 2 Sentence 3 Sentence 4

VS

  • Binary judgement between

random permutation and

  • riginal document.
  • Local models do well.
slide-27
SLIDE 27

Results

Airplane Test Discriminative (%) Barzilay and Lapata (SVM EGrid) 90 Barzilay and Lee (HMM) 74 Soricut and Marcu (Mixture)

  • Unified (Relaxed EGrid/HMM)

94

slide-28
SLIDE 28

Ordering Task

  • Used simulated annealing to find optimal
  • rderings.
  • Score: similarity to original ordering.

Kendall's τ metric:

  • 1 (worst) to 1 (best).

~ # of pairwise swaps. τ = 1

Sentence ? Sentence ? Sentence ? Sentence ? Sentence 1 Sentence 2 Sentence 3 Sentence 4

slide-29
SLIDE 29

Results

Airplane Test Kendall's τ Barzilay and Lapata (SVM EGrid)

  • Barzilay and Lee (HMM)

0.44 Soricut and Marcu (Mixture) 0.50 Unified (Relaxed EGrid/HMM) 0.50

slide-30
SLIDE 30

Relaxed Entity Grid

Airplane Development τ

  • Discr. (%)

Generative EGrid 0.17 81 Relaxed EGrid 0.02 87 Unified (Generative EGrid/HMM) 0.39 85 Unified (Relaxed EGrid/HMM) 0.54 96

slide-31
SLIDE 31

Overview

  • Previous Work: Entity Grids
  • Previous Work: Hidden Markov Model
  • Relaxed Entity Grid
  • Unified Hidden Markov Model
  • Corpus and Experiments
  • Conclusions and Future Work
slide-32
SLIDE 32

What We Did

  • Explained strengths of local and global models.
  • Proposed a new generative entity grid model.
  • Built a unified model with joint local and global

features.

– Improves on purely local or global approaches. – Comparable to state-of-the-art.

slide-33
SLIDE 33

What To Do Next

  • Escape from the airplane corpus!

– Too constrained and artificial. – Real documents have more complex syntax and

lexical choices.

  • Longer documents pose challenges:

– Current algorithms aren't scalable. – Neither are evaluation metrics.

slide-34
SLIDE 34

Acknowledgements

  • Regina Barzilay (code, data, advice & support)
  • Mirella Lapata (code, advice)
  • BLLIP (comments & criticism)
  • Tom Griffiths & Sharon Goldwater (Bayes)
  • DARPA GALE ($$)
  • Karen T. Romer Foundation ($$)

Couldn't have done it without: