Entity-based Coherence: Going Off the Grid Micha Elsner Elsner, - - PowerPoint PPT Presentation

entity based coherence going off the grid
SMART_READER_LITE
LIVE PREVIEW

Entity-based Coherence: Going Off the Grid Micha Elsner Elsner, - - PowerPoint PPT Presentation

Entity-based Coherence: Going Off the Grid Micha Elsner Elsner, Austerweil, Charniak: NAACL '07 (Unified Model of Local and Global Coherence) Elsner, Charniak: ACL '08 (Coreference-inspired Coherence Modeling) Text! Sir Walter Elliot, of


slide-1
SLIDE 1

Entity-based Coherence: Going Off the Grid

Micha Elsner

Elsner, Austerweil, Charniak: NAACL '07 (Unified Model of Local and Global Coherence) Elsner, Charniak: ACL '08 (Coreference-inspired Coherence Modeling)

slide-2
SLIDE 2

2

Text!

  • Sir Walter Elliot, of Kellynch Hall, in

Somersetshire, was a man who never took up any book but the Baronetage.

  • Sir Walter had improved it by adding the day he

had lost his wife.

  • There followed the history of the ancient family.
  • Vanity was the beginning and end of Sir Walter

Elliot's character.

  • He had been remarkably handsome in his

youth.

slide-3
SLIDE 3

3

Coherence

  • Consistent topic.
  • Earlier sentences provide context for later ones.

Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who never took up any book but the Baronetage. Sir Walter had improved it by adding the day he had lost his wife.

slide-4
SLIDE 4

4

Coherence

  • Consistent topic.
  • Earlier sentences provide context for later ones.

Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who never took up any book but the Baronetage. Sir Walter had improved it by adding the day he had lost his wife.

  • Not:

Sir Walter had improved it by adding the day he had lost his wife. He had been remarkably handsome in his youth.

slide-5
SLIDE 5

5

Applications

  • Create coherent text:

– Summarize – Add new facts

  • Evaluate texts:

– Essay scoring

  • Understand text pragmatics:

– Coreference – Topicality

slide-6
SLIDE 6

6

Our Approach: Entities

  • Objects in the world:
  • Referred to in language:

Sir Walter Elliot Sir Walter He

  • Coherence: what gets mentioned, and how.

– Other approaches: lexical, rhetorical.

slide-7
SLIDE 7

7

Overview

  • Evaluation Tasks
  • Previous: Entity Grids
  • Topics
  • Referring Expressions
  • Open Problems...
slide-8
SLIDE 8

8

Overview

  • Evaluation Tasks
  • Previous: Entity Grids
  • Topics
  • Referring Expressions
  • Open Problems...
slide-9
SLIDE 9

9

Corpora

  • Airplane

– Reports of plane crashes – Short (11 ss) – Stereotyped: 40% begin “This is preliminary

information”

  • WSJ

– Standard news corpus – Longer (25 ss) – More natural syntax

slide-10
SLIDE 10

10

Discriminative task

  • Binary judgement between random permutation

and original document.

Sentence 2 Sentence 1 Sentence 4 Sentence 3 Sentence 1 Sentence 2 Sentence 3 Sentence 4

VS

  • Fast, convenient test.
  • Longer documents are much

easier!

  • F-score (classifier can abstain).

Barzilay+Lapata '05

slide-11
SLIDE 11

11

Insertion task

  • Remove and re-insert one sentence at a time.
  • Examines permutations closer to the original
  • rdering.

– Hard even for long documents. – Report percent exactly correct.

Sentence Sentence Sentence Sentence New Sentence

?

Chen+Snyder+Barzilay '07 Elsner+Charniak '07

slide-12
SLIDE 12

12

Sentence Ordering

Data Source

Bag of Sentences Sentence ? Sentence ? Sentence ? Sentence ? Sentence 1 Sentence 2 Sentence 3 Sentence 4 Ordered Document

slide-13
SLIDE 13

13

Ordering metric

  • Kendall's Tau (rank ordering distance)
  • Counts pairwise swaps
  • No concept of structure

– Moving a paragraph vs. moving sentences – Good for short documents (Lapata 2006)

slide-14
SLIDE 14

14

Overview

  • Evaluation Tasks
  • Previous: Entity Grids
  • Topics
  • Referring Expressions
  • Conclusion
  • Open Problems...
slide-15
SLIDE 15

15

Entity Grid

Lapata+Barzilay '05

W a l t e r H a l l S

  • m

e r s e t B t . a g e d a y w i f e h i s t

  • r

y f a m i l y

Entities in text (NPs)

slide-16
SLIDE 16

16

Entity Grid

Lapata+Barzilay '05

1 S X X O W a l t e r H a l l S

  • m

e r s e t B t . a g e d a y w i f e h i s t

  • r

y f a m i l y

Entities in text (NPs)

Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who never took up any book but the Baronetage. Sentence #:

slide-17
SLIDE 17

17

Entity Grid

Lapata+Barzilay '05

Entities in text (NPs)

Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who never took up any book but the Baronetage. Sir Walter had improved it by adding the day he had lost his wife. There followed the history of the ancient family. Sentence #: 1

S X X O 2 S O X O 3 X X W a l t e r H a l l S

  • m

e r s e t B t . a g e d a y w i f e h i s t

  • r

y f a m i l y

slide-18
SLIDE 18

18

Local coherence

Very low zoom: entities in long contiguous columns. A randomly permuted document: Backwards? Move the paragraphs?

slide-19
SLIDE 19

19

Independence assumptions

  • Grid entities:

independent!

S S O X Walter wife family

  • Real entities:

topically related.

slide-20
SLIDE 20

20

Referring expressions

  • NPs treated as transparent:

Sir Walter Elliot Elliot both handled the same.

  • 'Same head' heuristic to fake coreference.

– About 2/3 accurate (Poesio+Vieira).

slide-21
SLIDE 21

21

Results

Airplane Disc (%) Ordering (τ) Barzilay+Lapata (EGrid) 90 Generative EGrid 81 0.17 WSJ Disc (F) Ins (prec) Generative EGrid 73 18.1

  • % vs. F: roughly equivalent here
  • Good discrimination, poor ordering.
slide-22
SLIDE 22

22

Overview

  • Evaluation Tasks
  • Previous: Entity Grids
  • Topics
  • Referring Expressions
  • Open Problems...
slide-23
SLIDE 23

23

Markov Model

the received minor injuries

  • Hidden Markov

Model for document structure.

  • Language model for

each state.

pilot

qi qi=1

Barzilay and Lee 2004

slide-24
SLIDE 24

24

Global Coherence

  • HMM learns overall document structure:

– Start, end, topic shift.

  • All local information stored in the state.

– Sparsity issues.

A wombat escaped from the cargo bay. Finally the wombat was captured. The last major wombat incident was in 1987.

  • Is there a state q-wombat?
slide-25
SLIDE 25

25

Unified Model

  • HMM structure:

– States generate entities. – Back off to Entity Grid. – Also generate other words.

  • Entity Grid prior:

– Repeat entities regardless of state. – (New estimator for the entity grid; mistake in original

results.)

slide-26
SLIDE 26

26

Graphical Model

N i W i Ei qi qi=1 Ei=1 N i=1 State Known Entities New entities Non-entities

...

slide-27
SLIDE 27

27

Soricut + Marcu '06

  • Mixture model:

– HMM, entity grid and word-to-word (IBM)

components.

– Results are as good as ours.

  • No joint learning.

– No relationship between topic and grid.

  • Uses more information (ngrams and IBM).

– Might be improved by adding our model.

slide-28
SLIDE 28

28

Results

Airplane Test Disc (%) Ordering (τ) Barzilay+Lapata (EGrid) 90

  • Generative EGrid

81 0.17 Barzilay+Lee (HMM) 74 0.44 Soricut+Marcu (Mixture)

  • 0.50

Unified (Egrid/HMM) 94 0.50

VS

Airplane Corpus: short documents

slide-29
SLIDE 29

29

Overview

  • Evaluation Tasks
  • Previous: Entity Grids
  • Topics
  • Referring Expressions
  • Open Problems...
slide-30
SLIDE 30

30

Anatomy of an unfamiliar NP

Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who...

full name and title

  • Lots of linguistic markers to introduce this

guy...

– because you don't know who he is.

slide-31
SLIDE 31

31

Anatomy of an unfamiliar NP

Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who...

full name and title long phrasal modifier

  • Lots of linguistic markers to introduce this

guy...

– because you don't know who he is.

slide-32
SLIDE 32

32

Anatomy of an unfamiliar NP

Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who...

full name and title long phrasal modifier copular verb

  • Lots of linguistic markers to introduce this

guy...

– because you don't know who he is.

slide-33
SLIDE 33

33

Terminology

  • First mention of entity: discourse-new

– Usually unfamiliar: hearer-new – Hearer-new NPs typically marked.

  • Subsequent: given, discourse-old
  • Discourse-new isn't always hearer-new.

– Unique entities (the FBI)

Prince '81

slide-34
SLIDE 34

34

Lots of features!

  • Appositives: Mr. Shepherd, a civil, cautious lawyer...
  • Restrictive relative clauses: the first man to...
  • Syntactic position: subject, object &c
  • Determiner / quantifier: a (new), the (complicated!)
  • Titles and abbreviated titles:

– Sir, Professor (usually new); Prof., Inc. (usually old)

  • How many modifiers?: More implies newer.
  • Most important feature: same head occurred before?

Vieira+Poesio '00 Ng+Cardie '02 Uryupina '03 ...

slide-35
SLIDE 35

35

Previous work (classifiers)

  • Used for coreference resolution:

– Don't resolve the new NPs. – Do resolve the old ones.

  • Almost any machine learning algorithm

available...

  • All score about 85%.
  • (Relies on document being in order.)

Joint decisions: Denis+Baldridge '07 Sequential: Poesio+al '05 Ng+Cardie '02

slide-36
SLIDE 36

36

Modeling coherence

Sir Walter Elliot, of Kellynch Hall, in Somersetshire Walter Elliot Sir Walter Sir Walter Elliot Sir Walter he his himself

vs

Sir Walter Elliot, of Kellynch Hall, in Somersetshire Walter Elliot Sir Walter Sir Walter Elliot Sir Walter he his himself

slide-37
SLIDE 37

37

Now some computation...

P(chain) = Π P(np)

Where do the labels come from? Full coreference!

P(doc) = Π P(chain)

Sir Walter Elliot, of Kellynch Hall, in Somersetshire

P( )

new|

P(old|

he his

P(old| ) )

Walter Elliot

P(old| )

Sir Walter

P(old| )

himself

P(old| )

Sir Walter

P(old| )

Sir Walter Elliot

P(old| )

slide-38
SLIDE 38

38

More realistic computation...

One coreferential chain turns into two. (Bad, but survivable.) And what about the pronouns? We'll come back to them later.

Sir Walter Elliot, of Kellynch Hall, in Somersetshire

P( )

new| Walter Elliot

P(old| )

Sir Walter Elliot

P(old| ) P(old|

he his

P(old| ) )

himself

P(old| )

Sir Walter

P(new| )

Sir Walter

P(old| )

slide-39
SLIDE 39

39

What else can go wrong?

  • Not all new NPs are unfamiliar.

– Unique referents: The FBI, the Golden Gate

Bridge, Thursday

– Our technique will mislabel these.

  • We can reduce error by distinguishing three

classes: new, old, singleton

– singleton: no subsequent coreferent NPs – often look more like old than new

corpus study: Fraurud '90 classifiers: Bean+Riloff '91 Uryupina '03

slide-40
SLIDE 40

40

Results

WSJ corpus: longer documents

WSJ disc (F) ins (prec) Entity Grid 73.2 18.1 NP syntax 72.7 16.7 Grid, NP syntax 77.6 21.5

VS

?

slide-41
SLIDE 41

41

Pronoun coreference

  • Pronouns occur close after their antecedent

nouns.

Marlow sat cross-legged right aft, leaning against the mizzen-mast. He had sunken cheeks, a yellow complexion, a straight back, an ascetic aspect, and... resembled an idol. The director, satisfied the anchor had good hold, made his way aft and sat down amongst us. We exchanged a few words lazily. Afterwards there was silence on board the yacht. For some reason or other we did not begin that game of dominoes. We felt meditative, and fit for nothing but placid staring. The day was ending in a serenity of still and exquisite brilliance.

No possible antecedents here!

1 2 3 4

slide-42
SLIDE 42

42

Violations cause incoherence

Marlow sat cross-legged right aft, leaning against the mizzen-mast. The director, satisfied the anchor had good hold, made his way aft and sat down amongst us. We exchanged a few words lazily. Afterwards there was silence on board the yacht. For some reason or other we did not begin that game of

  • dominoes. We felt meditative, and fit for nothing

but placid staring. The day was ending in a serenity of still and exquisite brilliance. He had sunken cheeks, a yellow complexion, a straight back, an ascetic aspect, and... resembled an idol.

No possible antecedents here!

1 3 4 2

slide-43
SLIDE 43

43

What sort of a model?

  • Typical coreference models are conditional:

P(antecedent | text)

Marlow sat ... had sunken cheeks... He P(Marlow | he) = .99

  • Probability of linking the pronoun to each

available referent.

  • High for unambiguous texts...
slide-44
SLIDE 44

44

What sort of a model?

  • Typical coreference models are conditional:

P(antecedent | text)

Marlow sat ... had sunken cheeks... He

P(Marlow | he) = .99 (still!)

We exchanged a few words lazily. There was silence on board the yacht.

P(words | he) ≈ 0 P(yacht | he) ≈ 0

slide-45
SLIDE 45

45

Generative coreference

  • Not only tell good coreference assignments

from bad ones...

  • But good texts from bad ones.

– So we need P(text | antecedent)

  • Luckily we can do that (sort of)...

– Ge+Hale+Charniak '98 – Charniak+Elsner '09 (talk Thursday!)

slide-46
SLIDE 46

46

Results

  • Improvements continue...

– On its own, this model is not as strong as the

syntactic one.

WSJ disc (F) ins (prec) Entity Grid 73.2 18.1 Pronoun 63.1 13.9 EG, NP 77.6 21.5 EG, NP, Prn 78.2 22.7

slide-47
SLIDE 47

47

Overview

  • Evaluation Tasks
  • Previous: Entity Grids
  • Topics
  • Referring Expressions
  • Open Problems...
slide-48
SLIDE 48

48

Topic model revisited

  • Longer documents, larger domains:

– Still use one state per topic? – How fine-grained are topics? – How predictable are transitions?

  • Ordering task:

– Doesn't scale! – Hierarchical ordering? – How do we score?

slide-49
SLIDE 49

49

Referring expressions

  • Full coreference?

– Generative models now exist (Haghighi+Klein '07)

  • Reference rewriting:

– Related task. – Nenkova+McKeown '03; now a shared task:

Belz+Gatt, Generation Challenge '08,'09

  • Inferrables:

– Mention of one entity (Sir Walter) allows definite

description for another (the family)

slide-50
SLIDE 50

50

Ordering paradigm

  • Coherence: not just ordering!
  • Relationship to general readability?

– Sentence structure – Word choice

  • Realistic source of incoherence...

– Better than permuted documents.

slide-51
SLIDE 51

51

Conclusion

  • Started with Entity Grid (prev. work)
  • Added:

– Topic (HMM) – Disc-new NP detection – Pronoun coreference

  • Much still to do...
slide-52
SLIDE 52

52

Other research: chat

Does anyone here shave their head? I shave part of my head. A tonsure? Nope, I only shave the chin. How do I limit the speed of my internet connection? Use dialup!

How do you cluster the different conversations?

Elsner+Charniak ACL '08

slide-53
SLIDE 53

53

Other research: named entities

President Bill Clinton President Clinton Bill

  • Mr. Clinton

Secretary Hillary Clinton Hillary Rodham Clinton Hillary Clinton

  • Ms. Clinton

Clinton Unsupervised learning: Which references go together? What is their structure?

Elsner+Charniak+Johnson, NAACL '09

slide-54
SLIDE 54

54

Other research: copyediting

  • How does editing change a document?
  • Do our models have the same preferences as

editors?

California voters will get a chance in a November vote have the chance to vote in November

  • n whether to end gay marriage...
slide-55
SLIDE 55

55

Thanks!

  • Regina Barzilay, Erdong Chen
  • Mirella Lapata
  • Olga Uryupina
  • all of BLLIP (and Tom Griffiths)
  • DARPA GALE, Karen T. Romer Foundation
  • Everyone here!

Code is available: http://www.cs.brown.edu/people/melsner