Natural Language Understanding Lecture 17: Entity-based Coherence - - PowerPoint PPT Presentation

natural language understanding
SMART_READER_LITE
LIVE PREVIEW

Natural Language Understanding Lecture 17: Entity-based Coherence - - PowerPoint PPT Presentation

Introduction The Entity Grid Evaluation Natural Language Understanding Lecture 17: Entity-based Coherence Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk March 24, 2015 Mirella Lapata Natural Language


slide-1
SLIDE 1

Introduction The Entity Grid Evaluation

Natural Language Understanding

Lecture 17: Entity-based Coherence Mirella Lapata

School of Informatics University of Edinburgh mlap@inf.ed.ac.uk

March 24, 2015

Mirella Lapata Natural Language Understanding 1

slide-2
SLIDE 2

Introduction The Entity Grid Evaluation

1 Introduction 2 The Entity Grid

Discourse Representation Entity Transitions Ranking Model

3 Evaluation

Text Ordering Summarization Reading: Barzilay and Lapata (2008).

Mirella Lapata Natural Language Understanding 2

slide-3
SLIDE 3

Introduction The Entity Grid Evaluation

Coherence in Text

Coherence: is a property of well-written texts; makes them easier to read and understand; ensures that sentences are meaningfully related; and that the reader can work out what expressions mean; the text is thematically organized; temporally organized; rather than a random concatenation of sentences. In this lecture, we will discuss Barzilay and Lapata’s (2008) entity-based model of coherence.

Mirella Lapata Natural Language Understanding 3

slide-4
SLIDE 4

Introduction The Entity Grid Evaluation

Coherence in Text

Summary A Britain said he did not have diplomatic immunity. The Spanish authorities contend that Pinochet may have committed crimes against Spanish citizens in

  • Chile. Baltasar Garzon filed a

request on Wednesday. Chile said, President Fidel Castro said Sunday he disagreed with the arrest in London. Summary B Former Chilean dictator Augusto Pinochet, was arrested in London

  • n 14 October 1998. Pinochet,

82, was recovering from surgery. The arrest was in response to an extradition warrant served by a Spanish judge. Pinochet was charged with murdering thousands, including many

  • Spaniards. Pinochet is awaiting a

hearing, his fate in the balance. American scholars applauded the arrest.

Mirella Lapata Natural Language Understanding 4

slide-5
SLIDE 5

Introduction The Entity Grid Evaluation

Entity-based Coherence

The way entities are introduced and discussed influences coherence (Grosz et al., 1995). Entities in an utterance are ranked according to salience.

Is an entity pronominalized or not? Is an entity in a prominent syntactic position?

Each utterance has one center (≈ topic or focus). Coherent discourses have utterances with common centers. Entity transitions capture degrees of coherence (e.g., in Centering theory continue > shift). Notions of salience, utterance, ranking are left unspecified.

Mirella Lapata Natural Language Understanding 5

slide-6
SLIDE 6

Introduction The Entity Grid Evaluation

Entity-based Local Coherence

John went to his favorite music store to buy a piano. He had frequented the store for many years. He was excited that he could finally buy a piano. He arrived just as the store was closing for the day. John went to his favorite music store to buy a piano. It was a store John had fre- quented for many years. He was excited that he could finally buy a piano. It was closing just as John arrived.

Mirella Lapata Natural Language Understanding 6

slide-7
SLIDE 7

Introduction The Entity Grid Evaluation

Entity-based Local Coherence

John went to his favorite music store to buy a piano. He had frequented the store for many years. He was excited that he could finally buy a piano. He arrived just as the store was closing for the day. John went to his favorite music store to buy a piano. It was a store John had fre- quented for many years. He was excited that he could finally buy a piano. It was closing just as John arrived.

Mirella Lapata Natural Language Understanding 6

slide-8
SLIDE 8

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

Can we compute a discourse representation automatically? Does it capture coherence characteristics? What linguistic information matters for coherence? Is it robust across domains and genres? What is an appropriate coherence model? View coherence rating as a machine learning problem. Learn a ranking function without manual involvement. Apply to text-to-text generation tasks. Inspired from entity-based theories, not a direct implementation of any theory in particular.

Mirella Lapata Natural Language Understanding 7

slide-9
SLIDE 9

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

1 Former Chilean dictator Augusto Pinochet, was arrested in London on 14 October 1998. 2 Pinochet, 82, was recovering from surgery. 3 The arrest was in response to an extradition warrant served by a Spanish judge. 4 Pinochet was charged with murdering thousands, including many Spaniards. 5 He is awaiting a hearing, his fate in the balance. 6 American scholars applauded the arrest.

Mirella Lapata Natural Language Understanding 8

slide-10
SLIDE 10

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

1

✞ ✝ ☎ ✆

Former Chilean dictator Augusto Pinochet , was arrested in

✞ ✝ ☎ ✆

London on

✞ ✝ ☎ ✆

14 October 1998. 2

✞ ✝ ☎ ✆

Pinochet , 82, was recovering from ✄

surgery . 3

✞ ✝ ☎ ✆

The arrest was in ✄

response to

✞ ✝ ☎ ✆

an extradition warrant served by

✞ ✝ ☎ ✆

a Spanish judge . 4

✞ ✝ ☎ ✆

Pinochet was charged with murdering

✞ ✝ ☎ ✆

thousands , includ- ing

✞ ✝ ☎ ✆

many Spaniards . 5

✞ ✝ ☎ ✆

He is awaiting

✞ ✝ ☎ ✆

a hearing ,

✞ ✝ ☎ ✆

Pinochet’s fate in

✞ ✝ ☎ ✆

the balance . 6

✞ ✝ ☎ ✆

American scholars applauded the

✄ ✂

arrest .

Mirella Lapata Natural Language Understanding 9

slide-11
SLIDE 11

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

1

✞ ✝ ☎ ✆

Former Chilean dictator Augusto Pinochet S, was arrested in

✞ ✝ ☎ ✆

London X on

✞ ✝ ☎ ✆

14 October X 1998. 2

✞ ✝ ☎ ✆

Pinochet S, 82, was recovering from ✄

surgery X. 3

✞ ✝ ☎ ✆

The arrest S was in ✄

response X to

✞ ✝ ☎ ✆

an extradition warrant X served by

✞ ✝ ☎ ✆

a Spanish judge S. 4

✞ ✝ ☎ ✆

Pinochet O was charged with murdering

✞ ✝ ☎ ✆

thousands O, in- cluding many

✞ ✝ ☎ ✆

Spaniards O. 5

✞ ✝ ☎ ✆

Pinochet S is awaiting

✞ ✝ ☎ ✆

a hearing O,

✞ ✝ ☎ ✆

his fate X in

✞ ✝ ☎ ✆

the balance X. 6

✞ ✝ ☎ ✆

American scholars S applauded the

✄ ✂

arrest O.

Mirella Lapata Natural Language Understanding 9

slide-12
SLIDE 12

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

1 PinochetS LondonX OctoberX 2 PinochetS surgeryX 3 arrestS response X warrantX judgeO 4 PinochetO thousandsO SpaniardsO 5 PinochetS hearingO PinochetX fateX balanceX 6 scholarsS arrestO

Mirella Lapata Natural Language Understanding 10

slide-13
SLIDE 13

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

Pinochet London October Surgery Arrest Warrant Judge Thousands Spaniards Hearing Fate Balance Scholars 1 2 3 4 5 6

Mirella Lapata Natural Language Understanding 11

slide-14
SLIDE 14

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

Pinochet London October Surgery Arrest Extradition Warrant Judge Thousands Spaniards Hearing Fate Balance Scholars 1 S 2 S 3 – 4 O 5 S 6 –

Mirella Lapata Natural Language Understanding 11

slide-15
SLIDE 15

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

Pinochet London October Surgery Arrest Extradition Warrant Judge Thousands Spaniards Hearing Fate Balance Scholars 1 S X X – – – – – – – – – – – 2 S – – X – – – – – – – – – – 3 – – – – S X X O – – – – – – 4 O – – – – – – – O O – – – – 5 S – – – – – – – – – O X X – 6 – – – – O – – – – – – – – S

Mirella Lapata Natural Language Understanding 11

slide-16
SLIDE 16

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

S X X – – – – – – – – – – – S – – X – – – – – – – – – – – – – – S X X O – – – – – – O – – – – – – – O O – – – – S – – – – – – – – – O X X – – – – – O – – – – – – – – S

Mirella Lapata Natural Language Understanding 12

slide-17
SLIDE 17

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

S X X – – – – – – – – – – – S – – X – – – – – – – – – – – – – – S X X O – – – – – – O – – – – – – – O O – – – – S – – – – – – – – – O X X – – – – – O – – – – – – – – S S S – X X – – – – – – – – – – – – X – – X – – – – – – – – – – – – – – – – X X O – – – – – – – – – – – – – – – O O – – – – – – – – – – – – – – – O X X – – – – – – O – – – – – – – –

Mirella Lapata Natural Language Understanding 12

slide-18
SLIDE 18

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

The Entity Grid

S X X – – – – – – – – – – – S – – X – – – – – – – – – – – – – – S X X O – – – – – – O – – – – – – – O O – – – – S – – – – – – – – – O X X – – – – – O – – – – – – – – S S S – X X – – – – – – – – – – – – X – – X – – – – – – – – – – – – – – – – X X O – – – – – – – – – – – – – – – O O – – – – – – – – – – – – – – – O X X – – – – – – O – – – – – – – –

Mirella Lapata Natural Language Understanding 12

slide-19
SLIDE 19

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

Entity Transitions

Definition A local entity transition is a sequence {S, O, X, –}n that represents entity occurrences and their syntactic roles in n adjacent sentences. Feature Vector Notation Each grid xij for document di is represented by a feature vector: Φ(xij) = (p1(xij), p2(xij), . . . , pm(xij)) m is the number of predefined entity transitions pt(xij) the probability of transition t in grid xij

Mirella Lapata Natural Language Understanding 13

slide-20
SLIDE 20

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

Entity Transitions

Example (transitions of length 2) S S S O S X S – O S O O O X O – X S X O X X X – – S – O – X – – d1 0 0 .03 0 0 .02 .07 0 .12 .02 .02 .05 .25 d2 0 0 .02 0 .07 0 .02 0 .06 .04 .36 d3 .02 0 0 .03 0 0 .06 .05 .03 .07 .07 .29

Mirella Lapata Natural Language Understanding 14

slide-21
SLIDE 21

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

Entity Transitions

Example (transitions of length 2) S S S O S X S – O S O O O X O – X S X O X X X – – S – O – X – – d1 0 0 .03 0 0 .02 .07 0 .12 .02 .02 .05 .25 d2 0 0 .02 0 .07 0 .02 0 .06 .04 .36 d3 .02 0 0 .03 0 0 .06 .05 .03 .07 .07 .29

Mirella Lapata Natural Language Understanding 14

slide-22
SLIDE 22

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

Linguistic Dimensions

Salience: Are some entities more important than others? Discriminate between salient (frequent) entities and the rest. Collect statistics separately for each group. Coreference: What is its contribution? Entities are coreferent if they have the same surface form. Coreference resolution systems (see last lexture). Syntax: Does syntactic knowledge matter? Use four categories {S, O, X, –}. Reduce categories to {X, –}.

Mirella Lapata Natural Language Understanding 15

slide-23
SLIDE 23

Introduction The Entity Grid Evaluation Discourse Representation Entity Transitions Ranking Model

Learning a Ranking Function

Training Set Ordered pairs (xij, xik), where xij and xik represent the same document di, and xij is more coherent than xik (assume j > k). Goal Find a parameter vector w such that:

  • w · (Φ(xij) − Φ(xik)) > 0 ∀j, i, k such that j > k

Support Vector Machines Constraint optimization problem can be solved using the search technique described in Joachims (2002).

Mirella Lapata Natural Language Understanding 16

slide-24
SLIDE 24

Introduction The Entity Grid Evaluation Text Ordering Summarization

Text Ordering

Motivation Determine a sequence in which to present a set of items. Information ordering used to evaluate text structuring algorithms. Essential step in generation applications. Data Source document and permutations of its sentences. Original order assumed coherent. Given k documents, with n permutations, obtain k · n pairwise rankings for training and testing. Two corpora, Earthquakes and Accidents, 100 texts each.

Mirella Lapata Natural Language Understanding 17

slide-25
SLIDE 25

Introduction The Entity Grid Evaluation Text Ordering Summarization

Text Ordering

Sentence 1 Sentence 2 Sentence 3 Sentence 4

Mirella Lapata Natural Language Understanding 18

slide-26
SLIDE 26

Introduction The Entity Grid Evaluation Text Ordering Summarization

Text Ordering

Sentence 1 Sentence 2 Sentence 3 Sentence 4 Sentence 2 Sentence 3 Sentence 4 Sentence 1 Sentence 4 Sentence 3 Sentence 2 Sentence 1 Sentence 2 Sentence 1 Sentence 4 Sentence 3

Mirella Lapata Natural Language Understanding 18

slide-27
SLIDE 27

Introduction The Entity Grid Evaluation Text Ordering Summarization

Comparison with State of the Art

Vector-based Model (LSA, Foltz et al., 1998): Meaning of individual words is represented in vector space. Sentence meaning is the mean of the vectors of its words. Average distance of adjacent sentences. Unsupervised, local, unlexicalized, domain independent.

Mirella Lapata Natural Language Understanding 19

slide-28
SLIDE 28

Introduction The Entity Grid Evaluation Text Ordering Summarization

Comparison with State of the Art

y x S1 S2 S3 S4 S5 y x S1 S2 S3 S4 S5

Mirella Lapata Natural Language Understanding 20

slide-29
SLIDE 29

Introduction The Entity Grid Evaluation Text Ordering Summarization

Comparison with State of the Art

HMM-based Content Models (Barzilay and Lee, 2004): Model topics and their order in texts. Model is an HMM: states correspond to topics (≈ sentences). Model selects sentence order with highest probability. Supervised, global, lexicalized, domain dependent.

Mirella Lapata Natural Language Understanding 21

slide-30
SLIDE 30

Introduction The Entity Grid Evaluation Text Ordering Summarization

Comparison with State of the Art

Location History Strength Casualties Rescue the quake was near San Jose its magnitude was ...

Mirella Lapata Natural Language Understanding 22

slide-31
SLIDE 31

Introduction The Entity Grid Evaluation Text Ordering Summarization

The Entity Grid

Mirella Lapata Natural Language Understanding 23

slide-32
SLIDE 32

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Earthquakes Accidents Coreference+Syntax+Salience+ 87.2 90.4 Coreference+Syntax+Salience− 88.3 90.1 Coreference+Syntax−Salience+ 86.6 88.4∗∗ Coreference−Syntax+Salience+ 83.0∗∗ 89.9 Coreference+Syntax−Salience− 86.1 89.2 Coreference−Syntax+Salience− 82.3∗∗ 88.6∗ Coreference−Syntax−Salience+ 83.0∗∗ 86.5∗∗ Coreference−Syntax−Salience− 81.4∗∗ 86.0∗∗ HMM-based Content Models 88.0 75.8∗∗ Latent Semantic Analysis 81.0∗∗ 87.3∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 24

slide-33
SLIDE 33

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Earthquakes Accidents Coreference+Syntax+Salience+ 87.2 90.4 Coreference+Syntax+Salience− 88.3 90.1 Coreference+Syntax−Salience+ 86.6 88.4∗∗ Coreference−Syntax+Salience+ 83.0∗∗ 89.9 Coreference+Syntax−Salience− 86.1 89.2 Coreference−Syntax+Salience− 82.3∗∗ 88.6∗ Coreference−Syntax−Salience+ 83.0∗∗ 86.5∗∗ Coreference−Syntax−Salience− 81.4∗∗ 86.0∗∗ HMM-based Content Models 88.0 75.8∗∗ Latent Semantic Analysis 81.0∗∗ 87.3∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 24

slide-34
SLIDE 34

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Earthquakes Accidents Coreference+Syntax+Salience+ 87.2 90.4 Coreference+Syntax+Salience− 88.3 90.1 Coreference+Syntax−Salience+ 86.6 88.4∗∗ Coreference−Syntax+Salience+ 83.0∗∗ 89.9 Coreference+Syntax−Salience− 86.1 89.2 Coreference−Syntax+Salience− 82.3∗∗ 88.6∗ Coreference−Syntax−Salience+ 83.0∗∗ 86.5∗∗ Coreference−Syntax−Salience− 81.4∗∗ 86.0∗∗ HMM-based Content Models 88.0 75.8∗∗ Latent Semantic Analysis 81.0∗∗ 87.3∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 24

slide-35
SLIDE 35

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Earthquakes Accidents Coreference+Syntax+Salience+ 87.2 90.4 Coreference+Syntax+Salience− 88.3 90.1 Coreference+Syntax−Salience+ 86.6 88.4∗∗ Coreference−Syntax+Salience+ 83.0∗∗ 89.9 Coreference+Syntax−Salience− 86.1 89.2 Coreference−Syntax+Salience− 82.3∗∗ 88.6∗ Coreference−Syntax−Salience+ 83.0∗∗ 86.5∗∗ Coreference−Syntax−Salience− 81.4∗∗ 86.0∗∗ HMM-based Content Models 88.0 75.8∗∗ Latent Semantic Analysis 81.0∗∗ 87.3∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 24

slide-36
SLIDE 36

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Earthquakes Accidents Coreference+Syntax+Salience+ 87.2 90.4 Coreference+Syntax+Salience− 88.3 90.1 Coreference+Syntax−Salience+ 86.6 88.4∗∗ Coreference−Syntax+Salience+ 83.0∗∗ 89.9 Coreference+Syntax−Salience− 86.1 89.2 Coreference−Syntax+Salience− 82.3∗∗ 88.6∗ Coreference−Syntax−Salience+ 83.0∗∗ 86.5∗∗ Coreference−Syntax−Salience− 81.4∗∗ 86.0∗∗ HMM-based Content Models 88.0 75.8∗∗ Latent Semantic Analysis 81.0∗∗ 87.3∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 24

slide-37
SLIDE 37

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Omission of coreference causes performance drop. Syntax and Salience have more effect on Accidents corpus. Linguistically poor model generally worse. Entity model is better than LSA. HMM-based content models exhibit high variability. Models seem to be complementary.

Mirella Lapata Natural Language Understanding 25

slide-38
SLIDE 38

Introduction The Entity Grid Evaluation Text Ordering Summarization

Summarization

Motivation Summaries naturally exhibit coherence violations. Compare model against rankings elicited by human judges. Useful for automatic evaluation of machine generated text. Data Outputs of 5 multi-document summarization systems and corresponding human authored summaries (DUC 2003). Participants assign readability score on a seven point scale. 144 summaries, 177 participants (23 per summary).

Mirella Lapata Natural Language Understanding 26

slide-39
SLIDE 39

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Accuracy Coreference+Syntax+Salience+ 80.0 Coreference+Syntax+Salience− 75.0 Coreference+Syntax−Salience+ 78.8 Coreference−Syntax+Salience+ 83.8 Coreference+Syntax−Salience− 71.3∗ Coreference−Syntax+Salience− 78.8 Coreference−Syntax−Salience+ 77.5 Coreference−Syntax−Salience− 73.8∗ Latent Semantic Analysis 52.5∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 27

slide-40
SLIDE 40

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Accuracy Coreference+Syntax+Salience+ 80.0 Coreference+Syntax+Salience− 75.0 Coreference+Syntax−Salience+ 78.8 Coreference−Syntax+Salience+ 83.8 Coreference+Syntax−Salience− 71.3∗ Coreference−Syntax+Salience− 78.8 Coreference−Syntax−Salience+ 77.5 Coreference−Syntax−Salience− 73.8∗ Latent Semantic Analysis 52.5∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 27

slide-41
SLIDE 41

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Accuracy Coreference+Syntax+Salience+ 80.0 Coreference+Syntax+Salience− 75.0 Coreference+Syntax−Salience+ 78.8 Coreference−Syntax+Salience+ 83.8 Coreference+Syntax−Salience− 71.3∗ Coreference−Syntax+Salience− 78.8 Coreference−Syntax−Salience+ 77.5 Coreference−Syntax−Salience− 73.8∗ Latent Semantic Analysis 52.5∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 27

slide-42
SLIDE 42

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Accuracy Coreference+Syntax+Salience+ 80.0 Coreference+Syntax+Salience− 75.0 Coreference+Syntax−Salience+ 78.8 Coreference−Syntax+Salience+ 83.8 Coreference+Syntax−Salience− 71.3∗ Coreference−Syntax+Salience− 78.8 Coreference−Syntax−Salience+ 77.5 Coreference−Syntax−Salience− 73.8∗ Latent Semantic Analysis 52.5∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 27

slide-43
SLIDE 43

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Model Accuracy Coreference+Syntax+Salience+ 80.0 Coreference+Syntax+Salience− 75.0 Coreference+Syntax−Salience+ 78.8 Coreference−Syntax+Salience+ 83.8 Coreference+Syntax−Salience− 71.3∗ Coreference−Syntax+Salience− 78.8 Coreference−Syntax−Salience+ 77.5 Coreference−Syntax−Salience− 73.8∗ Latent Semantic Analysis 52.5∗∗ Evaluation metric: % correct ranks in test set.

∗∗: sig. different from Coreference+Syntax+Salience+

Mirella Lapata Natural Language Understanding 27

slide-44
SLIDE 44

Introduction The Entity Grid Evaluation Text Ordering Summarization

Results

Coreference decreases accuracy (machine generated texts). Salience seems to have more of an impact here. Linguistically poor model is generally worse. Entity model performs better than LSA. LSA is unsupervised and exposed only to human texts. Training corpus is unsuitable for HMM-based content models.

Mirella Lapata Natural Language Understanding 28

slide-45
SLIDE 45

Introduction The Entity Grid Evaluation Text Ordering Summarization

Summary

Strengths: Novel framework for representing and measuring coherence. Entity grid and cross-sentential transitions. Suited for learning appropriate ranking function. Fully automatic and robust, useful for system development. Weaknesses: Entity grid doesn’t contain lexical information. Doesn’t contain a notion of global coherence. Can’t model multi-paragraph text.

Mirella Lapata Natural Language Understanding 29