Modelling Compression with Discourse Constraints James Clarke and - - PowerPoint PPT Presentation

modelling compression with discourse constraints
SMART_READER_LITE
LIVE PREVIEW

Modelling Compression with Discourse Constraints James Clarke and - - PowerPoint PPT Presentation

Introduction Modelling Compression with Discourse Constraints James Clarke and Mirella Lapata School of Informatics University of Edinburgh EMNLP 2007, Prague James Clarke and Mirella Lapata 1 Introduction Outline Sentence Compression 1


slide-1
SLIDE 1

Introduction

Modelling Compression with Discourse Constraints

James Clarke and Mirella Lapata

School of Informatics University of Edinburgh

EMNLP 2007, Prague

James Clarke and Mirella Lapata 1

slide-2
SLIDE 2

Introduction

Outline

1

Sentence Compression Definition and Overview Compression beyond Sentences

2

Compression Model ILP framework Constraints

3

Experiments Evaluation Results

James Clarke and Mirella Lapata 2

slide-3
SLIDE 3

Sentence Compression Definition and Overview

Outline

1

Sentence Compression Definition and Overview Compression beyond Sentences

2

Compression Model ILP framework Constraints

3

Experiments Evaluation Results

James Clarke and Mirella Lapata 3

slide-4
SLIDE 4

Sentence Compression Definition and Overview

What is Sentence Compression?

The task

To produce a summary of a single sentence by: using less words than the original preserving the most important information remaining grammatical

James Clarke and Mirella Lapata 4

slide-5
SLIDE 5

Sentence Compression Definition and Overview

What is Sentence Compression?

The task

To produce a summary of a single sentence by: using less words than the original preserving the most important information remaining grammatical Simplification: Given an input sentence of words W = w1, w2, . . . , wn, a compression is formed by dropping any subset of these words (Knight and Marcu 2002).

James Clarke and Mirella Lapata 4

slide-6
SLIDE 6

Sentence Compression Definition and Overview

Why Sentence Compression?

Applications

concise summary generation (Jing 2000, Lin 2003) subtitle generation for TV programmes (Vandeghinste et al. 2004) document display on small screens (Corston-Oliver 2001) audio scanning devices for the blind (Grefenstette 1998)

James Clarke and Mirella Lapata 5

slide-7
SLIDE 7

Sentence Compression Definition and Overview

Why Sentence Compression?

Applications

concise summary generation (Jing 2000, Lin 2003) subtitle generation for TV programmes (Vandeghinste et al. 2004) document display on small screens (Corston-Oliver 2001) audio scanning devices for the blind (Grefenstette 1998) Paradox: applications act on whole documents but compression by definition operates on isolated sentences.

James Clarke and Mirella Lapata 5

slide-8
SLIDE 8

Sentence Compression Definition and Overview

Previous Work

Sentence-based models

Most use a parallel corpus with features defined over: words (Hori and Furui 2004) parse trees (Knight and Marcu 2000, Jing 2000, Riezler et al 2003, McDonald 2006, Galley and McKeown 2007) semantic concepts (Jing 2000)

James Clarke and Mirella Lapata 6

slide-9
SLIDE 9

Sentence Compression Definition and Overview

Previous Work

Sentence-based models

Most use a parallel corpus with features defined over: words (Hori and Furui 2004) parse trees (Knight and Marcu 2000, Jing 2000, Riezler et al 2003, McDonald 2006, Galley and McKeown 2007) semantic concepts (Jing 2000) Caveat: context influences what information is important; the resulting compressed document should be coherent.

James Clarke and Mirella Lapata 6

slide-10
SLIDE 10

Sentence Compression Definition and Overview

This Work

We aim to: build a compression model that is contextually aware apply this model to entire documents We need to: represent the flow of discourse in text process documents automatically and robustly We focus on: representations of local coherence prerequisite for global coherence amenable to shallow processing

James Clarke and Mirella Lapata 7

slide-11
SLIDE 11

Sentence Compression Compression beyond Sentences

Outline

1

Sentence Compression Definition and Overview Compression beyond Sentences

2

Compression Model ILP framework Constraints

3

Experiments Evaluation Results

James Clarke and Mirella Lapata 8

slide-12
SLIDE 12

Sentence Compression Compression beyond Sentences

Discourse Representation

Centering Theory (Grosz et al. 1995)

Entity-orientated theory of local coherence (Grosz et al. 1995) Entities in an utterance are ranked according to salience Each utterance has one center (≈ topic or focus) Coherent discourses have utterances with common centers

James Clarke and Mirella Lapata 9

slide-13
SLIDE 13

Sentence Compression Compression beyond Sentences

Discourse Representation

Centering Theory (Grosz et al. 1995)

Entity-orientated theory of local coherence (Grosz et al. 1995) Entities in an utterance are ranked according to salience Each utterance has one center (≈ topic or focus) Coherent discourses have utterances with common centers

Lexical Chains (Halliday and Hasan 1976)

Representation of lexical cohesion (Halliday and Hasan 1976) Degree of semantic relatedness among words in document Dense and long chains signal the main topic of the document Coherent texts have more related words than incoherent ones

James Clarke and Mirella Lapata 9

slide-14
SLIDE 14

Sentence Compression Compression beyond Sentences

Example Discourse

1 Bad weather dashed hopes of attempts to halt the flow during what was seen as a lull in the lava’s momentum. 2 Some experts say that even if the eruption stopped today, the pres- sure of lava piled up behind for six miles would bring debris cas- cading down on to the town anyway. 3 Some estimate the volcano is pouring out one million tons of debris a day, at a rate of 15ft per second, from a fissure that opened in mid-December. 4 The Italian Army yesterday detonated 400lb of dynamite 3,500 feet up Mount Etna’s slopes.

James Clarke and Mirella Lapata 10

slide-15
SLIDE 15

Sentence Compression Compression beyond Sentences

Centering Algorithm

1 Bad weather dashed hopes of attempts to halt the flow during what was seen as a lull in the lava’s momentum. 2 Some experts say that even if the eruption stopped today, the pres- sure of lava piled up behind for six miles would bring debris cas- cading down on to the town anyway.

James Clarke and Mirella Lapata 11

slide-16
SLIDE 16

Sentence Compression Compression beyond Sentences

Centering Algorithm

1 Bad weather dashed hopes of attempts to halt the flow during what was seen as a lull in the lava’s momentum. 2 Some experts say that even if the eruption stopped today, the pressure of lava piled up behind for six miles would bring debris cascading down on to the town anyway.

1

Extract entities from U2.

James Clarke and Mirella Lapata 11

slide-17
SLIDE 17

Sentence Compression Compression beyond Sentences

Centering Algorithm

1 Bad weather dashed hopes of attempts to halt the flow during what was seen as a lull in the lava’s momentum. 2 Some experts say that even if the eruption stopped today, the pressure of lava piled up behind for six miles would bring debris cascading down on to the town anyway.

1

Extract entities from U2.

2

Rank the entities in U2 according to their grammatical role. (subject > objects > others)

James Clarke and Mirella Lapata 11

slide-18
SLIDE 18

Sentence Compression Compression beyond Sentences

Centering Algorithm

1 Bad weather dashed hopes of attempts to halt the flow during what was seen as a lull in the lava’s momentum. 2 Some experts say that even if the eruption stopped today, the pressure of lava piled up behind for six miles would bring debris cascading down on to the town anyway.

1

Extract entities from U2.

2

Rank the entities in U2 according to their grammatical role. (subject > objects > others)

3

Find highest ranked entity in U1 which occurs in U2. Set entity to be center of U2.

James Clarke and Mirella Lapata 11

slide-19
SLIDE 19

Sentence Compression Compression beyond Sentences

Centering Algorithm

  • 1. Bad weather dashed hopes of attempts to halt the flow during

what was seen as a lull in the lava’s momentum.

  • 2. Some experts say that even if the eruption stopped today, the

pressure of lava piled up behind for six miles would bring debris cascading down on to the town anyway.

1

Extract entities from U2.

2

Rank the entities in U2 according to their grammatical role. (subject > objects > others)

3

Find highest ranked entity in U1 which occurs in U2. Set entity to be center of U2.

James Clarke and Mirella Lapata 11

slide-20
SLIDE 20

Sentence Compression Compression beyond Sentences

Annotated Discourse

1 Bad weather dashed hopes of attempts to halt the flow during what was seen as a lull in the lava’s momentum. 2 Some experts say that even if the eruption stopped today, the pres- sure of lava piled up behind for six miles would bring debris cas- cading down on to the town anyway. 3 Some estimate the volcano is pouring out one million tons of de- bris a day, at a rate of 15ft per second, from a fissure that opened in mid-December. 4 The Italian Army yesterday detonated 400lb of dynamite 3,500 feet up Mount Etna’s slopes.

James Clarke and Mirella Lapata 12

slide-21
SLIDE 21

Sentence Compression Compression beyond Sentences

Lexical Chain Algorithm

1 – – – 2 – – – 3 – – – 4 – – – 5 – – – 6 – – – 7 – – – 8 – – –

James Clarke and Mirella Lapata 13

slide-22
SLIDE 22

Sentence Compression Compression beyond Sentences

Lexical Chain Algorithm

Lava Weight Time 1 X – X 2 X – – 3 – – X 4 X X X 5 X X – 6 – – X 7 X – – 8 – – –

1

Compute chains for document (Galley and McKeown 2003).

James Clarke and Mirella Lapata 13

slide-23
SLIDE 23

Sentence Compression Compression beyond Sentences

Lexical Chain Algorithm

Lava Weight Time 1 X – X 2 X – – 3 – – X 4 X X X 5 X X – 6 – – X 7 X – – 8 – – –

1

Compute chains for document (Galley and McKeown 2003). Lava : {lava, lava, lava, magma, lava} Weight : {tons, lbs} Time : {day, today, yesterday, second}

James Clarke and Mirella Lapata 13

slide-24
SLIDE 24

Sentence Compression Compression beyond Sentences

Lexical Chain Algorithm

Lava Weight Time 1 X – X 2 X – – 3 – – X 4 X X X 5 X X – 6 – – X 7 X – – 8 – – – Score 5 2 4

1

Compute chains for document (Galley and McKeown 2003).

2

Score(Chain) = Sent(Chain)

James Clarke and Mirella Lapata 13

slide-25
SLIDE 25

Sentence Compression Compression beyond Sentences

Lexical Chain Algorithm

Lava Weight Time 1 X – X 2 X – – 3 – – X 4 X X X 5 X X – 6 – – X 7 X – – 8 – – – Score 5 2 4

1

Compute chains for document (Galley and McKeown 2003).

2

Score(Chain) = Sent(Chain)

3

Score(Chain) < Avg(Score).

James Clarke and Mirella Lapata 13

slide-26
SLIDE 26

Sentence Compression Compression beyond Sentences

Lexical Chain Algorithm

Lava Time 1 X X 2 X – 3 – X 4 X X 5 X – 6 – X 7 X – 8 – – Score 5 4

1

Compute chains for document (Galley and McKeown 2003).

2

Score(Chain) = Sent(Chain)

3

Score(Chain) < Avg(Score).

4

Mark terms in chains as topic. Lava : {lava, lava, lava, magma, lava} Time : {day, today, yesterday, second}

James Clarke and Mirella Lapata 13

slide-27
SLIDE 27

Sentence Compression Compression beyond Sentences

Annotated Discourse

1 Bad weather dashed hopes of attempts to halt the flow during what was seen as a lull in the lava’s momentum. 2 Some experts say that even if the eruption stopped today, the pressure of lava piled up behind for six miles would bring debris cascading down on to the town anyway. 3 Some estimate the volcano is pouring out one million tons of de- bris a day, at a rate of 15ft per second, from a fissure that opened in mid-December. 4 The Italian Army yesterday detonated 400lb of dynamite 3,500 feet up Mount Etna’s slopes.

James Clarke and Mirella Lapata 14

slide-28
SLIDE 28

Compression Model ILP framework

Outline

1

Sentence Compression Definition and Overview Compression beyond Sentences

2

Compression Model ILP framework Constraints

3

Experiments Evaluation Results

James Clarke and Mirella Lapata 15

slide-29
SLIDE 29

Compression Model ILP framework

Integer Linear Programming

Properties: linear objective function decision variables (variables under our control) constraints over decision variables Advantages: find the global minimum or maximum value of objective function (Germann et al 2001, McDonald 2007) incorporate global constraints over the output space (Roth and Yih 2004, Riedel and Clarke 2006) ensure compressions are structurally and semantically valid

James Clarke and Mirella Lapata 16

slide-30
SLIDE 30

Compression Model ILP framework

Compression Model

Integer Linear Programming Formulation

trigram language model and significance score: c∗ = argmax

c n

  • i=1

P(wi|wi−1, wi−2) +

n

  • i=1

I(wi) requires no parallel corpus compresses sentences sequentially

James Clarke and Mirella Lapata 17

slide-31
SLIDE 31

Compression Model ILP framework

Compression Model

Integer Linear Programming Formulation

trigram language model and significance score: c∗ = argmax

c n

  • i=1

P(wi|wi−1, wi−2) +

n

  • i=1

I(wi) requires no parallel corpus compresses sentences sequentially

Decision Variables

yi = 1 if wi is in the compression

  • therwise

(1 ≤ i ≤ n)

James Clarke and Mirella Lapata 17

slide-32
SLIDE 32

Compression Model Constraints

Outline

1

Sentence Compression Definition and Overview Compression beyond Sentences

2

Compression Model ILP framework Constraints

3

Experiments Evaluation Results

James Clarke and Mirella Lapata 18

slide-33
SLIDE 33

Compression Model Constraints

Modifier Constraints

Ensure the relationships between head words and their modifiers remain grammatical.

1

If a modifier is in the compression, its head word must be included: yhead − ymodifer ≥ 0

2

Do not drop not if the head word is in the compression (same for words like his, our and genitives). yhead − ynot = 0

James Clarke and Mirella Lapata 19

slide-34
SLIDE 34

Compression Model Constraints

Sentential Constraints

Take overall sentence structure into account.

1

If a verb is in the compression then so are its arguments, and vice-versa: ysubject/object − yverb = 0

2

The compression must contain at least one verb.

  • i∈verbs

yi ≥ 1

James Clarke and Mirella Lapata 20

slide-35
SLIDE 35

Compression Model Constraints

Discourse Constraints

Take overall document into account and preserve its coherence.

1

Do not drop centers and their references. ycenter = 1

2

Do not drop words in topical lexical chains. ytopical = 1

3

Do not drop personal pronouns. ypersonal pronoun = 1

James Clarke and Mirella Lapata 21

slide-36
SLIDE 36

Compression Model Constraints

Compressed Document

  • 1. Weather dashed hopes to halt the flow.
  • 2. Experts say that, the pressure bring cascading down to the town.
  • 3. Some estimate at a rate of 15ft from a fissure opened in mid-

December.

  • 4. The Italian Army detonated 400lb of dynamite.

James Clarke and Mirella Lapata 22

slide-37
SLIDE 37

Compression Model Constraints

Compressed Document

  • 1. Weather dashed hopes to halt the flow in the lava’s momentum.
  • 2. Some experts say that, the pressure of lava would bring debris

cascading down.

  • 3. The volcano is pouring out million tons of debris a day.
  • 4. The Italian Army yesterday detonated 400lb of dynamite.

James Clarke and Mirella Lapata 22

slide-38
SLIDE 38

Experiments Evaluation

Outline

1

Sentence Compression Definition and Overview Compression beyond Sentences

2

Compression Model ILP framework Constraints

3

Experiments Evaluation Results

James Clarke and Mirella Lapata 23

slide-39
SLIDE 39

Experiments Evaluation

Evaluation

Motivation

Assume the compressed document is a replacement for original:

1

is the compressed document readable?

2

Is the key information from original preserved in compression?

Question-answering paradigm

How many questions can we answer accurately by reading the compressed document? Questions derived from source document. Two annotators created Q&A pairs. Fact-based questions requiring unambiguous answers.

James Clarke and Mirella Lapata 24

slide-40
SLIDE 40

Experiments Evaluation

Experimental Setup

Created document-based compression corpus (available from http://homepages.inf.ed.ac.uk/s0460084/data/). Six documents with five to eight questions per document. Three conditions: gold standard, McDonald (2006), Discourse ILP . Sixty participants over the web. Rate readability on seven point scale. Answer questions one at a time using compressed document.

James Clarke and Mirella Lapata 25

slide-41
SLIDE 41

Experiments Evaluation

Experimental Setup

Created document-based compression corpus (available from http://homepages.inf.ed.ac.uk/s0460084/data/). Six documents with five to eight questions per document. Three conditions: gold standard, McDonald (2006), Discourse ILP . Sixty participants over the web. Rate readability on seven point scale. Answer questions one at a time using compressed document. Mcdonald (2006): discriminative, state-of-the-art model, with large sentence-based feature space.

James Clarke and Mirella Lapata 25

slide-42
SLIDE 42

Experiments Evaluation

Example Questions and Answers

1 Weather dashed hopes to halt the flow in the lava’s momentum. 2 Some experts say that, the pressure of lava would bring debris cascading down. 3 The volcano is pouring out million tons of debris a day. 4 The Italian Army yesterday detonated 400lb of dynamite. Q: What is posing a threat to the town? A: lava Q: What hindered attempts to stop the lava flow? A: bad weather Q: What did the Army do to stop the lava flow? A: detonate explosives

James Clarke and Mirella Lapata 26

slide-43
SLIDE 43

Experiments Results

Outline

1

Sentence Compression Definition and Overview Compression beyond Sentences

2

Compression Model ILP framework Constraints

3

Experiments Evaluation Results

James Clarke and Mirella Lapata 27

slide-44
SLIDE 44

Experiments Results

Results

Model CompR Readability Q&A McDonald 2006 60.1% 2.65 54.4% Discourse ILP 65.4% 3.00 67.8% Gold Standard 70.3% 5.27 82.2%

James Clarke and Mirella Lapata 28

slide-45
SLIDE 45

Experiments Results

Results

Model CompR Readability Q&A McDonald 2006 60.1% 2.65 54.4% Discourse ILP 65.4% 3.00 67.8% Gold Standard 70.3% 5.27 82.2% On readability Discourse ILP and McDonald are not sig. different

James Clarke and Mirella Lapata 28

slide-46
SLIDE 46

Experiments Results

Results

Model CompR Readability Q&A McDonald 2006 60.1% 2.65 54.4% Discourse ILP 65.4% 3.00 67.8% Gold Standard 70.3% 5.27 82.2% On readability Discourse ILP and McDonald are not sig. different Both models are sig. worse than gold standard

James Clarke and Mirella Lapata 28

slide-47
SLIDE 47

Experiments Results

Results

Model CompR Readability Q&A McDonald 2006 60.1% 2.65 54.4% Discourse ILP 65.4% 3.00 67.8% Gold Standard 70.3% 5.27 82.2% On readability Discourse ILP and McDonald are not sig. different Both models are sig. worse than gold standard On Q&A task Discourse ILP is sig. better than McDonald

James Clarke and Mirella Lapata 28

slide-48
SLIDE 48

Experiments Results

Results

Model CompR Readability Q&A McDonald 2006 60.1% 2.65 54.4% Discourse ILP 65.4% 3.00 67.8% Gold Standard 70.3% 5.27 82.2% On readability Discourse ILP and McDonald are not sig. different Both models are sig. worse than gold standard On Q&A task Discourse ILP is sig. better than McDonald Both models sig. worse than gold standard

James Clarke and Mirella Lapata 28

slide-49
SLIDE 49

Experiments Results

Conclusions

Contributions: discourse-based sentence compression model formulated within the ILP framework using global constraints unsupervised, relatively simple and intuitive model document-based evaluation using a Q&A task-based paradigm performance gains over supervised discourse agnostic system Future work: interface compression model with sentence extraction study the effect of global discourse structure (Daumé III and Marcu 2002) explore the effect of discourse for other models

James Clarke and Mirella Lapata 29

slide-50
SLIDE 50

Q&A Task

Each question presented in turn. No corrections allowed. Answers marked consistently across all three systems. Q: What is posing a threat to the town? A: Lava Volcano Lava from Mount Etna Q: What hindered attempts to stop the lava flow? A: Bad weather Snow and winds The weather - snow Q: What did the Army do to stop the lava flow? A: Detonate explosives Used explosives Detonate dynamite

James Clarke and Mirella Lapata 30