Models for Sentence Compression A Comparison across Domains, - - PowerPoint PPT Presentation

models for sentence compression
SMART_READER_LITE
LIVE PREVIEW

Models for Sentence Compression A Comparison across Domains, - - PowerPoint PPT Presentation

Models for Sentence Compression A Comparison across Domains, Training Requirements and Evaluation Measures James Clarke and Mirella Lapata School of Informatics University of Edinburgh July 2006 ACL 2006 James Clarke and Mirella Lapata 1


slide-1
SLIDE 1

Models for Sentence Compression

A Comparison across Domains, Training Requirements and Evaluation Measures James Clarke and Mirella Lapata

School of Informatics University of Edinburgh

July 2006 ACL 2006

James Clarke and Mirella Lapata 1

slide-2
SLIDE 2

Introduction

What is Sentence Compression?

Sentence Compression

Can be viewed as producing a summary of a single sentence.

James Clarke and Mirella Lapata 2

slide-3
SLIDE 3

Introduction

What is Sentence Compression?

Sentence Compression

Can be viewed as producing a summary of a single sentence.

More formally

A compressed sentence should: Use less words than the original sentence. Preserve the most important information. Remain grammatical.

James Clarke and Mirella Lapata 3

slide-4
SLIDE 4

Introduction

Simplification

Sentence compression can involve... word reordering word deletion word substitution word insertion

James Clarke and Mirella Lapata 4

slide-5
SLIDE 5

Introduction

Simplification

Sentence compression can involve... word reordering word deletion word substitution word insertion Ideally we want to exploit all of these operations but let’s start simple:

Knight and Marcu (2002)

Given an input sentence of words W = w1, w2, . . . , wn, a compression is formed by dropping any subset of these words.

James Clarke and Mirella Lapata 5

slide-6
SLIDE 6

Introduction

Example Compression

Original

Prime Minister Tony Blair today insisted the case for holding terrorism suspects without trial was “absolutely compelling” as the government published new legislation allowing detention for 90 days without charge.

James Clarke and Mirella Lapata 6

slide-7
SLIDE 7

Introduction

Example Compression

Original

Prime Minister Tony Blair today insisted the case for holding terrorism suspects without trial was “absolutely compelling” as the government published new legislation allowing detention for 90 days without charge.

Compression

Tony Blair insisted the case for holding terrorism suspects without trial was “compelling”.

James Clarke and Mirella Lapata 7

slide-8
SLIDE 8

Introduction

Example Compression

Original

Prime Minister Tony Blair today insisted the case for holding terrorism suspects without trial was “absolutely compelling” as the government published new legislation allowing detention for 90 days without charge.

Compression

Tony Blair insisted the case for holding terrorism suspects without trial was “compelling”.

James Clarke and Mirella Lapata 8

slide-9
SLIDE 9

Introduction

Outline

1

Sentence Compression Motivation Previous Work

2

Our Work How do humans compress sentences? Do existing methods port well across domains? What about automatic evaluation measures?

3

Discussion

James Clarke and Mirella Lapata 9

slide-10
SLIDE 10

Sentence Compression Motivation

Outline

1

Sentence Compression Motivation Previous Work

2

Our Work How do humans compress sentences? Do existing methods port well across domains? What about automatic evaluation measures?

3

Discussion

James Clarke and Mirella Lapata 10

slide-11
SLIDE 11

Sentence Compression Motivation

Applications

Within summarisation: Current systems contain manually written rules for sentence compression. Other Applications include: Subtitle generation. Text compression for display on small screens. Audio scanning devices for the blind.

James Clarke and Mirella Lapata 11

slide-12
SLIDE 12

Sentence Compression Previous Work

Outline

1

Sentence Compression Motivation Previous Work

2

Our Work How do humans compress sentences? Do existing methods port well across domains? What about automatic evaluation measures?

3

Discussion

James Clarke and Mirella Lapata 12

slide-13
SLIDE 13

Sentence Compression Previous Work

Previous Work

Methods ✟✟✟✟✟✟✟✟ ✟ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ ❍ Supervised ✟✟✟✟✟✟ ❍ ❍ ❍ ❍ ❍ ❍ Generative Knight & Marcu (2002) Turner & Charniak (2005) Discriminative Knight & Marcu (2002) Riezler et al. (2003) Nguyen et al. (2004) McDonald (2006) Unsupervised Hori & Furui (2004) Charniak & Turner (2005)

James Clarke and Mirella Lapata 13

slide-14
SLIDE 14

Sentence Compression Previous Work

Data Requirements

Parallel Corpora

Most approaches rely on a parallel corpus. Automatically produced Ziff-Davis (Knight and Marcu, 2002). Domain: newspaper articles. There is no ‘natural’ resource of original-compressed sentences.

James Clarke and Mirella Lapata 14

slide-15
SLIDE 15

Sentence Compression Previous Work

Data Requirements

Parallel Corpora

Most approaches rely on a parallel corpus. Automatically produced Ziff-Davis (Knight and Marcu, 2002). Domain: newspaper articles. There is no ‘natural’ resource of original-compressed sentences.

Abstract

Blah blah blah. The documentation is excellent. Blah blah blah . . .

Document

. . . blah blah blah. The documentation is excellent – it is clearly written with numerous drawings, cautions and tips, and includes an entire section

  • n troubleshooting. Blah . . .

James Clarke and Mirella Lapata 15

slide-16
SLIDE 16

Sentence Compression Previous Work

Data Requirements

Parallel Corpora

Most approaches rely on a parallel corpus. Automatically produced Ziff-Davis (Knight and Marcu, 2002). Domain: newspaper articles. There is no ‘natural’ resource of original-compressed sentences.

Abstract

Blah blah blah. The documentation is excellent. Blah blah blah . . .

Document

. . . blah blah blah. The documentation is excellent – it is clearly written with numerous drawings, cautions and tips, and includes an entire section

  • n troubleshooting. Blah . . .

James Clarke and Mirella Lapata 16

slide-17
SLIDE 17

Sentence Compression Previous Work

Evaluation

Methodology

Algorithms are evaluated on small sample (32 sentences). Humans are asked to assess grammaticality and information content. Typically four participants are used. Unlike machine translation, no established automatic measure. Comparisons across systems and system-configurations?

James Clarke and Mirella Lapata 17

slide-18
SLIDE 18

Our Work How do humans compress sentences?

Outline

1

Sentence Compression Motivation Previous Work

2

Our Work How do humans compress sentences? Do existing methods port well across domains? What about automatic evaluation measures?

3

Discussion

James Clarke and Mirella Lapata 18

slide-19
SLIDE 19

Our Work How do humans compress sentences?

Human-authored Compression Corpus

Spoken Text

Natural domain for compression applications. Speech is challenging (ungrammatical, incomplete). No naturally occurring compression corpora.

James Clarke and Mirella Lapata 19

slide-20
SLIDE 20

Our Work How do humans compress sentences?

Human-authored Compression Corpus

Spoken Text

Natural domain for compression applications. Speech is challenging (ungrammatical, incomplete). No naturally occurring compression corpora.

Methodology

50 Broadcast news documents. 3 annotators remove tokens from original transcript:

preserve most important information in original sentence. preserve grammaticality of the compressed sentence.

Could also leave a sentence uncompressed.

James Clarke and Mirella Lapata 20

slide-21
SLIDE 21

Our Work How do humans compress sentences?

Example Human Compressions

Original

President Boris Yeltsin has won the most votes in Russia ’s hotly contested presidential election , one watched around the world .

Compressions

1

Boris Yeltsin has the most votes in Russia ’s presidential election .

2

Boris Yeltsin has won the most votes in Russia ’s presidential election , watched around the world .

3

Boris Yeltsin has won the most votes in Russia ’s presidential election .

James Clarke and Mirella Lapata 21

slide-22
SLIDE 22

Our Work How do humans compress sentences?

Analysis: Compression Rate

A1 A2 A3 Av Ziff-Davis % compressed 88 79 87 84.4 97 CompRate 73.1 79.0 70.0 73.03 47

James Clarke and Mirella Lapata 22

slide-23
SLIDE 23

Our Work How do humans compress sentences?

Analysis: Compression Rate

A1 A2 A3 Av Ziff-Davis % compressed 88 79 87 84.4 97 CompRate 73.1 79.0 70.0 73.03 47 Similar compression rates for annotators.

James Clarke and Mirella Lapata 23

slide-24
SLIDE 24

Our Work How do humans compress sentences?

Analysis: Compression Rate

A1 A2 A3 Av Ziff-Davis % compressed 88 79 87 84.4 97 CompRate 73.1 79.0 70.0 73.03 47 Similar compression rates for annotators. Ziff-Davis corpus is compressed much more aggressively.

James Clarke and Mirella Lapata 24

slide-25
SLIDE 25

Our Work How do humans compress sentences?

Analysis: Compression Rate

A1 A2 A3 Av Ziff-Davis % compressed 88 79 87 84.4 97 CompRate 73.1 79.0 70.0 73.03 47 Similar compression rates for annotators. Ziff-Davis corpus is compressed much more aggressively. Ziff-Davis corpus may not be comparable with human performance.

James Clarke and Mirella Lapata 25

slide-26
SLIDE 26

Our Work How do humans compress sentences?

Analysis: Spans

0.075 0.15 0.225 0.3 0.375 0.45 10+ 9 8 7 6 5 4 3 2 1 Relative number of drops Length of word spans dropped Distribution of lengths of words spans dropped Ann1 Ann2 Ann3 Ziff-Davis James Clarke and Mirella Lapata 26

slide-27
SLIDE 27

Our Work Do existing methods port well across domains?

Outline

1

Sentence Compression Motivation Previous Work

2

Our Work How do humans compress sentences? Do existing methods port well across domains? What about automatic evaluation measures?

3

Discussion

James Clarke and Mirella Lapata 27

slide-28
SLIDE 28

Our Work Do existing methods port well across domains?

Decision-based Sentence Compression

Compression as a rewriting problem

Decompose rewriting process into sequence of shift-reduce-drop actions (Knight and Marcu, 2002) following an extended shift-reduce parsing paradigm.

Operations

SHIFT transfers the first word from the input list to the stack. ASSIGNTYPE changes the label of trees at the top of the stack. REDUCE combines syntactic trees from the stack to form a new tree. DROP deletes from the input list subsequences of words that correspond to a syntactic constituent.

James Clarke and Mirella Lapata 28

slide-29
SLIDE 29

Our Work Do existing methods port well across domains?

Decision-based Example

(a) G A D e B R d Q Z c C b H a (b) G D e F K b H a

James Clarke and Mirella Lapata 29

slide-30
SLIDE 30

Our Work Do existing methods port well across domains?

Decision-based Example

Stack Input List Operation G H a G A C b G A B Q Z c G A B R d G A D e SHIFT ASSIGNTYPE H

James Clarke and Mirella Lapata 30

slide-31
SLIDE 31

Our Work Do existing methods port well across domains?

Decision-based Example

Stack Input List Operation H a G A C b G A B Q Z c G A B R d G A D e SHIFT ASSIGNTYPE K

James Clarke and Mirella Lapata 31

slide-32
SLIDE 32

Our Work Do existing methods port well across domains?

Decision-based Example

Stack Input List Operation H a K b G A B Q Z c G A B R d G A D e REDUCE 2 F

James Clarke and Mirella Lapata 32

slide-33
SLIDE 33

Our Work Do existing methods port well across domains?

Decision-based Example

Stack Input List Operation F K b H a G A B Q Z c G A B R d G A D e DROP B

James Clarke and Mirella Lapata 33

slide-34
SLIDE 34

Our Work Do existing methods port well across domains?

Decision-based Example

Stack Input List Operation F K b H a G A D e SHIFT ASSIGNTYPE D

James Clarke and Mirella Lapata 34

slide-35
SLIDE 35

Our Work Do existing methods port well across domains?

Decision-based Example

Stack Input List Operation F K b H a D e REDUCE 2 G

James Clarke and Mirella Lapata 35

slide-36
SLIDE 36

Our Work Do existing methods port well across domains?

Decision-based Example

Stack Input List Operation G D e F K b H a

James Clarke and Mirella Lapata 36

slide-37
SLIDE 37

Our Work Do existing methods port well across domains?

Decision-based Compression

Learning cases are automatically generated from a parallel corpus. 99 features are extracted from each learning case. Decision tree model learnt from the data. Model determines which operation to perform given a set of features.

James Clarke and Mirella Lapata 37

slide-38
SLIDE 38

Our Work Do existing methods port well across domains?

Word-based Model

Original Model (Hori, 2002)

Word-based score maximisation model. Score based on corpus knowledge. Maximised for fixed compression length using dynamic programming. Does not require a parallel corpus.

James Clarke and Mirella Lapata 38

slide-39
SLIDE 39

Our Work Do existing methods port well across domains?

Word-based Model

Original Model (Hori, 2002)

Word-based score maximisation model. Score based on corpus knowledge. Maximised for fixed compression length using dynamic programming. Does not require a parallel corpus.

Modifications

Removed the length parameter. Added more linguistic knowledge into the scoring function.

James Clarke and Mirella Lapata 39

slide-40
SLIDE 40

Our Work Do existing methods port well across domains?

Score

arg max

V

S(V) =

M

  • m=1

λII(Vm) + λLL(Vm|Vm−1Vm−2) + λsovSOV(Vm)

James Clarke and Mirella Lapata 40

slide-41
SLIDE 41

Our Work Do existing methods port well across domains?

Score

arg max

V

S(V) =

M

  • m=1

λII(Vm) + λLL(Vm|Vm−1Vm−2) + λsovSOV(Vm)

James Clarke and Mirella Lapata 41

slide-42
SLIDE 42

Our Work Do existing methods port well across domains?

Score

arg max

V

S(V) =

M

  • m=1

λII(Vm) + λLL(Vm|Vm−1Vm−2) + λsovSOV(Vm) Significance score is designed to include important nouns and verbs.

James Clarke and Mirella Lapata 42

slide-43
SLIDE 43

Our Work Do existing methods port well across domains?

Score

arg max

V

S(V) =

M

  • m=1

λII(Vm) + λLL(Vm|Vm−1Vm−2) + λsovSOV(Vm) Significance score is designed to include important nouns and verbs. Language model’s task is to preserve grammaticality.

James Clarke and Mirella Lapata 43

slide-44
SLIDE 44

Our Work Do existing methods port well across domains?

Score

arg max

V

S(V) =

M

  • m=1

λII(Vm) + λLL(Vm|Vm−1Vm−2) + λsovSOV(Vm) SOV(wi) = freq if wi in subject, object or verb role λdefault

  • therwise

Significance score is designed to include important nouns and verbs. Language model’s task is to preserve grammaticality. Subjects, objects and verbs should not be dropped. Words in other syntactic roles can be considered for removal.

James Clarke and Mirella Lapata 44

slide-45
SLIDE 45

Our Work Do existing methods port well across domains?

Score

arg max

V

S(V) =

M

  • m=1

λII(Vm) + λLL(Vm|Vm−1Vm−2) + λsovSOV(Vm) SOV(wi) = freq if wi in subject, object or verb role λdefault

  • therwise

Significance score is designed to include important nouns and verbs. Language model’s task is to preserve grammaticality. Subjects, objects and verbs should not be dropped. Words in other syntactic roles can be considered for removal.

James Clarke and Mirella Lapata 45

slide-46
SLIDE 46

Our Work Do existing methods port well across domains?

Comparison

Experimental Setup

Compare decision-tree and word-based model on Ziff-Davis and Broadcast news corpus. Evaluate against human judgements:

Sixty unpaid volunteers. Instructions and examples define the compression task. Rate each sentence on a five point scale. Take into account information retained and grammaticality.

James Clarke and Mirella Lapata 46

slide-47
SLIDE 47

Our Work Do existing methods port well across domains?

Results

Broadcast News CompR Ratings Decision-tree 0.55 2.04 Word-based 0.72 2.78 gold standard 0.71 3.87 Ziff-Davis CompR Ratings Decision-tree 0.58 2.34 Word-based 0.60 2.43 gold standard 0.54 3.53

James Clarke and Mirella Lapata 47

slide-48
SLIDE 48

Our Work Do existing methods port well across domains?

Results

Broadcast News CompR Ratings Decision-tree 0.55 2.04 Word-based 0.72 2.78 gold standard 0.71 3.87 Ziff-Davis CompR Ratings Decision-tree 0.58 2.34 Word-based 0.60 2.43 gold standard 0.54 3.53 Decision-tree sensitive to training data.

James Clarke and Mirella Lapata 48

slide-49
SLIDE 49

Our Work Do existing methods port well across domains?

Results

Broadcast News CompR Ratings Decision-tree 0.55 2.04 Word-based 0.72 2.78 gold standard 0.71 3.87 Ziff-Davis CompR Ratings Decision-tree 0.58 2.34 Word-based 0.60 2.43 gold standard 0.54 3.53 Rebuilds original sentence 75% of the time.

James Clarke and Mirella Lapata 49

slide-50
SLIDE 50

Our Work Do existing methods port well across domains?

Results

Broadcast News CompR Ratings Decision-tree 0.55 2.04 Word-based 0.72 2.78 gold standard 0.71 3.87 Ziff-Davis CompR Ratings Decision-tree 0.58 2.34 Word-based 0.60 2.43 gold standard 0.54 3.53 Word-based model produces compression rate similar to gold-standard.

James Clarke and Mirella Lapata 50

slide-51
SLIDE 51

Our Work Do existing methods port well across domains?

Results

Broadcast News CompR Ratings Decision-tree 0.55 2.04 Word-based 0.72 2.78 gold standard 0.71 3.87 Ziff-Davis CompR Ratings Decision-tree 0.58 2.34 Word-based 0.60 2.43 gold standard 0.54 3.53 Word-based model sig. better than decision-tree; both sig. worse than humans

James Clarke and Mirella Lapata 51

slide-52
SLIDE 52

Our Work Do existing methods port well across domains?

Results

Broadcast News CompR Ratings Decision-tree 0.55 2.04 Word-based 0.72 2.78 gold standard 0.71 3.87 Ziff-Davis CompR Ratings Decision-tree 0.58 2.34 Word-based 0.60 2.43 gold standard 0.54 3.53 No sig. difference between models; both sig. worse than humans.

James Clarke and Mirella Lapata 52

slide-53
SLIDE 53

Our Work What about automatic evaluation measures?

Outline

1

Sentence Compression Motivation Previous Work

2

Our Work How do humans compress sentences? Do existing methods port well across domains? What about automatic evaluation measures?

3

Discussion

James Clarke and Mirella Lapata 53

slide-54
SLIDE 54

Our Work What about automatic evaluation measures?

Simple String Accuracy

Based on the edit distance between two strings (Bangalore, Rambow, and Whittaker, 2000). Simple String Accuracy (SSA) = (1 − I + D + S R ) I = Insertions D = Deletions S = Substitutions R = Length of gold-standard

James Clarke and Mirella Lapata 54

slide-55
SLIDE 55

Our Work What about automatic evaluation measures?

Relation-based Evaluation

Proposed by Riezler et al. (2003). Compares the grammatical relations between compression and gold-standard. This allows us “to measure the semantic aspects of summarisation quality in terms of grammatical-functional information” Use standard IR measure of F-score.

James Clarke and Mirella Lapata 55

slide-56
SLIDE 56

Our Work What about automatic evaluation measures?

Correlation Analysis

Measure Ziff-Davis Broadcast News SSA 0.171 0.348* F-score 0.575** 0.532** IntSubj 0.679 0.746 *p < 0.05 **p < 0.01 SSA does not correlate with human judgements on both corpora. Relation F-score correlates significantly with human ratings on both corpora.

James Clarke and Mirella Lapata 56

slide-57
SLIDE 57

Our Work What about automatic evaluation measures?

Example System Compressions

James Clarke and Mirella Lapata 57

slide-58
SLIDE 58

Our Work What about automatic evaluation measures?

Example System Compressions

  • :

Apparently Fergie very much wants to have a career in television. d: A career in television. w: Fergie wants to have a career in television. g: Fergie wants a career in television.

James Clarke and Mirella Lapata 58

slide-59
SLIDE 59

Our Work What about automatic evaluation measures?

Example System Compressions

  • :

Many debugging features, including user-defined break points and variable-watching and message-watching windows, have been added. d: Many debugging features. w: Debugging features, and windows, have been added. g: Many debugging features have been added.

James Clarke and Mirella Lapata 59

slide-60
SLIDE 60

Our Work What about automatic evaluation measures?

Example System Compressions

  • :

As you said, the president has just left for a busy three days of speeches and fundraising in Nevada, California and New Mexico. d: As you said, the president has just left for a busy three days. w: You said, the president has left for three days of speeches and fundraising in Nevada, California and New Mexico. g: The president left for three days of speeches and fundraising in Nevada, California and New Mexico.

James Clarke and Mirella Lapata 60

slide-61
SLIDE 61

Discussion

Discussion

Findings

Decision-tree model is sensitive to the style of training data and does not generalise to our new corpus. Word-based model performs significantly better than decision-tree

  • n broadcast news.

Both systems are comparable on written text. F-Score correlates with human judgements.

James Clarke and Mirella Lapata 61

slide-62
SLIDE 62

Discussion

Future Work

Sentence Compression as Optimisation

Underlying model: Trigram language model. Decoding: Integer Programming. Advantage: Include linguistically motivated constraints.

Compressions are structurally and semantically valid.

See my poster today!

James Clarke and Mirella Lapata 62