1
Multiple Alternative Sentence Compressions (MASC)
A Framework for Automatic Summarization
Nitin Madnani, David Zajic, Bonnie Dorr
Necip Fazil Ayan, Jimmy Lin University of Maryland, College Park
Outline Problem Description MASC Architecture MASC Results - - PowerPoint PPT Presentation
Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David Zajic, Bonnie Dorr Necip Fazil Ayan, Jimmy Lin University of Maryland, College Park 1 Outline Problem Description MASC
1
Necip Fazil Ayan, Jimmy Lin University of Maryland, College Park
2
3
4
5
Sentence Filtering Sentence Compression Candidate Selection Sentences Candidates Task-Specific Features (e.g. query) Documents Summary
HMM Hedge Trimmer Topiary
(Zajic et al., 2005) (Zajic et al., 2006)
6
Part of Speech Tagger1 HMM Hedge Sentence Sentence with Verb Tags
VERB VERB
Compressions
Headline Language Model Story Language Model
Language models based on 242,918 AP headlines and stories from Tipster Corpus
1TreeTagger (Schmid, 1994)
7
8
Entity Tagger1 Trimmer Sentence
Sentence with Entity Tags
PERSON TIME EXPR
Compressions
Parser2 Parse
1BBN IdentiFinder (Bikel et al., 1999) 2Charniak Parser (Charniak, 2000)
9
10
The latest flood crest passed Chongqing in southwest China and waters were rising in Yichang on the middle reaches of the Yangtze state television reported Sunday S1 S S2 CC S3 NP VP
11
Illegal fireworks injured hundreds
and started six fires S NP VP CC VP VP
12
Topiary Sentence
Candidates
Topic Assignment1 Document Document Corpus Topic Terms Compressions
Trimmer
1BBN Unsupervised Topic
Detection
13
14
Relevance & Centrality Scorer1 Sentence Selector Candidates + Features
Document Document Set Candidates + More Features
Query
?
Feature Weights Summary
Cull & Rescore
1Uniform Retrieval Architecture
(URA), UMD’s software infrastructure for IR tasks.
15
16
DUC2004 Test Data, Rouge recall with unigrams
0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 First 75 UTD Topics HMM Hedge Trimmer Topiary HMM Hedge Trimmer Topiary
Rouge 1 Recall
No MASC MASC
17
DUC2006 Test Data
0.05 0.055 0.06 0.065 0.07 0.075 No Compression HMM Hedge Trimmer Rouge 2 Recall
18
19
c1 c2 ck
. . . Initialize: S = {}, H = {} C ← current k-best candidates for c ∈ C
ΔROUGE(c) = R2R(S∪{c}) - R2R(S) Add hypothesis to H
S ← S ∪ {c1} Update remaining candidates Repeat unless |S| > L wopt ← powellROUGE(H, w0)
Summary(S) Hypotheses(H) C
Δ1 Δ2 Δk
. . .
20
0.154 0.126 SU-4 0.104 0.081 2 0.403 0.363 1
ΔROUGE (k=10)
Manual
ROUGE
DUC2007 data, all differences significant at p < 0.05
Manual : Feature weights optimized manually to maximize ROUGE-2 Recall on the final system output Key Insights for ΔROUGE optimization:
21
S = Summary, L = General English language
wc
Other documents in the same cluster are used to represent the general language
REDUNDANT NON-REDUNDANT
22
w S
23
c(e1,e2) = c(e1, f )c( f ,e2)
f
上升 ||| increased ||| 2.0 上升 ||| uplifted ||| 1.0 increased ||| climbed ||| 2.0 climbed ||| uplifted ||| 1.0 . . . . . . uplifted ||| increased ||| 2.0
24
25
26
27
28