A Class of Submodular Functions for Document Summarization
Hui Lin, Jeff Bilmes
University of Washington, Seattle
- Dept. of Electrical Engineering
June 20, 2011
Lin and Bilmes Submodular Summarization June 20, 2011 1 / 29
A Class of Submodular Functions for Document Summarization Hui Lin, - - PowerPoint PPT Presentation
A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 20, 2011 Lin and Bilmes Submodular Summarization June 20, 2011 1 / 29 Extractive
University of Washington, Seattle
Lin and Bilmes Submodular Summarization June 20, 2011 1 / 29
Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29
Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29
Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29
Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29
Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29
Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29
Lin and Bilmes Submodular Summarization June 20, 2011 2 / 29
Background on Submodularity
1
2
3
4
5
6
Lin and Bilmes Submodular Summarization June 20, 2011 3 / 29
Background on Submodularity
Lin and Bilmes Submodular Summarization June 20, 2011 4 / 29
Background on Submodularity
f (R) = f ( ) = 3 f (S) = f ( ) = 4
Lin and Bilmes Submodular Summarization June 20, 2011 5 / 29
Background on Submodularity
f (R) = f ( ) = 3 f (S) = f ( ) = 4 f (R + k) = f ( ) = 4 + f ( + k) = f ( ) = 4 + S
Lin and Bilmes Submodular Summarization June 20, 2011 5 / 29
Background on Submodularity
Lin and Bilmes Submodular Summarization June 20, 2011 6 / 29
Background on Submodularity
Lin and Bilmes Submodular Summarization June 20, 2011 6 / 29
Background on Submodularity
Lin and Bilmes Submodular Summarization June 20, 2011 6 / 29
Background on Submodularity
Lin and Bilmes Submodular Summarization June 20, 2011 6 / 29
Problem Setup and Algorithm
1
2
3
4
5
6
Lin and Bilmes Submodular Summarization June 20, 2011 7 / 29
Problem Setup and Algorithm
Lin and Bilmes Submodular Summarization June 20, 2011 8 / 29
Problem Setup and Algorithm
ci: cost (e.g., the number of words in sentence i), b: the budget (e.g., the largest length allowed), knapsack constraint:
i∈S ci ≤ b.
Lin and Bilmes Submodular Summarization June 20, 2011 8 / 29
Problem Setup and Algorithm
ci: cost (e.g., the number of words in sentence i), b: the budget (e.g., the largest length allowed), knapsack constraint:
i∈S ci ≤ b.
S⊆V
Lin and Bilmes Submodular Summarization June 20, 2011 8 / 29
Problem Setup and Algorithm
Lin and Bilmes Submodular Summarization June 20, 2011 9 / 29
Problem Setup and Algorithm
We choose next element with largest ratio of gain over scaled cost: k ← argmax
i∈U
f (G ∪ {i}) − f (G) (ci)r . (2)
Lin and Bilmes Submodular Summarization June 20, 2011 9 / 29
Problem Setup and Algorithm
We choose next element with largest ratio of gain over scaled cost: k ← argmax
i∈U
f (G ∪ {i}) − f (G) (ci)r . (2) Scalability: the argmax above can be solved by O(log n) calls of f , thanks to submodularity Integer linear programming (ILP) takes 17 hours vs. greedy which takes < 1 second!!
Lin and Bilmes Submodular Summarization June 20, 2011 9 / 29
Problem Setup and Algorithm
20 40 60 80 100 120 140 2 4 6 8 10 12
r=0 r=0.5 r=1 r=1.5
number of sentences in the summary e u l a v n
t c n u f e v i t c e j b O
exact solution
Figure: The plots show the achieved objective function value as the number of selected sentences grows. The plots stop when in each case adding more sentences violates the budget.
Lin and Bilmes Submodular Summarization June 20, 2011 10 / 29
Submodularity in Summarization
1
2
3
4
5
6
Lin and Bilmes Submodular Summarization June 20, 2011 11 / 29
Submodularity in Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 12 / 29
Submodularity in Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 12 / 29
Submodularity in Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 12 / 29
Submodularity in Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 13 / 29
Submodularity in Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 13 / 29
Submodularity in Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 13 / 29
Submodularity in Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 14 / 29
Submodularity in Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 14 / 29
Submodularity in Summarization
8 9 10 11 12 13 14 15 0.2 0.4 0.6 0.8 1 ROUGE-2 Recall (%) (scale on cost) greedy algorithm Human
r
Figure: Oracle experiments on DUC-05. The red dash line indicates the best ROUGE-2 recall score of human summaries (summary with ID C).
Lin and Bilmes Submodular Summarization June 20, 2011 14 / 29
New Class of Submodular Functions for Document Summarization
1
2
3
4
5
6
Lin and Bilmes Submodular Summarization June 20, 2011 15 / 29
New Class of Submodular Functions for Document Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 16 / 29
New Class of Submodular Functions for Document Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 16 / 29
New Class of Submodular Functions for Document Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 17 / 29
New Class of Submodular Functions for Document Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 17 / 29
New Class of Submodular Functions for Document Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 17 / 29
New Class of Submodular Functions for Document Summarization
Lin and Bilmes Submodular Summarization June 20, 2011 18 / 29
New Class of Submodular Functions for Document Summarization
K
j∈Pi∩S
Lin and Bilmes Submodular Summarization June 20, 2011 19 / 29
New Class of Submodular Functions for Document Summarization
1 2 3 4
Lin and Bilmes Submodular Summarization June 20, 2011 20 / 29
New Class of Submodular Functions for Document Summarization
1 2 3 4
Lin and Bilmes Submodular Summarization June 20, 2011 20 / 29
New Class of Submodular Functions for Document Summarization
Query-independent (generic) case: rj = 1 N
wi,j. Query-dependent case, given a query Q, rj = β 1 N
wi,j + (1 − β)rj,Q where rj,Q measures the relevance between j and query Q.
Lin and Bilmes Submodular Summarization June 20, 2011 21 / 29
New Class of Submodular Functions for Document Summarization
Query-independent (generic) case: rj = 1 N
wi,j. Query-dependent case, given a query Q, rj = β 1 N
wi,j + (1 − β)rj,Q where rj,Q measures the relevance between j and query Q.
K1
i
∩S
K2
i
∩S
Lin and Bilmes Submodular Summarization June 20, 2011 21 / 29
Experimental Results
1
2
3
4
5
6
Lin and Bilmes Submodular Summarization June 20, 2011 22 / 29
Experimental Results
Table: ROUGE-1 recall (R) and F-measure (F) results (%) on DUC-04. DUC-03 was used as development set.
DUC-04 R F L1(S) 39.03 38.65 R1(S) 38.23 37.81 L1(S) + λR1(S) 39.35 38.90 Takamura and Okumura (2009) 38.50
39.07
Best system in DUC-04 (peer 65) 38.28 37.94
Lin and Bilmes Submodular Summarization June 20, 2011 23 / 29
Experimental Results
Table: ROUGE-1 recall (R) and F-measure (F) results (%) on DUC-04. DUC-03 was used as development set.
DUC-04 R F L1(S) 39.03 38.65 R1(S) 38.23 37.81 L1(S) + λR1(S) 39.35 38.90 Takamura and Okumura (2009) 38.50
39.07
Best system in DUC-04 (peer 65) 38.28 37.94
Lin and Bilmes Submodular Summarization June 20, 2011 23 / 29
Experimental Results
Lin and Bilmes Submodular Summarization June 20, 2011 24 / 29
Experimental Results
Table: ROUGE-2 recall (R) and F-measure (F) results (%)
R F L1(S) + λRQ(S) 7.82 7.72 L1(S) + 3
κ=1 λκRQ,κ(S)
8.19 8.13 Daum´ e III and Marcu (2006) 6.98
8.02
7.44 7.43
Lin and Bilmes Submodular Summarization June 20, 2011 25 / 29
Experimental Results
Table: ROUGE-2 recall (R) and F-measure (F) results (%)
R F L1(S) + λRQ(S) 7.82 7.72 L1(S) + 3
κ=1 λκRQ,κ(S)
8.19 8.13 Daum´ e III and Marcu (2006) 6.98
8.02
7.44 7.43
Lin and Bilmes Submodular Summarization June 20, 2011 25 / 29
Experimental Results
Table: ROUGE-2 recall (R) and F-measure (F) results (%)
R F L1(S) + λRQ(S) 9.75 9.77 L1(S) + 3
κ=1 λκRQ,κ(S)
9.81 9.82 Celikyilmaz and Hakkani-t¨ ur (2010) 9.10
9.30
9.51 9.51
Lin and Bilmes Submodular Summarization June 20, 2011 26 / 29
Experimental Results
Table: ROUGE-2 recall (R) and F-measure (F) results (%)
R F L1(S) + λRQ(S) 9.75 9.77 L1(S) + 3
κ=1 λκRQ,κ(S)
9.81 9.82 Celikyilmaz and Hakkani-t¨ ur (2010) 9.10
9.30
9.51 9.51
Lin and Bilmes Submodular Summarization June 20, 2011 26 / 29
Experimental Results
Table: ROUGE-2 recall (R) and F-measure (F) results (%)
R F L1(S) + λRQ(S) 12.18 12.13 L1(S) + 3
κ=1 λκRQ,κ(S)
12.38 12.33 Toutanova et al. (2007) 11.89 11.89 Haghighi and Vanderwende (2009) 11.80
ur (2010) 11.40
12.45 12.29
Lin and Bilmes Submodular Summarization June 20, 2011 27 / 29
Experimental Results
Table: ROUGE-2 recall (R) and F-measure (F) results (%)
R F L1(S) + λRQ(S) 12.18 12.13 L1(S) + 3
κ=1 λκRQ,κ(S)
12.38 12.33 Toutanova et al. (2007) 11.89 11.89 Haghighi and Vanderwende (2009) 11.80
ur (2010) 11.40
12.45 12.29
Lin and Bilmes Submodular Summarization June 20, 2011 27 / 29
Summary
1
2
3
4
5
6
Lin and Bilmes Submodular Summarization June 20, 2011 28 / 29
Summary
Lin and Bilmes Submodular Summarization June 20, 2011 29 / 29