Summarization
Chan Young Park – CMU Slides adapted from: Dan Jurafsky – Stanford Piji Li – Tencent AI Lab
Algorithms for NLP Summarization Chan Young Park CMU Slides - - PowerPoint PPT Presentation
Algorithms for NLP Summarization Chan Young Park CMU Slides adapted from: Dan Jurafsky Stanford Piji Li Tencent AI Lab Text Summarization Goal : produce an abridged version of a text that contains information that is important
Chan Young Park – CMU Slides adapted from: Dan Jurafsky – Stanford Piji Li – Tencent AI Lab
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
IBM Journal of Research and Development. 2:2, 159-165.
19
Conroy, Schlesinger, and O’Leary 2006 (could learn more complex weights)
20
Rada Mihalcea, ACL 2004
a labeled training set of good summaries for each document
the sentences in the document with sentences in the summary
position (first sentence?)
length of sentence
word informativeness, cue phrases
cohesion
a binary classifier (put sentence in summary? yes or no)
hard to get labeled training
alignment difficult
performance not better than unsupervised algorithms
Unsupervised content selection is more common
21
22
Have N humans produce a set of reference summaries of D
Run system, giving automatic summary X
What percentage of the bigrams from the reference summaries appear in X?
23
Lin and Hovy 2003
24
25
Rush et al., EMNLP 2015
26
Nallapati et al., CoNLL 2016
27
Nallapati et al., CoNLL 2016
28
29
30
31
32
33
34
35
36
37