Summarization
Chan Young Park – CMU Slides adapted from: Dan Jurafsky – Stanford Piji Li – Tencent AI Lab
Algorithms for NLP Summarization Chan Young Park CMU Slides - - PowerPoint PPT Presentation
Algorithms for NLP Summarization Chan Young Park CMU Slides adapted from: Dan Jurafsky Stanford Piji Li Tencent AI Lab Text Summarization Goal : produce an abridged version of a text that contains information that is important
Chan Young Park – CMU Slides adapted from: Dan Jurafsky – Stanford Piji Li – Tencent AI Lab
2
3
4
5
6
7
8
9
10
11
12
13
How to detect salient words/sentences?
14
15
16
17
18
19
IBM Journal of Research and Development. 2:2, 159-165.
20
Conroy, Schlesinger, and O’Leary 2006 (could learn more complex weights)
21
Rada Mihalcea, ACL 2004
22
Example of a CNN News Article With Highlights from cnn.com
a labeled training set of good summaries for each document
the sentences in the document with sentences in the summary
position (first sentence?)
length of sentence
word informativeness, cue phrases
cohesion
a binary classifier
(put sentence in summary? yes or no)
hard to get labeled training
alignment difficult
performance not better than unsupervised algorithms
Unsupervised content selection is more common
23
24
Have N humans produce a set of reference summaries of D
Run system, giving automatic summary X
What percentage of the bigrams from the reference summaries appear in X?
25
Lin and Hovy 2003
26
27
Rush et al., EMNLP 2015
28
Rush et al., EMNLP 2015
29
Nallapati et al., CoNLL 2016
30
Nallapati et al., CoNLL 2016
31
32
33
See et al., ACL 2017
34
See et al., ACL 2017
35
36
37
38
39
40
41
42