Discourse and Summarization
- Prof. Sameer Singh
CS 295: STATISTICAL NLP WINTER 2017
March 16, 2017
Based on slides from Dan Jurafsky, Jacob Eisenstein, and everyone else they copied from.
Discourse and Summarization Prof. Sameer Singh CS 295: STATISTICAL - - PowerPoint PPT Presentation
Discourse and Summarization Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 March 16, 2017 Based on slides from Dan Jurafsky, Jacob Eisenstein, and everyone else they copied from. Upcoming Final report due in a week: March 20, 2017
March 16, 2017
Based on slides from Dan Jurafsky, Jacob Eisenstein, and everyone else they copied from.
Project
2 CS 295: STATISTICAL NLP (WINTER 2017)
3
CS 295: STATISTICAL NLP (WINTER 2017)
4
CS 295: STATISTICAL NLP (WINTER 2017)
5
Coreference Coherence Relations Resolving entities and events. What makes the text coherent? Rhetorical and narrative links between units
CS 295: STATISTICAL NLP (WINTER 2017)
6
Coreference Coherence Relations Resolving entities and events. What makes the text coherent? Rhetorical and narrative links between units
CS 295: STATISTICAL NLP (WINTER 2017)
7 CS 295: STATISTICAL NLP (WINTER 2017)
8 CS 295: STATISTICAL NLP (WINTER 2017)
9
A meaningless sentence can be grammatical..
Colorless green ideas sleep furiously
The discourse equivalent of grammaticality is coherence Can a coherent text be without meaning?
CS 295: STATISTICAL NLP (WINTER 2017)
10 CS 295: STATISTICAL NLP (WINTER 2017)
11
The second reason for the five-paragraph theme is that it makes you focus on a single topic. Some people start writing on the usual topic, like TV commercials, and they wind up all over the place, talking about where TV came from or capitalism or health foods or whatever. But with
get beyond your original idea, like commercials are a good source of information about products. You give your three examples, and zap! you’re done. This is another way the five-paragraph theme keeps you from thinking too much.
CS 295: STATISTICAL NLP (WINTER 2017)
12 CS 295: STATISTICAL NLP (WINTER 2017)
13 CS 295: STATISTICAL NLP (WINTER 2017)
14 CS 295: STATISTICAL NLP (WINTER 2017)
15
CS 295: STATISTICAL NLP (WINTER 2017)
16
CS 295: STATISTICAL NLP (WINTER 2017)
17
CS 295: STATISTICAL NLP (WINTER 2017)
18
CS 295: STATISTICAL NLP (WINTER 2017)
19
CS 295: STATISTICAL NLP (WINTER 2017)
20
CS 295: STATISTICAL NLP (WINTER 2017)
21
Segmentation Zoning/Ordering Centering/Salience
world’s learning would be egregious.
the world is how to reduce college costs.
due to the luxuries students now expect.
result of athletics.
CS 295: STATISTICAL NLP (WINTER 2017)
22
Segmentation` Zoning/Ordering Centering/Salience
world’s learning would be egregious.
the world is how to reduce college costs.
due to the luxuries students now expect.
result of athletics.
CS 295: STATISTICAL NLP (WINTER 2017)
23
Segmentation Zoning/Ordering Centering/Salience
world’s learning would be egregious.
the world is how to reduce college costs.
due to the luxuries students now expect.
result of athletics.
world’s learning would be egregious.
the world is how to reduce college costs.
due to the luxuries students now expect.
result of athletics.
CS 295: STATISTICAL NLP (WINTER 2017)
24
Segmentation Zoning/Ordering Centering/Salience
world’s learning would be egregious.
the world is how to reduce college costs.
due to the luxuries students now expect.
result of athletics.
world’s learning would be egregious.
the world is how to reduce college costs.
due to the luxuries students now expect.
result of athletics.
world’s learning would be egregious.
the world is how to reduce college costs.
due to the luxuries students now expect.
result of athletics.
CS 295: STATISTICAL NLP (WINTER 2017)
25
Sentence Ordering When generating summaries, reorder till sentences are coherent. Readability Assessment Is a piece of text easily readable?
CS 295: STATISTICAL NLP (WINTER 2017)
26
Coreference Coherence Relations Resolving entities and events. What makes the text coherent? Rhetorical and narrative links between units
CS 295: STATISTICAL NLP (WINTER 2017)
27 CS 295: STATISTICAL NLP (WINTER 2017)
28 CS 295: STATISTICAL NLP (WINTER 2017)
29
CS 295: STATISTICAL NLP (WINTER 2017)
Goal: produce an abridged version of a text that contains information that is important or relevant to a user. Summarization Applications
30 CS 295: STATISTICAL NLP (WINTER 2017)
31 CS 295: STATISTICAL NLP (WINTER 2017)
construct the answer
32 CS 295: STATISTICAL NLP (WINTER 2017)
33 CS 295: STATISTICAL NLP (WINTER 2017)
1. content selection: choose sentences to extract from the document 2. information ordering: choose an order to place them in the summary 3. sentence realization: clean up the sentences
34
Document
Sentence Segmentation Sentence Extraction
All sentences from documents Extracted sentences
Information Ordering Sentence Realization
Summary
Content Selection
Sentence Simplification
CS 295: STATISTICAL NLP (WINTER 2017)
Rajam, 28, an artist who was living at the time in Philadelphia, found the inspiration in the back of city magazines.
Rebels agreed to talks with government officials, international observers said Tuesday.
The commercial fishing restrictions in Washington will not be lifted unless the salmon population increases [PP to a sustainable number]]
“For example”, “On the other hand”, “As a matter
CS 295: STATISTICAL NLP (WINTER 2017) 35
Zajic et al. (2007), Conroy et al. (2006), Vanderwende et al. (2007)
Simplest method: parse sentences, use rules to decide which modifiers to prune (more recently a wide variety of machine-learning methods)
Intrinsic metric for automatically evaluating summaries
Given a document D, and an automatic summary X:
36
ROUGE − 2 = min(count(i, X),count(i,S))
bigrams i∈S
s∈{RefSummaries}
count(i,S)
bigrams i∈S
s∈{RefSummaries}
CS 295: STATISTICAL NLP (WINTER 2017)
37
CS 295: STATISTICAL NLP (WINTER 2017)
38 CS 295: STATISTICAL NLP (WINTER 2017)
39 CS 295: STATISTICAL NLP (WINTER 2017)
40 CS 295: STATISTICAL NLP (WINTER 2017)
CS 295: STATISTICAL NLP (WINTER 2017) 41
42 CS 295: STATISTICAL NLP (WINTER 2017)
43 CS 295: STATISTICAL NLP (WINTER 2017)
Document Document Document Docume nt Docume nt Docume nt Docume nt Docume nt
Question Processing Passage Retrieval
Query Formulation Answer Type Detection
Question Passage Retrieval Document Retrieval
Answer Processing
Answer
passages
Indexing
Relevant Docs
Document Document Document Document Sentence Segmentation Sentence Extraction All sentences from documents Extracted sentences
Information Ordering Sentence Realization
Summary
Content Selection
Sentence Simplification
CS 295: STATISTICAL NLP (WINTER 2017) 44
CS 295: STATISTICAL NLP (WINTER 2017) 45
46 CS 295: STATISTICAL NLP (WINTER 2017)