MICHAEL PAUL* CHENGXIANG ZHAI ROXANA GIRJU
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
* NOW AT JOHNS HOPKINS UNIVERSITY
Summarizing Contrastive Viewpoints in Opinionated Text
Saturday, October 9, 2010
Summarizing Contrastive Viewpoints in Opinionated Text MICHAEL - - PowerPoint PPT Presentation
Summarizing Contrastive Viewpoints in Opinionated Text MICHAEL PAUL* CHENGXIANG ZHAI ROXANA GIRJU UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN * NOW AT JOHNS HOPKINS UNIVERSITY Saturday, October 9, 2010 Summarizing Contrastive Viewpoints
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
* NOW AT JOHNS HOPKINS UNIVERSITY
Saturday, October 9, 2010
948 verbatim responses from Gallup opinion phone survey 45% for, 48% against (March 2010)
Saturday, October 9, 2010
Editorials about the Israel-Palestine conflict Introduced by Lin et al. (2006) 312 articles by Israeli authors, 282 articles by Palestinian authors
Saturday, October 9, 2010
Saturday, October 9, 2010
No alignment of sentences in “macro” summary
Saturday, October 9, 2010
Saturday, October 9, 2010
Micro-contrastive summarization Pairs of contradictory sentences e.g., “the battery life is pretty good” vs “battery life sucks”
Saturday, October 9, 2010
Macro-contrastive summarization
e.g. product reviews for two different products; summarize
Saturday, October 9, 2010
Unsupervised modeling of viewpoints
Summarize in a way to highlight contrast We’ll describe this stage first
Saturday, October 9, 2010
Comparative LexRank; graph-based approach
Healthcare corpus
Unsupervised viewpoint clustering
Bitterlemons corpus
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
If z = 0, jump to same viewpoint If z = 1, jump to opposite viewpoint
Controls which set of nodes can be transitioned to Multiply sim by 0 if between a node you can’t jump to
Saturday, October 9, 2010
λ = 1
Equivalent to applying LexRank to viewpoints independently λ = 0.5
Even tradeoff between representation of viewpoint and contrast
λ = 0
A viewpoint’s summary will contain sentences that look like the
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Comparative LexRank; graph-based approach
Healthcare corpus
Unsupervised viewpoint clustering
Bitterlemons corpus
Saturday, October 9, 2010
Prominent reasons found in data as analyzed by humans
Source: http://www.gallup.com/poll/126521/Favor-Oppose-Obama-Healthcare-Plan.aspx
Saturday, October 9, 2010
Recall-based evaluation metric compares against gold summary Modification: scale term counts by prominence in data
Saturday, October 9, 2010
Always jump to same viewpoint
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Comparative LexRank; graph-based approach
Healthcare corpus
Unsupervised viewpoint clustering
Bitterlemons corpus
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Saturday, October 9, 2010
Latent Dirichlet Allocation (LDA)
Saturday, October 9, 2010
Each word might depend on the viewpoint/sentiment as well as
Words may depend on both, one or the other, or neither
Saturday, October 9, 2010
Dependency features make a big difference!
Median clustering accuracy (200 trials): Bag of words:
Best feature set: 70.7%
Median clustering accuracy (50 trials): Bag of words:
Best feature set: 88.1%
Saturday, October 9, 2010
Comparative LexRank; graph-based approach
Healthcare corpus
Unsupervised viewpoint clustering
Bitterlemons corpus
Saturday, October 9, 2010
Use dependency features Repeat 10 times, take model with best data likelihood
λ = 0.5
Summary length = 6 sentences
Measures clustering accuracy and summarization salience Randomly partition each summary in half for each judge
Saturday, October 9, 2010
11 of 12 sentences clustered correctly by TAM
correctly labeled 78% of the summaries
More contrast (smaller lambda) worsens this
Saturday, October 9, 2010
Achieved large gains in clustering accuracy by using simple but
Showed that rich feature sets can be used with topic models
Introduced Comparative LexRank algorithm Same algorithm can be used for macro-level and micro-level
Our random walk formulation based on class membership
Saturday, October 9, 2010
We don’t care about the order of the sentences Simple approach: At each step, add the sentence with the highest score as long as
Repeat until S exceeds user-specified length limit
Saturday, October 9, 2010
Created gold summary by having annotators identity
Saturday, October 9, 2010
Saturday, October 9, 2010