Opinion Integration Through Opinion Integration Through Semi - - PowerPoint PPT Presentation

opinion integration through opinion integration through
SMART_READER_LITE
LIVE PREVIEW

Opinion Integration Through Opinion Integration Through Semi - - PowerPoint PPT Presentation

Opinion Integration Through Opinion Integration Through Semi supervised Topic Modeling supervised Topic Modeling Semi Yue Lu and Chengxiang Zhai University of Illinois at Urbana Champaign 1 Why Opinion Integration? Why Opinion


slide-1
SLIDE 1

1

Opinion Integration Through Opinion Integration Through Semi Semi‐ ‐supervised Topic Modeling supervised Topic Modeling

Yue Lu and Chengxiang Zhai University of Illinois at Urbana‐Champaign

slide-2
SLIDE 2

2

Why Opinion Integration? Why Opinion Integration?

  • Web 2.0 huge amount of opinions
  • What have been said about Hillary Clinton?

190,451 posts 4,773,658 results

How to digest all?

slide-3
SLIDE 3

3 4,773,658 results

Two Kinds of Opinions Two Kinds of Opinions

Expert opinions

  • CNET editor’s review
  • Wikipedia article
  • Well‐structured
  • Easy to access
  • Maybe biased
  • Outdated soon

190,451 posts

Ordinary opinions

  • Forum discussions
  • Blog articles
  • fragmental
  • Hard to access
  • Represent the majority
  • Up to date

How to benefit from both?

slide-4
SLIDE 4

4

Research Questions Research Questions

  • How do we formalize the problem of opinion

integration?

  • How do we solve the problem in a general

way?

  • How do we evaluate it?
slide-5
SLIDE 5

5

Problem Definition Problem Definition

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive

Design Battery Price.. Design Battery Price..

Topic: iPod Expert review with aspects Weblogs

Integrated Summary

Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend..

Similar

  • pinions

Supplementary

  • pinions

Output

Review Aspects Extra Aspects

Input

slide-6
SLIDE 6

6

Challenges Challenges

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive

Integrated Summary

Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend..

Review Aspects Extra Aspects Similar

  • pinions

Supplementary

  • pinions
  • 1. How

to align

  • pinions

to expert aspects

  • r extra

aspects?

  • 1. How

to align

  • pinions

to expert aspects

  • r extra

aspects?

  • 2. How to distinguish

similar opinions with supplementary ones?

  • 2. How to distinguish

similar opinions with supplementary ones?

  • 3. How to extract

representative

  • pinions with support?
  • 3. How to extract

representative

  • pinions with support?
slide-7
SLIDE 7

7

Two Major Steps Two Major Steps

  • Step 1: opinion sentences retrieval
  • Step 2: opinion integration using probabilistic

topic models (3 subtasks)

Weblogs on iPod Query = “iPod” General Weblogs

slide-8
SLIDE 8

8

Subtask 1: Aspect Alignment Subtask 1: Aspect Alignment

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Weblogs Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend..

Align opinion sentences to aspects

Review Aspects Extra Aspects

slide-9
SLIDE 9

9

Subtask 2: Opinions Separation Subtask 2: Opinions Separation

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend..

Separate sim opinions from supp ones

Review Aspects Extra Aspects

slide-10
SLIDE 10

10

Subtask 3: Opinion Summary Subtask 3: Opinion Summary

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend..

  • Representative

sentence:

  • Support = 3

Representative Opinion (RO)

Summarize each block with representative sentences and support

Review Aspects Extra Aspects

slide-11
SLIDE 11

11

Basic PLSA: Generation Process Basic PLSA: Generation Process

w Topics Collection background λB B Document

Is 0.05 the 0.04 a 0.03 ..

… θ1 θ2 θk πd1 πd2 πdk

battery 0.3 life 0.2.. design 0.1 screen 0.05 price 0.2 purchase 0.15

Generate a word in a document Generate a word in a document

Topic model = unigram language model = multinomial distribution

[Hofmann 99], [Zhai et al. 04]

slide-12
SLIDE 12

12

Basic PLSA: Estimation Basic PLSA: Estimation

w Topics Collection background λB B Document

Is ? the ? a ?

… θ1 θ2 θk πd1 πd2 πdk

battery ? life ? design ? screen ? price ? purchase ?

Generate a word in a document Generate a word in a document

[Hofmann 99], [Zhai et al. 04]

? ? ?

Log-likelihood of the collection Log-likelihood of the collection

Estimated with Maximum Likelihood Estimator (MLE) through an EM algorithm

slide-13
SLIDE 13

13

Basic PLSA: Problem? Basic PLSA: Problem?

Design.. Battery.. Price.. … Design.. Battery.. Price.. …

Expert review with aspects Weblogs

  • n iPod

iPod nano iPod shuffle iPod touch …

Basic PLSA Basic PLSA Extracted topics may not align with expert review aspects Extracted topics may not align with expert review aspects

Solution: conjugate priors Semi-supervised PLSA

slide-14
SLIDE 14

14

Semi Semi‐ ‐supervised PLSA supervised PLSA

Maximum A Posterior (MAP) Estimation Maximum A Posterior (MAP) Estimation Maximum Likelihood Estimation (MLE) Maximum Likelihood Estimation (MLE) Add Dirichlet priors Add Dirichlet priors

1 - λB

w Topics Collection background λB B

Is 0.05 the 0.04 a 0.03 ..

… θ1 θ2 θk πd1 πd2 πdk Document

battery life design screen

r1 r2

  • Confidence in priors
  • Regularization
slide-15
SLIDE 15

15

Subtask 1: Aspect Alignment Subtask 1: Aspect Alignment

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Weblogs Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend..

Review Aspects Extra Aspects

slide-16
SLIDE 16

16

Align to review aspects Align to review aspects Discover extra aspects Discover extra aspects

Subtask 1: Aspect Alignment Subtask 1: Aspect Alignment

1 - λB

w Topics Collection background λB B

Is 0.05 the 0.04 a 0.03 ..

… θ1 θ2 θk πd1 πd2 πdk Document

battery life design screen

r1 r2 “aspect words” (nouns) Opinion sentence aligned to the most relevant aspect

slide-17
SLIDE 17

17

Subtask 2: Opinions Separation Subtask 2: Opinions Separation

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend..

Review Aspects Extra Aspects

slide-18
SLIDE 18

18

Subtask 2: Opinions Separation Subtask 2: Opinions Separation

Similar opinions Similar opinions Supplementary

  • pinions

Supplementary

  • pinions

1 - λB

w Topics Collection background λB B

Is 0.05 the 0.04 a 0.03 ..

θsim θsupp

πd1 πd2 Document

long many

r1

Sub-Collection: Battery

“opinion words” about Battery

slide-19
SLIDE 19

19

Subtask 3: Opinion Summary Subtask 3: Opinion Summary

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend..

  • Representative

sentence:

  • Support = 3

Representative Opinion (RO)

Review Aspects Extra Aspects

slide-20
SLIDE 20

20

Subtask 3: Opinion Summary Subtask 3: Opinion Summary

1 - λB

w RO clusters Collection background λB B

Is 0.05 the 0.04 a 0.03 ..

… θ1 θ2 θk πd1 πd2 πdk Document

Sub-Collection: A Block

Centroid Sentence 1 Centroid Sentence 2

Support = cluster size

slide-21
SLIDE 21

21

Experiment Setup Experiment Setup

  • Expert review data:

Topic Source # words # aspects iPhone CNET 4434 19 Barack Obama Wikipedia 312 14

  • Ordinary opinion data:

Topic Query Terms # articles # sentences iPhone iPhone 552 3000 Barack Obama Barack+Obama 639 1000

slide-22
SLIDE 22

22

Results: Product ( Results: Product (iPhone iPhone) )

  • Opinion Integration with review aspects

Review article Similar opinions Supplementary opinions You can make emergency calls, but you can't use any

  • ther functions…

N/A … methods for unlocking the iPhone have emerged on the Internet in the past few weeks, although they involve tinkering with the iPhone hardware… rated battery life of 8 hours talk time, 24 hours of music playback, 7 hours of video playback, and 6 hours on Internet use. iPhone will Feature Up to 8 Hours of Talk Time, 6 Hours of Internet Use, 7 Hours

  • f Video Playback or

24 Hours of Audio Playback Playing relatively high bitrate VGA H.264 videos, our iPhone lasted almost exactly 9 freaking hours of continuous playback with cell and WiFi on (but Bluetooth off). Unlock/hack iPhone

Activation Battery

Confirm the

  • pinions from

the review Additional info under real usage

slide-23
SLIDE 23

23

Results: Product ( Results: Product (iPhone iPhone) )

  • Opinions on extra aspects

support Supplementary opinions on extra aspects 15 You may have heard of iASign … an iPhone Dev Wiki tool that allows you to activate your phone without going through the iTunes rigamarole. 13 Cisco has owned the trademark on the name "iPhone" since 2000, when it acquired InfoGear Technology Corp., which

  • riginally registered the name.

13 With the imminent availability of Apple's uber cool iPhone, a look at 10 things current smartphones like the Nokia N95 have been able to do for a while and that the iPhone can't currently match... Another way to activate iPhone iPhone trademark

  • riginally owned

by Cisco A better choice for smart phones?

slide-24
SLIDE 24

24

Results: Product ( Results: Product (iPhone iPhone) )

  • Support statistics for review aspects

People care about price People comment a lot about the unique wi-fi feature Controversy: activation requires contract with AT&T

slide-25
SLIDE 25

25

Quantitative Evaluation Quantitative Evaluation

  • Goal

– Evaluate human agreement (how hard is opinion integration?) – Evaluate how our approach could reproduce human choice (how well is our method doing?)

  • Method

– Ask 3 users to perform 3 tasks – Tasks designed from the Obama example

slide-26
SLIDE 26

26

Task 1:Distinguish Extra Aspects Task 1:Distinguish Extra Aspects

  • Result

– Low human agreement (1/7) – Our method recovers 52.4% of user choices on avg cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Design Battery Price Design Battery Price iTunes … easy to use… warranty …better to extend.. 34 opinions 7 extra aspects

slide-27
SLIDE 27

27

Task 2: Aspect Alignment Task 2: Aspect Alignment

Results:

– Users agree on 13/27 = 48% sentences – Our method recovers 10.67/27 = 40% sentences on avg. cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Design Battery Price Design Battery Price

Review Aspects

  • Mix 27 opinions
  • Label each with one
  • f 14 aspects
slide-28
SLIDE 28

28

Quantitative Evaluation: Task 3 Quantitative Evaluation: Task 3

  • Mix one sim opinion

with many supp

  • pinions
  • Select one opinion

most similar to the review opinion

  • Result: recovers 60%
  • f human choice

cute… tiny… ..thicker.. last many hrs die out soon could afford it still expensive Design Battery Price Design Battery Price

Review Aspects

slide-29
SLIDE 29

29

Summary Summary

  • Novel problem: opinion integration
  • Unified approach: semi‐supervised

probabilistic topic modeling

  • Many potential interesting applications
  • Future Work

– More rigorous evaluation – More general setup: many expert reviews instead

  • f one
slide-30
SLIDE 30

30

Thank you! Thank you!