Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, - - PowerPoint PPT Presentation

comparing opinions on the web
SMART_READER_LITE
LIVE PREVIEW

Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, - - PowerPoint PPT Presentation

Opinion Observer: Analyzing and Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation: Asif Salekin Introduction Web: excellent source of consumer opinions Introduction Technical Tasks Useful


slide-1
SLIDE 1

Opinion Observer: Analyzing and Comparing Opinions on the Web

Authors Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation: Asif Salekin

slide-2
SLIDE 2

Introduction

  • Web: excellent source of consumer opinions
  • Useful information to customers and

product manufacturers

  • Opinion
  • bserver

Introduction

Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-3
SLIDE 3

Technical Tasks

  • Identify product features
  • For each feature, identify whether the
  • pinion is positive or negative
  • Review Format –
  • Pros
  • Cons
  • Detailed

review

  • The paper proposes a technique to identify

product features from pros and cons in this format

Introduction

Technical Tasks

Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-4
SLIDE 4

Problem Statement

  • Set of products P={P1,P2 … Pn}
  • Set of reviews Ri for Pi={r1,r2 … rk}
  • rj={sj1,sj2 …sjm} : sequence of sentenses
  • A product feature f in rj is an attribute of

the product that has been commented in rj

  • If f appears in rj, explicit feature

– “The battery life of this camera is too short”

  • If f does not appear in rj but is implied,

implicit feature

– “This camera is too large” (size)

Introduction Technical Tasks

Problem Statement

Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-5
SLIDE 5

Problem Statement

  • Opinion segment of a feature

– Set of consecutive sentences that expresses a positive or negative opinion on f – “The picture quality is good, but the battery life is short”

  • Positive opinion set of a feature (Pset)

– Set of opinion segments of f that expresses positive opinions about f from all the reviews of the product – Nset can be defined similarly

Introduction Technical Tasks

Problem Statement

Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-6
SLIDE 6

Problem Statement

  • Observation: Each sentence segment contains at

most one product feature. Sentence segments are separated by : , . ; and but

– Pros: Cons:

Introduction Technical Tasks

Problem Statement

Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-7
SLIDE 7

Prepare a Training Dataset

“Battery usage; included 16MB is stingy”

  • Perform Part-Of-Speech (POS) tagging and remove

digits

– “<N> Battery <N> usage” – “<V> included <N> MB <V>is <Adj> stingy”

  • Replace feature words with [feature].

– “<N> [feature] <N> usage” – “<V> included <N> [feature] <V> is <Adj> stingy”

  • Use 3-gram to produce shorter segments
  • “<V> included <N> [feature] <V> is <Adj> stingy”

→ “<Adj> included <N> [feature] <V> is” “<N> [feature] <V> is <Adj> stingy”

  • Distinguish duplicate tags

– “<N1> [feature] <N2> usage”

  • Perform word stemming

Introduction Technical Tasks Problem Statement

Prepare a Training Dataset

Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-8
SLIDE 8

Associated Rule Mining

  • Association Rule Mining model
  • I = {i1, …, in} : a set of items.
  • I={milk,bread, butter, beer}
  • D : a set of transactions. Each transaction

consists of a subset of items in I.

  • Association rule:

X → Y, where X ⊂ I, Y ⊂ I, and X ∩Y = ∅

  • Rule: {butter, bread} -> {milk}
  • The rule has support s in D if s% of transactions in D contain X ∪ Y.
  • Support: 1/5 =.2 since, X ∪ Y occurs in only
  • The rule X → Y holds in D with confidence c if c% of transactions

in D that support X also support Y.

  • Confidence: 0.2/0.2=1.0

for 100% of the transactions containing butter and bread , also contain milk

Introduction Technical Tasks Problem Statement Prepare a Training Dataset

Associated Rule Mining

Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-9
SLIDE 9

Associated Rule Mining

  • The resulting sentence (3-gram) segments using

human labeling are saved in a transaction file D.

  • Association rule mining finds all rules in the

database that satisfy some minimum support and minimum confidence constraints.

  • Use the association mining system CBA (Liu, B., Hsu,

W., Ma, Y. 1998) to mine rules.

  • Use 1% as the minimum support.
  • No minimum confident used
  • Some example rules:

– <N1>, <N2> → [feature] – <V>, <N> → [feature] – <N1> → [feature], <N2> – <N1>, [feature] → <N2>

Introduction Technical Tasks Problem Statement Prepare a Training Dataset

Associated Rule Mining

Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-10
SLIDE 10

Post-processing

Rules: <N1>, <N2> → [feature] <V>, <N> → [feature] <N1> → [feature], <N2> <N1>, [feature] → <N2>

  • Step 1: We only need rules that have [feature] on

the RHS.

– Need only rule 1, rule 2

  • Step 2:We need to consider the sequence of items

in the LHS.

– e.g., “<V>, <N> → [feature]” can have variation like: “<N>, <V> → [feature]” – Checking each rule against the transaction file to find the possible sequences. – Remove those derived rules with confidence < 50%.

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining

Post-processing

Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-11
SLIDE 11

Post-processing

Rules: <N1>, <N2> → [feature] <N>, <V> → [feature]

  • Step 3: Generate language patterns.

– changed to the language patterns according to the

  • rdering of the items in the rules from step 2 and the

feature location

<N1> [feature] <N2> <N> <V> [feature]

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining

Post-processing

Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

slide-12
SLIDE 12

Extraction of Product Features

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing

Extraction of Product Features

Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

  • Do POS tagging on new reviews
  • resulting patterns are used to match and

identify candidate features

  • Allow gaps for pattern matching
  • <N1> [feature] <N2> can match with
  • “Animals like kind people” gap like F: kind
  • If a sentence segment satisfies multiple patterns
  • Choose the pattern with highest confidence.
  • If no pattern applies,
  • use nouns or noun phrases as features.
  • If a sentence segment has only a single word,

e.g., “heavy” and “big

  • use that word as feature
slide-13
SLIDE 13

Feature Refinement

Two main mistakes made during extraction: – Feature conflict: Two or more feature in

  • ne sentence segment

– There is a more likely feature in the sentence segment but not extracted by any pattern.

  • e.g., “slight noise from speaker when not in

use” “noise” is found to be the feature but not “speaker”.

– How to find this? “speaker” was found as candidate features in other reviews, but “noise” was never.

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features

Feature Refinement

Mapping to Implicit Features Grouping Synonyms Experiments

slide-14
SLIDE 14

Feature Refinement

Frequent-noun

  • The generated product features together

with their frequency counts are saved in a candidate feature list.

  • For each sentence segment, if there are two
  • r more nouns, we choose the most frequent

noun in the candidate feature list.

Frequent-term

  • For each sentence segment, we simply

choose the word/phrase (it does not need to be a noun) with the highest frequency in the candidate feature list.

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features

Feature Refinement

Mapping to Implicit Features Grouping Synonyms Experiments

slide-15
SLIDE 15

Mapping to Implicit Features

  • In tagging the training data for mining

rules, we also tag the mapping of candidate features to their actual features.

  • “<V> included <N> MB <V>is <Adj> stingy”

Here, MB was tagged as feature. Now Map it to Memory.

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement

Mapping to Implicit Features

Grouping Synonyms Experiments

slide-16
SLIDE 16

Grouping Synonyms

  • Grouping features with similar meanings.
  • e.g., “photo”, “picture” and “image” all

refers to the same feature in digital camera reviews.

  • Employ WordNet to check if any synonym

groups/sets exist among the features.

  • Choose only the top two frequent senses of

a word for finding its synonyms.

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features

Grouping Synonyms

Experiments

slide-17
SLIDE 17

Experiments

  • Training and test review data

– Manually tagged a large collection of reviews of 15 electronic products from epinions.com. – 10 of them are used as the training data to mine patterns, and the rest are used as testing.

  • Evaluation measure

– recall (r) and precision (p)

n : the total number of reviews of a particular product. ECi : the number of extracted features from review i that are correct. Ci : the number of actual features in review i. Ei : the number of extracted features from review i.

Introduction

Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms

Experiments

slide-18
SLIDE 18

Experiments

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms

Experiments

slide-19
SLIDE 19

Experiments

Observations:

  • The frequent-term strategy gives better results

than the frequent-noun strategy.

  • some features are not expressed as

nouns.POS tagger makes mistakes.

  • The results for Pros are better than those for Cons.

– people tend to use similar words like ‘excellent’, ‘great’, ‘good’ in Pros. In contrast, the words that people use to complain differ a lot in Cons

Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms

Experiments

slide-20
SLIDE 20

Thank you & Questions?