Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, - - PowerPoint PPT Presentation
Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, - - PowerPoint PPT Presentation
Opinion Observer: Analyzing and Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation: Asif Salekin Introduction Web: excellent source of consumer opinions Introduction Technical Tasks Useful
Introduction
- Web: excellent source of consumer opinions
- Useful information to customers and
product manufacturers
- Opinion
- bserver
Introduction
Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Technical Tasks
- Identify product features
- For each feature, identify whether the
- pinion is positive or negative
- Review Format –
- Pros
- Cons
- Detailed
review
- The paper proposes a technique to identify
product features from pros and cons in this format
Introduction
Technical Tasks
Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Problem Statement
- Set of products P={P1,P2 … Pn}
- Set of reviews Ri for Pi={r1,r2 … rk}
- rj={sj1,sj2 …sjm} : sequence of sentenses
- A product feature f in rj is an attribute of
the product that has been commented in rj
- If f appears in rj, explicit feature
– “The battery life of this camera is too short”
- If f does not appear in rj but is implied,
implicit feature
– “This camera is too large” (size)
Introduction Technical Tasks
Problem Statement
Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Problem Statement
- Opinion segment of a feature
– Set of consecutive sentences that expresses a positive or negative opinion on f – “The picture quality is good, but the battery life is short”
- Positive opinion set of a feature (Pset)
– Set of opinion segments of f that expresses positive opinions about f from all the reviews of the product – Nset can be defined similarly
Introduction Technical Tasks
Problem Statement
Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Problem Statement
- Observation: Each sentence segment contains at
most one product feature. Sentence segments are separated by : , . ; and but
– Pros: Cons:
Introduction Technical Tasks
Problem Statement
Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Prepare a Training Dataset
“Battery usage; included 16MB is stingy”
- Perform Part-Of-Speech (POS) tagging and remove
digits
– “<N> Battery <N> usage” – “<V> included <N> MB <V>is <Adj> stingy”
- Replace feature words with [feature].
– “<N> [feature] <N> usage” – “<V> included <N> [feature] <V> is <Adj> stingy”
- Use 3-gram to produce shorter segments
- “<V> included <N> [feature] <V> is <Adj> stingy”
→ “<Adj> included <N> [feature] <V> is” “<N> [feature] <V> is <Adj> stingy”
- Distinguish duplicate tags
– “<N1> [feature] <N2> usage”
- Perform word stemming
Introduction Technical Tasks Problem Statement
Prepare a Training Dataset
Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Associated Rule Mining
- Association Rule Mining model
- I = {i1, …, in} : a set of items.
- I={milk,bread, butter, beer}
- D : a set of transactions. Each transaction
consists of a subset of items in I.
- Association rule:
X → Y, where X ⊂ I, Y ⊂ I, and X ∩Y = ∅
- Rule: {butter, bread} -> {milk}
- The rule has support s in D if s% of transactions in D contain X ∪ Y.
- Support: 1/5 =.2 since, X ∪ Y occurs in only
- The rule X → Y holds in D with confidence c if c% of transactions
in D that support X also support Y.
- Confidence: 0.2/0.2=1.0
for 100% of the transactions containing butter and bread , also contain milk
Introduction Technical Tasks Problem Statement Prepare a Training Dataset
Associated Rule Mining
Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Associated Rule Mining
- The resulting sentence (3-gram) segments using
human labeling are saved in a transaction file D.
- Association rule mining finds all rules in the
database that satisfy some minimum support and minimum confidence constraints.
- Use the association mining system CBA (Liu, B., Hsu,
W., Ma, Y. 1998) to mine rules.
- Use 1% as the minimum support.
- No minimum confident used
- Some example rules:
– <N1>, <N2> → [feature] – <V>, <N> → [feature] – <N1> → [feature], <N2> – <N1>, [feature] → <N2>
Introduction Technical Tasks Problem Statement Prepare a Training Dataset
Associated Rule Mining
Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Post-processing
Rules: <N1>, <N2> → [feature] <V>, <N> → [feature] <N1> → [feature], <N2> <N1>, [feature] → <N2>
- Step 1: We only need rules that have [feature] on
the RHS.
– Need only rule 1, rule 2
- Step 2:We need to consider the sequence of items
in the LHS.
– e.g., “<V>, <N> → [feature]” can have variation like: “<N>, <V> → [feature]” – Checking each rule against the transaction file to find the possible sequences. – Remove those derived rules with confidence < 50%.
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining
Post-processing
Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Post-processing
Rules: <N1>, <N2> → [feature] <N>, <V> → [feature]
- Step 3: Generate language patterns.
– changed to the language patterns according to the
- rdering of the items in the rules from step 2 and the
feature location
<N1> [feature] <N2> <N> <V> [feature]
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining
Post-processing
Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
Extraction of Product Features
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing
Extraction of Product Features
Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments
- Do POS tagging on new reviews
- resulting patterns are used to match and
identify candidate features
- Allow gaps for pattern matching
- <N1> [feature] <N2> can match with
- “Animals like kind people” gap like F: kind
- If a sentence segment satisfies multiple patterns
- Choose the pattern with highest confidence.
- If no pattern applies,
- use nouns or noun phrases as features.
- If a sentence segment has only a single word,
e.g., “heavy” and “big
- use that word as feature
Feature Refinement
Two main mistakes made during extraction: – Feature conflict: Two or more feature in
- ne sentence segment
– There is a more likely feature in the sentence segment but not extracted by any pattern.
- e.g., “slight noise from speaker when not in
use” “noise” is found to be the feature but not “speaker”.
– How to find this? “speaker” was found as candidate features in other reviews, but “noise” was never.
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features
Feature Refinement
Mapping to Implicit Features Grouping Synonyms Experiments
Feature Refinement
Frequent-noun
- The generated product features together
with their frequency counts are saved in a candidate feature list.
- For each sentence segment, if there are two
- r more nouns, we choose the most frequent
noun in the candidate feature list.
Frequent-term
- For each sentence segment, we simply
choose the word/phrase (it does not need to be a noun) with the highest frequency in the candidate feature list.
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features
Feature Refinement
Mapping to Implicit Features Grouping Synonyms Experiments
Mapping to Implicit Features
- In tagging the training data for mining
rules, we also tag the mapping of candidate features to their actual features.
- “<V> included <N> MB <V>is <Adj> stingy”
Here, MB was tagged as feature. Now Map it to Memory.
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement
Mapping to Implicit Features
Grouping Synonyms Experiments
Grouping Synonyms
- Grouping features with similar meanings.
- e.g., “photo”, “picture” and “image” all
refers to the same feature in digital camera reviews.
- Employ WordNet to check if any synonym
groups/sets exist among the features.
- Choose only the top two frequent senses of
a word for finding its synonyms.
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features
Grouping Synonyms
Experiments
Experiments
- Training and test review data
– Manually tagged a large collection of reviews of 15 electronic products from epinions.com. – 10 of them are used as the training data to mine patterns, and the rest are used as testing.
- Evaluation measure
– recall (r) and precision (p)
n : the total number of reviews of a particular product. ECi : the number of extracted features from review i that are correct. Ci : the number of actual features in review i. Ei : the number of extracted features from review i.
Introduction
Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms
Experiments
Experiments
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms
Experiments
Experiments
Observations:
- The frequent-term strategy gives better results
than the frequent-noun strategy.
- some features are not expressed as
nouns.POS tagger makes mistakes.
- The results for Pros are better than those for Cons.
– people tend to use similar words like ‘excellent’, ‘great’, ‘good’ in Pros. In contrast, the words that people use to complain differ a lot in Cons
Introduction Technical Tasks Problem Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement Mapping to Implicit Features Grouping Synonyms
Experiments