comparing opinions on the web
play

Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, - PowerPoint PPT Presentation

Opinion Observer: Analyzing and Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation: Asif Salekin Introduction Web: excellent source of consumer opinions Introduction Technical Tasks Useful


  1. Opinion Observer: Analyzing and Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation: Asif Salekin

  2. Introduction • Web: excellent source of consumer opinions Introduction Technical Tasks • Useful information to customers and Problem Statement product manufacturers Prepare a Training Dataset Associated Rule Mining Post-processing • Opinion Extraction of Product Features observer Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

  3. Technical Tasks • Identify product features Introduction Technical Tasks • For each feature, identify whether the Problem Statement opinion is positive or negative Prepare a Training • Review Format – Dataset Associated Rule Mining -Pros Post-processing -Cons Extraction of Product Features -Detailed Feature Refinement review Mapping to Implicit Features • The paper proposes a technique to identify Grouping Synonyms product features from pros and cons in this Experiments format

  4. Problem Statement • Set of products P={P 1 ,P 2 … P n } Introduction Technical Tasks • Set of reviews R i for P i ={r 1 ,r 2 … r k } Problem Statement • r j ={s j1 ,s j2 … s jm } : sequence of sentenses Prepare a Training Dataset • A product feature f in r j is an attribute of Associated Rule Mining the product that has been commented in r j Post-processing Extraction of • If f appears in r j , explicit feature Product Features Feature – “The battery life of this camera is too short” Refinement • If f does not appear in r j but is implied, Mapping to Implicit Features implicit feature Grouping Synonyms – “This camera is too large ” ( size ) Experiments

  5. Problem Statement • Opinion segment of a feature Introduction Technical Tasks – Set of consecutive sentences that expresses a Problem positive or negative opinion on f Statement – “The picture quality is good, but the battery life is Prepare a Training Dataset short” Associated Rule Mining Post-processing • Positive opinion set of a feature ( Pset ) Extraction of Product Features – Set of opinion segments of f that expresses Feature Refinement positive opinions about f from all the reviews of Mapping to the product Implicit Features – Nset can be defined similarly Grouping Synonyms Experiments

  6. Problem Statement • Observation: Each sentence segment contains at Introduction Technical Tasks most one product feature. Sentence segments are Problem separated by : , . ; and but Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement – Pros: Cons: Mapping to Implicit Features Grouping Synonyms Experiments

  7. Prepare a Training Dataset “Battery usage; included 16MB is stingy” Introduction • Perform Part-Of-Speech (POS) tagging and remove Technical Tasks Problem digits Statement – “<N> Battery <N> usage” Prepare a – “<V> included <N> MB <V>is < Adj > stingy” Training Dataset • Replace feature words with [feature]. Associated Rule Mining – “<N> [feature] <N> usage” Post-processing – “<V> included <N> [feature] <V> is < Adj > stingy” Extraction of • Use 3-gram to produce shorter segments Product Features Feature • “<V> included <N> [feature] <V> is < Adj > stingy” Refinement → “< Adj > included <N> [feature] <V> is” Mapping to “<N> [feature] <V> is < Adj > stingy” Implicit Features • Distinguish duplicate tags Grouping Synonyms – “<N1> [feature] <N2> usage” Experiments • Perform word stemming

  8. Associated Rule Mining • Association Rule Mining model Introduction Technical Tasks • I = { i 1 , …, i n } : a set of items. Problem - I={milk,bread, butter, beer} Statement • D : a set of transactions. Each transaction Prepare a Training Dataset consists of a subset of items in I. Associated Rule • Association rule: Mining X → Y , where X ⊂ I , Y ⊂ I , and X ∩ Y = ∅ Post-processing Extraction of - Rule: {butter, bread} -> {milk} Product Features The rule has support s in D if s % of transactions in D contain X ∪ Y . • Feature - Support: 1/5 =.2 since, X ∪ Y occurs in only Refinement • Mapping to The rule X → Y holds in D with confidence c if c % of transactions Implicit Features in D that support X also support Y . Grouping - Confidence: 0.2/0.2=1.0 Synonyms for 100% of the transactions containing butter and bread , also Experiments contain milk

  9. Associated Rule Mining • The resulting sentence (3-gram) segments using Introduction human labeling are saved in a transaction file D . Technical Tasks Problem • Association rule mining finds all rules in the Statement database that satisfy some minimum support and Prepare a Training Dataset minimum confidence constraints. Associated Rule • Use the association mining system CBA (Liu, B., Hsu, Mining W., Ma, Y. 1998) to mine rules. Post-processing • Use 1% as the minimum support. Extraction of Product Features • No minimum confident used Feature • Some example rules: Refinement Mapping to – <N1>, <N2> → [feature] Implicit Features – <V>, <N> → [feature] Grouping – <N1> → [feature], <N2> Synonyms Experiments – <N1>, [feature] → <N2>

  10. Post-processing Rules: Introduction <N1>, <N2> → [feature] Technical Tasks Problem <V>, <N> → [feature] Statement <N1> → [feature], <N2> Prepare a Training <N1>, [feature] → <N2> Dataset Associated Rule • Step 1: We only need rules that have [feature] on Mining the RHS. Post-processing – Need only rule 1, rule 2 Extraction of Product Features • Step 2: We need to consider the sequence of items Feature in the LHS. Refinement – e.g., “<V>, <N> → [feature]” can have variation like: Mapping to Implicit Features “<N>, <V> → [feature]” Grouping – Checking each rule against the transaction file to find Synonyms the possible sequences. Experiments – Remove those derived rules with confidence < 50%.

  11. Post-processing Rules: Introduction <N1>, <N2> → [feature] Technical Tasks Problem <N>, <V> → [feature] Statement Prepare a Training • Step 3: Generate language patterns. Dataset – changed to the language patterns according to the Associated Rule Mining ordering of the items in the rules from step 2 and the Post-processing feature location Extraction of Product Features <N1> [feature] <N2> Feature Refinement <N> <V> [feature] Mapping to Implicit Features Grouping Synonyms Experiments

  12. Extraction of Product Features • Do POS tagging on new reviews Introduction • resulting patterns are used to match and Technical Tasks Problem identify candidate features Statement • Allow gaps for pattern matching Prepare a Training Dataset - <N1> [feature] <N2> can match with Associated Rule - “Animals like kind people” gap like F: kind Mining Post-processing • If a sentence segment satisfies multiple patterns Extraction of - Choose the pattern with highest confidence. Product Features • If no pattern applies, Feature Refinement - use nouns or noun phrases as features. Mapping to Implicit Features • If a sentence segment has only a single word, Grouping e.g., “heavy” and “big Synonyms Experiments - use that word as feature

  13. Feature Refinement Two main mistakes made during extraction: Introduction Technical Tasks – Feature conflict: Two or more feature in Problem one sentence segment Statement Prepare a Training – There is a more likely feature in the Dataset sentence segment but not extracted by Associated Rule Mining any pattern. Post-processing • e.g., “slight noise from speaker when not in Extraction of Product Features use” Feature “noise” is found to be the feature but not Refinement “speaker”. Mapping to – How to find this? “speaker” was found as Implicit Features Grouping candidate features in other reviews, but Synonyms Experiments “noise” was never.

  14. Feature Refinement Frequent-noun Introduction Technical Tasks • The generated product features together Problem Statement with their frequency counts are saved in a Prepare a Training candidate feature list . Dataset Associated Rule • For each sentence segment, if there are two Mining or more nouns, we choose the most frequent Post-processing Extraction of noun in the candidate feature list. Product Features Feature Frequent-term Refinement • For each sentence segment, we simply Mapping to Implicit Features choose the word/phrase (it does not need to Grouping Synonyms be a noun) with the highest frequency in the Experiments candidate feature list.

  15. Mapping to Implicit Features • In tagging the training data for mining Introduction Technical Tasks rules, we also tag the mapping of Problem Statement candidate features to their actual Prepare a Training Dataset features. Associated Rule Mining • “<V> included <N> MB <V>is <Adj > stingy” Post-processing Here, MB was tagged as feature. Now Map it to Extraction of Product Features Memory. Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend