Review Mining Automatically Assessing Review Helpfulness Sanae Sato - PowerPoint PPT Presentation

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS Review Mining Automatically Assessing Review Helpfulness Sanae Sato Haotian He April 22, 2014

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS O VERVIEW O VERVIEW I NTRODUCTION The Issue Goals for This Issue M ETHODOLOGY Define Helpfulness Ranking Features Evaluation R ESULTS Results Summary

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS T HE I SSUE ◮ Online reviews vary in quality ◮ Current ranking of reviews is only by their recency or product rating, other than assessing relevance according to their text reviews ◮ ”Helpfulness” is very relevant information which directly affects customers’ decision making, but the challenge is that it’s also hard to define and measure what exactly it is

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS G OALS FOR T HIS I SSUE ◮ A system for automatically ranking reviews according to helpfulness ◮ An analysis of different classes of features most important to capture review helpfulness (structural, lexical, syntactic, semantic, and meta-data)

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS D EFINE H ELPFULNESS Formally, given a set of reviews R for a particular product, our task is to rank the reviews according to their helpfulness . They define a review helpfulness function h , as: rating + ( r ) h ( r ∈ R ) = rating + ( r ) + rating − ( r ) Data: Amazon.com reviews for particular electronics products obtained by using Amazon Web Services API.

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Ranking System SVM regression model and RBF kernel to estimate function h . Why choose SVM regression, rather than SVM ranking?

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Choose Features What features may affect the assessment of review helpfulness?

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Features Feature Class: Structural Feature ◮ Length (LEN) ◮ Sentential (SEN) ◮ HTML (HTM)

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Features Feature Class: Lexical Feature ◮ Unigram (UGR) ◮ Bigram (BGR)

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Feature Extraction Feature Class: Syntactic Feature ◮ Syntax (SYN)

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Features Feature Class: Semantic Feature ◮ Product-Feature (PRF) ◮ General-Inquirer (GIW)

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Features Feature Class: Meta-data Feature ◮ Stars (STR/STR1/STR2)

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Feature Extraction For LEN/SEN/UGR/BGR/SYN: ◮ Minipar dependency parser (Lin 1994) ◮ Parser tokenization ◮ Sentence Breaker ◮ Syntactic categorization

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Feature Extraction For PRF: ◮ Developed an automatic way of mining reference to product features ◮ Basic approach: turn user generated pros/cons list found in Epinions.com into a feature list based on the assumption that pros/cons list tend to contain references to the product features that are important ◮ number of unique Product-Feature

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Feature Extraction For GIW: ◮ Extract sentiment words using General Inquirer Dictionary

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Feature Extraction For STR: ◮ Directly created from the star rating

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS A PPROACH Evaluation ◮ Gold Standard: Labeled dateset { review , h ( review ) } for supervised machine learning ◮ Spearman correlation coefficient ◮ Person coefficient

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS R ESULTS

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS S UMMARY ◮ A system for automatically ranking reviews according to helpfulness They successfully assessed helpfulness and ranking reviews according to it. SVM regression suits and works well to learn the helpfulness function for their purpose. Compared with Gold Standard, the results shows a good match, as Spearman correlation coefficient scores of 0.656 (MP3) and 0.604 (digital cameras) against the gold standard.

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS S UMMARY ◮ An analysis of different classes of features most important to capture review helpfulness (structural, lexical, syntactic, semantic, and meta-data) The top three significant features: ◮ Length of the review ◮ Unigram (UGR) ◮ Product rating Semantic/sentiment features were subsumed by the simple unigram features. Structural feature except length and syntactic feature had no significant impact.

Review Mining Automatically Assessing Review Helpfulness Sanae Sato - PowerPoint PPT Presentation

O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS Review Mining Automatically Assessing Review Helpfulness Sanae Sato Haotian He April 22, 2014 O VERVIEW I NTRODUCTION M ETHODOLOGY R ESULTS O VERVIEW O VERVIEW I NTRODUCTION The Issue Goals for

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Introduction What is data mining? to Data mining functionalities Data Mining Major

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

NANO MINING POOL CLOUD CONTRACTS AND MINING SERVICES OUR PRODUCTS Cloud cards are mining cards

Correlation analysis Fernando Brito e Abreu (fba@di.fct.unl.pt) Universidade Nova de

Fabienne Cap March 9th, 2017 Overview Motivation Methodology Results Fabienne Cap

A Degree-of of-Knowledge Model to Capture Source Code Familiarity Thomas Fritz, Jingwen Ou, Gail

1 Vectors 1.1 Definitions Dot product or inner product n v w = ( v 1 w 1 + . . . +

Mach Machine Le ine Learning arning Feature Space, Feature Selection Hamid R. Rabiee Jafar

of galaxies and gravitational lenses Deepak Jain Deen Dayal Upadhyaya College University of

Evaluating translation quality - part 2 Machine Translation Lecture 10 Instructor: Chris

Drilling through the M31 Halo near Mayall-II/G1 Michael Gregg University of California, Davis