MRR: an Unsupervised Algorithm to Rank Reviews by Relevance - PowerPoint PPT Presentation

MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Vinicius Woloszyn Henrique D. P. dos Santos et al. Department of Computer Science Federal University of Rio Grande do Sul and Pontifical Catholic University of Rio Grande do Sul 2017 IEEE/WIC/ACM International Conference on Web Intelligence Leipzig, August 24, 2017 Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 1 / 21

Introduction Many works address the problem of ranking documents by their relevance. Most of them rely on supervised algorithms such as classification and regression. Annotated: Neural Network, SVM Statistics: TF-IDF, Readability, POS-Tag Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 2 / 21

Introduction The quality of results produced by supervised algorithms is dependent on the existence of a large, domain-dependent training data set. Amazon, Yelp Netflix, IMDB Unsupervised methods are an attractive alternative to avoid the labor-intense and error-prone task of manual annotation of training datasets. Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 3 / 21

MRR - Ranking documents by their relevance Graph-based Vertices are the documents (review), and the edges are defined in terms of the similarity between pairs of documents (ratings score and textual). f ( u , v ) = α ∗ sim txt ( u , v ) + (1 − α ) ∗ sim star ( u , v ) (1) α : tune similarity function Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 4 / 21

MRR - Ranking documents by their relevance Similarity Functions Textual Cosine similarity of TF-IDF vectors sim txt ( u , v ) = cos ( tfidf ( t . t ) , tfidf ( v . t )) (2) Stars Euclidean distance normalized by Min-Max scaling sim star ( u , v ) = 1 − | u . rs − v . rs | − min ( rs ) (3) max ( rs ) − min ( rs ) Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 5 / 21

MRR - Ranking documents by their relevance Graph Centrality Hypothesis: a relevant document has a high centrality index since it is similar to many other documents. Centrality index produces a ranking of vertices’ importance, indicating the ranking of the most relevant document. Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 6 / 21

MRR - Graph-Specific Similarity Threshold Graph Pruning Centrality is dependent on the existence of edges between nodes. Prune the graph based on a minimum similarity between review. E : mean of graph similarity � 1 , f ( u , v ) ≥ E ∗ β W ′ ( u , v ) = (4) 0 , otherwise β : tune prune function Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 7 / 21

Main steps of the MRR algorithm 0.15 0.08 ♦♦ 3 ♠♠ 4 ♦♦ 3 ♠♠ 4 ♦♦ 3 0.55 ♠♠ 4 ♥♥ ♥ ♥♥ ♥ ♥♥ ♥ 0.87 0.90 0.90 0.87 ♠♠ ♠♠ 4 4 ♠♠ 4 ♥ ♥ 0.55 ♥ 0.85 0.32 0.85 0.34 ♦♦♦ ♦♦♦ ♦♦♦ 0.01 ♣♣ ♣♣ ♣♣ ♠♠ ♠♠ 3 3 ♠♠ 3 0.92 0 ♥ ♥ ♦ 2 . ♦ 2 0.88 9 ♦ 2 ♥ 0.88 2 ♦♦ ♦♦ ♣♣ ♣♣ ♦♦ ♣♣ 0.45 0.08 0.22 (A) Similarity Function (B) Graph-Speci c Threshold (C) PageRank Scores (A) Builds a similarity graph G between pairs of documents; (B) Prune by removing all edges lower than the similarity threshold; (C) Employ PageRank to obtain the centrality scores; Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 8 / 21

MRR Algorithm Algorithm 1 - MRR Algorithm ( R , α , β ): S 1: for each u , v ∈ R do W [ u , v ] ← α ∗ sim txt ( u , v )+(1- α ) ∗ sim star ( u , v ) 2: 3: end for 4: E ← mean ( W ) 5: for each u , v ∈ R do if W [ u , v ] ≥ E ∗ β then 6: W ′ [ u , v ] ← 1 7: else 8: W ′ [ u , v ] ← 0 9: end if 10: 11: end for 12: S ← PageRank ( W ′ ) 13: Return S Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 9 / 21

Experiment Design Dataset: reviews (rating score and text) of electronics and books from the Amazon website. Gold Standard: Human perception of helpfulness: vote + ( r ) h ( r ∈ R ) = (5) vote + ( r ) + vote − ( r ) Metric: Normalized Discounted Cumulative Gain as NDCG@n Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 10 / 21

Amazon Dataset Electronics Books Votes 48.20 ( ± 302.84) 29.71 ( ± 73.58) Positive 40.12 ( ± 291.99) 20.60 ( ± 64.18) Negative 8.08 ( ± 22.27) 9.11 ( ± 21.44) Rating 3.73 ( ± 1.50) 3.41 ( ± 1.54) Words 350.32 ( ± 402.02) 287.44 ( ± 273.75) Products 383 461 Total 19,756 24,234 Table: Profiling of the Amazon dataset. Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 11 / 21

MRR Evaluation Experiments: Baselines comparison; Graph-Specific Threshold Assessment; Parameter Sensibility; and Run-time Performance. Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 12 / 21

Experiment Design Baselines: TSUR et al. (2009) as REVRANK; Core Virtual Review (200 most frequent words), Rank by similarity distance to Core Wu et al. (2011) as PR HS LEN; Sentences similarity based on POS-Tags, PageRank, Hits and Length SVM Regression: a) textual features TF-IDF and the star score, b) the same features used by Wu et al. (2011) Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 13 / 21

Relevance Ranking Assessment NDCG@1 NDCG@5 SVM WU 0.80770 0.91817 SVM TFIDF 0.85539 0.93119 REVRANK 0.66052 0.68172 PR HS LEN 0.72689 0.77131 MRR 0.79877 0.81876 Table: Mean Performance on Book Reviews MRR statistically outperformed all unsupervised baselines Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 14 / 21

Relevance Ranking Assessment NDCG@1 NDCG@5 SVM WU 0.76416 0.91535 SVM TFIDF 0.88986 0.94621 REVRANK 0.67903 0.72133 PR HS LEN 0.87434 0.87184 MRR 0.89403 0.89246 Table: Mean Performance on Electronic Reviews MRR statistically outperformed all unsupervised baselines MRR is comparable to supervised methods Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 15 / 21

Graph-Specific Threshold Assessment MRR performance is always better using a Graph-Specific threshold. Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 16 / 21

Parameter Sensibility: α and β α in all settings had a low influence (4%) β produced the highest variation (17%). Nevertheless when 0 . 8 ≤ β ≤ 0 . 9, the MRR varying only 6% . Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 17 / 21

Run-time Assessment Time required for producing a ranking for 383 products (log scale) MRR presents a significantly lower running time Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 18 / 21

Final Remarks Contributions: Unsupervised method: does not depend on an annotated training set; Faster than other graph-centrality methods; It performs well in different domains (e.g. closed vs. open-ended); Significantly superior to the unsupervised baselines, and comparable to a supervised approach in a specific setting. Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 19 / 21

Further Work Next steps: Others clustering techniques for graph; Methods to select the most relevant reviews; Segmented Bushy Path widely explored in text summarization; Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 20 / 21

Thanks Thank You! Question? source: https://github.com/vwoloszyn/MRR contact: henrique.santos.003@acad.pucrs.br Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS) MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Leipzig, August 24, 2017 21 / 21

MRR: an Unsupervised Algorithm to Rank Reviews by Relevance - PowerPoint PPT Presentation

MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Vinicius Woloszyn Henrique D. P. dos Santos et al. Department of Computer Science Federal University of Rio Grande do Sul and Pontifical Catholic University of Rio Grande do Sul

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

BBB Customer Reviews Work for Your Business Customer Reviews Good for Consumers Good for

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

FACULTY REVIEWS Adrienne Jeffries Karl Pfister OUTLINE Different types of faculty reviews

IANA Update for the ccNSO IANA Reviews + Were performing a number of reviews as part of

Future Reviews Marco Verzocchi Fermilab 13 January 2020 Date of the next reviews DUNE

Stars and Words: Reviewing Book Reviews Gregg Bridgeman EIC at Olivia Kimbrell Press, Inc.

Systematic Reviews 8 March 2007 Simon Gates Contents Reviewing of research Why we need

RegData 2.0 July 29, 2014 Patrick A. McLaughlin Senior Research Fellow Mercatus Center at

Quality-biased Ranking for Queries with Commercial Intent Alexander Shishkin Polina Zhinalieva

Service Level Expectations for Post Transition IANA Jay Daley .nz jay@nzrs.net.nz Structure of

AUCKLAND ACTION AGAINST POVERTY Ricardo Menendez March Political Coordinator WHO WE ARE. WHAT

Nuclear Energy University Programs FY12 Review G.A. Bala, NEUP-IO Program Manager Improvement

Remixing Gotye Katie Wardrobe Midnight Music What is remixing? A new work made from an

Good Morning. REMIX! REMIX! DONT JUST RECYCLE: PRINT REMIX! REMIX! DONT JUST RECYCLE:

MIKE.A THE DJ Born in 1999, Mike.A is a Dj Producer Now Mike mainly produces Mainstream, from