problem domain
play

Problem Domain Collaborative filtering (CF)-based recommender - PowerPoint PPT Presentation

ClustKNN : A Highly Scalable Hybrid Model-& Memory-Based CF Algorithm Al Mamunur Rashid, Shyong K. Lam, George Karypis, and John Riedl University of Minnesota Problem Domain Collaborative filtering (CF)-based recommender systems (RS).


  1. ClustKNN : A Highly Scalable Hybrid Model-& Memory-Based CF Algorithm Al Mamunur Rashid, Shyong K. Lam, George Karypis, and John Riedl University of Minnesota

  2. Problem Domain • Collaborative filtering (CF)-based recommender systems (RS). • Issue: − Scalability 2 Al Mamunur Rashid, WebKDD 2006

  3. Background: Why Recommender Systems? Information overload: More than 1.3 million articles! About 50 million blogs! About 130 million photos! 3 Al Mamunur Rashid, WebKDD 2006

  4. Background: Why Recommender Systems? • One solution: − Recommender systems � Tools that suggest items of interest based on • Users’ expressed preferences • Observed behaviors • Information about the items � Collaborative Filtering • Recommendations based on like-minded users 4 Al Mamunur Rashid, WebKDD 2006

  5. Many CF Algorithms So Far… • Most of the early ones: kNN − GroupLens (1994) , Ringo (1995) • View it as a special regression problem. − Nearly all statistical and ML approaches can be applied! • Classification by Breese et al. (1998) : Memory-based Model-based CF CF � � Simplicity � � Training cost � � Online prediction cost � � Adding new information 5 Al Mamunur Rashid, WebKDD 2006

  6. Many CF Algorithms So Far… • Accuracy: − So far the main focus � However, how much difference in accuracy users perceive? • Does it scale though? 6 Al Mamunur Rashid, WebKDD 2006

  7. User-based k NN CF Algorithm • Classic memory-based CF • Assumption: − Linear relationship between two users’ preferences � User-similarities measured by Pearson correlation coeff. • Works very well − Very good accuracy & Explainable to general users. • Problem: Doesn’t scale! − O(mn) online cost 7 Al Mamunur Rashid, WebKDD 2006

  8. ClustKNN : Proposed Approach • Retain good properties of User-based kNN • Make it to scale n users Bisecting k-means clustering k clusters Take k-centroids k surrogate users • Online cost: O(km) ≅ O(m) − (k«m, k«n) 8 Al Mamunur Rashid, WebKDD 2006

  9. ClustKNN : Proposed Approach • Bisecting k-means clustering − Better k-means � Cluster sizes are more uniform � Better results found in document clustering (Steinbach 2000) • Similarity function: − Same in both cluster-building and CF − Nicely complements each other 9 Al Mamunur Rashid, WebKDD 2006

  10. Other Algorithms Considered 10 Al Mamunur Rashid, WebKDD 2006

  11. Time-complexities 11 Al Mamunur Rashid, WebKDD 2006

  12. Experiments: Datasets •Movie recommendation data from 12 Al Mamunur Rashid, WebKDD 2006

  13. Experiments: Evaluation Metrics • Prediction eval metrics − NMAE � Divide MAE with Expected MAE � Limitation: • Same value of error: same treatment � No difference between two (pred, actual) pairs (5, 2) and (2, 5) − Expected Utility (EU) • Recommendation list eval metrics − Precision-recall-F1 13 Al Mamunur Rashid, WebKDD 2006

  14. Evaluation Metric: EU • Two tables: − A contingency table � Rows: predictions; columns: actual ratings − A utility table � Filled with a linear utility function: � Penalizes false positives more than false negatives 14 Al Mamunur Rashid, WebKDD 2006

  15. Results 7 6.8 Expect ed Ut ilit y 6.6 6.4 ClustKNN 6.2 User-based KNN 6 20 30 40 50 60 70 80 100 120 140 200 500 # of clusters in the model 0.47 0.465 ClustKNN User-based KNN 0.46 0.455 NMAE 0.45 0.445 0.44 0.435 0.43 0.425 20 30 40 50 60 70 80 100 120 140 200 500 # of clusters in the model 15 Al Mamunur Rashid, WebKDD 2006

  16. Results: Prediction Accuracy 16 Al Mamunur Rashid, WebKDD 2006

  17. Results: Recommendation List 17 Al Mamunur Rashid, WebKDD 2006

  18. ClustKNN : Discussion • Scalable! • Simple and explainable • Hybrid of model- and memory-based approaches • Great for occasionally-connected, low-storage devices! − Memory requirement: only O(km+m) ! 18 Al Mamunur Rashid, WebKDD 2006

  19. Thanks for listening! Questions? 19 Al Mamunur Rashid, WebKDD 2006

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend