ds504 cs586 big data analytics recommender system
play

DS504/CS586: Big Data Analytics Recommender System Prof. Yanhua Li - PowerPoint PPT Presentation

Welcome to DS504/CS586: Big Data Analytics Recommender System Prof. Yanhua Li Time: 6:00pm 8:50pm Thu. Location: KH116 Fall 2017 Example: Recommender Systems v Customer X v Customer Y Star War I Does search on Star War I Star War


  1. Welcome to DS504/CS586: Big Data Analytics Recommender System Prof. Yanhua Li Time: 6:00pm –8:50pm Thu. Location: KH116 Fall 2017

  2. Example: Recommender Systems v Customer X v Customer Y § Star War I § Does search on Star War I § Star War II § Recommender system suggests Star War II from data collected about customer X J. Leskovec, A. Rajaraman, J. Ullman: 2 Mining of Massive Datasets, http:// www.mmds.org

  3. Recommendations Examples: Search Recommendations Products, web sites, Items blogs, news items, … J. Leskovec, A. Rajaraman, J. Ullman: 3 Mining of Massive Datasets, http:// www.mmds.org

  4. From Scarcity to Abundance v Shelf space is a scarce commodity for traditional retailers § Also: TV networks, movie theaters,… v Web enables near-zero-cost dissemination of information about products § From scarcity to abundance, e.g., Amazon, Target online, eBay, etc. v More choices necessitates better filters § Recommendation engines J. Leskovec, A. Rajaraman, J. Ullman: 4 Mining of Massive Datasets, http:// www.mmds.org

  5. Types of Recommendations v Editorial and hand curated § List of favorites § Lists of “essential” items v Simple aggregates § Top 10, Most Popular, Recent Uploads v Tailored to individual users § Amazon, Netflix, … J. Leskovec, A. Rajaraman, J. Ullman: 5 Mining of Massive Datasets, http:// www.mmds.org

  6. Formal Model v X = set of Customers v S = set of Items v Utility function u : X × S à R § R = set of ratings § R is a totally ordered set § e.g., 0-5 stars, real number in [0,1] J. Leskovec, A. Rajaraman, J. Ullman: 6 Mining of Massive Datasets, http:// www.mmds.org

  7. Utility Matrix Avatar LOTR Matrix Pirates 1 0.2 Alice 0.5 0.3 Bob 0.2 1 Carol 0.4 David J. Leskovec, A. Rajaraman, J. Ullman: 7 Mining of Massive Datasets, http:// www.mmds.org

  8. Key Problems v (1) Gathering “known” ratings for matrix § How to collect the data in the utility matrix v (2) Estimate unknown ratings from the known ones § Mainly interested in high unknown ratings • We are not interested in knowing what you don’t like but what you like v (3) Evaluating estimation methods § How to measure success/performance of recommendation methods J. Leskovec, A. Rajaraman, J. Ullman: 8 Mining of Massive Datasets, http:// www.mmds.org

  9. (1) Gathering Ratings v Explicit § Ask people to rate items § Doesn’t work well in practice – people can’t be bothered v Implicit § Learn ratings from user actions • E.g., purchase implies high rating § What about low ratings? J. Leskovec, A. Rajaraman, J. Ullman: 9 Mining of Massive Datasets, http:// www.mmds.org

  10. (2) Estimating Utilities v Key problem: Utility matrix U is sparse § Most people have not rated most items § Cold start: • New items have no ratings • New users have no history v Approaches to recommender systems: § 1) Content-based § 2) Collaborative filtering J. Leskovec, A. Rajaraman, J. Ullman: 10 Mining of Massive Datasets, http:// www.mmds.org

  11. Content-based Recommender Systems

  12. Content-based Recommendations v Main idea: Recommend items to customer x similar to previous items rated highly by x § Look at x’s items vs all items Example: v Movie recommendations § Recommend movies with same actor(s), director, genre, … v Websites, blogs, news § Recommend other sites with “similar” content J. Leskovec, A. Rajaraman, J. Ullman: 12 Mining of Massive Datasets, http:// www.mmds.org

  13. Plan of Action Item profiles likes build recommend Red match Circles Triangles User profile J. Leskovec, A. Rajaraman, J. Ullman: 13 Mining of Massive Datasets, http:// www.mmds.org

  14. Item Profiles v For each item, create an item profile v Profile is a set (vector) of features § Movies: author, title, actor, director,… § Text: Set of “important” words in document J. Leskovec, A. Rajaraman, J. Ullman: 14 Mining of Massive Datasets, http:// www.mmds.org

  15. User Profiles and Prediction v User profile possibilities: § Weighted average of rated item profiles § Variations: weight by difference from average rating for item ∑ w x = w j ( r xj − r x ) j = 1... N x v Prediction heuristic: § Given user profile w x and item profile w j , estimate r xj = cos( w x , w j ) = w x w j / || w j |||| w x || J. Leskovec, A. Rajaraman, J. Ullman: 15 Mining of Massive Datasets, http:// www.mmds.org

  16. Pros: Content-based Approach v +: No need for data on other users v +: Able to recommend to users with unique tastes v +: Able to recommend new & unpopular items § No item cold-start v +: Able to provide explanations § Can provide explanations of recommended items by listing content-features that caused an item to be recommended J. Leskovec, A. Rajaraman, J. Ullman: 16 Mining of Massive Datasets, http:// www.mmds.org

  17. Cons: Content-based Approach v –: Finding the appropriate features is hard § E.g., images, movies, music v –: Recommendations for new users § How to build a user profile? § User code-start problem v –: Overspecialization § Never recommends items outside user’s content profile § People might have multiple interests § Unable to exploit quality judgments of other users J. Leskovec, A. Rajaraman, J. Ullman: 17 Mining of Massive Datasets, http:// www.mmds.org

  18. Collaborative Filtering Harnessing quality judgments of other users

  19. Collaborative Filtering v Consider user x v Find set N of other x users whose ratings are “ similar ” to x ’s ratings N v Estimate x ’s ratings based on ratings of users in N J. Leskovec, A. Rajaraman, J. Ullman: 19 Mining of Massive Datasets, http:// www.mmds.org

  20. r x = [*, _, _, *, ***] Finding “Similar” Users r y = [*, _, **, **, _] v Let r x be the vector of user x’s ratings r x , r y as sets: v Jaccard similarity measure r x = {1, 4, 5} r y = {1, 3, 4} § Problem: Ignore the value of the ratings: v Cosine Similarity measure r x , r y as points: r x = {1, 0, 0, 1, 3} § Sim(x,y)=cos(r x , r y )=r x r y /||r x || ||r y || r y = {1, 0, 2, 2, 0} § Problem: Treading missing ratings as negatives v Pearson correlation coefficient v Sim(x,y)= v cos(r x , r y )=(r x -r x,ave )(r y -r y,ave )/||r x -r x,ave || ||r y -r y,ave || 20

  21. Cosine sim: Similarity Metric v Intuitively we want: § sim( A , B ) > sim( A , C ) v Jaccard similarity: 1/5 < 2/4 v Cosine similarity: 0.386 > 0.322 § Considers missing ratings as “negative” § Solution: subtract the (row) mean Notice cosine sim. is correlation when data is centered at 0 21

  22. User-User Collaborative Filtering § For user u, find other similar users § Estimate rating for item i based on ratings from similar users ∑ sim ( u , n ) ⋅ r ni n ⊂ neighbors ( u ) pred ( u , i ) = ∑ sim ( u , n ) n ⊂ neighbors ( u ) Sim(u,n) … similarity of user u and n r ui … rating of user u on item i neighbor(u) … set users similar to user u J. Leskovec, A. Rajaraman, J. Ullman: 22 Mining of Massive Datasets, http:// www.mmds.org

  23. Item-Item Collaborative Filtering v So far: User-user collaborative filtering v Another view: Item-item § For item i , find other similar items § Estimate rating for item i based on ratings for similar items § Can use same similarity metrics and prediction functions as in user-user model s r ∑ ⋅ ij xj j N ( i ; x ) ∈ r = xi s s ij … similarity of items i and j ∑ ij r xj … rating of user x on item j j N ( i ; x ) ∈ N(i;x) … set items rated by x similar to i J. Leskovec, A. Rajaraman, J. Ullman: 23 Mining of Massive Datasets, http:// www.mmds.org

  24. Item-Item CF (|N|=2) users 1 2 3 4 5 6 7 8 9 10 11 12 1 1 3 5 5 4 2 5 4 4 2 1 3 movies 3 2 4 1 2 3 4 3 5 4 2 4 5 4 2 5 4 3 4 2 2 5 6 1 3 3 2 4 - unknown rating - rating between 1 to 5 J. Leskovec, A. Rajaraman, J. Ullman: 24 Mining of Massive Datasets, http:// www.mmds.org

  25. Item-Item CF (|N|=2) users 1 2 3 4 5 6 7 8 9 10 11 12 1 1 3 ? 5 5 4 2 5 4 4 2 1 3 movies 3 2 4 1 2 3 4 3 5 4 2 4 5 4 2 5 4 3 4 2 2 5 6 1 3 3 2 4 - estimate rating of movie 1 by user 5 J. Leskovec, A. Rajaraman, J. Ullman: 25 Mining of Massive Datasets, http:// www.mmds.org

  26. Item-Item CF (|N|=2) users 1 2 3 4 5 6 7 8 9 10 11 12 sim(1,m) 1 1 3 ? 5 5 4 1.00 2 5 4 4 2 1 3 -0.18 movies 3 2 4 1 2 3 4 3 5 0.41 4 2 4 5 4 2 -0.10 -0.31 5 4 3 4 2 2 5 0.59 6 1 3 3 2 4 Here we use Pearson correlation as similarity: Neighbor selection: 1) Subtract mean rating m i from each movie i Identify movies similar to m 1 = (1+3+5+5+4)/5 = 3.6 movie 1 , rated by user 5 row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0] 26 2) Compute cosine similarities between rows

  27. Item-Item CF (|N|=2) users 1 2 3 4 5 6 7 8 9 10 11 12 sim(1,m) 1 1 3 ? 5 5 4 1.00 2 5 4 4 2 1 3 -0.18 movies 3 2 4 1 2 3 4 3 5 0.41 4 2 4 5 4 2 -0.10 -0.31 5 4 3 4 2 2 5 0.59 6 1 3 3 2 4 Compute similarity weights: s 1,3 =0.41, s 1,6 =0.59 J. Leskovec, A. Rajaraman, J. Ullman: 27 Mining of Massive Datasets, http:// www.mmds.org

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend