http cs246 stanford edu training data
play

http://cs246.stanford.edu Training data 100 million ratings, - PowerPoint PPT Presentation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Training data 100 million ratings, 480,000 users, 17,770 movies 6 years of data: 2000-2005 Test data Last few ratings of each user (2.8


  1. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

  2.  Training data  100 million ratings, 480,000 users, 17,770 movies  6 years of data: 2000-2005  Test data  Last few ratings of each user (2.8 million)  Evaluation criterion: Root Mean Square Error ( RMSE )  Netflix’s system RMSE: 0.9514  Competition  2,700+ teams  $1 million prize for 10% improvement on Netflix 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2

  3. 480,000 users Matrix R 1 3 4 3 5 5 4 5 5 3 17,700 3 movies 2 2 2 5 2 1 1 3 3 1 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

  4. 480,000 users Matrix R 1 3 4 𝒔 𝟒,𝟕 3 5 5 4 5 5 3 17,700 3 movies 2 ? ? Training Data Set Test Data Set ? 2 1 ? 3 ? 1 True rating of user x on item i 𝑦𝑗 2 SSE = 𝑠 𝑦𝑗 − 𝑠 (𝑗,𝑦)∈𝑆 Predicted rating 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 4

  5.  The winner of the Netflix Challenge  Multi-scale modeling of the data: Combine top level, “regional” Global effects modeling of the data, with a refined, local view:  Global: Factorization  Overall deviations of users/movies  Factorization: Collaborative filtering  Addressing “regional” effects  Collaborative filtering:  Extract local patterns 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

  6.  Global:  Mean movie rating: 3.7 stars  The Sixth Sense is 0.5 stars above avg.  Joe rates 0.2 stars below avg.  Baseline estimation: Joe will rate The Sixth Sense 4 stars  Local neighborhood (CF/NN):  Joe didn’t like related movie Signs   Final estimate: Joe will rate The Sixth Sense 3.8 stars 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 6

  7.  Earliest and most popular collaborative filtering method  Derive unknown ratings from those of “ similar ” movies (item-item variant)  Define similarity measure s ij of items i and j  Select k - nearest neighbors, compute the rating  N(i; x): items most similar to i that were rated by x   s r  ij xj  ˆ j N ( i ; x ) r s ij … similarity of items i and j  xi r uj … rating of user x on item j s N(i;x) … set of items similar to  ij j N ( i ; x ) item i that were rated by x 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

  8.  In practice we get better estimates if we model deviations:    s ( r b ) ^  ij xj xj   j N ( i ; x ) r b  xi xi s  ij j N ( i ; x ) baseline estimate for r xi Problems/Issues: 1) Similarity measures are “arbitrary” 𝒄 𝒚𝒋 = 𝝂 + 𝒄 𝒚 + 𝒄 𝒋 2) Pairwise similarities neglect interdependencies among users μ = overall mean rating 3) Taking a weighted average can be b x = rating deviation of user x restricting = ( avg. rating of user x ) – μ Solution: Instead of s ij use w ij that b i = ( avg. rating of movie i ) – μ we estimate directly from data 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 8

  9.  Use a weighted sum rather than weighted avg. : 𝑠 = 𝑐 𝑦𝑗 + 𝑥 𝑗𝑘 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 𝑦𝑗 𝑘∈𝑂(𝑗;𝑦)  A few notes:  We sum over all movies j that are similar to i and were rated by x  𝒙 𝒋𝒌 is the interpolation weight (some real number)  We allow: 𝒙 𝒋𝒌 ≠ 𝟐 𝒌∈𝑶(𝒋,𝒚)  𝒙 𝒋𝒌 models interaction between pairs of movies (it does not depend on user x )  𝑶(𝒋; 𝒚) … set of movies rated by user x that are similar to movie i 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 9

  10. = 𝑐 𝑦𝑗 +  𝑠 𝑥 𝑗𝑘 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 𝑦𝑗 𝑘∈𝑂(𝑗,𝑦)  How to set w ij ? 𝑣𝑗 2  Remember, error metric is SSE : 𝑠 𝑣𝑗 − 𝑠 (𝑗,𝑣)∈𝑆  Find w ij that minimize SSE on training data!  Models relationships between item i and its neighbors j  w ij can be learned/estimated based on x and all other users that rated i Why is this a good idea? 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 10

  11. 1 3 4 3 5 5  Here is what we just did: 4 5 5 3 3  Goal: Make good recommendations 2 ? ? ? 2 1 ?  Quantify goodness using SSE: 3 ? 1 So, Lower SSE means better recommendations  We want to make good recommendations on items that some user has not yet seen. Can’t really do this. Why?  Let’s set values w such that they work well on known (user, item) ratings And hope these w s will predict well the unknown ratings  This is the first time in the class that we see Optimization methods 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 11

  12.  Idea: Let’s set values w such that they work well on known (user, item) ratings  How to find such values w ?  Idea: Define an objective function and solve the optimization problem  Find w ij that minimize SSE on training data ! 2 min 𝑥 𝑗𝑘 𝑐 𝑦𝑗 + 𝑥 𝑗𝑘 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 − 𝑠 𝑦𝑗 𝑦 𝑘∈𝑂 𝑗;𝑦  Think of w as a vector of numbers 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

  13.  We have the optimization 2 problem, now what? min 𝑥 𝑗𝑘 𝑐 𝑦𝑗 + 𝑥 𝑗𝑘 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 − 𝑠 𝑦𝑗  Gradient decent 𝑦 𝑘∈𝑂 𝑗;𝑦  Iterate until convergence: 𝒙 𝒙 −  𝜶𝒙  … learning rate  where 𝜶𝒙 is gradient (derivative evaluated on data): 𝜖 𝛼𝑥 = = 2 𝑐 𝑦𝑗 + 𝑥 𝑗𝑙 𝑠 𝑦𝑙 − 𝑐 𝑦𝑙 − 𝑠 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 𝑦𝑗 𝜖𝑥 𝑗𝑘 𝑦 𝑙∈𝑂 𝑗;𝑦 for 𝒌 ∈ {𝑶 𝒋; 𝒚 , ∀𝒋, ∀𝒚 } 𝜖 𝜖𝑥 𝑗𝑘 = 𝟏 else  Note: we fix movie i , go over all r xi , for every movie 𝒌 ∈ 𝑶 𝒋; 𝒚 , while | w new - w old | > ε : 𝝐 w old = w new we compute 𝝐𝒙 𝒋𝒌 w new = w old -  ·  w old 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 13

  14. = 𝑐 𝑦𝑗 +  So far: 𝑠 𝑥 𝑗𝑘 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 𝑦𝑗 𝑘∈𝑂(𝑗;𝑦)  Weights w ij derived based Global effects on their role; no use of an arbitrary similarity measure ( w ij  s ij ) Factorization  Explicitly account for interrelationships among CF/NN the neighboring movies  Next: Latent factor model  Extract “regional” correlations 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 14

  15. Global average: 1.1296 User average: 1.0651 Movie average: 1.0533 Netflix: 0.9514 Basic Collaborative filtering: 0.94 CF+Biases+learnt weights: 0.91 Grand Prize: 0.8563 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 15

  16. Serious Braveheart The Color Amadeus Purple Lethal Sense and Weapon Sensibility Ocean’s 11 Geared Geared towards towards males females The Lion King The Princess Independence Diaries Day Dumb and Dumber Funny 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 16

  17. SVD: A = U  V T  “SVD” on Netflix data: R ≈ Q · P T f factors users .1 -.4 .2 1 3 5 5 4 users -.5 .6 .5 4 5 4 2 1 3 items f factors 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 2 4 1 2 3 4 3 5 ≈ -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 items 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 P T -1 .7 .3 1 3 3 2 4 Q R  For now let’s assume we can approximate the rating matrix R as a product of “thin” Q · P T  R has missing entries but let’s ignore that for now!  Basically, we will want the reconstruction error to be small on known ratings and we don’t care about the values on the missing ones 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 17

  18.  How to estimate the missing rating of 𝑼 user x for item i ? 𝒚𝒋 = 𝒓 𝒋 ⋅ 𝒒 𝒚 𝒔 users 1 3 5 5 4 = 𝒓 𝒋𝒈 ⋅ 𝒒 𝒚𝒈 4 ? 5 4 2 1 3 items ≈ 2 4 1 2 3 4 3 5 𝒈 2 4 5 4 2 4 3 4 2 2 5 q i = row i of Q 1 3 3 2 4 p x = column x of P T .1 -.4 .2 users f factors -.5 .6 .5 items 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 P T -1 .7 .3 f factors Q 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 18

  19.  How to estimate the missing rating of 𝑼 user x for item i ? 𝒚𝒋 = 𝒓 𝒋 ⋅ 𝒒 𝒚 𝒔 users 1 3 5 5 4 = 𝒓 𝒋𝒈 ⋅ 𝒒 𝒚𝒈 4 ? 5 4 2 1 3 items ≈ 2 4 1 2 3 4 3 5 𝒈 2 4 5 4 2 4 3 4 2 2 5 q i = row i of Q 1 3 3 2 4 p x = column x of P T .1 -.4 .2 users f factors -.5 .6 .5 items 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 P T -1 .7 .3 f factors Q 2/10/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend