Training data 100 million ratings, 480,000 users, 17,770 movies 6 - PowerPoint PPT Presentation

 Training data ▪ 100 million ratings, 480,000 users, 17,770 movies ▪ 6 years of data: 2000-2005  Test data ▪ Last few ratings of each user (2.8 million) ▪ Evaluation criterion: Root Mean Square Error (RMSE) = ▪ Netflix’s system RMSE: 0.9514  Competition ▪ 2,700+ teams ▪ $1 million prize for 10% improvement on Netflix 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2

Labels known publicly Labels only known to Netflix Training Data Held-Out Data 3 million ratings 100 million ratings 1.5m ratings 1.5m ratings Quiz Set: Test Set: scores scores posted on known only leaderboard to Netflix Scores used in determining final winner 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3

480,000 users Matrix R 1 3 4 3 5 5 4 5 5 3 17,700 3 movies 2 2 2 5 2 1 1 3 3 1 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4

480,000 users Matrix R 1 3 4 𝒔 𝟒,𝟕 3 5 5 4 5 5 3 17,700 3 movies 2 ? ? Training Data Set Test Data Set ? 2 1 ? 3 ? True rating of 1 user x on item i RMSE = Predicted rating 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5

 The winner of the Netflix Challenge  Multi-scale modeling of the data: Combine top level, “regional” Global effects modeling of the data, with a refined, local view: ▪ Global: Factorization ▪ Overall deviations of users/movies ▪ Factorization: Collaborative filtering ▪ Addressing “regional” effects ▪ Collaborative filtering: ▪ Extract local patterns 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6

 Global: ▪ Mean movie rating: 3.7 stars ▪ The Sixth Sense is 0.5 stars above avg. ▪ Joe rates 0.2 stars below avg.  Baseline estimation: Joe will rate The Sixth Sense 4 stars ▪ That is 4 = 3.7+0.5-0.2  Local neighborhood (CF/NN): ▪ Joe didn’t like related movie Signs ▪  Final estimate: Joe will rate The Sixth Sense 3.8 stars 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7

 The earliest and the most popular collaborative filtering method  Derive unknown ratings from those of “ similar ” movies (item-item variant)  Define similarity metric s ij of items i and j  Select k - nearest neighbors, compute the rating ▪ N(i; x): items most similar to i that were rated by x   s r  ij xj = ˆ j N ( i ; x ) r s ij … similarity of items i and j  xi r xj … rating of user x on item j s N(i;x) … set of items similar to  ij j N ( i ; x ) item i that were rated by x 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8

 In practice we get better estimates if we model deviations:   − s ( r b ) ^  ij xj xj = + j N ( i ; x ) r b  xi xi s  ij j N ( i ; x ) baseline estimate for r xi Problems/Issues: 1) Similarity metrics are “arbitrary” 𝒄 𝒚𝒋 = 𝝂 + 𝒄 𝒚 + 𝒄 𝒋 2) Pairwise similarities neglect interdependencies among users μ = overall mean rating 3) Taking a weighted average can be b x = rating deviation of user x restricting = ( avg. rating of user x ) – μ Solution: Instead of s ij use w ij that = ( avg. rating of movie i ) – μ b i we estimate directly from data 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9

 Use a weighted sum rather than weighted avg. :  A few notes: ▪ 𝑶(𝒋; 𝒚) … set of movies rated by user x that are similar to movie i ▪ 𝒙 𝒋𝒌 is the interpolation weight (some real number) ▪ Note, we allow: σ 𝒌∈𝑶(𝒋;𝒚) 𝒙 𝒋𝒌 ≠ 𝟐 ▪ 𝒙 𝒋𝒌 models interaction between pairs of movies (it does not depend on user x ) 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10

𝑦𝑗 = 𝑐 𝑦𝑗 + σ 𝑘∈𝑂(𝑗,𝑦) 𝑥 𝑗𝑘 𝑠  ෞ 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘  How to set w ij ? ▪ Remember, error metric is: 𝟑 or equivalently SSE: σ (𝒋,𝒚)∈𝑺 ො 𝒔 𝒚𝒋 − 𝒔 𝒚𝒋 ▪ Find w ij that minimize SSE on training data! ▪ Models relationships between item i and its neighbors j ▪ w ij can be learned/estimated based on x and all other users that rated i Why is this a good idea? 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 11

1 3 4 3 5 5  Goal: Make good recommendations 4 5 5 3 3 2 2 2 ▪ Quantify goodness using RMSE: 5 2 1 1 Lower RMSE  better recommendations 3 3 1 ▪ Really want to make good recommendations on items that user has not yet seen. Can’t really do this! ▪ Let’s set build a system such that it works well on known (user, item) ratings And hope the system will also predict well the unknown ratings 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12

 Idea: Let’s set values w such that they work well on known (user, item) ratings  How to find such values w ?  Idea: Define an objective function and solve the optimization problem  Find w ij that minimize SSE on training data ! True Predicted rating rating  Think of w as a vector of numbers 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13

 A simple way to minimize a function 𝒈(𝒚) : ▪ Compute the derivative 𝜶𝒈(𝒚) ▪ Start at some point 𝒛 and evaluate 𝜶𝒈(𝒛) ▪ Make a step in the reverse direction of the gradient: 𝒛 = 𝒛 − 𝜶𝒈(𝒛) ▪ Repeat until convergence 𝑔 𝑧 + 𝛼𝑔(𝑧) 𝑔 𝑧 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14

 We have the optimization 2 𝐾 𝑥 = ෍ 𝑐 𝑦𝑗 + ෍ 𝑥 𝑗𝑘 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 − 𝑠 𝑦𝑗 problem, now what? 𝑦,𝑗∈𝑆 𝑘∈𝑂 𝑗;𝑦  Gradient descent: ▪ Iterate until convergence: 𝒙 ← 𝒙 −  𝜶 𝒙 𝑲  … learning rate where 𝜶 𝒙 𝑲 is the gradient (derivative evaluated on data): 𝑥 𝐾 = 𝜖𝐾(𝑥) 𝛼 = 2 ෍ 𝑐 𝑦𝑗 + ෍ 𝑥 𝑗𝑙 𝑠 𝑦𝑙 − 𝑐 𝑦𝑙 − 𝑠 𝑦𝑗 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 𝜖𝑥 𝑗𝑘 𝑦,𝑗∈𝑆 𝑙∈𝑂 𝑗;𝑦 for 𝒌 ∈ {𝑶 𝒋; 𝒚 , ∀𝒋, ∀𝒚 } 𝜖𝐾(𝑥) else 𝜖𝑥 𝑗𝑘 = 𝟏 ▪ Note: We fix movie i , go over all r xi , for every movie 𝒌 ∈ 𝑶 𝒋; 𝒚 , 𝝐𝑲(𝒙) while | w new - w old | > ε : we compute 𝝐𝒙 𝒋𝒌 w old = w new w new = w old -  ·  w old 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 15

𝑦𝑗 = 𝑐 𝑦𝑗 + σ 𝑘∈𝑂(𝑗;𝑦) 𝑥 𝑗𝑘 𝑠  So far: ෞ 𝑠 𝑦𝑘 − 𝑐 𝑦𝑘 ▪ Weights w ij derived based Global effects on their roles; no use of an arbitrary similarity metric ( w ij  s ij ) Factorization ▪ Explicitly account for interrelationships among CF/NN the neighboring movies  Next: Latent factor model ▪ Extract “regional” correlations 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16

Global average: 1.1296 User average: 1.0651 Movie average: 1.0533 Netflix: 0.9514 Basic Collaborative filtering: 0.94 CF+Biases+learned weights: 0.91 Grand Prize: 0.8563 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17

[Slide from BellKor team] Serious Braveheart The Color Amadeus Purple Lethal Sense and Weapon Sensibility Ocean’s 11 Geared Geared towards towards males females The Lion King The Princess Independence Diaries Day Dumb and Dumber Funny 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18

SVD: A = U  V T  “SVD” on Netflix data: R ≈ Q · P T factors users .1 -.4 .2 1 3 5 5 4 users -.5 .6 .5 4 5 4 2 1 3 factors items 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 2 4 1 2 3 4 3 5 ≈ -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 items -.7 2.1 -2 4 3 4 2 2 5 P T -1 .7 .3 1 3 3 2 4 R Q  For now let’s assume we can approximate the rating matrix R as a product of “thin” Q · P T ▪ R has missing entries but let’s ignore that for now! ▪ Basically, we want the reconstruction error to be small on known ratings and we don’t care about the values on the missing ones 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 19

Training data 100 million ratings, 480,000 users, 17,770 movies 6 - PowerPoint PPT Presentation

Training data 100 million ratings, 480,000 users, 17,770 movies 6 years of data: 2000-2005 Test data Last few ratings of each user (2.8 million) Evaluation criterion: Root Mean Square Error (RMSE) = Netflixs system

Compliance Training 2012 Compliance Training 2012 Training Objectives Training Objectives

New Staff Training Training Site Development Training Site Development 2 Training Site

Product Features Technical Training 2007 Technical Training 2007 Technical Training 2007

Food Handler Training Food Handler Training Food Handler Training Food Handler Training Online

Service Section Service Section Technical Training Technical Training Technical Training

LAS Links Online Administration Training 1 Coordinator and Proctor Training Agenda Training

Presentation Health and Employability Training 01 02 Health and Employability Training Health

TRAINING INTER STATE STUDY TOUR TO NDRI KARNAL TRAINING INTER STATE STUDY TOUR TO

1 Facilitator Training Facilitator Training Facilitator Training Facilitator Training 2

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

Colorado Altitude Training Altitude Training for Aviators 1 Problem Statement Hypoxia training

Leadplane Training Course Leadplane Training Course Target Descriptions Leadplane Training

GSA OLU End-User Training GSA OLU End-User Training Training Objectives How to navigate the

Planning Commission Training Washoe County Planning Commission Training Mt. Rose Conference Room

THE TRAINING LAYOFF SCHEME THE TRAINING LAYOFF SCHEME 1 October 2009 The Training Layoff Scheme

37% OF AMERICANS do not feel safe walking at night can we... stop an assailant? make it small?

Introduction to the Comprehensive Approach to Suicide Prevention and Mental Health Promotion

Installations of the Future Providing Readiness & Resilience Across the Enterprise VISION:

Lethality and Autonomous Systems: An Ethical Stance Ronald C. Arkin Mobile Robot Laboratory

Introduction Gang Tan Penn State University Spring 2019 CMPSC 447: Software Security Why a

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London

Types of Hooks Introductions: Quote- So, never worry about failure. Its gonna happen. But

Exercise 9: Low Energy Neutrons FLUKA Beginners Course Exercise 9: Low Energy Neutrons Aim of

Training data 100 million ratings, 480,000 users, 17,770 movies 6 - PowerPoint PPT Presentation

Training data 100 million ratings, 480,000 users, 17,770 movies 6 years of data: 2000-2005 Test data Last few ratings of each user (2.8 million) Evaluation criterion: Root Mean Square Error (RMSE) = Netflixs system

Compliance Training 2012 Compliance Training 2012 Training Objectives Training Objectives

New Staff Training Training Site Development Training Site Development 2 Training Site

Product Features Technical Training 2007 Technical Training 2007 Technical Training 2007

Food Handler Training Food Handler Training Food Handler Training Food Handler Training Online

Service Section Service Section Technical Training Technical Training Technical Training

LAS Links Online Administration Training 1 Coordinator and Proctor Training Agenda Training

Presentation Health and Employability Training 01 02 Health and Employability Training Health

TRAINING INTER STATE STUDY TOUR TO NDRI KARNAL TRAINING INTER STATE STUDY TOUR TO

1 Facilitator Training Facilitator Training Facilitator Training Facilitator Training 2

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

Colorado Altitude Training Altitude Training for Aviators 1 Problem Statement Hypoxia training

Leadplane Training Course Leadplane Training Course Target Descriptions Leadplane Training

GSA OLU End-User Training GSA OLU End-User Training Training Objectives How to navigate the

Planning Commission Training Washoe County Planning Commission Training Mt. Rose Conference Room

THE TRAINING LAYOFF SCHEME THE TRAINING LAYOFF SCHEME 1 October 2009 The Training Layoff Scheme

37% OF AMERICANS do not feel safe walking at night can we... stop an assailant? make it small?

Introduction to the Comprehensive Approach to Suicide Prevention and Mental Health Promotion

Installations of the Future Providing Readiness &amp; Resilience Across the Enterprise VISION:

Lethality and Autonomous Systems: An Ethical Stance Ronald C. Arkin Mobile Robot Laboratory

Introduction Gang Tan Penn State University Spring 2019 CMPSC 447: Software Security Why a

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London

Types of Hooks Introductions: Quote- So, never worry about failure. Its gonna happen. But

Exercise 9: Low Energy Neutrons FLUKA Beginners Course Exercise 9: Low Energy Neutrons Aim of

Installations of the Future Providing Readiness & Resilience Across the Enterprise VISION: