SLIDE 1
NETFLIX Movie Recommendations Virgil Pavlu Shahzad Rajput Keshi - - PowerPoint PPT Presentation
NETFLIX Movie Recommendations Virgil Pavlu Shahzad Rajput Keshi - - PowerPoint PPT Presentation
NETFLIX Movie Recommendations Virgil Pavlu Shahzad Rajput Keshi Dai Movie ratings: 1 (bad) - 5 (good) 5 3 2 1 5 Movie ratings ? 5 3 2 5 ? 3 1 5 4 ? 4 4 3 5 ? 5 3 2 4 COLLABORATIVE FILTERING; PEARSON FORMULA compute for
SLIDE 2
SLIDE 3
Movie ratings
5 3 2
?
5 3 1 5 4
?
4 4
?
3 5
?
5 3 2 4
SLIDE 4
SLIDE 5
COLLABORATIVE FILTERING; PEARSON FORMULA compute for each user u mean and variance. Let Nu = number of movies rated by user u; Rum is the rating of user u for movie m µu =
- m Rum
Nu σu =
- m R2
um
Nu − µ2
u
normalize each ratings by substracting the user mean and divid- ing by user variance ¯ rum = Rum − µu σu compute user similarity between any two users u and v ρuv = 1 movies in common m ⇥
m
¯ rum · ¯ rvm predict the rating for a new movie by accounting for all other users’ v rating on the movie predict(u, m) = µu +
- v ρuv · ¯
rvm
- v |ρuv|
· σu
SLIDE 6
Users-item-ratings problem
Usually very sparse Many applications
article recommendation
Amazon, Netflix, iTunes and many others
pretty much all online stores/services “automatic” reviews some items (movie, books) easier than others
Content vs Collaborative approach
SLIDE 7
NETFLIX dataset
Rent movies via postal service
recently also online
18000 movies .5 million users Training: 100 million ratings Testing : 1 million ratings
measure perfomance : RMSE
SLIDE 8
37918 teams / 180 countries
SLIDE 9
Collaborative Filtering
Use similarity between users/items Many solutions, old and new
Simple : Pearson’s formula
measure statistical correlation between users/items
Simple : Rule-based k-Nearest Neighbor/k-Means + regression Model effects due to user/movie/time etc
Star Wars may not be as likeable now as 30 years ago
Matrix factorization
SLIDE 10
Content-based training
Identify movies by content features
Actors, genre, director, writer etc 6000 features to cover 90% of NETFLIX dataset We use content data from IMDB
Learn a profile for each user
x x x
SLIDE 11
User profile
movie r=4
4 4 4 4
movie r=1
1 1 1
movie r=5
5 5 5
profile
2.5 4 5 3 3.3 4
SLIDE 12
Content + Collaborative
Fix a movie m Build a training set with content+collab features Run decision tree + regression profile collaborative training testing
SLIDE 13
Content + Collaborative
On some movies content features dominant On others, collab features dominant profile collaborative training testing
SLIDE 14