recommendation systems
play

Recommendation Systems Prof. Mike Hughes Many ideas/slides - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Recommendation Systems Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2 Recommendation


  1. Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Recommendation Systems Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2

  2. Recommendation Task: Which users will like which items? Mike Hughes - Tufts COMP 135 - Spring 2019 3

  3. • Need recommendation everywhere Mike Hughes - Tufts COMP 135 - Spring 2019 4

  4. Utility matrix • The “value” or “utility” of items to users • Only known when ratings happen • In practice, very sparse, many entries unknown 2 4 Mike Hughes - Tufts COMP 135 - Spring 2019 5

  5. Rec Sys Unit Objectives • Explain Recommendation Task • Predict which users will like which items • Explain two major types of recommendation • Content-based (have features for items/users) • Collaborative filtering ( only have scores for item+user pairs) • Detailed Approach: Matrix Factorization + SGD • Evaluation: • Precision/recall for binary recs Mike Hughes - Tufts COMP 135 - Spring 2019 6

  6. Task: Recommendation Supervised Learning Content filtering Unsupervised Learning Collaborative filtering Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 7

  7. Content-based recommendation Mike Hughes - Tufts COMP 135 - Spring 2019 8

  8. Content-based FEATURE VALUE is_round 1 Key aspect: is_juicy 1 Have common features for each item average_price $1.99/lb Mike Hughes - Tufts COMP 135 - Spring 2019 9

  9. Content-Based Recommendation • Reduce per-user prediction to supervised prediction problem What features are necessary? What are pitfalls? Fig. Credit: Emily Fox (UW) Mike Hughes - Tufts COMP 135 - Spring 2019 10

  10. Possible Per-Item Features • Movie • Set of actors, director, genre, year • Document • Bag of words, author, genre, citations • Product • Tags, reviews Mike Hughes - Tufts COMP 135 - Spring 2019 11

  11. Content-Based Recommender Fig. Credit: Emily Fox (UW) Mike Hughes - Tufts COMP 135 - Spring 2019 12

  12. Collaborative filtering Mike Hughes - Tufts COMP 135 - Spring 2019 13

  13. External Slides • Matt Gormley at CMU • https://www.cs.cmu.edu/~mgormley/courses/ 10601-s17/slides/lecture25-mf.pdf • We’ll use page 4 – 34 • Start: ”Recommender Systems” slide • Stop at: comparison of optimization algorithms Mike Hughes - Tufts COMP 135 - Spring 2019 14

  14. Matrix Factorization (MF) • User ! represented by vector " # ∈ % & • Item ' represented by vector ( ) ∈ % & * ( ) approximates the utility + #) • Inner product " # • Intuition: • Two items with similar vectors get similar utility scores from the same user; • Two users with similar vectors give similar utility scores to the same item Mike Hughes - Tufts COMP 135 - Spring 2019 15

  15. Training an MF model • Variables to optimize • ! = # $ : & = 1, … , * , + = , $ : - = 1, … , . • Training objective 6 + 8 2 6 6 + 8 2 5 , 3 min !,+ 2 . $3 − # $ # $ 6 , 3 6 $3 $ 3 • How to optimize? • Stochastic gradient descent, visit each user-item entry at random! • Key practical aspects • Regularization terms to prevent overfitting Mike Hughes - Tufts COMP 135 - Spring 2019 16

  16. Include intercept/bias terms! • Per-user scalar + , • Per-item scalar - . ; + = 5 ; ; + = 5 9 : . − + , − - . min 2,4 5 6 ,. − 8 , 8 , ; : . ; ,. , . Why include these? Mike Hughes - Tufts COMP 135 - Spring 2019 17

  17. Include intercept/bias terms! • Per-user scalar + , • Per-item scalar - . ; + = 5 ; ; + = 5 9 : . − + , − - . min 2,4 5 6 ,. − 8 , 8 , ; : . ; ,. , . Why include these? Improve accuracy Some items just more popular Some users just more positive Mike Hughes - Tufts COMP 135 - Spring 2019 18

  18. Summary of Methods Mike Hughes - Tufts COMP 135 - Spring 2019 19

  19. Task: Recommendation Supervised Learning Content-based filtering Unsupervised Learning Collaborative filtering Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 20

  20. Recall: Supervised Method Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Spring 2019 21

  21. Example: Per-User Predictor For each item n: Supervised x: User-Item Feature Learning y: Rating Score Performance { x n , y n } N measure Content-based filtering Task n =1 Unsupervised Learning User-item Predicted Regressor Feature rating / vector y Classifier x Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 22

  22. Recall: Unsupervised Method Supervised Learning Data Examples Performance { x n } N measure Task n =1 Unsupervised Learning summary data or model x of x Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 Mike Hughes - Tufts COMP 135 - Spring 2019 23 23

  23. Example: Matrix Factorization Supervised Learning Data Examples Performance Matrix M measure Task Unsupervised Learning Low-rank Specific Value of Collaborative filtering factors entry M_ij that indicies reconstruct Reinforcement M (i,j) Learning Mike Hughes - Tufts COMP 135 - Spring 2019 Mike Hughes - Tufts COMP 135 - Spring 2019 24 24

  24. Evaluation Mike Hughes - Tufts COMP 135 - Spring 2019 25

  25. Evaluation Assumptions • For given user, we can rate each item with score • We care most about our top-score predictions • Setup: • Algorithm rates each item with score • Sort items from high to low score • Have “true” relevant/not usage labels (unused by algo.) Item ranking 1 2 3 4 5 6 7 8 Actual usage 1 0 1 0 0 0 1 1 Mike Hughes - Tufts COMP 135 - Spring 2019 26

  26. Mike Hughes - Tufts COMP 135 - Spring 2019 27

  27. Mike Hughes - Tufts COMP 135 - Spring 2019 28

  28. External Slides • Emily Fox’s slides • https://courses.cs.washington.edu/courses/cse 416/18sp/slides/L13_matrix- factorization.pdf#page=19 • Start: Slide 19 (world of all baby products) • Stop: End of that section Mike Hughes - Tufts COMP 135 - Spring 2019 29

  29. Precision-Recall Curve precision recall (= TPR) Mike Hughes - Tufts COMP 135 - Spring 2019 30

  30. Precision @ k • Assume only top k results are predicted positive • E.g. Netflix can only show you ~8 results on screen at a time, we want most of these to be relevant • Example: Item ranking 1 2 3 4 5 6 7 8 Actual usage 1 0 1 0 0 0 1 1 • Prec @ 1 : 100% • Prec @ 2: 50% • Prec @ 8: 50% Mike Hughes - Tufts COMP 135 - Spring 2019 31

  31. Cold Start Issue • New user entering the system • Hard for both content-based and matrix factors • Matching similar users • Trial-and-error • New item entering the system • Easy with per-user content-based recommendation • IF easy to get the item’s feature vector • Hard with matrix factorization • Trial-and-error Mike Hughes - Tufts COMP 135 - Spring 2019 32

  32. Practical Issues in Real Systems • Recommendation system and users form a loopy system • RS changes user’s behavior • User generate data for RS • User groups becoming more homogeneous • Youtube recommendation of politic videos: recommend videos from the same camp Mike Hughes - Tufts COMP 135 - Spring 2019 33

  33. Rec Sys Unit Objectives • Explain Recommendation Task • Predict which users will like which items • Explain two major types of recommendation • Content-based (have features for items/users) • Collaborative filtering ( only have scores for item+user pairs) • Detailed Approach: Matrix Factorization + SGD • Evaluation: • Precision/recall for binary recs Mike Hughes - Tufts COMP 135 - Spring 2019 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend