Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 14 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 14 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola,   Yehuda Koren, Stanford CS246)

Recommender Systems

The Long Tail (from: https://www.wired.com/2004/10/tail/)

Problem Setting

Problem Setting • Task : Predict user preferences for unseen items

Content-based Filtering serious Braveheart The Color Amadeus Purple Lethal Weapon Sense and Sensibility Ocean’s ¡ 11 Geared Geared towards towards males females Dave The Lion King Dumb and Dumber The Princess Independence Diaries Day Gus escapist

Content-based Filtering serious Braveheart The Color Amadeus Purple Lethal Weapon Sense and Sensibility Ocean’s ¡ 11 Geared Geared towards towards males females Dave The Lion King Dumb and Dumber The Princess Independence Diaries Day Gus escapist Idea : Predict rating using item features on a per-user basis

Content-based Filtering serious Braveheart The Color Amadeus Purple Lethal Weapon Sense and Sensibility Ocean’s ¡ 11 Geared Geared towards towards males females Dave The Lion King Dumb and Dumber The Princess Independence Diaries Day Gus escapist Idea : Predict rating using user features on a per-item basis

Collaborative Filtering #3 #2 #1 Joe #4 Idea : Predict rating based on similarity to other users

Problem Setting • Task : Predict user preferences for unseen items • Content-based filtering : Model user/item features • Collaborative filtering : Implicit similarity of users items

Recommender Systems • Movie recommendation (Netflix) • Related product recommendation (Amazon) • Web page ranking (Google) • Social recommendation (Facebook) • News content recommendation (Yahoo) • Priority inbox & spam filtering (Google) • Online dating (OK Cupid) • Computational Advertising (Everyone)

Challenges • Scalability • Millions of objects • 100s of millions of users • Cold start • Changing user base • Changing inventory • Imbalanced dataset • User activity / item reviews   power law distributed • Ratings are not missing at random

Running Example: Netflix Data Training data Test data score user movie date user movie date score 1 21 5/7/02 1 1 62 1/6/05 ? 1 213 8/2/04 5 1 96 9/13/04 ? 2 345 3/6/01 4 2 7 8/18/05 ? 2 123 5/1/05 4 2 3 11/22/05 ? 2 768 7/15/02 3 3 47 6/13/02 ? 3 76 1/22/01 5 3 15 8/12/01 ? 4 45 8/3/00 4 4 41 9/1/00 ? 5 568 9/10/05 1 4 28 8/27/05 ? 5 342 3/5/03 2 5 93 4/4/05 ? 5 234 12/28/00 2 5 74 7/16/03 ? 6 76 8/11/02 5 6 69 2/14/04 ? 6 56 6/15/03 4 6 83 10/3/03 ? • Released as part of $1M competition by Netflix in 2006 • Prize awarded to BellKor in 2009

Running Yardstick: RMSE s X | S | − 1 r ui − r ui ) 2 rmse( S ) = (ˆ ( i,u ) ∈ S

Running Yardstick: RMSE s X | S | − 1 r ui − r ui ) 2 rmse( S ) = (ˆ ( i,u ) ∈ S (doesn’t tell you how to actually do recommendation)

Ratings aren’t everything Netflix then Netflix now

Content-based Filtering

Item-based Features

Per-user Regression Learn a set of regression coefficients for each user | r u − X w | 2 w u = argmin w

Bias Moonrise Kingdom 4 5 4 4 0.3 0.2

Bias Moonrise Kingdom 4 5 4 4 0.3 0.2 Problem : Some movies are universally loved / hated

Bias 3 3 Moonrise Kingdom 4 5 3 4 4 0.3 0.2 Problem : Some movies are universally loved / hated   some users are more picky than others

Bias Moonrise Kingdom 4 5 4 4 0.3 0.2 Problem : Some movies are universally loved / hated   some users are more picky than others Solution: Introduce a per-movie and per-user bias

Temporal Effects

Changes in user behavior … Netflix changed rating labels 2004

Movies get better with time? Are movies getting better with time?

Temporal Effects Are movies getting better with time? Solution: Model temporal effects in bias not weights

Neighborhood Methods

Neighborhood Based Methods #3 #2 #1 Joe #4 Users and items form a bipartite graph (edges are ratings)

Neighborhood Based Methods (user, user) similarity • predict rating based on average   from k-nearest users • good if item base is smaller than user base • good if item base changes rapidly (item,item) similarity • predict rating based on average   from k-nearest items • good if the user base is small • good if user base changes rapidly

Parzen-Window Style CF P j ∈ s k ( i,u ) s ij ( r uj − b uj ) r ui = b ui + ˆ where b ui = µ + b u + b i P j ∈ s k ( i,u ) s ij • Define a similarity s ij between items • Find set s k ( i , u ) of k -nearest neighbors   to i that were rated by user u • Predict rating using weighted average over set • How should we define s ij ?

• – Pearson Correlation Coefficient • – each item rated by a distinct set of users User ratings for item i: ? ? 1 ? ? 5 5 3 ? ? 4 2 ? ? ? 4 ? 5 4 1 ? User ratings for item j: 2 ? ? ? 4 2 5 ? ? 1 5 ? ? 2 ? 3 ? ? ? 5 4 • Cov[ r ui , r uj ] s ij = Std[ r ui ]Std[ r uj ]

(item,item) similarity Empirical estimate of Pearson correlation coefficient P u ∈ U ( i,j ) ( r ui − b ui )( r uj − b uj ) ρ ij = ˆ qP u ∈ U ( i,j ) ( r ui − b ui ) 2 P u ∈ U ( i,j ) ( r uj − b uj ) 2 Regularize towards 0 for small support | U ( i, j ) | − 1 s ij = | U ( i, j ) | − 1 + λ ˆ ρ ij Regularize towards baseline for small neighborhood P j ∈ s k ( i,u ) s ij ( r uj − b uj ) r ui = b ui + ˆ λ + P j ∈ s k ( i,u ) s ij

Similarity for binary labels Pearson correlation not meaningful for binary labels   (e.g. Views, Purchases, Clicks) Jaccard similarity Observed / Expected ratio m ij s ij = observed m ij s ij = expected ≈ α + m i m j /m α + m i + m j − m ij m i users acting on i m ij users acting on both i and j m total number of users

Matrix Factorization Methods

Matrix Factorization Moonrise Kingdom 4 5 4 4 0.3 0.2

Matrix Factorization Moonrise Kingdom 4 5 4 4 0.3 0.2 Idea: pose as (biased) matrix factorization problem

Matrix Factorization users 1 3 5 5 4 5 4 4 2 1 3 items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation

Prediction users 1 3 5 5 4 5 4 4 2 1 3 ? items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation

Prediction users 1 3 5 5 4 5 4 4 2 1 3 2.4 items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation

SVD with missing values .1 -.4 .2 1 3 5 5 4 -.5 .6 .5 5 4 4 2 1 3 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 ~ 2 4 1 2 3 4 3 5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 -1 .7 .3 1 3 3 2 4 Pose as regression problem • SVD ¡isn’t ¡defined ¡when ¡entries ¡are ¡unknown ¡ � • � Regularize using Frobenius norm • – –

Alternating Least Squares .1 -.4 .2 1 3 5 5 4 -.5 .6 .5 5 4 4 2 1 3 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 ~ 2 4 1 2 3 4 3 5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 -1 .7 .3 1 3 3 2 4 • SVD ¡isn’t ¡defined ¡when ¡entries ¡are ¡unknown ¡ � (regress w u given X ) • � • – –

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 14 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 14 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246) Recommender Systems The Long Tail (from: https://www.wired.com/2004/10/tail/) The

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Data Mining: Concepts and Techniques Chap 8. Data Streams, Time Series Data, and Sequential

Precision Medicine: Science and Art. Jue Wang, MD, FACP Professor of Medicine and

Big Dogs, Hot Fences and Fast Sheep Presented by Dan Macon Flying Mule Farm A Ranchers

Marin County Livestock Protection Program Marin County Board of Supervisors Marin County

Location- -based Routing in based Routing in Location Sensor Networks II Sensor Networks II

1 st DRAFT ITEC 2020, Excel, London, 27-30 April 2020 #ITEC2019 Agenda Introduction Drones and

Lethality and Autonomous Systems: An Ethical Stance Ronald C. Arkin Mobile Robot Laboratory

Installations of the Future Providing Readiness & Resilience Across the Enterprise VISION:

Introduction to the Comprehensive Approach to Suicide Prevention and Mental Health Promotion

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 14 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 14 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246) Recommender Systems The Long Tail (from: https://www.wired.com/2004/10/tail/) The

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Data Mining: Concepts and Techniques Chap 8. Data Streams, Time Series Data, and Sequential

Precision Medicine: Science and Art. Jue Wang, MD, FACP Professor of Medicine and

Big Dogs, Hot Fences and Fast Sheep Presented by Dan Macon Flying Mule Farm A Ranchers

Marin County Livestock Protection Program Marin County Board of Supervisors Marin County

Location- -based Routing in based Routing in Location Sensor Networks II Sensor Networks II

1 st DRAFT ITEC 2020, Excel, London, 27-30 April 2020 #ITEC2019 Agenda Introduction Drones and

Lethality and Autonomous Systems: An Ethical Stance Ronald C. Arkin Mobile Robot Laboratory

Installations of the Future Providing Readiness &amp; Resilience Across the Enterprise VISION:

Introduction to the Comprehensive Approach to Suicide Prevention and Mental Health Promotion

Installations of the Future Providing Readiness & Resilience Across the Enterprise VISION: