Collaborative Filtering Presentation by Alex Hugger Filtering - - PowerPoint PPT Presentation

collaborative filtering
SMART_READER_LITE
LIVE PREVIEW

Collaborative Filtering Presentation by Alex Hugger Filtering - - PowerPoint PPT Presentation

Collaborative Filtering Presentation by Alex Hugger Filtering Documents Mittwoch, 28. April 2010 Departement/Institut/Gruppe 2 Content-Based Methods Find other popular items by the same author or similar keywords Recommendation


slide-1
SLIDE 1

Collaborative Filtering

Presentation by Alex Hugger

slide-2
SLIDE 2

Filtering Documents

Mittwoch, 28. April 2010 2 Departement/Institut/Gruppe

slide-3
SLIDE 3

Content-Based Methods

Find other popular items by the same author or similar keywords Recommendation quality is relatively poor

Mittwoch, 28. April 2010 3

slide-4
SLIDE 4

Filtering Music

Mittwoch, 28. April 2010 4

slide-5
SLIDE 5

Filtering Jokes

Mittwoch, 28. April 2010 5

www.xkcd.org

slide-6
SLIDE 6

Filtering Jokes

Let the users rate the jokes Sort by average rating

Mittwoch, 28. April 2010 6

slide-7
SLIDE 7

Collaborative Filtering

People who have agreed in the past tend to agree in the future

Mittwoch, 28. April 2010 7

slide-8
SLIDE 8

Good or Bad?

Mittwoch, 28. April 2010 8

Die Hard (1988) Dirty Dancing (1987)

slide-9
SLIDE 9

Good or Bad?

Mittwoch, 28. April 2010 9

slide-10
SLIDE 10

Mittwoch, 28. April 2010 10

slide-11
SLIDE 11

Jester 4.0 (http://eigentaste.berkeley.edu)

Mittwoch, 28. April 2010 11

slide-12
SLIDE 12

MovieLens

(http://movielens.org)

Mittwoch, 28. April 2010 12

slide-13
SLIDE 13

Netflix

www.netflix.org DVD/Blue-Ray rental and video streaming 1’000’000$ for the first beating the current recommendation algorithm by 10% Competition started in October 2006 Ended July 2009

Mittwoch, 28. April 2010 13

slide-14
SLIDE 14

GroupLens: An Open Architecture for Collaborative Filtering of NetNews

Research paper from 1994 by:

  • Paul Resnick, MIT Center for Coordination Science
  • Neophytos Iacovou, University of Minnesota
  • Neophytos Iacovou, University of Minnesota
  • Mitesh Suchak, MIT Center for Coordination Science
  • Peter Bergstrom , University of Minnesota
  • John Riedl , University of Minnesota

Mittwoch, 28. April 2010 14

slide-15
SLIDE 15

NetNews

Mittwoch, 28. April 2010 15

slide-16
SLIDE 16

Problems of NetNews

Signal to noise ratio is too low Splitting bulletin board into newsgroups Moderated newsgroups News clients

Summary of the author and subject line Display discussion threads together String search facilities Kill files

Mittwoch, 28. April 2010 16

slide-17
SLIDE 17

Modification to NetNews

Mittwoch, 28. April 2010 17 Mittwoch, 28. April 2010 17

3.72 2.61

slide-18
SLIDE 18

Predicting Scores

Score prediction system is robust to certain differences of interpretation of the rating scale

One user rates 3-5 and the other 1-3 One thinks 1 and the other 5 is best score

Mittwoch, 28. April 2010 18

slide-19
SLIDE 19

Predicting Scores

Predictions can be modeled as matrix filling

Item # Ken Lee Meg Nan 1 1 4 2 2

Mittwoch, 28. April 2010 19

1 1 4 2 2 2 5 2 4 4 3 3 4 2 5 5 5 4 1 1 6 ? 2 5

slide-20
SLIDE 20

Predicting Scores

Assign similarities to each of the other people Compute over articles rated by both Pearson Correlation Coefficients

Between -1 and 1

Mittwoch, 28. April 2010 20

Lee Ken Lee Ken Lee Ken Lee Ken

)] )( [( ) , cov( σ σ µ µ σ σ ⋅ − − = ⋅ =

L K E L K r

Ken

σ

= standard deviation of Ken

Ken

µ

= average of Ken’s ratings

slide-21
SLIDE 21

Predicting Scores

Correlation Coefficients of Ken

User Correlation Lee

  • 0.8

# Ken Lee Meg Nan 1 1 4 2 2

Mittwoch, 28. April 2010 21

Lee

  • 0.8

Meg +1 Nan 1 1 4 2 2 2 5 2 4 4 3 3 4 2 5 5 5 4 1 1 6 ? 2 5

slide-22
SLIDE 22

Predicting Scores

Weighted average of all ratings on article 6 Ken’s prediction is 4.56

( )

⋅ − r J µ

Mittwoch, 28. April 2010 22

( )

∑ ∑

∈ − ∈ −

⋅ − + =

Raters Ken Raters Ken 6 6Prediction J J J J J K

r r J K µ µ

slide-23
SLIDE 23

Scaling Issues

Relevant performance measures

Prediction quality Compute time and disk storage

Rating is small, but each article may be rated by many Rating is small, but each article may be rated by many users Volume of ratings could exceed volume of news

Mittwoch, 28. April 2010 23

slide-24
SLIDE 24

Scaling Issues

Pre-fetching ratings and pre-computing predictions keeps user time constant High computation complexity Volume of all ratings may exceed the storage capacity

100’000 users rate 10 articles per day. 100 bytes are required to store a rating. 1GB of storage required per 10 days.

Mittwoch, 28. April 2010 24

slide-25
SLIDE 25

Cluster Models

Mittwoch, 28. April 2010 25

slide-26
SLIDE 26

Cluster Models

Better online scalability and performance than classical collaborative filtering Complex and extensive clustering is run offline Prediction quality gets reduced

Mittwoch, 28. April 2010 26

slide-27
SLIDE 27

Item-to-Item Collaborative Filtering

Mittwoch, 28. April 2010 27

slide-28
SLIDE 28

Item-to-Item Collaborative Filtering

Amazon.com extensively uses recommendation algorithms 10’000’000 products and customers Result returned in real-time (< 0.5s) Algorithm must respond immediately to new information

Mittwoch, 28. April 2010 28

slide-29
SLIDE 29

Amazon.com

Mittwoch, 28. April 2010 29

slide-30
SLIDE 30

Amazon.com

Mittwoch, 28. April 2010 30

slide-31
SLIDE 31

Amazon.com

Mittwoch, 28. April 2010 31

slide-32
SLIDE 32

Amazon.com

Mittwoch, 28. April 2010 32

slide-33
SLIDE 33

How It Works - Offline

Similar-items table Calculating similarity between a single product and all related products

Complexity: O(mn2) - in practice: O(mn)

m: number of users m: number of users n: number of items

Mittwoch, 28. April 2010 33

slide-34
SLIDE 34

How It Works - Online

Given a similar-items table Find all similar items to each of the users ratings and purchases Aggregate those items Recommend most popular and correlated items Number of users has no effect on performance

Mittwoch, 28. April 2010 34

slide-35
SLIDE 35

General difficulties

Cold start Self-fulfilling prophecy Recommendations for groups Evaluation of recommendation systems Evaluation of recommendation systems

Mittwoch, 28. April 2010 35

slide-36
SLIDE 36

Conclusion

Effective form of targeted marketing Mostly used in e-commerce business Mostly used in e-commerce business

But can always be used when signal to noise ratio is too low

Mittwoch, 28. April 2010 36

slide-37
SLIDE 37

Questions?

Mittwoch, 28. April 2010 37

slide-38
SLIDE 38

References

GroupLens: An Open Architecture for Collaborative Filtering of NetNews

Published 1994

Paul Resnick, MIT Center for Coordination Science Neophytos Iacovou, University of Minnesota Neophytos Iacovou, University of Minnesota Mitesh Suchak, MIT Center for Coordination Science Peter Bergstrom , University of Minnesota John Riedl , University of Minnesota

Amazon.com Recommendations

Published 2003

Greg Linden Brent Smith Jeremy York

Mittwoch, 28. April 2010 38