Problem Domain Collaborative filtering (CF)-based recommender - - PowerPoint PPT Presentation

problem domain
SMART_READER_LITE
LIVE PREVIEW

Problem Domain Collaborative filtering (CF)-based recommender - - PowerPoint PPT Presentation

ClustKNN : A Highly Scalable Hybrid Model-& Memory-Based CF Algorithm Al Mamunur Rashid, Shyong K. Lam, George Karypis, and John Riedl University of Minnesota Problem Domain Collaborative filtering (CF)-based recommender systems (RS).


slide-1
SLIDE 1

ClustKNN: A Highly Scalable Hybrid

Model-& Memory-Based CF Algorithm

Al Mamunur Rashid, Shyong K. Lam, George Karypis, and John Riedl University of Minnesota

slide-2
SLIDE 2

Al Mamunur Rashid, WebKDD 2006 2

Problem Domain

  • Collaborative filtering (CF)-based recommender

systems (RS).

  • Issue:

− Scalability

slide-3
SLIDE 3

Al Mamunur Rashid, WebKDD 2006 3

Background: Why Recommender Systems?

Information overload:

More than 1.3 million articles! About 50 million blogs! About 130 million photos!

slide-4
SLIDE 4

Al Mamunur Rashid, WebKDD 2006 4

Background: Why Recommender Systems?

  • One solution:

− Recommender systems

Tools that suggest items of interest based on

  • Users’ expressed preferences
  • Observed behaviors
  • Information about the items

Collaborative Filtering

  • Recommendations based on like-minded users
slide-5
SLIDE 5

Al Mamunur Rashid, WebKDD 2006 5

Many CF Algorithms So Far…

  • Most of the early ones: kNN

− GroupLens(1994), Ringo(1995)

  • View it as a special regression problem.

− Nearly all statistical and ML approaches can be applied!

  • Classification by Breese et al.(1998):

Memory-based CF Model-based CF Simplicity

  • Training cost
  • Online prediction cost
  • Adding new information
slide-6
SLIDE 6

Al Mamunur Rashid, WebKDD 2006 6

Many CF Algorithms So Far…

  • Accuracy:

− So far the main focus

However, how much difference in accuracy users perceive?

  • Does it scale though?
slide-7
SLIDE 7

Al Mamunur Rashid, WebKDD 2006 7

User-based kNN CF Algorithm

  • Classic memory-based CF
  • Assumption:

− Linear relationship between two users’ preferences

User-similarities measured by Pearson correlation coeff.

  • Works very well

− Very good accuracy & Explainable to general users.

  • Problem: Doesn’t scale!

− O(mn) online cost

slide-8
SLIDE 8

Al Mamunur Rashid, WebKDD 2006 8

ClustKNN: Proposed Approach

  • Retain good properties of User-based kNN
  • Make it to scale
  • Online cost: O(km) ≅ O(m)

− (k«m, k«n)

n users k clusters k surrogate users

Bisecting k-means clustering Take k-centroids

slide-9
SLIDE 9

Al Mamunur Rashid, WebKDD 2006 9

ClustKNN: Proposed Approach

  • Bisecting k-means clustering

− Better k-means

Cluster sizes are more uniform Better results found in document clustering (Steinbach 2000)

  • Similarity function:

− Same in both cluster-building and CF − Nicely complements each other

slide-10
SLIDE 10

Al Mamunur Rashid, WebKDD 2006 10

Other Algorithms Considered

slide-11
SLIDE 11

Al Mamunur Rashid, WebKDD 2006 11

Time-complexities

slide-12
SLIDE 12

Al Mamunur Rashid, WebKDD 2006 12

Experiments: Datasets

  • Movie recommendation data from
slide-13
SLIDE 13

Al Mamunur Rashid, WebKDD 2006 13

Experiments: Evaluation Metrics

  • Prediction eval metrics

− NMAE

Divide MAE with Expected MAE Limitation:

  • Same value of error: same treatment

No difference between two (pred, actual) pairs (5, 2) and (2, 5)

− Expected Utility (EU)

  • Recommendation list eval metrics

− Precision-recall-F1

slide-14
SLIDE 14

Al Mamunur Rashid, WebKDD 2006 14

Evaluation Metric: EU

  • Two tables:

− A contingency table

Rows: predictions; columns: actual ratings

− A utility table

Filled with a linear utility function: Penalizes false positives more than false negatives

slide-15
SLIDE 15

Al Mamunur Rashid, WebKDD 2006 15

Results

0.425 0.43 0.435 0.44 0.445 0.45 0.455 0.46 0.465 0.47 20 30 40 50 60 70 80 100 120 140 200 500 # of clusters in the model NMAE ClustKNN User-based KNN 6 6.2 6.4 6.6 6.8 7 20 30 40 50 60 70 80 100 120 140 200 500 # of clusters in the model Expect ed Ut ilit y ClustKNN User-based KNN

slide-16
SLIDE 16

Al Mamunur Rashid, WebKDD 2006 16

Results: Prediction Accuracy

slide-17
SLIDE 17

Al Mamunur Rashid, WebKDD 2006 17

Results: Recommendation List

slide-18
SLIDE 18

Al Mamunur Rashid, WebKDD 2006 18

ClustKNN: Discussion

  • Scalable!
  • Simple and explainable
  • Hybrid of model- and memory-based approaches
  • Great for occasionally-connected, low-storage

devices!

− Memory requirement: only O(km+m) !

slide-19
SLIDE 19

Al Mamunur Rashid, WebKDD 2006 19

Thanks for listening!

Questions?