On the Effectiveness of Linear Models for One-Class Collaborative - - PowerPoint PPT Presentation

on the effectiveness of linear models for one class
SMART_READER_LITE
LIVE PREVIEW

On the Effectiveness of Linear Models for One-Class Collaborative - - PowerPoint PPT Presentation

On the Effectiveness of Linear Models for One-Class Collaborative Filtering Suvash Sedhain 1,2 , Aditya Menon 2,1 , Scott Sanner 3,1 , Darius Braziunas 4 Australian National University 1 NICTA 2 Oregon State University 3 Rakuten Kobo Inc 4


slide-1
SLIDE 1

On the Effectiveness of Linear Models for One-Class Collaborative Filtering

Suvash Sedhain1,2, Aditya Menon2,1, Scott Sanner3,1, Darius Braziunas4

Australian National University1 NICTA2 Oregon State University3 Rakuten Kobo Inc 4

slide-2
SLIDE 2

Recommender Systems

  • Recommender Systems

– Objective: Present personalized items to users

  • Collaborative filtering

– De-facto method for multiuser recommender systems – Find people like you and leverage their preferences – One-class: only observe positive feedback

slide-3
SLIDE 3

Sneak Peak: Model Proposal

  • Personalized user focused linear model
  • Convex
  • Embarrassingly parallel

– Each user trained individually

slide-4
SLIDE 4

State-of-the-art Collaborative Filtering

  • Neighborhood methods
  • Matrix Factorization
  • SLIM (Sparse Linear Method)
slide-5
SLIDE 5

Nearest Neighbors: A Matrix View

. . . . . .

1 1 1 1

. . . . . .

. . . . 1 1 ? ? ? ? ? ? ? ? 1 1 1 1 ? ?

. . . . .

. . . .

×

. . . . . .

=

  • { Jaccard, Cosine} similarity SI used in practice
  • Keep only top k similarities
  • Simple, but learning is limited
slide-6
SLIDE 6

Factorization Model

  • Works well in general, but non-convex!

(Weighted) Matrix Factorization

m × k k × n

. . . . . .

1 1 1 1 . . . . 1 1 1 1 1 1

=

. . . . . .

User Projection Item Projection

slide-7
SLIDE 7

SLIM

10

. . . . . .

1 1 1 1 . . . . 1 1 1 1 1 1

×

=

  • Effectively trying to learn item-to-item similarities
  • Not user-focused, complicated optimization

user item item

. . . . .

. . . . . . . . . .

1 1 1 1 . . . . 1 1 1 1 1 1 item

slide-8
SLIDE 8

Recommender Systems Desiderata

  • Learning based
  • Convex objective
  • User focused
  • Parallelizable
slide-9
SLIDE 9

Comparison of recommendation methods for OC-CF

slide-10
SLIDE 10

Outline

  • Problem statement
  • Background
  • LRec Model
  • Experiments
  • Results
  • Summary
slide-11
SLIDE 11

LRec

Recommendation for

1 1 1

Wu1

×

=

Any loss function

  • Squared
  • Logistic

Recommendation Learning a model per user

. . .

. . .

1 1 1 1 1 1 1 1 1 1 1

. . .

. . .

  • Each item is a training

instance

  • Can be interpreted as

learning user-user affinities

  • Regularizer prevents

from the trivial solution

slide-12
SLIDE 12

Properties of LRec

  • User focused

– Recommendation as learning a model per user

  • Convex objective

– Guarantees optimal solution for the formulation

  • Embarrassingly parallel

– Each model is completely independent of other

slide-13
SLIDE 13

Relationship with Existing Models

LRec SLIM

  • User focused
  • L2 penalty
  • Optimization

– L2 loss – Logistic Loss : Liblinear (dual iff #users >> #items)

  • Item focused
  • Elastic-net penalty + non-negativity

constraints

  • Optimization:
  • Coordinate descent
  • Levy et.al. relaxed the non-negativity

constraints; optimization via SGD Truncated Gradient

slide-14
SLIDE 14

Relationship with Existing Models

LRec

  • Learns weight matrix via

classification/regression problem – can be interpreted as learning user- user similarities

Neighborhood models

  • Computes similarities using predefined

similarity metrics(eg: Cosine, Jaccard)

slide-15
SLIDE 15

Matrix Factorization

Relationship with Existing Models

LRec

  • Learns weight matrix via

classification/regression problem – can be interpreted as learning user- user similarities

Recommendation

Where,

If

  • Non Convex objective
  • Low rank
  • Parallelism via distributed

communication

  • Convex objective
  • Full rank
  • Embarrassingly parallel
slide-16
SLIDE 16

Other Advantages of LRec

  • Efficient hyper-parameter tuning for ranking

– Validate on small subset of users

  • Model can be fine-tuned per user
slide-17
SLIDE 17

Other Advantages of LRec: Incorporating Side Information

. . .

. . .

1 1 1 1 1 1 1 1 1 1 1

. . .

. . .

Genre Actors

. . .

×

=

1 1 Item features

  • Can easily incorporate abundant item-side information
slide-18
SLIDE 18

Outline

  • Problem statement
  • Background
  • LRec Model
  • Experiment & Results
  • Summary
slide-19
SLIDE 19

Dataset Description and Evaluation

  • Movielens 1M (ML1M)
  • Kobo
  • Last FM (LASTFM)
  • Million Song Dataset (MSD)

Evaluation Metrics

  • precision@k
  • mean Average Precision@100
  • 10 random train-test split
  • 80%-20% split
  • For MSD, we evaluate on random 500

users

  • Error bars => 95% confidence interval
slide-20
SLIDE 20

Experiment Setup

  • Baselines

– Most Popular – Neighborhood

  • User KNN (U-KNN)
  • Item KNN (I-KNN)

– Matrix Factorization

  • PureSVD
  • WRMF
  • LogisticMF
  • Bayesian Personalized

Ranking (BPR)

  • SLIM
  • LREC

– Elastic Net Lrec + Non-Negativity (Lrec + Sq + L1+ NN) – Squared Loss LRec (Lrec + Sq) – Logistic Loss LRec (LRec)

slide-21
SLIDE 21

Results

Did not finish

slide-22
SLIDE 22

Results

Precision@20 on ML1M and LastFM dataset

slide-23
SLIDE 23

Results

Did not finish

Precision@20 on Kobo and LastFM dataset

slide-24
SLIDE 24

Performance Evaluation

% improvement over WRMF on ML1M dataset

Users segmented by the number of observation

slide-25
SLIDE 25

Case Study

Recommendation from WRMF vs LRec LRec is more personalized

slide-26
SLIDE 26

Summary

  • LRec

– Personalized user focused linear recommender – Convex objective – Embarrassingly parallel

  • Future work

– Further scale LRec

  • Computational
  • Memory footprint
slide-27
SLIDE 27

Thanks