Hybrid algorithms for recommending new items - - PowerPoint PPT Presentation

hybrid algorithms for recommending new items
SMART_READER_LITE
LIVE PREVIEW

Hybrid algorithms for recommending new items - - PowerPoint PPT Presentation

2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011) Chicago, IL (USA) Oct 2011, 27 th Hybrid algorithms for recommending new items http://dx.doi.org/10.1145/2039320.2039325


slide-1
SLIDE 1

Hybrid algorithms for recommending new items

2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011)

Chicago, IL (USA) – Oct 2011, 27th

http://dx.doi.org/10.1145/2039320.2039325

MOVIRI, R&D

ROBERTO TURRIN – Moviri, R&D

Paolo Cremonesi – Politecnico di Milano Fabio Airoldi – Moviri, R&D

http://dx.doi.org/10.1145/2039320.2039325
slide-2
SLIDE 2

..in a nutshell

  • Hybrid algorithms
  • Real domain

requirements

  • scalability
  • modularity
  • many unrated items
Credits: http://dpaki.com/?p=2591
  • many unrated items
  • New-item stressing

experiments

  • Datasets
  • Private TV dataset
  • MovieLens
slide-3
SLIDE 3

Traditional recommender systems

Collaborative (CF) Content-based (CBF)

Pros
  • High quality

Cons

Pros
  • Work on new items

Cons

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items Cons
  • New items problem

(since they do not have ratings)

  • Popularity bias
Cons
  • Low quality

(since user ratings are ignored)

  • Profile overfitting
slide-4
SLIDE 4

..so CF or CBF? ..many variables

quality

CF CBF

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

time

new system mature system

CBF

?

slide-5
SLIDE 5

TV domain: new items

  • The EPG is characterized by

many unrated, new TV programs

  • The percentage of new-item
  • The percentage of new-item

cannot be neglected

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-6
SLIDE 6

Existing hybrid algorithms

Several hybrid algorithms mix CF and CBF

(but also demographics, social)

e.g.:

  • P

. Melville, R. J. Mooney, and R. Nagarajan. “Content-boosted collaborative filtering for improved recommendations”, 2002

  • B. Mobasher, X. Jin, and Y. Zhou. “Semantically Enhanced Collaborative

Filtering on the Web”, 2003

Pros

Some approaches show better quality than CF/CBF

Cons

  • Low scalability / no real-time recommendations
  • Only partial focus on new-item problem
  • Not working with implicit, binary ratings
  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-7
SLIDE 7

Our hybrid algorithms

GOALS New-item Quality comparable to collaborative REQUIREMENTS: Batch/real-time scalability/complexity Updated recommendations Modularity: ability to re-use existing CF and CBF algorithms.
  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

Modularity: ability to re-use existing CF and CBF algorithms.

Implicit/explicit ratings
slide-8
SLIDE 8

Main contributions

GOALS New-item Quality comparable to collaborative REQUIREMENTS: Batch/real-time scalability/complexity Updated recommendations Modularity: ability to re-use existing CF and CBF algorithms.
  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

Two hybrid algorithms: extension of SimComb algorithm introduction of a new hybrid algorithm New-item stressing evaluation

Modularity: ability to re-use existing CF and CBF algorithms.

Implicit/explicit ratings
slide-9
SLIDE 9

MOVIRI, R&D

STATE-OF-THE-ART RECOMMENDER ALGORITHMS

slide-10
SLIDE 10

Collaborative algorithms

Implemented strategies:

Item-item neighborhood-based (NNCos)

User Rating Matrix (URM)

Rating given by user u to item i In implicit dataset is either 1 or 0 u i

Item-item neighborhood-based (NNCos)

  • Recommendations are based on item-item similarities computed as the

cosine metric

Latent factor models (PureSVD)

  • Recommendations are based on hidden factors implicitly discovered by

means of a matrix factorization (SVD)

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-11
SLIDE 11

Content-based algorithm

Weight of feature f in item i.

Computed as TF-IDF Example of features: genre, actors,

directors,…

Item-content matrix (ICM)

f i

LSA (Latent Semantic Analysis) The ICM is factorized by means of SVD in order to discover latent semantic

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-12
SLIDE 12

Hybrid algorithms

Interleaved (INTL)

  • Trivial hybrid implementation where the final recommendation list is

formed by alternating items recommended by the CF algorithm with items recommended by the CBF algorithm Item A Item B Item C Item Z Item Y Item X

CF list CBF list

Item A Item Z Item B Item Y

SimComb [Mobasher et al. 2004]

  • Two item-item similarity matrices are computed and linearly combined

CF item-item similarities

Item Y

(1-α) + α

CBF item-item similarities

=

Hybrid list

HYBRID item-item similarities

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-13
SLIDE 13

MOVIRI, R&D

PROPOSED HYBRID ALGORITHMS

  • FFA (Filtered Feature Augmentation)
  • SIMinjKnn (Similarity Injection Knn)
slide-14
SLIDE 14

Collaborative filtering as main brick

CF

We trust CF recommendations when the model has been trained with “enough” information (i.e., ratings)

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

CBF

We add CBF-based data (i.e., rating) for better training the CF when no enough information is available

slide-15
SLIDE 15

Collaborative filtering as main brick

CF

We trust CF recommendations when the model has been trained with “enough” information (i.e., ratings)

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

CBF

We add CBF-based data (i.e., features) for better training the CF when no enough information is available

slide-16
SLIDE 16

Item-item model

KNN Item-item similarity matrix

i

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

j

A number of recommendation (CF and CBF) algorithms allow to compute item-item similarity.

slide-17
SLIDE 17

Item-item model: real-time recommendations

KNN Item-item similarity matrix

i

+

  • +

User ratings

? ? ? ?

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

j

slide-18
SLIDE 18

Item-item model: real-time recommendations

+

  • + *

User ratings

? ? ? ?

Real-time requirements:

  • Memory: K * #items
  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
  • Memory:

#items

  • Time: f(#ratings,K) * #items

MODEL

  • Use of existing algorithms
  • Updated recommendations
  • Implicit/explicit ratings
slide-19
SLIDE 19

Filtered Feature Augmentation (FFA)

Motivation

  • Pseudo-ratings model new items
  • Less sparse item-profiles

Idea: add pseudo-ratings to the item profiles

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

CONTENT

CBF

Filter

RATINGS

CF

Model

slide-20
SLIDE 20

Filtered Feature Augmentation (FFA)

Motivation

  • Pseudo-ratings model new items
  • Less sparse item-profiles

Idea: add pseudo-ratings to the item profiles

Entropy-based filtering (e.g., Gini impurity measure) predicted ratings

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

CONTENT

CBF

Filter

RATINGS

CF

Model

slide-21
SLIDE 21

Similarity Injection Knn (SIMinjKnn)

Motivation

  • Discovering relationships between new and
  • ld items

Idea: mixing CF and CBF similarities

CONTENT

CBF

CBF

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

CONTENT

CBF

RATINGS

CF

CF Model Combiner Model Model

slide-22
SLIDE 22

MOVIRI, R&D

EVALUATION

slide-23
SLIDE 23

Datasets

1M Movielens ~6K users, ~3.9K items, 1M ratings

ML

An implicit, binary dataset collected from

15’000 IPTV users over a period of six months

~15K users, ~800 rated items/~4K, ~26K ratings Multilanguage (mainly German, French) content

data

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

TV

available at http://home.dei.polimi.it/cremones/memo/downloads/TV2.zip
slide-24
SLIDE 24

Testing methodology (1)

Training set (extracted from H1) Test set

  • (100-β)% existing items:

extracted from H1

  • β% new items: extracted from H2
  • H1: set of existing items
  • H2: set of new items

Discarded ratings

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-25
SLIDE 25

Testing methodology (1)

Training set (extracted from H1) Test set

  • (100-β)% existing items:

extracted from H1

  • β% new items: extracted from H2
  • H1: set of existing items
  • H2: set of new items

Discarded ratings

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-26
SLIDE 26

Testing methodology (2)

For each <user, item> <u,i> in H1+2:

Generate rating prediction for i Generate rating prediction for every other items Sort the items according to predicted rating

There is a “hit” if rank(i) < N

There is a “hit” if rank(i) < N

i.e., item i appears in the top-N.

In our tests, N=20

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-27
SLIDE 27

Non-hybrid algorithms

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

ML TV

ML TV

slide-28
SLIDE 28

Hybrid algorithms: ML

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

ML

slide-29
SLIDE 29

Hybrid algorithms: ML

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

ML

slide-30
SLIDE 30

Hybrid algorithms: TV

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

TV

slide-31
SLIDE 31

Toy sample

IJ2 BF1 BF2 IJ2 BF1 BF2

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

IJ1 BF3

CBF

IJ1 BF3

CF

slide-32
SLIDE 32

Toy sample

IJ2 BF1 BF2

SIMInj

IJ2 BF1 BF2

FFA

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

CBF CF new “connections”

IJ1 BF3 IJ1 BF3

slide-33
SLIDE 33

Conclusions / Future work

Proposed 2 hybrid algorithms: Higher recall than CF and CBF in the presence of new

items

Scalable / non-affecting real-time performance Handling implicit/explicit ratings Future work: Subjective evaluation Improving the filter with other information Other domains
  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
slide-34
SLIDE 34

Roberto TURRIN, PhD

Moviri, R&D – Italy

roberto.turrin@moviri.com

Thank you

Politecnico di Milano

www.contentwise.tv

roberto.turrin@moviri.com

  • R. TURRIN, P
. Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items

Q&A