Going beyond the algorithms Yehuda Koren Haifa movie #15868 Rese - - PowerPoint PPT Presentation

going beyond the algorithms
SMART_READER_LITE
LIVE PREVIEW

Going beyond the algorithms Yehuda Koren Haifa movie #15868 Rese - - PowerPoint PPT Presentation

Lessons from the Netflix Prize: Going beyond the algorithms Yehuda Koren Haifa movie #15868 Rese search movie #7614 movie #16661 We Know What You Ought To Be Watching This Summer Rese search Rese search Were quite curious,


slide-1
SLIDE 1

Rese search

movie #7614 movie #16661 movie #15868

Lessons from the Netflix Prize: Going beyond the algorithms

Yehuda Koren

Haifa

slide-2
SLIDE 2

Rese search

We Know What You Ought To Be Watching This Summer

slide-3
SLIDE 3

Rese search

slide-4
SLIDE 4

Rese search

“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules

  • Goal to improve on Netflix’ existing movie recommendation

technology, Cinematch

  • Criterion: reduction in root mean squared error (RMSE)
  • Oct’06: Contest began
  • Oct’07: $50K progress prize for 8.43% improvement
  • Oct’08: $50K progress prize for 9.44% improvement
  • Sept’09: $1 million grand prize for 10.06% improvement
slide-5
SLIDE 5

Rese search

score movie user

1 21 1 5 213 1 4 345 2 4 123 2 3 768 2 5 76 3 4 45 4 1 568 5 2 342 5 2 234 5 5 76 6 4 56 6

movie user

? 62 1 ? 96 1 ? 7 2 ? 3 2 ? 47 3 ? 15 3 ? 41 4 ? 28 4 ? 93 5 ? 74 5 ? 69 6 ? 83 6

Training data Test data

Movie rating data

  • Training data

– 100 million ratings – 480,000 users – 17,770 movies – 6 years of data: 2000-2005

  • Test data

– Last few ratings

  • f each user (2.8

million)

  • Dates of ratings are

given

slide-6
SLIDE 6

Rese search

Data >> Models

  • Very limited feature set

– User, movie, date – Places focus on models/algorithms

  • Major steps forward associated with incorporating new

data features

– Temporal effects – Selection bias:

  • What movies a user rated
  • Daily rating counts
slide-7
SLIDE 7

Rese search

Multiple sources of temporal dynamics

  • Item-side effects:

– Product perception and popularity are constantly changing – Seasonal patterns influence items’ popularity

  • User-side effects:

– Customers ever redefine their taste – Transient, short-term bias; anchoring – Drifting rating scale – Change of rater within household

slide-8
SLIDE 8

Rese search

Something Happened in Early 2004…

2004

slide-9
SLIDE 9

Rese search

Are movies getting better with time?

slide-10
SLIDE 10

Rese search

Temporal dynamics - challenges

  • Multiple effects: Both items and users are changing over

time  Scarce data per target

  • Inter-related targets: Signal needs to be shared among

users – foundation of collaborative filtering  cannot isolate multiple problems  Common “concept drift” methodologies won’t hold. E.g., underweighting older instances is unappealing

slide-11
SLIDE 11

Rese search

Effect of daily rating counts

  • Number of ratings user gave on the same day is an

important indicator

  • It affects different movies differently

Credit to: Martin Piotte and Martin Chabbert

slide-12
SLIDE 12

Rese search

Memento vs Patch Adams

3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 4.1 4.2 1 2 3

  • 4

5

  • 8

9

  • 16

17

  • 32

33

  • 64

65

  • 128

129

  • 256

257 +

Memento (127318 samples)

3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 1 2 3

  • 4

5

  • 8

9

  • 16

17

  • 32

33

  • 64

65

  • 128

129

  • 256

257 +

Patch Adams (121769 samples) Credit to: Martin Piotte and Martin Chabbert

slide-13
SLIDE 13

Rese search

Why daily rating counts

  • Number of user ratings on a date is a proxy for

how long ago the movie was seen

– Some movies age better than others

  • Also, two rating tasks:

– Seed Netflix recommendations – Rate movies as you see them

  • Related to selection bias?
slide-14
SLIDE 14

Rese esear arch

Components of a rating predictor

user-movie interaction movie bias user bias

User-movie interaction

  • Characterizes the matching

between users and movies

  • Attracts most research in the field
  • Benefits from algorithmic and

mathematical innovations

Baseline predictor

  • Separates users and movies
  • Often overlooked
  • Benefits from insights into users’

behavior

  • Among the main practical

contributions of the competition

Biases matter!

slide-15
SLIDE 15

Rese esear arch

A baseline predictor

  • We have expectations on the rating by user u to movie i, even without

estimating u’s attitude towards movies like i – Rating scale of user u – Values of other ratings user gave recently (day-specific mood, anchoring, multi-user accounts) – (Recent) popularity of movie i – Selection bias; related to number of ratings user gave on the same day

slide-16
SLIDE 16

Rese esear arch

Biases 33% Personalization 10% Unexplained 57%

Sources of Variance in Netflix data

1.276 (total variance) 0.732 (unexplained) 0.415 (biases) 0.129 (personalization) + +

slide-17
SLIDE 17

Rese esear arch

What drives user preferences?

  • Do they like certain genre, actors, director, keywords, etc. ?
  • Well, some do, but this is far from a complete characterization!
  • E.g., a recent paper is titled:

– “Recommending new movies: even a few ratings are more valuable than metadata” [Pilaszy and Tikk, 09]

  • User motives are latent, barely interpretable in human language
  • Can be captured when data is abundant
slide-18
SLIDE 18

Rese esear arch

Wishful perception

Geared towards females Geared towards males serious escapist

The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s 11 Sense and Sensibility

slide-19
SLIDE 19

Rese esear arch

Complex reality…

slide-20
SLIDE 20

Rese search

Ratings are not given at random!

Marlin, Zemel, Roweis, Slaney, “Collaborative Filtering and the Missing at Random Assumption” UAI 2007 Yahoo! survey answers Yahoo! music ratings Netflix ratings

Distribution of ratings

slide-21
SLIDE 21

Rese search

  • A powerful source of information:

Characterize users by which movies they rated, rather than how they rated

  •  A dense binary representation of the data:

4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1

users movies

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

users movies

Which movies users rate?

  ,

ui u i

R r 

  ,

ui u i

B b 

slide-22
SLIDE 22

Rese search

Ensembles are Valuable for Prediction

  • Our final solution was a linear blend of over 700

prediction sets

– Some of the 700 were blends

  • Difficult, or impossible, to build a grand unified model
  • Blending techniques: linear regression, neural network,

gradient boosted decision trees, and more…

  • Mega blends are not needed in practice

– A handful of simple models achieves 90% of the improvement of the full blend

slide-23
SLIDE 23

Rese search

4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1

3 4 3 2 1 4 5 4 2 4 3 4

3 2 5 2 4 Yehuda Koren

Yahoo! Research

inc.com

  • yehuda@yahoo