Recommender system industry challenges move towards real-world, - - PowerPoint PPT Presentation

recommender system industry challenges move towards real
SMART_READER_LITE
LIVE PREVIEW

Recommender system industry challenges move towards real-world, - - PowerPoint PPT Presentation

Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova March 23th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING,


slide-1
SLIDE 1

Get on with it!

Recommender system industry challenges move towards real-world,

  • nline evaluation

Padova – March 23th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu

slide-2
SLIDE 2
  • Andreas

Andreas Lommatzsch

Andreas.Lommatzsch@tu-berlin.de http://www.dai-lab.de

slide-3
SLIDE 3
  • s

Jonas Seiler

Jonas.Seiler@plista.com http://www.plista.com

slide-4
SLIDE 4
  • Daniel

Daniel Kohlsdorf

Daniel.Kohlsdorf@xing.com http://www.xing.com

slide-5
SLIDE 5

Where are recommender system challenges headed? Direction 1: Use info beyond the user- item matrix. Direction 2: Online evaluation + multiple metrics.

Moving towards real-world evaluation

Flickr credit: rodneycampbell

slide-6
SLIDE 6

Why evaluate?

<Images showing “our” use cases>

  • Evaluation is crucial for the success of real-life systems
  • How should we evaluate?
  • Precision and

Recall Technical complexity Influence

  • n sales

Required hardware resources Business models Scalability Diversity of the presented results User satisfaction

slide-7
SLIDE 7

Evaluation Settings

  • A static collection of documents
  • A set of queries
  • A list of relevant documents defined by

experts for each query

Traditional Evaluation in IR

“The Cranfield paradigm”

Advantages

  • Reproducible setting
  • All researches have exactly the same

information

  • Optimized for measuring precision

Query0 * #nn * #nn * #nn

slide-8
SLIDE 8

Traditional Evaluation in IR

Weaknesses of traditional IR evaluation

  • High costs for creating dataset
  • Datasets are not up-to-date
  • Domain-specific documents
  • The expert-defined ground truth does not

consider individual user preferences

  • Individual user preferences
  • Context-awareness is not considered
  • Technical aspects are ignored

Context is everything

slide-9
SLIDE 9

Industry and recsys challenges

  • Challenges benefit both industry and academic research.
  • We look at how industry challenges have evolved since

the Netflix prize 2009.

slide-10
SLIDE 10

Traditional Evaluation in RecSys

Evaluation Settings

  • Rating prediction on user-item matrices
  • Large, sparse dataset
  • Predict personalized ratings
  • Cross-validation, RMSE

Advantages

  • Reproducible setting
  • Personalization
  • Dataset is based on

real user ratings

“The Netflix paradigm”

slide-11
SLIDE 11

Traditional Evaluation in RecSys

Weaknesses of traditional Recommender evaluation

  • Static data
  • Only one type of data - only user ratings
  • User ratings are noisy
  • Temporal aspects tend to be ignored
  • Context-awareness is not considered
  • Technical aspects are ignored
slide-12
SLIDE 12

Challenges of Developing Applications

Challenges

  • Data streams - continuous changes
  • Big data
  • Combine knowledge from different sources
  • Context-Awareness
  • Users expect personally relevant results
  • Heterogeneous devices
  • Technical complexity, real-time requirements
slide-13
SLIDE 13

How to address these challenges in the Evaluation?

  • Realistic evaluation setting

– Heterogeneous data sources – Streams – Dynamic user feedback

  • Appropriate metrics

– Precision and User satisfaction – Technical complexity – Sales and Business models

  • Online and Offline Evaluation

How to Setup a better Evaluation?

slide-14
SLIDE 14

Approaches for a better Evaluation

  • News recommendations

@ plista

  • Job recommendations

@ XING

slide-15
SLIDE 15

The plista Recommendation Scenario

Setting

  • 250 ms response time
  • 350 Mio AI/day
  • In 10 Countries

Challenges

  • News change

continuously

  • User do not log-in

explicitly

  • Seasonality, context-

depend user preferences

slide-16
SLIDE 16

Offline

  • Cross-validation

– Metric Optimization Engine (https://github.com/Yelp/MOE) – Integration into Spark

  • How well does it correlate with

Online Evaluation?

  • Time Complexity

Evaluation @ plista

Online

  • AB Tests

– Limited

  • by Caching Memory
  • Computational

Resources – MOE*

slide-17
SLIDE 17

Offline

  • Mean and variance estimation of parameter space with

Gaussian Process

  • Evaluate parameter with highest Expected Improvement (EI),

Upper Confidence Interval ….

  • Rest API

Evaluation using MOE

slide-18
SLIDE 18

Online

  • A/B Tests are expensive
  • Model non-stationarity
  • Integrate out non-stationarity

to get mean EI

Evaluation using MOE

slide-19
SLIDE 19

Provide an API enabling researchers testing own ideas

  • The CLEF-NewsREEL challenge
  • A Challenge in CLEF (Conferences and Labs of the Evaluation Forum)
  • 2 Tasks: Online and Offline Evaluation

The CLEF-NewsREEL challenge

slide-20
SLIDE 20

How does the challenge work?

  • Live streams consisting of impressions, requests, and

clicks, 5 publishers, approx 6 Million messages per day

  • Technical requirements: 100 ms per request
  • Live evaluation

based on CTR

CLEF-NewsREEL Online Task

slide-21
SLIDE 21

Online vs. Offline Evaluation

  • Technical aspects can be evaluated without user feedback
  • Analyze the required resources and the response time
  • Simulate the online evaluation by replaying a recorded

stream

CLEF-NewsREEL Offline Task

slide-22
SLIDE 22

Challenge

  • Realistic simulation of streams
  • Reproducible setup of computing environments

Solution

  • A framework simplifying

the setup of the evaluation environment

  • The Idomaar framework

developed in the CrowdRec project

CLEF-NewsREEL Offline Task

http://rf.crowdrec.eu

slide-23
SLIDE 23

More Information

  • SIGIR forum Dec 2015 (Vol 49, #2)

http://sigir.org/files/forum/2015D/p129.pdf Evaluate your algorithm online and offline in NewsREEL

  • Register for the challenge!

http://crowdrec.eu/2015/11/clef-newsreel-2016/ (register until 22nd of April)

  • Tutorials and Templates are provided at orp.plista.com

CLEF-NewsREEL

slide-24
SLIDE 24

https://recsys.xing.com/

XING - RecSys Challenge

slide-25
SLIDE 25

Job Recommendations @ XING

slide-26
SLIDE 26

XING - Evaluation based on interaction

  • On Xing users can give feedback on recommendations.
  • Number of user feedback way lower than implicit measures.
  • A/B Tests focus on clickthrough rate.
slide-27
SLIDE 27

XING - RecSys Challenge, Scoring, Space on Page

  • Predict 30 items for each user.
  • Score: weighted combination of

the precision ○ precisionAt(2) ○ precisionAt(4) ○ precisionAt(6) ○ precisionAt(20)

Top 6

slide-28
SLIDE 28

XING - RecSys Challenge, User Data

  • User ID
  • Job Title
  • Educational Degree
  • Field of Study
  • Location
slide-29
SLIDE 29

XING - RecSys Challenge, User Data

  • Number of past jobs
  • Years of Experience
  • Current career level
  • Current discipline
  • Current industry
slide-30
SLIDE 30

XING - RecSys Challenge, Item Data

  • Job title
  • Desired career level
  • Desired discipline
  • Desired industry
slide-31
SLIDE 31

XING - RecSys Challenge, Interaction Data

  • Timestamp
  • User
  • Job
  • Type:

– Deletion – Click – Bookmark

slide-32
SLIDE 32

XING - RecSys Challenge, Anonymization

slide-33
SLIDE 33

XING - RecSys Challenge, Anonymization

slide-34
SLIDE 34

XING - RecSys Challenge, Future

  • Live Challenge

– Users submit predicted future interactions – The solution is recommended on the platform – Participants get points for actual user clicks

Release to Challenge Collect Clicks Work On Predictions Score

slide-35
SLIDE 35

How to setup a better Evaluation

  • Consider different quality criteria

(prediction, technical, business models)

  • Aggregate heterogeneous information sources
  • Consider user feedback
  • Use online and offline analyses

to understand users and their requirements

Concluding ...

slide-36
SLIDE 36

Participate in challenges based on real-life scenarios

  • NewsREEL challenge

Concluding ...

  • RecSys 2016 challenge

=> Organize a challenge. Focus on real-life data.

http://orp.plista.com http://2016.recsyschallenge.com/

slide-37
SLIDE 37

More Information

  • http://www.crowdrec.eu
  • http://www.clef-newsreel.org
  • http://orp.plista.com
  • http://2016.recsyschallenge.com
  • http://www.xing.com

Thank You