recommender system industry challenges move towards real
play

Recommender system industry challenges move towards real-world, - PowerPoint PPT Presentation

Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova March 23th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING,


  1. Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu

  2. Andreas Lommatzsch • Andreas Andreas.Lommatzsch@tu-berlin.de http://www.dai-lab.de

  3. Jonas Seiler • s Jonas.Seiler@plista.com http://www.plista.com

  4. Daniel Kohlsdorf • Daniel Daniel.Kohlsdorf@xing.com http://www.xing.com

  5. Moving towards real-world evaluation Where are recommender system challenges headed? Direction 1: Use info beyond the user- item matrix. Direction 2: Online evaluation + multiple metrics. Flickr credit: rodneycampbell

  6. Why evaluate? <Images showing “our” use cases> Evaluation is crucial for the success of real-life systems • ● ● How should we evaluate? • Influence on sales Precision and ● Recall ● Required hardware ● Technical resources complexity Business ● User models ● satisfaction ● Scalability Diversity of the presented results

  7. Traditional Evaluation in IR Evaluation Settings • A static collection of documents • A set of queries • A list of relevant documents defined by Query0 experts for each query * #nn * #nn * #nn Advantages “The Cranfield paradigm” • Reproducible setting • All researches have exactly the same information • Optimized for measuring precision

  8. Traditional Evaluation in IR Weaknesses of traditional IR evaluation • High costs for creating dataset • Datasets are not up-to-date • Domain-specific documents • The expert-defined ground truth does not consider individual user preferences • Individual user preferences Context is everythin g • Context-awareness is not considered • Technical aspects are ignored

  9. Industry and recsys challenges Challenges benefit both industry and academic research. • • We look at how industry challenges have evolved since the Netflix prize 2009.

  10. Traditional Evaluation in RecSys Evaluation Settings • Rating prediction on user-item matrices • Large, sparse dataset • Predict personalized ratings • Cross-validation, RMSE Advantages • Reproducible setting • Personalization • Dataset is based on real user ratings “The Netflix paradigm”

  11. Traditional Evaluation in RecSys Weaknesses of traditional Recommender evaluation • Static data • Only one type of data - only user ratings • User ratings are noisy • Temporal aspects tend to be ignored • Context-awareness is not considered • Technical aspects are ignored

  12. Challenges of Developing Applications Challenges • Data streams - continuous changes • Big data • Combine knowledge from different sources • Context-Awareness • Users expect personally relevant results • Heterogeneous devices • Technical complexity , real-time requirements

  13. How to Setup a better Evaluation? ● How to address these challenges in the Evaluation? • Realistic evaluation setting ● – Heterogeneous data sources – Streams – Dynamic user feedback ● • Appropriate metrics – Precision and User satisfaction ● – Technical complexity – Sales and Business models • Online and Offline Evaluation

  14. Approaches for a better Evaluation • News recommendations @ plista • Job recommendations @ XING

  15. The plista Recommendation Scenario Setting ● 250 ms response time ● 350 Mio AI/day ● In 10 Countries Challenges ● News change continuously ● User do not log-in explicitly ● Seasonality, context- depend user preferences

  16. Evaluation @ plista Offline Online • • Cross-validation AB Tests – – M etric O ptimization E ngine Limited • (https://github.com/Yelp/MOE) by Caching Memory – • Integration into Spark Computational • How well does it correlate with Resources – Online Evaluation? MOE* • Time Complexity

  17. Evaluation using MOE Offline • Mean and variance estimation of parameter space with Gaussian Process • Evaluate parameter with highest Expected Improvement (EI), Upper Confidence Interval …. • Rest API

  18. Evaluation using MOE Online • A/B Tests are expensive • Model non-stationarity • Integrate out non-stationarity to get mean EI

  19. The CLEF-NewsREEL challenge Provide an API enabling researchers testing own ideas • The CLEF-NewsREEL challenge • A Challenge in CLEF (Conferences and Labs of the Evaluation Forum) • 2 Tasks: Online and Offline Evaluation

  20. CLEF-NewsREEL Online Task How does the challenge work? • Live streams consisting of impressions, requests, and clicks, 5 publishers, approx 6 Million messages per day • Technical requirements: 100 ms per request • Live evaluation based on CTR

  21. CLEF-NewsREEL Offline Task Online vs. Offline Evaluation • Technical aspects can be evaluated without user feedback • Analyze the required resources and the response time • Simulate the online evaluation by replaying a recorded stream

  22. CLEF-NewsREEL Offline Task Challenge • Realistic simulation of streams • Reproducible setup of computing environments Solution • A framework simplifying the setup of the evaluation environment • The Idomaar framework developed in the CrowdRec project http://rf.crowdrec.eu

  23. CLEF-NewsREEL More Information • SIGIR forum Dec 2015 (Vol 49, #2) http://sigir.org/files/forum/2015D/p129.pdf Evaluate your algorithm online and offline in NewsREEL • Register for the challenge! http://crowdrec.eu/2015/11/clef-newsreel-2016/ (register until 22nd of April) • Tutorials and Templates are provided at orp.plista.com

  24. XING - RecSys Challenge https://recsys.xing.com/

  25. Job Recommendations @ XING

  26. XING - Evaluation based on interaction ● On Xing users can give feedback on recommendations. ● Number of user feedback way lower than implicit measures. ● A/B Tests focus on clickthrough rate.

  27. XING - RecSys Challenge, Scoring, Space on Page Top 6 ● Predict 30 items for each user. ● Score: weighted combination of the precision ○ precisionAt(2) ○ precisionAt(4) ○ precisionAt(6) ○ precisionAt(20)

  28. XING - RecSys Challenge, User Data • User ID • Job Title • Educational Degree • Field of Study • Location

  29. XING - RecSys Challenge, User Data • Number of past jobs • Years of Experience • Current career level • Current discipline • Current industry

  30. XING - RecSys Challenge, Item Data • Job title • Desired career level • Desired discipline • Desired industry

  31. XING - RecSys Challenge, Interaction Data • Timestamp • User • Job • Type: – Deletion – Click – Bookmark

  32. XING - RecSys Challenge, Anonymization

  33. XING - RecSys Challenge, Anonymization

  34. XING - RecSys Challenge, Future • Live Challenge – Users submit predicted future interactions – The solution is recommended on the platform – Participants get points for actual user clicks Score Release to Challenge Collect Clicks Work On Predictions

  35. Concluding ... How to setup a better Evaluation • Consider different quality criteria (prediction, technical, business models) • Aggregate heterogeneous information sources • Consider user feedback • Use online and offline analyses to understand users and their requirements

  36. Concluding ... Participate in challenges based on real-life scenarios • • NewsREEL challenge RecSys 2016 challenge http://orp.plista.com http://2016.recsyschallenge.com/ => Organize a challenge. Focus on real-life data .

  37. Thank You More Information • http://www.crowdrec.eu • http://www.clef-newsreel.org • http://orp.plista.com • http://2016.recsyschallenge.com • http://www.xing.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend