Recommender System Experiments with MyMediaLite Or: Everything you - PowerPoint PPT Presentation

Recommender System Experiments with MyMediaLite Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask) Zeno Gantner <zeno.gantner@nokia.com> Nokia Location & Commerce, Berlin

HERE Maps by Nokia … in Berlin ● ca. 800 people ● HERE Maps platform – mobile apps ● HERE Drive ● HERE Maps ● HERE Transit (public transport) – customers ● Yahoo Maps ● Bing Maps ● major car companies: BMW, VW, Toyota, ...

HERE Maps by Nokia … in Berlin Maps Search Team ● #bbuzz regulars ● 3 of us contributed to Lucene 4.3.0 ;-) http://2011.berlinbuzzwords.de/content/improving-search-ranking-through-ab-tests-case-study http://2012.berlinbuzzwords.de/sessions/efficient-scoring-lucene http://2012.berlinbuzzwords.de/sessions/introducing-cascalog-functional-data-processing-hadoop http://2012.berlinbuzzwords.de/sessions/relevance-optimization-check-candidate-lists https://issues.apache.org/jira/browse/LUCENE-4930 https://issues.apache.org/jira/browse/LUCENE-4571

(C) Paul L. Dineen; license: CC by; source http://www.flickr.com/photos/pauldineen/4529216647/sizes/o/in/photostream/

Data + Software/Algorithms = ??? Real-world deployments (c) Diliff; license CC by-3.0 (c) Joon Han, license: CC by-sa 3.0, source: http://en.wikipedia.org/wiki/File:Groundhog_day_tip_top_bistro.jpg

Data mining competitions

Research

RecSys Experiments with MyMediaLite 1. Interaction Data 2. Baseline Methods 3. Apples and Oranges 4. Metrics 5. Hyperparameter Tuning 6. Reproducibility

Running Example: MyMediaLite ● RecSys toolkit and evaluation framework ● written in C#/Mono ● C#, Python, Ruby, F# ● simple ● choice ● 2 Java ports ● free (RapidMiner plugin) ● documented ● regular releases (every ● tested 2-3 months) since 2010 http://mymedialite.net/ http://github.com/zenogantner/MyMediaLite

Running Example: MyMediaLite command-line tools ● rating_prediction ● item_recommendation Find all examples here: http://github.com/zenogantner/mml-eval-examples

1. Interaction Data Explicit feedback Implicit feedback ● views ● clicks ● purchases Not always there. Often positive-only.

1. Interaction Data item_recommendation --training-file=F1 --test-file=F2 IDs can be (almost) arbitrary strings User ID Item ID Timestamp optional 196 242 881250949 186 302 891717742 22 377 878887116 244 51 880606923 ... ... ... Separator: whitespace, Alternative format: tab, comma, :: yyyy-mm-dd

Random Splits item_recommendation … --test-ratio=0.25 Shuffle and split: Simple, but: ● Does not take temporal trends into account. ● Does not use all data for testing.

k-fold Cross-Validation item_recommendation … --cross-validation=4 Shuffle and split: ● Uses each data point for evaluation. ● Does not take temporal trends into account.

Chronological Splits rating_prediction … --chronological-split=0.25 rating_prediction … --chronological-split=01/01/2002 Sort chronologically and split: ● Use the past to predict the “future”. ● Takes trends in the data into account. – time of day, day of week – season – trending products

(c) Serolillo, license: CC by 2.5

2. Baseline Methods Why compare against baselines? ● Absolute numbers have no meaning. – … well, at least here. – Relative numbers may also have no meaning. ● … if you compare to the wrong things. Good baselines: ● the strongest solution that is still simple ● the existing solution ● standard solutions – coll. filtering: kNN, vanilla matrix factorization

2. Baseline Methods item_recommendation … --recommender=Random item_recommendation … --recommender=MostPopular item_recommendation … --recommender=MostPopularByAttributes --item-attributes=ARTISTS Item recommendation baselines: ● random ● popular items (by attribute/category)

(c) Michael Collins; license: CC by-2.0

3. Apples and Oranges Always check if you measure on the same splits. It happens quite often …

3. Apples and Oranges Always check if you measure on the same splits. It happens quite often … e.g. this ICML 2013 paper:

3. Apples and Oranges

3. Apples and Oranges ● On chronological splits of the Netflix dataset, matrix factorization (“SVD”) models usually do not perform below 0.9 RMSE. ● Chronological splits can be much harder than random splits! Lessons: ● Baselines are important – they can also help us to “debug” experiments. ● Do not compare between simple splits and chronological splits.

(c) Pastorius; license: CC by 3.0; source: http://commons.wikimedia.org/wiki/File:Plastic_tape_m

4. Metrics What is the right metric? ● Know your goal. – It always depends on what you want to achieve. – What to measure? ● Criticize your metrics. – They may ignore important aspects of your problem. – They are just approximations of user behavior. ● Eyeball the results. – Your metrics may fail to catch WTF results. http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/

4. Metrics item_recommendation ... --measures=”prec@5,NDCG” Precision at k ● number of “correct” items in the top k results ● The choice of k is specific to your application. ● very simple ● easy to understand and explain More ranking measures: NDCG, MAP, ERR

4. Metrics Precision at k recommendations precision at 4 bad 0 good 1 bad 0 bad 0 bad -- good -- bad -- 1/4

5. Hyperparameter Tuning item_recommendation … --recommender=WRMF --recommender-options=”reg=0.01 alpha=2” ● Hyperparameters, e.g. – regularization to control overfitting – learning rate (for gradient descent methods) – stopping criterion ● You have to do it. Also for your baselines. ● Don't get too fancy. – Grid search will do it in most cases. ● More advanced: – Nelder-Mead/Simplex

5. Hyperparameter Tuning rating_prediction … --search-hp Grid search ● simple ● brute force ● embarrassingly parallel “A practical guide to SVM classification” http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

6. Reproducible Experiments item_recommendation … --random-seed=1 Random seed ● “random” splitting ● training initialization ● debugging

6. Reproducible Experiments item_recommendation … --random-seed=1 Besides random seed: ● Put everything in version control. – data, software – scripts and configuration ● Use build tools like make for automation. – Knows when to re-run your data preprocessing steps. http://bitaesthetics.com/posts/make-for-data-scientists.html

6. Reproducible Experiments item_recommendations … --recommender=ExternalItemRecommender --recommender-options=”prediction_file=FILE” Re-use evaluation code. Create predictions using external software. Use MyMediaLite for evaluation.

6. Reproducible Experiments item_recommendations … --recommender=ExternalItemRecommender --recommender-options=”prediction_file=FILE” Why re-use evaluation code? ● Evaluation protocols (splitting+candidate selection+metrics) are not easy to get right. ● Ensures comparability. – more configuration kept fixed => less risk of accidental differences ● Laziness!

(c) by Caucas; license: CC by-nc-nd 2.0; source: http://www.flickr.com/photos/thecaucas/2597813380/sizes/o/

Summary 1. Split your data appropriately. 2. Do not compare apples and oranges . 3. Compare against simple and strong baselines . 4. Precision at k is a metric that is easy to explain. 5. Grid search is a simple method for hyperparameter tuning . 6. Make your experiments reproducible . 7. MyMediaLite can help you with some of these things ;-). Try it out!

(c) Michael Sauers; license CC by-nc-sa 2.0 http://github.com/zenogantner/mml-eval-examples http://mymedialite.net/ http://github.com/zenogantner/MyMediaLite

Recommender System Experiments with MyMediaLite Or: Everything you - PowerPoint PPT Presentation

Recommender System Experiments with MyMediaLite Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask) Zeno Gantner <zeno.gantner@nokia.com> Nokia Location & Commerce, Berlin HERE Maps by Nokia

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

ACADEMIC RECOMMENDER SYSTEM DESIGN 1 WHATS ACADEMIC RECOMMENDER SYSTEM Similar

Recommender System for Real Mobile Applications: Two Case Studies Big data vs. small data &

COMP9313: Big Data Management Recommender System Source from Dr. Xin Cao Recommendations

Recommender Systems Research Challenges Francesco Ricci Free University of Bozen-Bolzano

Recommender Systems MLSS 14 Collaborative Filtering and other approaches Xavier Amatriain

Experiments on deflection of charged Experiments on deflection of charged Experiments on

CSE 255 Lecture 5 Data Mining and Predictive Analytics Recommender Systems Why

Content- -based Recommender Systems based Recommender Systems Content problems, challenges

Recommendations, Activities, and Behavior Feb 9, 2018 Julian McAuley Where are recommender

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Numerical Shape Optimization Praveen. C praveen@math.tifrbng.res.in Tata Institute of

Helical Antennas with Improved Gain 1 School of Electrical Engineering, University of Belgrade,

On the Interaction of an Electro-dynamic On the Interaction of an Electro-dynamic Shaker and a

On feedback target control for uncertain discrete-time systems through polyhedral techniques

Bimanual Regrasping Introduction from Unimanual Machine Learning Algorithm Overview Vision

Mo r p h o l o g i c a l a n a l y s i s o f t h e c o m p o s i t

Beginning Neural Networks [Assignment 6] Paolo Gabriel ECE 228, Spring 2018 Objectives Data

Workshop 9.2a: Nested designs Murray Logan November 23, 2016 Table of contents 1 Nested

Recommender System Experiments with MyMediaLite Or: Everything you - PowerPoint PPT Presentation

Recommender System Experiments with MyMediaLite Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask) Zeno Gantner <zeno.gantner@nokia.com> Nokia Location & Commerce, Berlin HERE Maps by Nokia

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

ACADEMIC RECOMMENDER SYSTEM DESIGN 1 WHATS ACADEMIC RECOMMENDER SYSTEM Similar

Recommender System for Real Mobile Applications: Two Case Studies Big data vs. small data &amp;

COMP9313: Big Data Management Recommender System Source from Dr. Xin Cao Recommendations

Recommender Systems Research Challenges Francesco Ricci Free University of Bozen-Bolzano

Recommender Systems MLSS 14 Collaborative Filtering and other approaches Xavier Amatriain

Experiments on deflection of charged Experiments on deflection of charged Experiments on

CSE 255 Lecture 5 Data Mining and Predictive Analytics Recommender Systems Why

Content- -based Recommender Systems based Recommender Systems Content problems, challenges

Recommendations, Activities, and Behavior Feb 9, 2018 Julian McAuley Where are recommender

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Numerical Shape Optimization Praveen. C praveen@math.tifrbng.res.in Tata Institute of

Helical Antennas with Improved Gain 1 School of Electrical Engineering, University of Belgrade,

On the Interaction of an Electro-dynamic On the Interaction of an Electro-dynamic Shaker and a

On feedback target control for uncertain discrete-time systems through polyhedral techniques

Bimanual Regrasping Introduction from Unimanual Machine Learning Algorithm Overview Vision

Mo r p h o l o g i c a l a n a l y s i s o f t h e c o m p o s i t

Beginning Neural Networks [Assignment 6] Paolo Gabriel ECE 228, Spring 2018 Objectives Data

Workshop 9.2a: Nested designs Murray Logan November 23, 2016 Table of contents 1 Nested

Recommender System for Real Mobile Applications: Two Case Studies Big data vs. small data &