Recommender System Experiments with MyMediaLite
Or: Everything you always wanted to know about
- ffline experiments* (*but were afraid to ask)
Zeno Gantner <zeno.gantner@nokia.com> Nokia Location & Commerce, Berlin
Recommender System Experiments with MyMediaLite Or: Everything you - - PowerPoint PPT Presentation
Recommender System Experiments with MyMediaLite Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask) Zeno Gantner <zeno.gantner@nokia.com> Nokia Location & Commerce, Berlin HERE Maps by Nokia
Recommender System Experiments with MyMediaLite
Or: Everything you always wanted to know about
Zeno Gantner <zeno.gantner@nokia.com> Nokia Location & Commerce, Berlin
HERE Maps by Nokia … in Berlin
– mobile apps
– customers
Toyota, ...
HERE Maps by Nokia … in Berlin
Maps Search Team
Lucene 4.3.0 ;-)
http://2011.berlinbuzzwords.de/content/improving-search-ranking-through-ab-tests-case-study http://2012.berlinbuzzwords.de/sessions/efficient-scoring-lucene http://2012.berlinbuzzwords.de/sessions/introducing-cascalog-functional-data-processing-hadoop http://2012.berlinbuzzwords.de/sessions/relevance-optimization-check-candidate-lists https://issues.apache.org/jira/browse/LUCENE-4930 https://issues.apache.org/jira/browse/LUCENE-4571
Data + Software/Algorithms = ???
(c) Joon Han, license: CC by-sa 3.0, source: http://en.wikipedia.org/wiki/File:Groundhog_day_tip_top_bistro.jpg(c) Diliff; license CC by-3.0
Real-world deployments
Data mining competitions
Research
RecSys Experiments with MyMediaLite
Running Example: MyMediaLite
evaluation framework
(RapidMiner plugin)
2-3 months) since 2010
http://mymedialite.net/ http://github.com/zenogantner/MyMediaLite
Running Example: MyMediaLite
command-line tools
Find all examples here: http://github.com/zenogantner/mml-eval-examples
Explicit feedback Not always there. Implicit feedback
Often positive-only.
User ID Item ID Timestamp 196 242 881250949 186 302 891717742 22 377 878887116 244 51 880606923 ... ... ...
item_recommendation --training-file=F1 --test-file=F2
IDs can be (almost) arbitrary strings
Separator: whitespace, tab, comma, :: Alternative format: yyyy-mm-dd
Random Splits
item_recommendation … --test-ratio=0.25
Shuffle and split: Simple, but:
k-fold Cross-Validation
item_recommendation … --cross-validation=4
Shuffle and split:
Chronological Splits
rating_prediction … --chronological-split=0.25 rating_prediction … --chronological-split=01/01/2002
Sort chronologically and split:
– time of day, day of week – season – trending products
Why compare against baselines?
– … well, at least here. – Relative numbers may also have no meaning.
Good baselines:
– coll. filtering: kNN, vanilla matrix factorization
item_recommendation … --recommender=Random item_recommendation … --recommender=MostPopular item_recommendation …
Item recommendation baselines:
Always check if you measure on the same splits. It happens quite often …
Always check if you measure on the same splits. It happens quite often … e.g. this ICML 2013 paper:
matrix factorization (“SVD”) models usually do not perform below 0.9 RMSE.
random splits! Lessons:
to “debug” experiments.
chronological splits.
What is the right metric?
– It always depends on what you want to achieve. – What to measure?
– They may ignore important aspects of your problem. – They are just approximations of user behavior.
– Your metrics may fail to catch WTF results.
http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/
item_recommendation ... --measures=”prec@5,NDCG”
Precision at k
More ranking measures: NDCG, MAP, ERR
Precision at k
recommendations precision at 4 bad good 1 bad bad bad
item_recommendation … --recommender=WRMF
– regularization to control overfitting – learning rate (for gradient descent methods) – stopping criterion
– Grid search will do it in most cases.
– Nelder-Mead/Simplex
rating_prediction … --search-hp
Grid search
“A practical guide to SVM classification” http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
item_recommendation … --random-seed=1
Random seed
item_recommendation … --random-seed=1
Besides random seed:
– data, software – scripts and configuration
– Knows when to re-run your data preprocessing steps.
http://bitaesthetics.com/posts/make-for-data-scientists.html
item_recommendations …
Re-use evaluation code. Create predictions using external software. Use MyMediaLite for evaluation.
item_recommendations …
Why re-use evaluation code?
selection+metrics) are not easy to get right.
– more configuration kept fixed => less risk of
accidental differences
Summary
baselines.
explain.
hyperparameter tuning.
things ;-). Try it out!
http://github.com/zenogantner/mml-eval-examples http://mymedialite.net/ http://github.com/zenogantner/MyMediaLite
(c) Michael Sauers; license CC by-nc-sa 2.0