recommender system experiments with mymedialite
play

Recommender System Experiments with MyMediaLite Or: Everything you - PowerPoint PPT Presentation

Recommender System Experiments with MyMediaLite Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask) Zeno Gantner <zeno.gantner@nokia.com> Nokia Location & Commerce, Berlin HERE Maps by Nokia


  1. Recommender System Experiments with MyMediaLite Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask) Zeno Gantner <zeno.gantner@nokia.com> Nokia Location & Commerce, Berlin

  2. HERE Maps by Nokia … in Berlin ● ca. 800 people ● HERE Maps platform – mobile apps ● HERE Drive ● HERE Maps ● HERE Transit (public transport) – customers ● Yahoo Maps ● Bing Maps ● major car companies: BMW, VW, Toyota, ...

  3. HERE Maps by Nokia … in Berlin Maps Search Team ● #bbuzz regulars ● 3 of us contributed to Lucene 4.3.0 ;-) http://2011.berlinbuzzwords.de/content/improving-search-ranking-through-ab-tests-case-study http://2012.berlinbuzzwords.de/sessions/efficient-scoring-lucene http://2012.berlinbuzzwords.de/sessions/introducing-cascalog-functional-data-processing-hadoop http://2012.berlinbuzzwords.de/sessions/relevance-optimization-check-candidate-lists https://issues.apache.org/jira/browse/LUCENE-4930 https://issues.apache.org/jira/browse/LUCENE-4571

  4. (C) Paul L. Dineen; license: CC by; source http://www.flickr.com/photos/pauldineen/4529216647/sizes/o/in/photostream/

  5. + = ?

  6. Data + Software/Algorithms = ??? Real-world deployments (c) Diliff; license CC by-3.0 (c) Joon Han, license: CC by-sa 3.0, source: http://en.wikipedia.org/wiki/File:Groundhog_day_tip_top_bistro.jpg

  7. Data mining competitions

  8. Research

  9. + = ?

  10. RecSys Experiments with MyMediaLite 1. Interaction Data 2. Baseline Methods 3. Apples and Oranges 4. Metrics 5. Hyperparameter Tuning 6. Reproducibility

  11. Running Example: MyMediaLite ● RecSys toolkit and evaluation framework ● written in C#/Mono ● C#, Python, Ruby, F# ● simple ● choice ● 2 Java ports ● free (RapidMiner plugin) ● documented ● regular releases (every ● tested 2-3 months) since 2010 http://mymedialite.net/ http://github.com/zenogantner/MyMediaLite

  12. Running Example: MyMediaLite command-line tools ● rating_prediction ● item_recommendation Find all examples here: http://github.com/zenogantner/mml-eval-examples

  13. 1. Interaction Data Explicit feedback Implicit feedback ● views ● clicks ● purchases Not always there. Often positive-only.

  14. 1. Interaction Data item_recommendation --training-file=F1 --test-file=F2 IDs can be (almost) arbitrary strings User ID Item ID Timestamp optional 196 242 881250949 186 302 891717742 22 377 878887116 244 51 880606923 ... ... ... Separator: whitespace, Alternative format: tab, comma, :: yyyy-mm-dd

  15. Random Splits item_recommendation … --test-ratio=0.25 Shuffle and split: Simple, but: ● Does not take temporal trends into account. ● Does not use all data for testing.

  16. k-fold Cross-Validation item_recommendation … --cross-validation=4 Shuffle and split: ● Uses each data point for evaluation. ● Does not take temporal trends into account.

  17. Chronological Splits rating_prediction … --chronological-split=0.25 rating_prediction … --chronological-split=01/01/2002 Sort chronologically and split: ● Use the past to predict the “future”. ● Takes trends in the data into account. – time of day, day of week – season – trending products

  18. (c) Serolillo, license: CC by 2.5

  19. 2. Baseline Methods Why compare against baselines? ● Absolute numbers have no meaning. – … well, at least here. – Relative numbers may also have no meaning. ● … if you compare to the wrong things. Good baselines: ● the strongest solution that is still simple ● the existing solution ● standard solutions – coll. filtering: kNN, vanilla matrix factorization

  20. 2. Baseline Methods item_recommendation … --recommender=Random item_recommendation … --recommender=MostPopular item_recommendation … --recommender=MostPopularByAttributes --item-attributes=ARTISTS Item recommendation baselines: ● random ● popular items (by attribute/category)

  21. (c) Michael Collins; license: CC by-2.0

  22. 3. Apples and Oranges Always check if you measure on the same splits. It happens quite often …

  23. 3. Apples and Oranges Always check if you measure on the same splits. It happens quite often … e.g. this ICML 2013 paper:

  24. 3. Apples and Oranges

  25. 3. Apples and Oranges ● On chronological splits of the Netflix dataset, matrix factorization (“SVD”) models usually do not perform below 0.9 RMSE. ● Chronological splits can be much harder than random splits! Lessons: ● Baselines are important – they can also help us to “debug” experiments. ● Do not compare between simple splits and chronological splits.

  26. (c) Pastorius; license: CC by 3.0; source: http://commons.wikimedia.org/wiki/File:Plastic_tape_m

  27. 4. Metrics What is the right metric? ● Know your goal. – It always depends on what you want to achieve. – What to measure? ● Criticize your metrics. – They may ignore important aspects of your problem. – They are just approximations of user behavior. ● Eyeball the results. – Your metrics may fail to catch WTF results. http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/

  28. 4. Metrics item_recommendation ... --measures=”prec@5,NDCG” Precision at k ● number of “correct” items in the top k results ● The choice of k is specific to your application. ● very simple ● easy to understand and explain More ranking measures: NDCG, MAP, ERR

  29. 4. Metrics Precision at k recommendations precision at 4 bad 0 good 1 bad 0 bad 0 bad -- good -- bad -- 1/4

  30. 5. Hyperparameter Tuning item_recommendation … --recommender=WRMF --recommender-options=”reg=0.01 alpha=2” ● Hyperparameters, e.g. – regularization to control overfitting – learning rate (for gradient descent methods) – stopping criterion ● You have to do it. Also for your baselines. ● Don't get too fancy. – Grid search will do it in most cases. ● More advanced: – Nelder-Mead/Simplex

  31. 5. Hyperparameter Tuning rating_prediction … --search-hp Grid search ● simple ● brute force ● embarrassingly parallel “A practical guide to SVM classification” http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

  32. 6. Reproducible Experiments item_recommendation … --random-seed=1 Random seed ● “random” splitting ● training initialization ● debugging

  33. 6. Reproducible Experiments item_recommendation … --random-seed=1 Besides random seed: ● Put everything in version control. – data, software – scripts and configuration ● Use build tools like make for automation. – Knows when to re-run your data preprocessing steps. http://bitaesthetics.com/posts/make-for-data-scientists.html

  34. 6. Reproducible Experiments item_recommendations … --recommender=ExternalItemRecommender --recommender-options=”prediction_file=FILE” Re-use evaluation code. Create predictions using external software. Use MyMediaLite for evaluation.

  35. 6. Reproducible Experiments item_recommendations … --recommender=ExternalItemRecommender --recommender-options=”prediction_file=FILE” Why re-use evaluation code? ● Evaluation protocols (splitting+candidate selection+metrics) are not easy to get right. ● Ensures comparability. – more configuration kept fixed => less risk of accidental differences ● Laziness!

  36. (c) by Caucas; license: CC by-nc-nd 2.0; source: http://www.flickr.com/photos/thecaucas/2597813380/sizes/o/

  37. Summary 1. Split your data appropriately. 2. Do not compare apples and oranges . 3. Compare against simple and strong baselines . 4. Precision at k is a metric that is easy to explain. 5. Grid search is a simple method for hyperparameter tuning . 6. Make your experiments reproducible . 7. MyMediaLite can help you with some of these things ;-). Try it out!

  38. (c) Michael Sauers; license CC by-nc-sa 2.0 http://github.com/zenogantner/mml-eval-examples http://mymedialite.net/ http://github.com/zenogantner/MyMediaLite

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend