a reinforcement learning and synthetic
play

A Reinforcement Learning and Synthetic Data Approach to Mobile - PowerPoint PPT Presentation

A Reinforcement Learning and Synthetic Data Approach to Mobile Notification Management Rowan Sutton, Kieran Fraser, Owen Conlan ADAPT Centre, Trinity College Dublin The ADAPT Centre is funded under the SFI Research Centres Programme (Grant


  1. A Reinforcement Learning and Synthetic Data Approach to Mobile Notification Management Rowan Sutton, Kieran Fraser, Owen Conlan ADAPT Centre, Trinity College Dublin The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

  2. Content www.adaptcentre.ie ❖ Motivation ❖ Research Design ❖ Experiment Implementation ❖ Results ❖ Limitations & Future Work ❖ Conclusion

  3. Motivation: Anecdotal www.adaptcentre.ie

  4. Motivation: SOTA www.adaptcentre.ie Growing number of Notification notifications pushed delivery not smart at users (Pielot, M. (Mehrotra, A. et al, et al, 2014). 2016). Unnecessary Large no. of notifications may incoming dramatically notifications = decrease negative user productivity (Iqbal, S. emotions (Sahami T. et al, 2010). Shirazi, A. et al, 2014).

  5. Motivation: Observed Problem www.adaptcentre.ie

  6. Research Design: Gathering Data www.adaptcentre.ie WeAreUs Android App ❖ Experience Sampling Method ❖ Moments of notification interest, moments of phone usage interest ❖ Anonymised & Synthesised

  7. Research Design: Gathering Data www.adaptcentre.ie 31,329 15 participants notifications over 3 months logged 291 4,940 questionnaire smartphone responses usage logs

  8. Research Design: Data Analysis www.adaptcentre.ie

  9. Research Design: Data Analysis www.adaptcentre.ie

  10. Research Design: Data Analysis www.adaptcentre.ie

  11. Research Design: Data Analysis www.adaptcentre.ie

  12. Research Design: Data Analysis www.adaptcentre.ie

  13. Research Design: Synthesising Data www.adaptcentre.ie

  14. Research Design: Synthesising Data www.adaptcentre.ie

  15. Research Design: Synthesising Data www.adaptcentre.ie

  16. Research Design: Synthesising Data www.adaptcentre.ie

  17. Research Design: Synthesising Data www.adaptcentre.ie Train on Real, Test on Synthetic 1 RMSE F1 scores differ in range 0.02 – 0.07 indicating synthetic data imitates real world data. 1. Esteban, C., Hyland, S.L., Ratsch, G.: Real-valued (medical) time series generation

  18. Research Design: Reinforcement Learning www.adaptcentre.ie OpenAI Gym Open source toolkit for “developing and comparing reinforcement learning algorithms” 1 Gym-Push Custom OpenAI Gym environment simulating push-notification overload on mobile device users 1. https://gym.openai.com/

  19. Research Design: Reinforcement Learning www.adaptcentre.ie Gym-Push Custom OpenAI Gym environment simulating push-notification overload on mobile device users State Action Context + Open / Dismiss the Notification Features notification

  20. Experiment Implementation www.adaptcentre.ie Q-learning Agent • Learn a policy to maximise total reward • Create q-table to track quality of state->action pairs • Updates q-values according to Watkins one-step Q-learning algorithm (1) • Can explore or exploit ( ε ) Deep Q-learning Agent • Replaces the q-table with a DNN • Takes the state as input and output is an action • Weights optimised based on the Huber Loss function (2)

  21. Experiment Implementation www.adaptcentre.ie Individual User (Synth & Balanced) • Comprised of ≈ 6000 synthetic notifications • Split into sets of size: 50, 100, 250, 500, 1000, 2500, 5000 • Balanced Individual User (Real & Balanced) • Comprised of ≈ 6000 real notifications • Split into sets of size: 50, 100, 250, 500, 1000, 2500, 5000 • Balanced Individual User (Real & Unbalanced) • Comprised of ≈ 6000 real notifications • Split into sets of size: 50, 100, 250, 500, 1000, 2500, 5000 • Unbalanced Multiple Users (Real & Unbalanced) • Comprised of ≈ 1000 real notifications • Unbalanced

  22. Experiment Implementation www.adaptcentre.ie ❖ Evaluating agents ability to correctly predict user action of open/dismiss notification ❖ Feature set: { app, category, time-of-day, day-of- week } ❖ Evaluate with 10-fold cross validation ❖ Accuracy ❖ Precision – important when cost of false positive is high e.g. agent predicts user wants to see it, delivers -> they end up dismissing it ❖ Recall – important when cost of false negative is high e.g. agent predicts user doesn’t need to see it, caches it -> they miss an important message ❖ F1

  23. Results: Q-learning on synthetic data www.adaptcentre.ie

  24. Results: Q-learning on synthetic data www.adaptcentre.ie

  25. Results: Q-learning on synthetic data www.adaptcentre.ie

  26. Results: Q-learning on real data www.adaptcentre.ie

  27. Results: Q-learning on real data www.adaptcentre.ie

  28. Results: State Space Impact www.adaptcentre.ie Synthetic Data Real Data

  29. Applied Research: Observed User Problem www.adaptcentre.ie

  30. Applied Research: Q-learning Solution www.adaptcentre.ie

  31. Results: DQN on synthetic data www.adaptcentre.ie

  32. Results: DQN on synthetic data www.adaptcentre.ie

  33. Results: DQN on synthetic data www.adaptcentre.ie

  34. Results: DQN on real data www.adaptcentre.ie

  35. Results: DQN on real data www.adaptcentre.ie

  36. Applied Research: Observed User Problem www.adaptcentre.ie

  37. Applied Research: DQN Solution www.adaptcentre.ie

  38. Results: Feature importance www.adaptcentre.ie

  39. Results: Train on Synthetic, Test on Real www.adaptcentre.ie

  40. Results: Train on Synthetic, Test on Real www.adaptcentre.ie

  41. Results: Multiple Users www.adaptcentre.ie

  42. Results: Multiple Users www.adaptcentre.ie

  43. Limitations & Future Work www.adaptcentre.ie Limitations • Small set of users • Restricted set of features e.g. ticker text not used • Fixed hyper-parameters Future Work • Generative modeling applied to text • Exploring other RL algorithms e.g. HER, IMPALA • Larger user study

  44. Future Work – Conditional Ticker Text Generation www.adaptcentre.ie

  45. Future Work – Autonomous Personalised Notifications www.adaptcentre.ie

  46. Future Work – Autonomous Personalised Notifications www.adaptcentre.ie

  47. Future Work – Autonomous Personalised Notifications www.adaptcentre.ie

  48. Conclusion www.adaptcentre.ie OpenAI Gym Shareable notification environment for training on data set notifications Two methods of RL Evaluations illustrate applied to agents achieve notification comparable management performance to SOTA

  49. EvalUMAP www.adaptcentre.ie http://evalumap.adaptcentre.ie/

  50. www.adaptcentre.ie Thank you. Questions? Demo: https://review2019.github.io Email: kieran.fraser@adaptcentre.ie

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend