recent advances in reinforcement learning with a focus on
play

Recent Advances in Reinforcement Learning (with a focus on - PowerPoint PPT Presentation

01/29/2020 Recent Advances in Reinforcement Learning (with a focus on ) Patrick Scholz Division of Computer Assisted Medical Interventions Author Division Taxonomic position of RL 01/28/2020 | Page2 01/29/2020 |


  1. 01/29/2020 Recent Advances in Reinforcement Learning (with a focus on ) Patrick Scholz Division of Computer Assisted Medical Interventions

  2. Author Division Taxonomic position of RL 01/28/2020 | Page2 01/29/2020 |

  3. Author Division Basics of RL 01/28/2020 | Page3 Markov Decision Process S – States A – Possible Actions P – Transition Probability R – Immediate Reward Policy Cumulative reward 01/29/2020 |

  4. Author Division Deep RL within the last years wrt 01/28/2020 | Page4 AlphaGo AlphaZero MuZero Zero AlphaGo 2015 2016 2017 2018 2019 01/29/2020 |

  5. Author Division “Deep” Learning and Reinforcement learning 01/28/2020 | Page5 Mnih, V., Kavukcuoglu, K., Silver, D. et al. ‘Human-level control through deep reinforcement learning’. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236 01/29/2020 |

  6. Author Division „Go“ as the next holy grail 01/28/2020 | Page6 Using expert moves for Playing against earlier versions supervised learning to generate data Defeated Lee Sedol (world champion) in a regular match 4:1 (using 48 TPUs) Silver, D., Huang, A., Maddison, C. et al. ‘Mastering the game of Go with deep neural networks and tree search’. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961 01/29/2020 |

  7. Author Division „Go“ as the next holy grail 01/28/2020 | Page7 Monte Carlo Tree Search Silver, D., Huang, A., Maddison, C. et al. ‘Mastering the game of Go with deep neural networks and tree search’. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961 01/29/2020 |

  8. Author Division Dropping initial human input 01/28/2020 | Page8 Major design changes: ● using MCTS action distribution to train ● combining policy and value network ● switching to ResNet architecture ● no hand-crafted input features any more Defeated AlphaGo after 72h under same conditions 100:0 (using 4 TPUs) Silver, D., Schrittwieser, J., Simonyan, K. et al. ‘Mastering the game of Go without human knowledge’. Nature 550, 354–359 (2017). https://doi.org/10.1038/nature24270 01/29/2020 |

  9. Author Division Generalizing input/output representation 01/28/2020 | Page9 Major design changes: ● including draws ● no augmentation exploitation any more ● continuously updating instead of choosing a winner after iteration ● always same hyper- parameters Silver, David, et al. ‘A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go through Self-Play’. Science, vol. 362, no. 6419, Dec. 2018, pp. 1140–44. 01/29/2020 |

  10. Author Division Leaving perfect information environments 01/28/2020 | Page10 representation function h prediction function f dynamics function g A: planning B: acting C: training Schrittwieser, Julian, et al. ‘Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model’. ArXiv:1911.08265 [Cs, Stat], Nov. 2019. arXiv.org, http://arxiv.org/abs/1911.08265. 01/29/2020 |

  11. Author Division Leaving perfect information environments 01/28/2020 | Page11 learns all game rules on its own Compared against: Stockfish (chess), Elmo (Shogi), AlphaZero (Go), R2D2 (Atari) Schrittwieser, Julian, et al. ‘Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model’. ArXiv:1911.08265 [Cs, Stat], Nov. 2019. arXiv.org, http://arxiv.org/abs/1911.08265. 01/29/2020 |

  12. Author Division Some other advances 01/28/2020 | Page12 Hide and Seek AlphaStar approx. Starcraft Chess Go values II 10 26 breadth 35 250 Multiple agents in an open environment depth 80 150 1000s 01/29/2020 |

  13. Author Division Thank you for your attention! 01/28/2020 | Page13 Any questions? 01/29/2020 |

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend