Recent Advances in Reinforcement Learning (with a focus on - PowerPoint PPT Presentation

01/29/2020 Recent Advances in Reinforcement Learning (with a focus on ) Patrick Scholz Division of Computer Assisted Medical Interventions

Author Division Taxonomic position of RL 01/28/2020 | Page2 01/29/2020 |

Author Division Basics of RL 01/28/2020 | Page3 Markov Decision Process S – States A – Possible Actions P – Transition Probability R – Immediate Reward Policy Cumulative reward 01/29/2020 |

Author Division Deep RL within the last years wrt 01/28/2020 | Page4 AlphaGo AlphaZero MuZero Zero AlphaGo 2015 2016 2017 2018 2019 01/29/2020 |

Author Division “Deep” Learning and Reinforcement learning 01/28/2020 | Page5 Mnih, V., Kavukcuoglu, K., Silver, D. et al. ‘Human-level control through deep reinforcement learning’. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236 01/29/2020 |

Author Division „Go“ as the next holy grail 01/28/2020 | Page6 Using expert moves for Playing against earlier versions supervised learning to generate data Defeated Lee Sedol (world champion) in a regular match 4:1 (using 48 TPUs) Silver, D., Huang, A., Maddison, C. et al. ‘Mastering the game of Go with deep neural networks and tree search’. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961 01/29/2020 |

Author Division „Go“ as the next holy grail 01/28/2020 | Page7 Monte Carlo Tree Search Silver, D., Huang, A., Maddison, C. et al. ‘Mastering the game of Go with deep neural networks and tree search’. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961 01/29/2020 |

Author Division Dropping initial human input 01/28/2020 | Page8 Major design changes: ● using MCTS action distribution to train ● combining policy and value network ● switching to ResNet architecture ● no hand-crafted input features any more Defeated AlphaGo after 72h under same conditions 100:0 (using 4 TPUs) Silver, D., Schrittwieser, J., Simonyan, K. et al. ‘Mastering the game of Go without human knowledge’. Nature 550, 354–359 (2017). https://doi.org/10.1038/nature24270 01/29/2020 |

Author Division Generalizing input/output representation 01/28/2020 | Page9 Major design changes: ● including draws ● no augmentation exploitation any more ● continuously updating instead of choosing a winner after iteration ● always same hyper- parameters Silver, David, et al. ‘A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go through Self-Play’. Science, vol. 362, no. 6419, Dec. 2018, pp. 1140–44. 01/29/2020 |

Author Division Leaving perfect information environments 01/28/2020 | Page10 representation function h prediction function f dynamics function g A: planning B: acting C: training Schrittwieser, Julian, et al. ‘Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model’. ArXiv:1911.08265 [Cs, Stat], Nov. 2019. arXiv.org, http://arxiv.org/abs/1911.08265. 01/29/2020 |

Author Division Leaving perfect information environments 01/28/2020 | Page11 learns all game rules on its own Compared against: Stockfish (chess), Elmo (Shogi), AlphaZero (Go), R2D2 (Atari) Schrittwieser, Julian, et al. ‘Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model’. ArXiv:1911.08265 [Cs, Stat], Nov. 2019. arXiv.org, http://arxiv.org/abs/1911.08265. 01/29/2020 |

Author Division Some other advances 01/28/2020 | Page12 Hide and Seek AlphaStar approx. Starcraft Chess Go values II 10 26 breadth 35 250 Multiple agents in an open environment depth 80 150 1000s 01/29/2020 |

Author Division Thank you for your attention! 01/28/2020 | Page13 Any questions? 01/29/2020 |

Recent Advances in Reinforcement Learning (with a focus on - PowerPoint PPT Presentation

01/29/2020 Recent Advances in Reinforcement Learning (with a focus on ) Patrick Scholz Division of Computer Assisted Medical Interventions Author Division Taxonomic position of RL 01/28/2020 | Page2 01/29/2020 |

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Overview Focus Projection Focus Projection Focus to Accent Focus to Accent Restricted View of

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Recent Advances In the Recent Advances In the Management of ITP Management of ITP Prof Gregory

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

CS 225 Data Structures Dec. 11 Flo loyd- Warshalls Algorithm Wad ade Fag agen-Ulm

From Deep Blue to Monte Carlo: An Update on Game

TD3, Monte Carlo Tree Search Milan Straka December 17, 2018 Charles University in Prague

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

CS 309: Autonomous Intelligent Robotics FRI I Lecture 2: Introduction to AI Instructor: Justin

What are the emerging technologies? 1- Machine Learning (ML) 2- Block Chain Technologies (BCT)

Casimir effect and 3d QED from machine learning Harold Erbin Universit di Torino & Infn

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Sambuz

Useful Links

Newsletter

Mail Us

Recent Advances in Reinforcement Learning (with a focus on - PowerPoint PPT Presentation

01/29/2020 Recent Advances in Reinforcement Learning (with a focus on ) Patrick Scholz Division of Computer Assisted Medical Interventions Author Division Taxonomic position of RL 01/28/2020 | Page2 01/29/2020 |

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Overview Focus Projection Focus Projection Focus to Accent Focus to Accent Restricted View of

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Recent Advances In the Recent Advances In the Management of ITP Management of ITP Prof Gregory

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

CS 225 Data Structures Dec. 11 Flo loyd- Warshalls Algorithm Wad ade Fag agen-Ulm

From Deep Blue to Monte Carlo: An Update on Game

TD3, Monte Carlo Tree Search Milan Straka December 17, 2018 Charles University in Prague

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

CS 309: Autonomous Intelligent Robotics FRI I Lecture 2: Introduction to AI Instructor: Justin

What are the emerging technologies? 1- Machine Learning (ML) 2- Block Chain Technologies (BCT)

Casimir effect and 3d QED from machine learning Harold Erbin Universit di Torino &amp; Infn

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Sambuz

Useful Links

Newsletter

Mail Us

Casimir effect and 3d QED from machine learning Harold Erbin Universit di Torino & Infn