Deep Reinforcement Learning [Human-Level Control through deep - PowerPoint PPT Presentation

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature 2015] CS 486/686 University of Waterloo Lecture 20: July 10, 2017

Gradient Q-learning Initialize weights at random in Observe current state Loop Select action and execute it Receive immediate reward Observe new state �� 𝒙 �,� Gradient: � � 𝒙 𝒙 �𝒙 �𝒙 � � �� Update weights: �𝒙 Update state: ’ 8 CS486/686 Lecture Slides (c) 2017 P. Poupart

Recap: Convergence of Tabular Q-learning • Tabular Q-Learning converges to optimal Q- function under the following conditions: and • Let – Where is # of times that is visited • Q-learning � 9 CS486/686 Lecture Slides (c) 2017 P. Poupart

Divergence of non-linear Q-learning • Even when the following conditions hold and non-linear Q-learning may diverge • Intuition: – Adjusting to increase at might introduce errors at nearby state-action pairs. 11 CS486/686 Lecture Slides (c) 2017 P. Poupart

Experience Replay • Idea: store previous experiences into a buffer and sample a mini-batch of previous experiences at each step to learn by Q-learning • Advantages – Break correlations between successive updates (more stable learning) – Fewer interactions with environment needed to converge (greater data efficiency) 13 CS486/686 Lecture Slides (c) 2017 P. Poupart

Target Network • Idea: Use a separate target network that is updated only periodically repeat for each in mini-batch: � update target • Advantage: mitigate divergence 14 CS486/686 Lecture Slides (c) 2017 P. Poupart

Deep Q-network • Google Deep Mind: • Deep Q-network: Gradient Q-learning with – Deep neural networks – Experience replay – Target network • Breakthrough: human-level play in many Atari video games 16 CS486/686 Lecture Slides (c) 2017 P. Poupart

Deep Q-network Initialize weights and at random in Observe current state Loop Select action and execute it Receive immediate reward Observe new state Add to experience buffer � Sample mini-batch of experiences from buffer For each experience in mini-batch � � Gradient: �� 𝒙 �̂,� � � 𝒙 𝐱 � �𝒙 �𝒙 � � � �� Update weights: �𝒙 Update state: ’ Every steps, update target: 17 CS486/686 Lecture Slides (c) 2017 P. Poupart

Deep Reinforcement Learning [Human-Level Control through deep - PowerPoint PPT Presentation

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature 2015] CS 486/686 University of Waterloo Lecture 20: July 10, 2017 Outline Value Function Approximation Linear approximation Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Not Smooth High degree approximation Explicit y=f(x) Implicit f(x,y)=0 Parametric

1.2 Surface Representation & Data Structures Hao Li http://cs621.hao-li.com 1

Polygon Rendering Methods Ray Casting Given a freeform surface, one usually Simplest

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Linear and nonlinear methods for model reduction Diane Guignard Joint work : A. Bonito, R.

Overview of Bode Plots Transfer function review Piece-wise linear approximations

COMS 4721: Machine Learning for Data Science Lecture 9, 2/16/2017 Prof. John Paisley Department

Todays Agenda Upcoming Homework Section 2.8: Linear Approximations and Differentials

Deep Reinforcement Learning [Human-Level Control through deep - PowerPoint PPT Presentation

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature 2015] CS 486/686 University of Waterloo Lecture 20: July 10, 2017 Outline Value Function Approximation Linear approximation Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Not Smooth High degree approximation Explicit y=f(x) Implicit f(x,y)=0 Parametric

1.2 Surface Representation &amp; Data Structures Hao Li http://cs621.hao-li.com 1

Polygon Rendering Methods Ray Casting Given a freeform surface, one usually Simplest

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Linear and nonlinear methods for model reduction Diane Guignard Joint work : A. Bonito, R.

Overview of Bode Plots Transfer function review Piece-wise linear approximations

COMS 4721: Machine Learning for Data Science Lecture 9, 2/16/2017 Prof. John Paisley Department

Todays Agenda Upcoming Homework Section 2.8: Linear Approximations and Differentials

1.2 Surface Representation & Data Structures Hao Li http://cs621.hao-li.com 1