Deep Neural Networks and Deep Reinforcement Learning Deep Learning, - PowerPoint PPT Presentation

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition [sect. 5.1-5.3, 6.1-6.3, 6.5]

Outline Deep Neural Networks and Deep Reinforcement Learning ♦ Neural Networks: intro ♦ Deep Neural Networks ♦ Deep Reinforcement Learning ♦ Deep Q Network ♦ Slides based on course offered by Prof. Pascal Poupart at Univ. of Waterloo.

Reinforcement Learning: key points Deep Neural Networks and Deep Reinforcement Learning ♦ MDPs and Value iteration (planning) s ′ T ( s , a , s ′ )( R ( s , a , s ′ ) + γ V k ( s ′ )) V ( s ) = max a � ♦ TD Learning and Q-Learning(reinforcement learning) Q ( s , a ) = (1 − α ) Q ( s , a ) + α ( R ( s , a , s ′ ) + γ max a ′ Q ( s , a ′ )) ♦ Key Issues: number of states and actions maybe too many to maintain and update the Q-Table

Reinforcement Learning: Example of large state spaces Deep Neural ♦ the game of Go: 3 361 Networks and Deep Reinforcement Learning

Reinforcement Learning: Example of large state spaces Deep Neural Networks and Deep Reinforcement ♦ Cart pole control problem: ( x , x ′ , θ, θ ′ ) continuous Learning

Reinforcement Learning: Example of large state spaces Deep Neural Networks and Deep Reinforcement ♦ Atari games: 210x160x3 (possible pixels considering the Learning RGB layers)

Key Idea: function approximation Deep Neural Networks and Deep Reinforcement Learning ♦ Which functions are we interested in: Policy π ( s ) → a Value function V ( s ) ∈ ℜ Q-function Q ( s , a ) ∈ ℜ

Q-Function approximation Deep Neural Networks and Deep Reinforcement Learning ♦ State is a set of features: s = ( x 1 , x 2 , · · · , x n ) T CartPole: s = ( x , x ′ , θ, θ ′ ) T Atari, values of pixels ♦ Linear approximation: Q ( a , s ) ≈ � n i =1 w ai x i ♦ Non-linear ( neural network ): Q ( s , a ) ≈ g ( x ; w )

Feed-Forward ANN Deep Neural ♦ Network of units (computational neurons) Networks and Deep ♦ DAG connecting functions with weighted edges Reinforcement Learning ♦ Each unit computes h ( w T x + b ) w : weights, x : inputs to node, b: bias h: activation function , usually non-linear

One hidden layer ANN Deep Neural ♦ hidden units: z j = h 1 ( w (1) x + b (1) ) Networks and j j Deep ♦ output units: y k = h 2 ( w (2) k z + b (2) Reinforcement k ) Learning j w (2) i w (1) ji x i + b (1) ) + b (2) ♦ overall y k = h 2 ( � kj h 1 ( � k )) j w : weights, x : inputs to node, b: bias h: activation function , usually non-linear

Activation function h Deep Neural Networks and Deep Reinforcement � 1 if x ≥ 0 Learning ♦ threshold: h ( x ) = − 1 otherwise 1 ♦ sigmoid: h ( x ) = σ ( x ) = 1+ e − x ♦ tanh h ( x ) = tanh ( x ) = e x − e − x e x + e − x 2 ( x − µ ♦ gaussian h ( x ) = e − 1 σ ) 2 ♦ identity h ( x ) = x ♦ rectified linear (ReLU) h ( x ) = max { 0 , x }

Universal approximation property Deep Neural Networks and Deep Reinforcement Learning Theorem (Hornik et al., 1989, Cybenko, 1989) : a feedforward network with a linear output layer and at least one hidden layer with any "squashing" activation function (sigmoid/tahnh/gaussian) can approximate any function arbitrarely closely, provided that the network is given enough hidden units. ♦ any: continuous function on a closed an bounded subset of ℜ n (relationship with Borel measurability)

Minimize least squared error Deep Neural Networks and ♦ Key idea to optimize the weights: minimize the error with Deep Reinforcement respect to the output (Loss) Learning E n ( W ) 2 = | f ( x n ; W ) − y n | 2 � � E ( W ) = 2 n n ♦ Non convex optimization problem: can train by using gradient descent Given sample ( x n , y n ) update weights as follows: w ji ← w ji − η ∂ E n ∂ w ji Backpropagation algorithm to compute the gradient in a ANN

Deep Neural Networks Deep Neural Networks and Deep Reinforcement Learning ♦ Deep NN: ANN with many hidden layers ♦ Benefit: high expressivity (i.e., compact representation) ♦ Issues: can we train Deep NN in the same way ? can we avoid overfitting ?

Example: Image Classification Deep Neural Networks and ♦ ImageNet Large Scale Visual Recognition Challenge Deep Reinforcement Learning

Vanishing Gradient Deep Neural Networks and ♦ Deep Neural networks that uses "squashing" functions (e.g., Deep Reinforcement sigmoid, tanh) suffer from vanishing gradients Learning

Sigmoid and Hyperbolic functions Deep Neural Networks and Deep Reinforcement ♦ Derivative of Sigmoid and Tanh is always less than one! Learning ♦ when back propagating gradients we multiply several numbers that are less than one

Example: vanishing gradient ♦ y = t ( w 3 t ( w 2 t ( w 1 x ))), where t ( · ) is the tanh function Deep Neural Networks and Deep ♦ common weight initialization in (-1,1) Reinforcement Learning ♦ tanh function and its derivative are less than 1 ♦ vanishing gradient ∂ y ∂ w 3 = t ′ ( a 3 ) t ( a 2 ) ∂ y ∂ y ∂ w 2 = t ′ ( a 3 ) w 3 t ′ ( a 2 ) t ( a 1 ) ≤ ∂ w 3 ∂ y ∂ y ∂ w 1 = t ′ ( a 3 ) w 3 t ′ ( a 2 ) w 2 t ′ ( a 1 ) x ≤ ∂ w 2

Mitigations for vanishing gradient Deep Neural Networks and Deep Reinforcement Learning ♦ typical solutions to mitigate vanishing gradient Pre-training Rectified Linear Units Batch normalization Skip connections

Rectified Linear Units (ReLU) Deep Neural ♦ Rectified linear h ( x ) = max (0 , x ) Networks and Deep Reinforcement Gradient is 0 or 1 Learning Piecewise linear Sparse computation ♦ Soft computation (Softplus): h ( x ) = log (1 + e x ) ♦ Softplus does not mitigate gradient vanishing

Deep Reinforcement Learning: key points Deep Neural Networks and Deep Reinforcement Learning ♦ For many real world domains we can not explicitly represent key functions for RL ( π ( s ), V ( s ), Q ( s , a )) ♦ We can try to approximate them Linear approximation Neural Network approximation Deep RL ♦ Deep Q Network approximates Q ( s , a ) with a DNN

Gradient Q-Learning Deep Neural Networks and Deep ♦ approximate Q ( s , a ) with a parametrized function Q w ( s , a ) Reinforcement Learning ♦ Minimize squared error between estimate and target Estimate Q w ( s , a ) Target: r ( s , a , s ′ ) + γ max a ′ Q w ( s ′ , a ′ ) ♦ squared error: Err ( w ) = ( Q w ( s , a ) − r ( s , a , s ′ ) − γ max a ′ Q w ( s ′ , a ′ )) 2 ♦ gradient: ∂ Err ( w ) = 2( Q w ( s , a ) − r ( s , a , s ′ ) − γ max a ′ Q w ( s ′ , a ′ )) ∂ Q w ( s , a ) ∂ w ∂ w (Scalar 2 is a constant factor and not important for update)

Gradient Q-Learning Algorithm Deep Neural Networks and Deep Reinforcement Learning Algorithm 1 Gradient Q-Learning 1: Initialize weights w randomly in [ − 1 , 1] 2: Initialize s {observe current state} 3: loop Select and execute action a 4: Observe new state s ′ receive immediate reward r 5: ∂ Err ( w ) = ( Q w ( s , a ) − r − γ max a ′ Q w ( s ′ , a ′ )) ∂ Q w ( s , a ) 6: ∂ w ∂ w update weights w ← w − α ∂ Err ( w ) 7: ∂ w update state s ← s ′ 8: 9: end loop

Convergence of tabular Q-Learning Deep Neural Networks and Deep Reinforcement Learning ♦ Q ( s , a ) ← Q ( s , a )+ α ( r ( s , a , s ′ )+ γ max a ′ Q ( s ′ , a ′ ) − Q ( s , a )) ♦ Tabular Q-Learning converges to optimal policy if you explore enough if you make the learning rate small enough ... but not decrease it too quickly α = 1 / n ( s , a ), where n ( s , a ) is number of visits for ( s , a )

Convergence of linear gradient Q-Learning Deep Neural Networks and Deep Reinforcement Learning i w ai x i = w T x ♦ linear approximation of Q(s,a), Q ( s , a ) ≈ � ♦ α t = 1 / t ♦ gradient Q-learning a ′ Q w ( s ′ , a ′ )) ∂ Q w ( s , a ) w ← w − α t ( Q w ( s , a ) − r − γ max ∂ w

Non-Convergence of Non-linear gradient Q-Learning Deep Neural Networks and Deep Reinforcement ♦ Non-linear approximation of Q(s,a), Q ( s , a ) ≈ g ( x ; w ) Learning ♦ Even if α t = 1 / t , gradient Q-Learning may not converge ♦ Issue: we update the weights to reduce error for a specific experience (i.e., a specific ( s , a )) but by changing the weights we may end up changing the Q ( s , a ) potentially everywhere. this is true also for linear approximation, but in that case convergence can still be guaranteed

Mitigating divergence Deep Neural Networks and Deep Reinforcement Learning ♦ Two main approaches to mitigate divergence: 1 experience replay 2 use two different networks Q-network Target network

Experience replay Deep Neural Networks and Deep ♦ Store previous experiences (i.e., ( s , a , s ′ , r )) and use them at Reinforcement Learning each step Store previous ( s , a , s ′ , r ) in a dedicated memory buffer At each step sample a mini-batch from this buffer and use the mini-batch to update the weights ♦ Benefits 1 reduces correlation between successive samples (increase stability) 2 reduces number of interaction with the environment (increase data efficiency)

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, - PowerPoint PPT Presentation

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

PLAYING ATARI WITH DEEP REINFORCEMENT LEARNING NEURAL NETWORK VISION FOR ROBOT DRIVING ARJUN

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep Learning Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 521, 436444 (28 May 2015)

A short overview on Reducing model bias in a deep learning classifier using domain adversarial

Lecture 5.1: Groups acting on sets Matthew Macauley Department of Mathematical Sciences Clemson

1 Example Modular program design Top-down design Bank Transactions Begin with main

Africa and the World: the view from Washington Howard Wolpe Former special envoy to the Great

Agile Approaches Roman Kontchakov Birkbeck, University of London Based on Chapter 3 of Bennett,

Marital Shocks and Womens Welfare in Africa Dominique van de Walle Development Research

Share Chair 2.009 Final Presentation December 11, 2006 The Share Chair Traditional Wheelchair

Sambuz

Useful Links

Newsletter

Mail Us

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, - PowerPoint PPT Presentation

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

PLAYING ATARI WITH DEEP REINFORCEMENT LEARNING NEURAL NETWORK VISION FOR ROBOT DRIVING ARJUN

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep Learning Yann LeCun, Yoshua Bengio &amp; Geoffrey Hinton Nature 521, 436444 (28 May 2015)

A short overview on Reducing model bias in a deep learning classifier using domain adversarial

Lecture 5.1: Groups acting on sets Matthew Macauley Department of Mathematical Sciences Clemson

1 Example Modular program design Top-down design Bank Transactions Begin with main

Africa and the World: the view from Washington Howard Wolpe Former special envoy to the Great

Agile Approaches Roman Kontchakov Birkbeck, University of London Based on Chapter 3 of Bennett,

Marital Shocks and Womens Welfare in Africa Dominique van de Walle Development Research

Share Chair 2.009 Final Presentation December 11, 2006 The Share Chair Traditional Wheelchair

Sambuz

Useful Links

Newsletter

Mail Us

Deep Learning Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 521, 436444 (28 May 2015)