Human-level control through deep reinforcement Liia Butler But - PowerPoint PPT Presentation

Human-level control through deep reinforcement Liia Butler

But first... A quote "The question of whether machines can think... is about as relevant as the question of whether submarines can swim" Edsger W. Dijkstra

Overview 1. Introduction 2. Reinforcement Learning 3. Deep neural networks 4. Markov Decision Process 5. Algorithm Breakdown 6. Evaluation and conclusions

Introduction Deep Q-network (DQN) - The agent - Reinforcement learning plus - Deep neural networks - Goal : General artificial intelligence - How little do we have to know to be intelligent? Can we solve a wide range of challenging tasks? - Pixels and game score as input

Reinforcement Learning - Theory of how software agents may optimize their control of the environment - Inspired by the psychological and neuroscientific perspectives on animal behavior - One of the three types of machine learning http://en.proft.me/media/science/ml_types.png

Space Invaders

Deep Neural Networks - An architecture in deep learning, type of artificial neural network - Artificial neural network: a network of nodes representing processing elements that are highly connected, working together towards specific problems, like in biological nervous system - Multiple layers of nodes with increasing abstraction of the data - Extract high-level representations from raw data - DQN uses "deep convolutional network" - 84 x 4 x 4 image produced by preprocessing map - three convolutional layers - Two fully connected layers - http://www.nature.com/nature/journal/v5 18/n7540/images/nature14236-f4.jpg http://www.nature.com/nature/journal/v518/n7540/carousel/nature14236-f1.jpg

Markov Decision Process - State - Action - Reward http://cse-wiki.unl.edu/wiki/images/5/58/ReinforJpeg.jpg

What these mean for DQN - State - What is going on? - The goal was to be universal so it's represented by screen pixels - Action - What can we do? - Ex. moving, direction, buttons - Reward - What's our motivation? - Points, lives, etc. http://www.retrogamer.net/wp-content/uploads/2014/07/Top-10-Atari-Jaguar-Games-616x410.png

How is DQN going to do this? - Preprocessing - Reduce input dimensionality, max value for pixel color, remove flickering - ε-greedy policy - choosing the action - Bellman equation - optimal control of environment, action-value function - Using a function approximator to estimate the action-value function - Loss function and Q-learning gradient - Experience replay - building a data set from agent's experience

Algorithm Breakdown Key D = Memory, or data set N = Number of experience tuples in replay memory Q = "quality" function Θ = The weight M = Number of episodes s = sequence x = observation/image Φ =preprocessing sequence T = time-step at which game terminates ε = probability in ε-greedy policy a = action s = state y = target r = reward ν = reward discount factor C = Number of updates to Q

Algorithm Breakdown Key D = Memory, or data set N = Number of experience tuples in replay memory Q = "quality" function Θ = The weight M = Number of episodes s = sequence x = observation/image ε-greedy policy T = time-step at which game terminates ε = probability in ε-greedy policy a = action Φ =preprocessing sequence s = state y = target r = reward ν = reward discount factor C = Number of updates to Q

ε-greedy policy How to choose the action 'a' at time 't' - Exploration, random - Exploitation, best one according to the Q value

Algorithm Breakdown Key D = Memory, or data set N = Number of experience tuples in replay memory Q = "quality" function Θ = The weight M = Number of episodes s = sequence x = observation/image T = time-step at which game terminates ε = probability in ε-greedy policy a = action Φ =preprocessing function s = state Experience y = target Replay r = reward ν = reward discount factor C = Number of updates to Q

Experience Replay 1. Take action 2. Store transition in memory 3. Sample random minibatch of transitions from D 4. Optimize using gradient descent on target 'y' and Q-network

Optimizing the Q-Network - Bellman Equation: - The loss function we have: - From this: - Gives us the Q-learning gradient:

Algorithm Breakdown Key D = Memory, or data set N = Number of experience tuples in replay memory Q = "quality" function Θ = The weight for approximator M = Number of episodes s = sequence x = observation/image T = time-step at which game terminates ε = probability in ε-greedy policy a = action Φ =preprocessing sequence s = state y = target r = reward ν = reward discount factor C = Number of updates to Q

Breakout!

Evaluation and Conclusions - Agents vs. Pro gamers - Action at 10 Hz (an action every 0.1 seconds), every 6th frame - At 60 Hz (every 0.017 seconds), every frame, only 6 games > 5% better performance - Controlled human conditions - Out of the 49 games - 29 at human or above - 20 below http://www.nature.com/nature/journal/v518/n7540/images_article/nature14236-f2.jpg

29 out of 49 20 out of 49 http://www.nature.com/nature/journal/v518/n7540/images_ article/nature14236-f3.jpg

Questions and Discussion - What do you think are some non-gaming applications of deep reinforcement learning? - Do you think that comparing with the "professional human game tester" is a sufficient enough of an evaluation? Is there a better way? - Should we even have a general AI, or are we better off with domain specific AIs? - Are there other consequences besides a computer beating your high score? (Have we doomed society?)

Human-level control through deep reinforcement Liia Butler But - PowerPoint PPT Presentation

Human-level control through deep reinforcement Liia Butler But first... A quote "The question of whether machines can think... is about as relevant as the question of whether submarines can swim" Edsger W. Dijkstra Overview 1.

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

DeepMind Self-Learning Atari Agent Human - level control through deep reinforcement learning

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Introduction to Deep Reinforcement Learning and Control Spring 2019, CMU 10-403 Katerina

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning Lex Fridman

Project TIER: Teaching Transparency in Empirical Research Richard Ball Associate Professor of

PHYS 597 Senior Physics Laboratory Goals Becoming an independent researcher Student Learning

Recent Advances in Machine Learning And Their Application to Networking David Meyer

DAY 1 Lotty (@lottybrand) Diego 1.00pm Introduce Open Science part 1 1.30pm Q&A /

Administrative - Poster Session on Wednesday, worth 3% of final grade, +2% for top few posters.

Motivation: No Formal Theory Motivation: No Formal Theory Master course at Leiden University

Snake News or Fake News? The Game Show Tara Cataldo Science Collections Coordinator Marston

The Automatic Statistician and Future Directions in Probabilistic Machine Learning Zoubin

Human-level control through deep reinforcement Liia Butler But - PowerPoint PPT Presentation

Human-level control through deep reinforcement Liia Butler But first... A quote "The question of whether machines can think... is about as relevant as the question of whether submarines can swim" Edsger W. Dijkstra Overview 1.

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

DeepMind Self-Learning Atari Agent Human - level control through deep reinforcement learning

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Introduction to Deep Reinforcement Learning and Control Spring 2019, CMU 10-403 Katerina

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning Lex Fridman

Project TIER: Teaching Transparency in Empirical Research Richard Ball Associate Professor of

PHYS 597 Senior Physics Laboratory Goals Becoming an independent researcher Student Learning

Recent Advances in Machine Learning And Their Application to Networking David Meyer

DAY 1 Lotty (@lottybrand) Diego 1.00pm Introduce Open Science part 1 1.30pm Q&amp;A /

Administrative - Poster Session on Wednesday, worth 3% of final grade, +2% for top few posters.

Motivation: No Formal Theory Motivation: No Formal Theory Master course at Leiden University

Snake News or Fake News? The Game Show Tara Cataldo Science Collections Coordinator Marston

The Automatic Statistician and Future Directions in Probabilistic Machine Learning Zoubin

DAY 1 Lotty (@lottybrand) Diego 1.00pm Introduce Open Science part 1 1.30pm Q&A /