Randomized Prior Functions for Deep Reinforcement Learning Ian - PowerPoint PPT Presentation

bit.ly/rpf_nips @ianosband Randomized Prior Functions for Deep Reinforcement Learning Ian Osband, John Aslanides, Albin Cassirer

bit.ly/rpf_nips Reinforcement Learning @ianosband

bit.ly/rpf_nips Reinforcement Learning @ianosband Data & Estimation = Supervised Learning

bit.ly/rpf_nips Reinforcement Learning @ianosband Data & Estimation = Supervised Learning + partial feedback = Multi-armed Bandit

bit.ly/rpf_nips Reinforcement Learning @ianosband Data & Estimation = Supervised Learning + partial feedback = Multi-armed Bandit + delayed consequences = Reinforcement Learning

bit.ly/rpf_nips Reinforcement Learning @ianosband • “Sequential decision making under uncertainty.” Data & Estimation = Supervised Learning + partial feedback = Multi-armed Bandit + delayed consequences = Reinforcement Learning

bit.ly/rpf_nips Reinforcement Learning @ianosband • “Sequential decision making under uncertainty.” Data & Estimation = • Three necessary building blocks: Supervised Learning + partial feedback = Multi-armed Bandit + delayed consequences = Reinforcement Learning

bit.ly/rpf_nips Reinforcement Learning @ianosband • “Sequential decision making under uncertainty.” Data & Estimation = • Three necessary building blocks: Supervised 1. Generalization Learning + partial feedback = Multi-armed Bandit + delayed consequences = Reinforcement Learning

bit.ly/rpf_nips Reinforcement Learning @ianosband • “Sequential decision making under uncertainty.” Data & Estimation = • Three necessary building blocks: Supervised 1. Generalization Learning 2. Exploration vs Exploitation + partial feedback = Multi-armed Bandit + delayed consequences = Reinforcement Learning

bit.ly/rpf_nips Reinforcement Learning @ianosband • “Sequential decision making under uncertainty.” Data & Estimation = • Three necessary building blocks: Supervised 1. Generalization Learning 2. Exploration vs Exploitation + partial feedback = 3. Credit assignment Multi-armed Bandit + delayed consequences = Reinforcement Learning

bit.ly/rpf_nips Reinforcement Learning @ianosband • “Sequential decision making under uncertainty.” Data & Estimation = • Three necessary building blocks: Supervised 1. Generalization Learning 2. Exploration vs Exploitation + partial feedback = 3. Credit assignment Multi-armed Bandit • As a field, we are pretty good at combining any 2 of these 3.   + delayed consequences … but we need practical solutions that combine them all. = Reinforcement Learning

bit.ly/rpf_nips Reinforcement Learning @ianosband • “Sequential decision making under uncertainty.” Data & Estimation = • Three necessary building blocks: Supervised 1. Generalization Learning 2. Exploration vs Exploitation + partial feedback = 3. Credit assignment Multi-armed Bandit • As a field, we are pretty good at combining any 2 of these 3.   + delayed consequences … but we need practical solutions that combine them all. = Reinforcement We need effective uncertainty estimates for Deep RL Learning

bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband

bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout sampling

bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout sampling “Dropout sample ⩬ posterior sample” (Gal+Gharamani 2015)

bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout sampling “Dropout sample ⩬ posterior sample” (Gal+Gharamani 2015) Dropout rate does not concentrate with the data.

bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout sampling “Dropout sample ⩬ posterior sample” (Gal+Gharamani 2015) Dropout rate does not concentrate with the data. Even “concrete” dropout not necessarily right rate.

bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout Variational sampling inference “Dropout sample ⩬ posterior sample” (Gal+Gharamani 2015) Dropout rate does not concentrate with the data. Even “concrete” dropout not necessarily right rate.

bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout Variational sampling inference “Dropout sample Apply VI to Bellman ⩬ posterior sample” error as if it was an i.i.d. supervised loss. (Gal+Gharamani 2015) Dropout rate does not concentrate with the data. Even “concrete” dropout not necessarily right rate.

  bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout Variational sampling inference “Dropout sample Apply VI to Bellman ⩬ posterior sample” error as if it was an i.i.d. supervised loss. (Gal+Gharamani 2015) Bellman error:   Dropout rate does Q ( s , a ) = r + γ max Q ( s ′ � , α ) not concentrate with α the data. Uncertainty in Q ➡ correlated TD loss. Even “concrete” VI on i.i.d. model dropout not does not propagate necessarily right rate. uncertainty.

  bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout Variational Distributional sampling inference RL “Dropout sample Apply VI to Bellman ⩬ posterior sample” error as if it was an i.i.d. supervised loss. (Gal+Gharamani 2015) Bellman error:   Dropout rate does Q ( s , a ) = r + γ max Q ( s ′ � , α ) not concentrate with α the data. Uncertainty in Q ➡ correlated TD loss. Even “concrete” VI on i.i.d. model dropout not does not propagate necessarily right rate. uncertainty.

  bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout Variational Distributional sampling inference RL “Dropout sample Apply VI to Bellman Models Q-value as a ⩬ posterior sample” error as if it was an distribution, rather i.i.d. supervised loss. than point estimate. (Gal+Gharamani 2015) Bellman error:   Dropout rate does Q ( s , a ) = r + γ max Q ( s ′ � , α ) not concentrate with α the data. Uncertainty in Q ➡ correlated TD loss. Even “concrete” VI on i.i.d. model dropout not does not propagate necessarily right rate. uncertainty.

  bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout Variational Distributional sampling inference RL “Dropout sample Apply VI to Bellman Models Q-value as a ⩬ posterior sample” error as if it was an distribution, rather i.i.d. supervised loss. than point estimate. (Gal+Gharamani 2015) Bellman error:   Dropout rate does This distribution != Q ( s , a ) = r + γ max Q ( s ′ � , α ) not concentrate with α posterior uncertainty. the data. Uncertainty in Q ➡ correlated TD loss. Even “concrete” VI on i.i.d. model dropout not does not propagate necessarily right rate. uncertainty.

  bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout Variational Distributional sampling inference RL “Dropout sample Apply VI to Bellman Models Q-value as a ⩬ posterior sample” error as if it was an distribution, rather i.i.d. supervised loss. than point estimate. (Gal+Gharamani 2015) Bellman error:   Dropout rate does This distribution != Q ( s , a ) = r + γ max Q ( s ′ � , α ) not concentrate with α posterior uncertainty. the data. Uncertainty in Q ➡ correlated TD loss. Even “concrete” Aleatoric vs Epistemic VI on i.i.d. model dropout not … it’s n ot the right does not propagate necessarily right rate. thing for exploration. uncertainty.

  bit.ly/rpf_nips Estimating uncertainty in deep RL @ianosband Dropout Variational Distributional Count-based sampling inference RL density “Dropout sample Apply VI to Bellman Models Q-value as a ⩬ posterior sample” error as if it was an distribution, rather i.i.d. supervised loss. than point estimate. (Gal+Gharamani 2015) Bellman error:   Dropout rate does This distribution != Q ( s , a ) = r + γ max Q ( s ′ � , α ) not concentrate with α posterior uncertainty. the data. Uncertainty in Q ➡ correlated TD loss. Even “concrete” Aleatoric vs Epistemic VI on i.i.d. model dropout not … it’s n ot the right does not propagate necessarily right rate. thing for exploration. uncertainty.

Randomized Prior Functions for Deep Reinforcement Learning Ian - PowerPoint PPT Presentation

bit.ly/rpf_nips @ianosband Randomized Prior Functions for Deep Reinforcement Learning Ian Osband, John Aslanides, Albin Cassirer bit.ly/rpf_nips Reinforcement Learning @ianosband bit.ly/rpf_nips Reinforcement Learning @ianosband Data

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

Self-Ascription in Conjunct-Disjunct Systems Stephen Wechsler wechsler@austin.utexas.edu The

Modalized Normality in Pictorial Narratives Dorit Abusch and Mats Rooth Sinn und Bedeutung 25

Coalitions and Communication Natasha Alechina University of Nottingham Joint work with Mehdi

Optimizing Computer Representation and Estimates for U - . . . Computer Processing of Epistemic

On the Reality of Observable Properties Shane Mansfield SamsonFest May 29, 2013 Background: A

The Semantics of R A: Lets be more specific! Masoud Jasbi Stanford University 1 Snapshot

new work for certainty Bob Beddor National University of Singapore 11 8 19 Outline On

st r st s