Learning without Exploration Scott Fujimoto , David Meger, Doina - PowerPoint PPT Presentation

Apr 30, 2023 •138 likes •345 views

Off-Policy Deep Reinforcement Learning without Exploration Scott Fujimoto , David Meger, Doina Precup Mila, McGill University Surprise! Agent orange and agent blue are trained with 1. The same off-policy algorithm (DDPG). 2. The same dataset.

Off-Policy Deep Reinforcement Learning without Exploration Scott Fujimoto , David Meger, Doina Precup Mila, McGill University
Surprise! Agent orange and agent blue are trained with… 1. The same off-policy algorithm (DDPG). 2. The same dataset.
The Difference? 1. Agent orange: Interacted with the environment. • Standard RL loop. • Collect data, store data in buffer, train, repeat. 2. Agent blue: Never interacted with the environment. • Trained with data collected by agent orange concurrently.
1. Trained with the same off-policy algorithm. 2. Trained with the same dataset. 3. One interacts with the environment. One doesn’t .
Off-policy deep RL fails when truly off-policy .
Value Predictions
Extrapolation Error 𝑅 𝑡, 𝑏 ← 𝑠 + 𝛿𝑅 𝑡′, 𝑏′
Extrapolation Error 𝑅 𝑡, 𝑏 ← 𝑠 + 𝛿𝑅 𝑡′, 𝑏′ GIVEN GENERATED
Extrapolation Error 𝑅 𝑡, 𝑏 ← 𝑠 + 𝛿𝑅 𝑡′, 𝑏′ 𝑡, 𝑏, 𝑠, 𝑡 ′ ~𝐸𝑏𝑢𝑏𝑡𝑓𝑢 1. 𝑏 ′ ~𝜌(𝑡 ′ ) 2.
Extrapolation Error 𝑅 𝑡, 𝑏 ← 𝑠 + 𝛿𝑅 𝑡′, 𝑏′ 𝑡 ′ , 𝑏 ′ ∉ 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 → 𝑅 𝑡 ′ , 𝑏 ′ = 𝐜𝐛𝐞 → 𝑅 𝑡, 𝑏 = 𝐜𝐛𝐞
Extrapolation Error 𝑅 𝑡, 𝑏 ← 𝑠 + 𝛿𝑅 𝑡′, 𝑏′ 𝑡 ′ , 𝑏 ′ ∉ 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 → 𝑅 𝑡 ′ , 𝑏 ′ = 𝐜𝐛𝐞 → 𝑅 𝑡, 𝑏 = 𝐜𝐛𝐞
Extrapolation Error 𝑅 𝑡, 𝑏 ← 𝑠 + 𝛿𝑅 𝑡′, 𝑏′ 𝑡 ′ , 𝑏 ′ ∉ 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 → 𝑅 𝑡 ′ , 𝑏 ′ = 𝐜𝐛𝐞 → 𝑅 𝑡, 𝑏 = 𝐜𝐛𝐞
Extrapolation Error Attempting to evaluate 𝜌 without (sufficient) access to the (𝑡, 𝑏) pairs 𝜌 visits.
Batch-Constrained Reinforcement Learning Only choose 𝜌 such that we have access to the (𝑡, 𝑏) pairs 𝜌 visits.
Batch-Constrained Reinforcement Learning 1. a~𝜌 𝑡 such that 𝑡, 𝑏 ∈ 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 . 2. a~𝜌 𝑡 such that 𝑡 ′ , 𝜌 𝑡 ′ ∈ 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 . 3. a~𝜌 𝑡 such that 𝑅(𝑡, 𝑏) is maxed.
Batch-Constrained Deep Q-Learning (BCQ) First imitate dataset via generative model: 𝐻(𝑏|𝑡) ≈ 𝑄 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 (𝑏|𝑡) . 𝜌 𝑡 = argmax 𝑏 𝑗 𝑅 (𝑡, 𝑏 𝑗 ) , where 𝑏 𝑗 ~𝐻 (I.e. select the best action that is likely under the dataset) (+ some additional deep RL magic )
∎ BCQ ∎ DDPG
∎ BCQ ∎ DDPG
Come say Hi @ Pacific Ballroom #38 (6:30 Tonight) https://github.com/sfujim/BCQ (Artist’s rendition of poster session)

Recommend

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine Human Exploration vs Robot Exploration Human Exploration vs Robot Exploration Human Exploration vs Robot

1.2k views • 30 slides

Without sustaining injury Without sustaining injury Without sustaining injury Without sustaining

Credentials Credentials Credentials Credentials Without sustaining injury Without sustaining injury Without sustaining injury Without sustaining injury English+ Philosophy, Leiden University Common Pitfalls of Common Pitfalls of Common

222 views • 10 slides

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in exploration beginning to deliver results Exploration historically focused on Tanzania Mature, well explored portfolio of projects Changed

498 views • 38 slides

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

Investment Opportunity in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed mineralisation, immediate drill targets & exploration upside Duke Exploration / Investor Presentation 1 Disclaimer This

918 views • 26 slides

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business opportunities Rift and Frontal fold belt knowledge transfer Play based approach in chosen themes Focus on North and East Africa Preserving

1.01k views • 59 slides

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott Doc Horowitz Associate Administrator Associate Administrator NASA Exploration Systems Mission Directorate NASA Exploration Systems Mission

410 views • 12 slides

Exploration and Function Approximation CMU 10703 Katerina Fragkiadaki This lecture Exploration

Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Exploration and Function Approximation CMU 10703 Katerina Fragkiadaki This lecture Exploration in Large Continuous State Spaces Exploration-Exploitation

1.21k views • 86 slides

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning without Class Labels (or correct outputs) outputs) Density Estimation Density Estimation Learn P(X) given training data for X Learn P(X)

529 views • 39 slides

National Forum on Ocean Exploration Ocean Exploration Advisory Board Meeting March 31, 2015

National Forum on Ocean Exploration Ocean Exploration Advisory Board Meeting March 31, 2015 La Jolla, California PL 111-11 Section 12002 Authorizes the NOAA Administrator to establish a coordinated national ocean exploration program

201 views • 8 slides

Exploration edge Fraser MacCorquodale General Manager - Exploration What are we looking for Au

NEWCREST INVESTOR DAY 2018 Exploration edge Fraser MacCorquodale General Manager - Exploration What are we looking for Au and Au Cu deposits Tier 1 or Tier 2 deposit Early stage to advanced opportunities Our Exploration Financially

292 views • 13 slides

Data Exploration Tyler Moore CSE 7338 Computer Science & Engineering Department, SMU,

Notes Data Exploration Tyler Moore CSE 7338 Computer Science & Engineering Department, SMU, Dallas, TX Lecture 5 Notes Outline Data exploration 1 2 / 27 Data exploration Notes Guide to analyzing data Type of Data Exploration

90 views • 7 slides

Data validation and exploration Data validation and exploration Abhijit Dasgupta Abhijit

Data validation and exploration Data validation and exploration Abhijit Dasgupta Abhijit Dasgupta Fall, 2019 Fall, 2019 1 BIOF339, Fall, 2019 Plan today Dynamic exploration of data Data validation Missing data evaluation 2 BIOF339,

570 views • 27 slides

Learning without Borders: A qualitative exploration of a service-learning collaboration between

Learning without Borders: A qualitative exploration of a service-learning collaboration between healthcare and computer science students in the Dominican Republic Courtney Cox PharmD Candidate 2018, Sarah Lenahan PharmD Candidate 2019, Patrick

628 views • 15 slides

Approximate Q-Learning 3-25-16 Exploration policy vs. optimal policy Where do the exploration

Approximate Q-Learning 3-25-16 Exploration policy vs. optimal policy Where do the exploration traces come from? We need some policy for acting in the environment before we understand it. Wed like to get decent rewards while

265 views • 14 slides

Distributional Reinforcement Learning for Efficient Exploration Hengshuai Yao Huawei Hi-Silicon

Distributional Reinforcement Learning for Efficient Exploration Hengshuai Yao Huawei Hi-Silicon June, 2019 Hengshuai Yao (Huawei Hi-Silicon) June, 2019 1/10 The exploration problem Exploration is a long standing problem in Reinforcement

346 views • 10 slides

Contributions Introduction Data Exploration without Specification B. Saket, H. Kim, E. T. Brown

Visualization by Demonstration: An Interaction Paradigm for Visual Data Exploration Contributions Introduction Data Exploration without Specification B. Saket, H. Kim, E. T. Brown and A. Endert IEEE Transactions on Visualization and Computer

464 views • 3 slides

Conflicts of Interest None to disclose Disclaimer of Endorsement

Military Culture Competency Providing Quality Care for Those Whove Served 14 th Annual Statewide Integrated Care Conference October 26, 2017 Walter Dunn MD, PhD Greater Los Angeles

966 views • 45 slides

Veteran Benefit Enhancement Program Public Assistance Reporting Information System (PARIS)

Veteran Benefit Enhancement Program Public Assistance Reporting Information System (PARIS) 06/13/2018 Introductions Bill Allman WA State HCA President, PARIS Board Tim Dahlin WA State HCA Member, PARIS Board 2 In Todays

331 views • 31 slides

Disclosures Respiratory Hazards of Military Restricted research grants through Service

3/1/2017 Disclosures Respiratory Hazards of Military Restricted research grants through Service Department of Veterans Affairs research foundation from: Five Prime Therapeutics GlaxoSmithKline Mehrdad Arjomandi, MD MagArray Inc.

316 views • 8 slides

Module 5 Understanding DIC and How to Apply Topics Covered in This Module Entitlement to

Module 5 Understanding DIC and How to Apply Topics Covered in This Module Entitlement to Dependency and Indemnity Compensation (DIC) DIC Rate Table DIC Granted Automatically When DIC Is Not Automatic Aid and Attendance

222 views • 7 slides

Learning from Language Jacob Andreas Doing things with language 2 Doing things with language

Learning from Language Jacob Andreas Doing things with language 2 Doing things with language Who is left of Go up, then go left. the truck? The hooded oriole is a large bird with black wings. A man with a white shirt and

1.73k views • 144 slides

Clocks Adam T. Sampson School of Computing, University of Kent Neil C. C. Brown School of

Clocks Adam T. Sampson School of Computing, University of Kent Neil C. C. Brown School of Computing, University of Kent Motivation My day job Big concurrent distributed simulations Lots of agents Discrete time steps Implemented

213 views • 10 slides

Cellular automata and agent-based models Matthew Macauley Department of Mathematical Sciences

Cellular automata and agent-based models Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Spring 2017 M. Macauley (Clemson) Cellular automata and agent-based models Math

209 views • 18 slides

Mobile Money Global Event Wednesday October 7, 2015 Cape Town, South Africa #MMGE15

Mobile Money Global Event Wednesday October 7, 2015 Cape Town, South Africa #MMGE15 #mobile360Africa Welcome Seema Desai, Head of Mobile Money, GSMA #MMGE15 #mobile360Africa Mobile money continues to expand globally NUMBER OF LIVE MOBILE

772 views • 59 slides

Learning without Exploration Scott Fujimoto , David Meger, Doina - PowerPoint PPT Presentation

Off-Policy Deep Reinforcement Learning without Exploration Scott Fujimoto , David Meger, Doina Precup Mila, McGill University Surprise! Agent orange and agent blue are trained with 1. The same off-policy algorithm (DDPG). 2. The same dataset.

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Without sustaining injury Without sustaining injury Without sustaining injury Without sustaining

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

Exploration and Function Approximation CMU 10703 Katerina Fragkiadaki This lecture Exploration

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

National Forum on Ocean Exploration Ocean Exploration Advisory Board Meeting March 31, 2015

Exploration edge Fraser MacCorquodale General Manager - Exploration What are we looking for Au

Data Exploration Tyler Moore CSE 7338 Computer Science & Engineering Department, SMU,

Data validation and exploration Data validation and exploration Abhijit Dasgupta Abhijit

Learning without Borders: A qualitative exploration of a service-learning collaboration between

Approximate Q-Learning 3-25-16 Exploration policy vs. optimal policy Where do the exploration

Distributional Reinforcement Learning for Efficient Exploration Hengshuai Yao Huawei Hi-Silicon

Contributions Introduction Data Exploration without Specification B. Saket, H. Kim, E. T. Brown

Conflicts of Interest None to disclose Disclaimer of Endorsement

Veteran Benefit Enhancement Program Public Assistance Reporting Information System (PARIS)

Disclosures Respiratory Hazards of Military Restricted research grants through Service

Module 5 Understanding DIC and How to Apply Topics Covered in This Module Entitlement to

Learning from Language Jacob Andreas Doing things with language 2 Doing things with language

Clocks Adam T. Sampson School of Computing, University of Kent Neil C. C. Brown School of

Cellular automata and agent-based models Matthew Macauley Department of Mathematical Sciences

Mobile Money Global Event Wednesday October 7, 2015 Cape Town, South Africa #MMGE15

Sambuz

Useful Links

Newsletter

Mail Us

Learning without Exploration Scott Fujimoto , David Meger, Doina - PowerPoint PPT Presentation

Off-Policy Deep Reinforcement Learning without Exploration Scott Fujimoto , David Meger, Doina Precup Mila, McGill University Surprise! Agent orange and agent blue are trained with 1. The same off-policy algorithm (DDPG). 2. The same dataset.

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Without sustaining injury Without sustaining injury Without sustaining injury Without sustaining

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

Exploration and Function Approximation CMU 10703 Katerina Fragkiadaki This lecture Exploration

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

National Forum on Ocean Exploration Ocean Exploration Advisory Board Meeting March 31, 2015

Exploration edge Fraser MacCorquodale General Manager - Exploration What are we looking for Au

Data Exploration Tyler Moore CSE 7338 Computer Science &amp; Engineering Department, SMU,

Data validation and exploration Data validation and exploration Abhijit Dasgupta Abhijit

Learning without Borders: A qualitative exploration of a service-learning collaboration between

Approximate Q-Learning 3-25-16 Exploration policy vs. optimal policy Where do the exploration

Distributional Reinforcement Learning for Efficient Exploration Hengshuai Yao Huawei Hi-Silicon

Contributions Introduction Data Exploration without Specification B. Saket, H. Kim, E. T. Brown

Conflicts of Interest None to disclose Disclaimer of Endorsement

Veteran Benefit Enhancement Program Public Assistance Reporting Information System (PARIS)

Disclosures Respiratory Hazards of Military Restricted research grants through Service

Module 5 Understanding DIC and How to Apply Topics Covered in This Module Entitlement to

Learning from Language Jacob Andreas Doing things with language 2 Doing things with language

Clocks Adam T. Sampson School of Computing, University of Kent Neil C. C. Brown School of

Cellular automata and agent-based models Matthew Macauley Department of Mathematical Sciences

Mobile Money Global Event Wednesday October 7, 2015 Cape Town, South Africa #MMGE15

Sambuz

Useful Links

Newsletter

Mail Us

Data Exploration Tyler Moore CSE 7338 Computer Science & Engineering Department, SMU,