Using State Predictions for Value Regularization in Curiosity Driven - PowerPoint PPT Presentation

Oct 05, 2022 •257 likes •671 views

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning Oliver Richter, , Manuel Fritsche Gino Brunner, Roger Wattenhofer ETH Zurich Distributed Computing

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning Oliver Richter, , Manuel Fritsche Gino Brunner, Roger Wattenhofer ETH Zurich – Distributed Computing – www.disco.ethz.ch TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AAAA
Base actions on predictions
Reinforcement learning Agent Environment
Reinforcement learning
How to choose the action?
Return value
Value function
Reinforcement learning Agent Environment
Sparse reward settings ? Agent Environment
Agent Environment
Reward the exploration of novel states
Reward the exploration of novel states
How to find novel states? make predictions A
How to find novel states? make predictions get surprised A F
Curiosity prediction reality
Asynchronous Advantage Actor-Critic architecture (A3C) Feature A3C Extractor Network A3C
Adding curiosity Feature A3C Extractor Network 1 Forward Feature Model Extractor 2
Learning good features Feature A3C Extractor Network 1 Forward Feature Model Extractor 2 Inverse Feature Model Extractor 2 Pathak et. al, ICML 2017, A3C + ICM
Good features for all Feature A3C Extractor Network Forward Model Inverse Feature Model Extractor A3C + Pred
Adding Value Prediction Feature A3C Extractor Network Forward A3C Model Network Inverse Feature Model Extractor A3C + Pred + VPC
Value Prediction Consistency
Value Prediction Consistency
Value Prediction Consistency
Let’s see how it works in practice
Rewards per episode
Rewards per episode
Rewards per episode
Rewards per episode
Thinking bigger
Rewards per episode
Rewards per episode
Rewards per episode
Rewards per episode
Doom environment
Doom Setup
Rewards per episode
Rewards per episode
Rewards per episode
Rewards per episode
Question & Answers ?

Recommend

1 Predictions for 2020 Predictions for 2020 We will live in flying houses. 1966

1 Predictions for 2020 Predictions for 2020 We will live in flying houses. 1966 Predictions for 2020 We will have personal helicopters. 1951 Predictions for 2020 C, X, and Q will not be part of the alphabet. 1900 Predictions

818 views • 49 slides

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Regularization Overview Regularization Overview Problems & Multicollinearity We will discuss three popular methods for obtaining better estimates of the linear model coefficients Regularization Techniques Principal

305 views • 12 slides

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970 Lecture 3: Stephen Scott Stephen Scott and Vinod and Vinod Regularization Variyam Variyam Machine learning can generally be distilled to an

551 views • 9 slides

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Regularization is a general approach to add a complexity parameter to a learning algorithm. Requires that the model parameters be continuous. (i.e., Regression OK, IAML: Regularization and Ridge Regression Decision trees

204 views • 3 slides

10. Regularization More on tradeoffs Regularization Effect of using different norms

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 10. Regularization More on tradeoffs Regularization Effect of using different norms Example: hovercraft revisited Laurent Lessard (www.laurentlessard.com) Review of

407 views • 21 slides

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization, Early stopping, Dataset augmentation, Parameter sharing and tying, Injecting noise at input, Ensemble methods, Dropout Mitesh M. Khapra Department of

1.06k views • 86 slides

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco Regularization via Spectral Filtering About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse

889 views • 48 slides

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

June 2006 Trevor Hastie, Stanford Statistics 1 June 2006 Trevor Hastie, Stanford Statistics 2 Theme Regularization Paths Boosting fits a regularization path toward a max-margin classifier. Svmpath does as well. Trevor Hastie In

112 views • 9 slides

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN) - Image Team ICIP2005, Genova, 11-14 September 2005 Data Regularization Aim of regularization consists in transforming a noisy signal into a

583 views • 22 slides

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd Wachsmuth (TU Chemnitz) 1. Control-constrained problems 2. Tychonov regularization and regularization error estimates 3. Necessary conditions for

622 views • 35 slides

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S. Villa CNRS, cole Normale Suprieure Sminaire CVN - Centrale Suplec - 23 Jan 2018 Regularization of inverse problems 1 Regularization by

720 views • 56 slides

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics and Systems Science, CAS Table of contents 1. Introduction 2. Regularization methods 3. Input design for regularization methods 4. Conclusions

516 views • 48 slides

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About this class Goal To analyze the limits of learning from examples in high dimensional spaces. To introduce the semi-supervised setting and the use of

437 views • 31 slides

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for Multi-Output Learning About this class Goal In many practical problems, it is convenient to model the object of interest as a function with multiple

507 views • 47 slides

Time Predictions in Uber Eats Zi Wang@Uber QCon New York 2019 June 2019 Agenda 1. ML in Uber

Time Predictions in Uber Eats Zi Wang@Uber QCon New York 2019 June 2019 Agenda 1. ML in Uber Eats Goals & Challenges ML Platform @ Uber 2. How Time Predictions Power Dispatch System 3. Deep Dive in Time Predictions Food

953 views • 53 slides

On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study

On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection Vivian Lai and Chenhao Tan @vivwylai | @chenhaotan vivlai.github.io | chenhaot.com University of Colorado Boulder

391 views • 24 slides

Machine Learning Networking Event Paul Sherrer Institut 15th January 2018 Johannes Kirschner

Machine Learning Networking Event Paul Sherrer Institut 15th January 2018 Johannes Kirschner Learning and Adaptive Systems Group, ETH Zurich Learning and Adaptive Systems Group, ETH Zurich Johannes Kirschner | 15 January 2018 | 1 Learning

424 views • 4 slides

Deep Reinforcement Learning Axel Perschmann Supervisor: Ahmed Abdulkadir Seminar: Current Works

Deep Reinforcement Learning Axel Perschmann Supervisor: Ahmed Abdulkadir Seminar: Current Works in Computer Vision Research Group: Pattern Recognition and Image Processing Albert-Ludwigs-Universit at Freiburg 07. July 2016 Reinforcement

1k views • 45 slides

Presentation at 63 th Regular Session of CICAD April 27, 2018 Ann Fordham Executive Director,

Presentation at 63 th Regular Session of CICAD April 27, 2018 Ann Fordham Executive Director, IDPC I am making this statement on behalf of the International Drug Policy Consortium (IDPC). IDPC is a global network of more than 170 NGOs that

485 views • 3 slides

Governance Changes The Officers want to clarify the

Governance Changes The Officers want to clarify the defini5on of a member within our Cons5tu5on and By-Laws. Since SCAAPT does not currently

452 views • 5 slides

IoTwins Project Distributed Digital Twins for Industrial SMEs: a Big Data Platform Paolo

Wo r k s h o p w i t h I C T 1 1 p r o j e c t s - H P C , B i g D a t a , I o T a n d A I f u t u r e i n d u s t r y - d r i v e n c o l l a b o r a t i v e s t r a t e g i c t o p i c s ( p a r t 2 ) @ B D VA , 0 3 / 0 7 / 2 0 2

196 views • 17 slides

What is an Academy? A small learning community within the school Focused and rigorous

Time to think... High School! What is an Academy? A small learning community within the school Focused and rigorous courses in a specific area Grouped by academy in English and Social Studies (when possible) Real-world experience

913 views • 21 slides

Welcome KATM!! Topics for Today Whos struggling? Brief overview of RtI Model , Improving

10/14/2018 Welcome KATM!! Topics for Today Whos struggling? Brief overview of RtI Model , Improving Mathematics one version of a multitiered system of support ( MTSS ) Instruction for Students who What helps students build

67 views • 4 slides

Kane County Case Management System Assessment Project URL Integration Kane County CMS

Kane County Case Management System Assessment Project URL Integration Kane County CMS Assessment Project Final Report Presentation September 30, 2011 Amir Holmes, Senior Business Analyst & Project Manager Proposed Solution The key

485 views • 20 slides