Variational Deep Q-Networks in Edward Harri Bell-Thomas R244: Open - PowerPoint PPT Presentation

Feb 29, 2024 •666 likes •774 views

Variational Deep Q-Networks in Edward Harri Bell-Thomas R244: Open Source Project Presentation 19/11/2019 Q-Learning Q-Learning is model-free reinforcement learning. Q is the action-value function defining the reward used for reinforcement

Variational Deep Q-Networks in Edward Harri Bell-Thomas R244: Open Source Project Presentation 19/11/2019
Q-Learning Q-Learning is model-free reinforcement learning. Q is the action-value function defining the reward used for reinforcement — this is learned. Conceptually, h P 1 i t =  r t γ t | s  = s, a  = a Q π ( s, a ) = E a t ⇠ π ( ·| s t )
Q-Learning: Bellman Error The value of Q π at a certain point in time, t , in terms of the payo ff from an initial choice, a t , and the value of the remaining decision problem that results after that choice. h� Q π ( s t , a t ) � max �  i E [ r t + γ Q π ( s t +  , a )] J ( π ) = E α a α ! s  ⇠ ρ , a t ⇠ π ( ·| s t )
Deep Q-Networks Briefly Approximate the action-value function Q π ( s, a ) with a neural network Q θ ( s, a ). The (greedy) policy represented by this is π θ . Discretise the expectation using K sample trajectories, each with period T . Use this to approximate J ( θ ). K T �  h i ˜ ( Q ( i ) θ ( s ( i ) t , a ( i ) r t + γ Q ( i ) θ ( s ( i ) J ( θ ) =   )  P P t )) � max t +  , a ) K T a i =  t = 
Variational Inference Main Concepts: 1. Try to solve an optimisation problem over a class of tractable distributions, q , parameterised by φ , in order to find the one most similar to p . 2. φ min φ KL ( q φ ( θ ) k p ( θ | D )) 3. Approximate this using gradient descent.
Variational Deep Q-Networks Idea: For e ffi cient exploration we need q φ ( θ ) to be dispersed — near even coverage of the parameter space. Encourage this by adding an entropy bonus to the objective. h� �  i Q θ ( s j , a j ) � max a 0 E [ r j + γ Q θ ( s 0 j , a 0 )] � λ H ( q φ ( θ )) E θ ⇠ q φ ( θ ) Assigning systematic randomness to Q enables e ffi cient exploration of the policy space. Further, encouraging high entropy over parameter distribution prevent premature convergence. tl;dr Higher chance of finding maximal rewards in a faster time than standard DQNs.
Algorithm Figure: VDQN Pseudocode.
Aim / Goals Workplan
Questions?

Recommend

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Lecture 3 Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS In this talk I will in some detail describe the paper of Kingma and Welling. Auto-Encoding Variational Bayes , International

706 views • 58 slides

Interact with me: #ComeONLINE Edward Esene @EdwardEsene2 Edward Esene @EdwardEsene Edward

Interact with me: #ComeONLINE Edward Esene @EdwardEsene2 Edward Esene @EdwardEsene Edward Esene edward@plusinnovationhub.com esene.edward@gmail.com Edward Esene bit.ly/getdigitalskills bit.ly/getdigitalskills THRIVE: Opportunities in

182 views • 17 slides

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

An Introduction to An Introduction to Variational Variational Methods for Graphical Models Methods for Graphical Models By Jordan, M., Ghahramani, Z., Jaakkola, T.S., Saul, L.K. Basics of Basics of Variational Variational Methodology

549 views • 40 slides

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? p*(x) Want to estimate some distribution, p*(x) What is Variational Inference?

684 views • 52 slides

Variational Russian Roulette for Variational Russian Roulette for Deep Bayesian Nonparametrics

Variational Russian Roulette for Variational Russian Roulette for Deep Bayesian Nonparametrics Deep Bayesian Nonparametrics Kai Xu [1] Joint work with Akash Srivastava [1,2] and Charles Sutton [1,3,4] [1] University of Edinburgh [2] MIT-IBM

372 views • 6 slides

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28 Chaoqi Wang Sana Tonekaboni Will Grathwohl Group2: Variational inference for GPs. Slides 29 - 57 Trefor Evans Kingsley Chang Shems Saleh James

1.32k views • 98 slides

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational Inference Interested in computing posterior , but it is often intractable parametrize a variational family of distributions to

623 views • 12 slides

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

528 views • 35 slides

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations deeplearning.cce2020@gmail.com Deep Networks Intuition Neural networks with multiple hidden layers - Deep networks [Hinton, 2006] Deep Networks Intuition

1.33k views • 28 slides

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys: surveys: the case of the Chandra Deep Field South the case of the Chandra Deep Field South the case of the Chandra Deep Field South Fabrizio Fiore

423 views • 21 slides

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels

1.45k views • 34 slides

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

563 views • 43 slides

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor - Sriram Ganapathy (sriramg@iisc.ac.in.in) Understanding Deep Networks Understanding Deep Networks Understanding Deep Networks SVHN dataset

366 views • 23 slides

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas Variational Inference Approximate by posterior Idea maximizing , bound lower Ptt variational fly ply ? ) a , , ) " 9) * =

709 views • 15 slides

The Variational Predictive Natural Gradient Da Tang 1 Rajesh Ranganath 2 1 Columbia University 2

The Variational Predictive Natural Gradient Da Tang 1 Rajesh Ranganath 2 1 Columbia University 2 New York University June 12, 2019 Variational Inference Latent variable models: p ( x , z ; ) = p ( z ) p ( x | z ; ). Variational

481 views • 20 slides

American-style options, stochastic volatility, and degenerate parabolic variational inequalities

Introduction Stationary variational inequalities for Heston generator Evolutionary variational inequalities for Heston generator Stochastic representation of solutions to variational problems References American-style options, stochastic

1.2k views • 79 slides

WHY SHOULD WE CARE? EARLY BRAIN DEVELOPMENT THE RETURN ON EARLY INVESTMENT BUILDING A STRONG

WHY SHOULD WE CARE? EARLY BRAIN DEVELOPMENT THE RETURN ON EARLY INVESTMENT BUILDING A STRONG FOUNDATION Why are the first 2,000 days so critical? IF YOU CARE ABOUT Economic Well-being Crime Reduction Thriving Communities

668 views • 62 slides

S6 INDUCTION JUNE 2020 Leadership * Values * Responsibility * Ownership * Reflection new normal =

S6 INDUCTION JUNE 2020 Leadership * Values * Responsibility * Ownership * Reflection new normal = new opportunities S6 INDUCTION A big well done and thank you to you all for your hard work and completion of the all the tasks set so far.

330 views • 6 slides

SOMETHING NEW IN ONLINE? YES, DIRECT MAIL ! Direct Mail as a NEW source for online marketers

SOMETHING NEW IN ONLINE? YES, DIRECT MAIL ! Direct Mail as a NEW source for online marketers @gorkagarmendia Threats of the online world Opportunity for the direct mail Where & how The best for the end 625 per click Source:

1.01k views • 75 slides

How to clearly identify qualified prospects who are active on LinkedIn 1 How to use existing lead

How to clearly identify qualified prospects who are active on LinkedIn 1 How to use existing lead lists to spark conversations on LinkedIn 2 Simple methods for making introductions (connection requests) with up to 100 prospects every day 3 How

948 views • 68 slides

Home for All Learning Network November 29, 2018 Agenda Welcome and Introductions

Home for All Learning Network November 29, 2018 Agenda Welcome and Introductions Community Partners Enhance Housing Engagement Upcoming Events Meeting Evaluation and Input on Preferred Formats Thanks and Adjourn Home for

889 views • 50 slides

LIFELOGGING A NEW BIG DATA CHALLENGE Dr Cathal Gurrin cathal@gmail.com & @cathal INSIGHT,

LIFELOGGING A NEW BIG DATA CHALLENGE Dr Cathal Gurrin cathal@gmail.com & @cathal INSIGHT, Dublin City University, Ireland EU Delegation to Japan 03rd October 2014 BIG DATA Data so large and complex that it becomes difficult to process

397 views • 17 slides

Quarter 4 2 0 0 5 Results for DiGi Analyst presentation February 15 th 2006 Chee Pok Jin, CMO

Quarter 4 2 0 0 5 Results for DiGi Analyst presentation February 15 th 2006 Chee Pok Jin, CMO Johan Dennelind, CFO DiGi Q3 2005 Quarter four in brief DiGi Q3 2005 DiGi Highlights Stro ng re ve nue g ro wth with susta ine d pro fit ma rg

517 views • 29 slides

Mediated Memories in the Digital Age Jos van Dijck Presentation by Amber West Memorization

Mediated Memories in the Digital Age Jos van Dijck Presentation by Amber West Memorization But besides their personal value, collections of mediated memories raise interesting questions about a persons identity in a specific

242 views • 23 slides

Variational Deep Q-Networks in Edward Harri Bell-Thomas R244: Open - PowerPoint PPT Presentation

Variational Deep Q-Networks in Edward Harri Bell-Thomas R244: Open Source Project Presentation 19/11/2019 Q-Learning Q-Learning is model-free reinforcement learning. Q is the action-value function defining the reward used for reinforcement

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Interact with me: #ComeONLINE Edward Esene @EdwardEsene2 Edward Esene @EdwardEsene Edward

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Russian Roulette for Variational Russian Roulette for Deep Bayesian Nonparametrics

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

The Variational Predictive Natural Gradient Da Tang 1 Rajesh Ranganath 2 1 Columbia University 2

American-style options, stochastic volatility, and degenerate parabolic variational inequalities

WHY SHOULD WE CARE? EARLY BRAIN DEVELOPMENT THE RETURN ON EARLY INVESTMENT BUILDING A STRONG

S6 INDUCTION JUNE 2020 Leadership * Values * Responsibility * Ownership * Reflection new normal =

SOMETHING NEW IN ONLINE? YES, DIRECT MAIL ! Direct Mail as a NEW source for online marketers

How to clearly identify qualified prospects who are active on LinkedIn 1 How to use existing lead

Home for All Learning Network November 29, 2018 Agenda Welcome and Introductions

LIFELOGGING A NEW BIG DATA CHALLENGE Dr Cathal Gurrin cathal@gmail.com &amp; @cathal INSIGHT,

Quarter 4 2 0 0 5 Results for DiGi Analyst presentation February 15 th 2006 Chee Pok Jin, CMO

Mediated Memories in the Digital Age Jos van Dijck Presentation by Amber West Memorization

LIFELOGGING A NEW BIG DATA CHALLENGE Dr Cathal Gurrin cathal@gmail.com & @cathal INSIGHT,