CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust - PowerPoint PPT Presentation

Nov 16, 2022 •132 likes •210 views

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and Wright, Chapter 4] University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Op#miza#on in ML It is common to formulate ML problems as optimization

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and Wright, Chapter 4] University of Waterloo CS885 Spring 2018 Pascal Poupart 1
Op#miza#on in ML • It is common to formulate ML problems as optimization problems. – Min squared error – Min cross entropy – Max log likelihood – Max discounted sum of rewards University of Waterloo CS885 Spring 2018 Pascal Poupart 2
Two important classes • Line search methods – Find a direction of improvement – Select a step length • Trust region methods – Select a trust region (analog to max step length) – Find a point of improvement in the region University of Waterloo CS885 Spring 2018 Pascal Poupart 3
Trust Region Methods • Idea: – Approximate objective ! with a simpler objective " ! $ ∗ = '()*+, - ̃ – Solve ̃ !($) $ ∗ might be in a region • Problem: The optimum 0 where " ! poorly approximates ! and therefore $ ∗ might be far from optimal 1 • Solution: restrict the search to a region where we trust " ! to approximate ! well. $ ∗ = '()*+, -∈34563789:;< ̃ – Solve ̃ !($) University of Waterloo CS885 Spring 2018 Pascal Poupart 4
Example ! " o)en chosen to be a quadra5c approxima5on of " • " # ≈ ! " # = f c + ∇" * + # − * + 1 2! # − * + 0(*)(# − *) where 3" is the gradient and 0 is the hessian • Trust region o)en chosen to be a hypersphere # − * 4 ≤ 6 University of Waterloo CS885 Spring 2018 Pascal Poupart 5
Generic Algorithm trustRegionMethod ∗ and % = 0 Initialize ! , " # Repeat % ← % + 1 ∗ = ,-./0% 1 ̃ Solve " + ∗ 3(") subject to " − " +78 9 ≤ ! If ; ∗ ∗ ) then increase ! 3 " + ≈ 3(" + else decrease ! Until convergence University of Waterloo CS885 Spring 2018 Pascal Poupart 6
Trust Region Subproblem ! " often chosen to be a quadratic approximation of " • f c + ∇" + , - − + + 1 2! - − + , 2(+)(- − +) min & subject to - − + 5 ≤ 7 • When 2 is positive semi-definite – Convex optimization – Simple and globally optimal solution • When 2 is not positive semi-definite – Non-convex optimization – Simple heuristics that guarantee improvement University of Waterloo CS885 Spring 2018 Pascal Poupart 7

Recommend

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1, [Sze] Chapter 1 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Introduction to Reinforcement Learning Course website and

424 views • 14 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Recurrent neural networks Long short term memory (LSTM) networks Deep

119 views • 11 slides

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put]

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put] Sec. 11.1-11.3 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Hierarchical RL Hierarchy of goals Reach and actions in Destination

201 views • 8 slides

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Minimax search Evaluation functions Alpha-beta pruning University

662 views • 26 slides

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7,

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Exploration/exploitation tradeoff Regret

228 views • 20 slides

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar] Sec. 2.9 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Bayesian bandits Thompson sampling Contextual bandits

464 views • 22 slides

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Quick recap Markov Decision Processes: value iteration ( " + * ,- Pr "

624 views • 18 slides

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar]

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Markov

509 views • 17 slides

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Environment dynamics Stochastic processes Markovian assumption

156 views • 14 slides

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Value Function Approximation Linear approximation

178 views • 16 slides

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper by Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016) Presented by Yan Shi Outline 1. Introduction 2. Background 3. Algorithms

966 views • 27 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

587 views • 27 slides

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885 Reinforcement Learning Paper by: P. Zhang, L. Xiong, Z. Yu, P. Fang, S. Yan, J. Yao, and Y. Zhou (Sensors 2019) Presented by: Neel Bhatt Context and

286 views • 14 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

938 views • 63 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

Q4 & FY2018 Highlights TSX:TGZ / OTCQX:TGCDF Friday, February 22, 2019 Forward-Looking

Q4 & FY2018 Highlights TSX:TGZ / OTCQX:TGCDF Friday, February 22, 2019 Forward-Looking Statements All information included in this presentation, including any information as to Terangas future financial or operating performance and other

868 views • 35 slides

Le vote lectronique : un dfi pour la vrification formelle Steve Kremer Loria, Inria Nancy

Le vote lectronique : un dfi pour la vrification formelle Steve Kremer Loria, Inria Nancy 1 / 17 Electronic voting Elections are a security-sensitive process which is the cornerstone of modern democracy Electronic voting promises

773 views • 40 slides

15-150 Fall 2020 Stephen Brookes Lecture 17 Sequences and cost graphs Halloween, a full moon

15-150 Fall 2020 Stephen Brookes Lecture 17 Sequences and cost graphs Halloween, a full moon and a time change all happening simultaneously announcements Next Tuesday (3 Nov) is ELECTION DAY Class will be held as usual (on zoom)

829 views • 68 slides

Genomic Medicine Centers Meeting VII Genomic Clinical Decision Support Developing Solutions

Genomic Medicine Centers Meeting VII Genomic Clinical Decision Support Developing Solutions for Clinical and Research Implementations October 2-3, 2014 Introductions Introductions and Welcome: Marc and Blackford Around the room

426 views • 17 slides

HARNESS THE EDGE with StorMagic SvSAN & HPE Edgeline MAKING THE COMPLEX SIMPLE MAKING THE

2020 2019 HARNESS THE EDGE with StorMagic SvSAN & HPE Edgeline MAKING THE COMPLEX SIMPLE MAKING THE COMPLEX SIMPLE CONFIDENTIAL PRESENTERS & AGENDA BRUCE KORNFELD Chief Marketing & Product Officer, StorMagic BHARATH RAMESH

430 views • 17 slides

The Beijing Tier-2 Site: current status and plans Lu Wang, Computing Center Institute of High

The Beijing Tier-2 Site: current status and plans Lu Wang, Computing Center Institute of High Energy Physics, Beijing 3/15/10 The Beijing Tier-2 Site 3/15/10 1/29 Outline Grid activities in 2009 Grid Resource plan for 2010

472 views • 30 slides

A Smarter Pig: Building a SQL interface to Pig using Apache Calcite Eli Levine & Julian Hyde

A Smarter Pig: Building a SQL interface to Pig using Apache Calcite Eli Levine & Julian Hyde Apache: Big Data, Miami 2017/05/16 About us Eli Levine @teleturn PMC member of Phoenix ASF member Julian Hyde @julianhyde Original developer

454 views • 17 slides

D. Frekers Charge-exchange reactions GT-transitions, -decay and Flux @ 1 AU [cm -1

D. Frekers Charge-exchange reactions GT-transitions, -decay and Flux @ 1 AU [cm -1 s -1 MeV -1 )] for lines [cm -1 s -1 ] 1012 pp 1010 things beyond 13N 108 15O 106 17F 8B 104 7Be pep hep 102 0.1 0.2 0.5 1

980 views • 38 slides