cs885 reinforcement learning lecture 14c june 15 2018
play

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and Wright, Chapter 4] University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Op#miza#on in ML It is common to formulate ML problems as optimization


  1. CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and Wright, Chapter 4] University of Waterloo CS885 Spring 2018 Pascal Poupart 1

  2. Op#miza#on in ML • It is common to formulate ML problems as optimization problems. – Min squared error – Min cross entropy – Max log likelihood – Max discounted sum of rewards University of Waterloo CS885 Spring 2018 Pascal Poupart 2

  3. Two important classes • Line search methods – Find a direction of improvement – Select a step length • Trust region methods – Select a trust region (analog to max step length) – Find a point of improvement in the region University of Waterloo CS885 Spring 2018 Pascal Poupart 3

  4. Trust Region Methods • Idea: – Approximate objective ! with a simpler objective " ! $ ∗ = '()*+, - ̃ – Solve ̃ !($) $ ∗ might be in a region • Problem: The optimum 0 where " ! poorly approximates ! and therefore $ ∗ might be far from optimal 1 • Solution: restrict the search to a region where we trust " ! to approximate ! well. $ ∗ = '()*+, -∈34563789:;< ̃ – Solve ̃ !($) University of Waterloo CS885 Spring 2018 Pascal Poupart 4

  5. Example ! " o)en chosen to be a quadra5c approxima5on of " • " # ≈ ! " # = f c + ∇" * + # − * + 1 2! # − * + 0(*)(# − *) where 3" is the gradient and 0 is the hessian • Trust region o)en chosen to be a hypersphere # − * 4 ≤ 6 University of Waterloo CS885 Spring 2018 Pascal Poupart 5

  6. Generic Algorithm trustRegionMethod ∗ and % = 0 Initialize ! , " # Repeat % ← % + 1 ∗ = ,-./0% 1 ̃ Solve " + ∗ 3(") subject to " − " +78 9 ≤ ! If ; ∗ ∗ ) then increase ! 3 " + ≈ 3(" + else decrease ! Until convergence University of Waterloo CS885 Spring 2018 Pascal Poupart 6

  7. Trust Region Subproblem ! " often chosen to be a quadratic approximation of " • f c + ∇" + , - − + + 1 2! - − + , 2(+)(- − +) min & subject to - − + 5 ≤ 7 • When 2 is positive semi-definite – Convex optimization – Simple and globally optimal solution • When 2 is not positive semi-definite – Non-convex optimization – Simple heuristics that guarantee improvement University of Waterloo CS885 Spring 2018 Pascal Poupart 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend