to
play

to ECR Search Policy - Hill Climbing to " 01 ECR - PowerPoint PPT Presentation

Policysearchttill Climbing to ECR Search Policy - Hill Climbing to " 01 ECR Search Genetic Doit a . a toooo : Thilo On Policysearchttill Climbing # to " ECR Search Genetic rennin


  1. Policysearchttill Climbing µ to ECR

  2. Search Policy - Hill Climbing µ to " 01 ECR Search Genetic Doit a . a toooo : Thilo On

  3. Policysearchttill Climbing # to " ECR Search Genetic rennin ÷ ::÷÷?÷i÷÷÷i¥⇒÷ - .

  4. Search Policy - CMA-t

  5. Gradient Bandits :

  6. Gradient Bandits : scalar Just per a arm y . States ! No ! policy But States full RL inflames future in case ,

  7. policy Proof of theorem gradient

  8. policy Proof of theorem gradient push gradient R in Marginalize , → constant Reward Dynamics t Q E r w . . .

  9. policy Proof of theorem gradient push gradient R in Marginalize , → constant Reward Dynamics t ⑦ E r w . . . ' ) Vals creates Expanding → computation deeply nested ; compute At step every every , state could from to get you have stale could every been you in t Transform simple into Sum over time and steps states : What prob total of at is being each each at time state step ?

  10. policy Proof of theorem gradient push gradient R in Marginalize , - constant Reward Dynamics t ⑦ E r w . . . ' ) Vt ( s creates Expanding → computation deeply nested ; compute At step every every , state could from to get you have stale could every been you in I normalized Transform simple into ① Sum on over stole steady time and steps states . : 5 prob of What prob total of at is being each each at time state step ? normalized O version

  11. REINFORCE → f- actions I not All Q approx , Sample return a

  12. REINFORCE →

  13. REINFORCE →

  14. Gradient Bandits + Base line I ← Expectation Mean of Zero Samples

  15. ! Baseline REINFORCE Gradient Bandits t + Baseline I ① Mean Expectation of Zero Samples f Lse ) I

  16. Actually search policy - parameterized Directly - policy valve functions No - ( except baseline ) REINFORCE in Continuous actions - natural to represent High variance - , No bootstrapping w/ policy Scales - Complexity not size , of state space

  17. Critic only Actor only - - value search function policy - - methods parameterized Directly - policy Indirect - policy via VE value functions No - actions Discrete - ( except baseline only ) REINFORCE in variance Lower - Continuous actions I - bootstrapping natural to represent with Scales size - High variance - , state of space No bootstrapping w/ policy Scales - Complexity not size , of state space

  18. Critic only Actor only Actor Critic - - - - - Policy value valve search frickin function policy Search t - . methods Directly parametrized ! both Benefits of - - policy Indirect - policy via UF Continuous actions - valve functions No - actions Discrete - ( except baseline - Bootstrapping only ) Scales primarily REINFORCE in with - variance Lower Policy complexity - Continuous actions I - bootstrapping natural to represent with Scales size - High variance - , state of space No bootstrapping w/ policy Scales - Complexity not size , of state space

  19. Critic only Actor only Actor Critic - - - - - Policy value valve search frickin function policy Search t - . methods Directly parametrized ! both Benefits of - - policy Indirect - policy via VF Continuous actions - valve functions No - actions Discrete - ( except baseline - Bootstrapping only ) Scales primarily REINFORCE in with - variance Lower Policy complexity - Continuous actions I - bootstrapping natural to represent popular Many of most with Scales size - High variance - , state methods of A- space contemporary c are : No bootstrapping Proximal Policy Optimization - w/ policy Scales - A 3C - Complexity not size , Critic Actor of Soft state space - PG DD - : (

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend