Making Deep Q-learning Approaches Robust to Time Discretization - - PowerPoint PPT Presentation

making deep q learning approaches robust to time
SMART_READER_LITE
LIVE PREVIEW

Making Deep Q-learning Approaches Robust to Time Discretization - - PowerPoint PPT Presentation

Making Deep Q-learning Approaches Robust to Time Discretization Corentin Tallec L eonard Blier Yann Ollivier Universit e Paris-Sud, Facebook AI Research June 4, 2019 C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4,


slide-1
SLIDE 1

Making Deep Q-learning Approaches Robust to Time Discretization

Corentin Tallec L´ eonard Blier Yann Ollivier

Universit´ e Paris-Sud, Facebook AI Research

June 4, 2019

  • C. Tallec et al. (UPSUD, FAIR)

Framerate robust DQ Learning June 4, 2019 1 / 4

slide-2
SLIDE 2

Reinforcement Learning in Near Continuous Time What happens when using standard RL methods with small time discretization or high framerate?

Usual RL algorithm + high framerate → failure Scalability limited by algorithms! Better hardware, sensors, actuators → Worse performance Contributes to lack of robustness of Deep RL: New environment → different framerate → new hyperparameters. Low FPS High FPS

  • C. Tallec et al. (UPSUD, FAIR)

Framerate robust DQ Learning June 4, 2019 2 / 4

slide-3
SLIDE 3

Why is near continuous Q-learning failing?

There is no continuous time Q-learning

As δt → 0, Qπ(s, a) → V π(s) Qπ does not depend on actions when δt → 0 = ⇒ Cannot use Qπ to select actions!

There is no continuous time ε-greedy exploration

ε-greedy, ε = 1 pendulum: δt = .05 δt = .0001

  • C. Tallec et al. (UPSUD, FAIR)

Framerate robust DQ Learning June 4, 2019 3 / 4

slide-4
SLIDE 4

Can we solve this?

YES

To know how: Poster #32 this evening

Low FPS High FPS

  • C. Tallec et al. (UPSUD, FAIR)

Framerate robust DQ Learning June 4, 2019 4 / 4