Making Deep Q-learning Approaches Robust to Time Discretization - - PowerPoint PPT Presentation

▶

Dec 24, 2023 279 likes •339 views

Making Deep Q-learning Approaches Robust to Time Discretization Corentin Tallec L eonard Blier Yann Ollivier Universit e Paris-Sud, Facebook AI Research June 4, 2019 C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4,

SLIDE 1

Making Deep Q-learning Approaches Robust to Time Discretization

Corentin Tallec L´ eonard Blier Yann Ollivier

Universit´ e Paris-Sud, Facebook AI Research

June 4, 2019

C. Tallec et al. (UPSUD, FAIR)

Framerate robust DQ Learning June 4, 2019 1 / 4

SLIDE 2

Reinforcement Learning in Near Continuous Time What happens when using standard RL methods with small time discretization or high framerate?

Usual RL algorithm + high framerate → failure Scalability limited by algorithms! Better hardware, sensors, actuators → Worse performance Contributes to lack of robustness of Deep RL: New environment → different framerate → new hyperparameters. Low FPS High FPS

C. Tallec et al. (UPSUD, FAIR)

Framerate robust DQ Learning June 4, 2019 2 / 4

SLIDE 3

Why is near continuous Q-learning failing?

There is no continuous time Q-learning

As δt → 0, Qπ(s, a) → V π(s) Qπ does not depend on actions when δt → 0 = ⇒ Cannot use Qπ to select actions!

There is no continuous time ε-greedy exploration

ε-greedy, ε = 1 pendulum: δt = .05 δt = .0001

C. Tallec et al. (UPSUD, FAIR)

Framerate robust DQ Learning June 4, 2019 3 / 4

SLIDE 4

Can we solve this?

YES

To know how: Poster #32 this evening

Low FPS High FPS

C. Tallec et al. (UPSUD, FAIR)

Framerate robust DQ Learning June 4, 2019 4 / 4