Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
Daniel Brown*, Wonjoon Goo*, Prabhat Nagarajan, and Scott Niekum
Extrapolating Beyond Suboptimal Demonstrations via Inverse - - PowerPoint PPT Presentation
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown*, Wonjoon Goo*, Prabhat Nagarajan, and Scott Niekum Inverse Reinforcement Learning Current approaches 1. Cant do better
Daniel Brown*, Wonjoon Goo*, Prabhat Nagarajan, and Scott Niekum
Current approaches …
demonstrator. We find a reward function that explains the ranking, allowing for extrapolation.
problems.
IRL via Ranked Demonstrations Current approaches …
demonstrator. We find a reward function that explains the ranking, allowing for extrapolation.
problems.
Current approaches …
demonstrator. We find a reward function that explains the ranking, allowing for extrapolation.
problems.
IRL via Ranked Demonstrations
Current approaches …
demonstrator. We find a reward function that explains the ranking, allowing for extrapolation.
problems. Inverse Reinforcement Learning becomes standard binary classification.
IRL via Ranked Demonstrations
Given ranked demonstrations How do we train the reward function ?
We subsample trajectories to create a large dataset of weakly labeled pairs!
learning.
(e.g. GAIL).
learning.
(e.g. GAIL).
(e.g. Atari games)
learning.
(e.g. GAIL).
(e.g. Atari games)
demonstrator
Best demo (88.97) T-REX (143.40)
T-REX can extrapolate beyond the performance of the best demo HalfCheetah Hopper Ant
T-REX ou
rform
best d demon
8 games! s!
Best demo (84) T-REX (520)
Robust to noisy ranking labels Automatic ranking by watching a learner improve at a task Human demos / ranking labels Reward function visualization
Robust to noisy ranking labels Automatic ranking by watching a learner improve at a task Human demos / ranking labels Reward function visualization