extrapolating beyond suboptimal demonstrations via
play

Extrapolating Beyond Suboptimal Demonstrations via Inverse - PowerPoint PPT Presentation

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown*, Wonjoon Goo*, Prabhat Nagarajan, and Scott Niekum Inverse Reinforcement Learning Current approaches 1. Cant do better


  1. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown*, Wonjoon Goo*, Prabhat Nagarajan, and Scott Niekum

  2. Inverse Reinforcement Learning Current approaches … 1. Can’t do better than the demonstrator. We find a reward function that explains the ranking, allowing for extrapolation. 2. Are hard to scale to complex problems.

  3. Inverse Reinforcement Learning IRL via Ranked Current approaches … Demonstrations 1. Can’t do better than the demonstrator. We find a reward function that explains the ranking, allowing for extrapolation. 2. Are hard to scale to complex problems.

  4. Inverse Reinforcement Learning IRL via Ranked Current approaches … Demonstrations 1. Can’t do better than the demonstrator. We find a reward function that explains the ranking, allowing for extrapolation. 2. Are hard to scale to complex problems.

  5. Inverse Reinforcement Learning IRL via Ranked Current approaches … Demonstrations 1. Can’t do better than the demonstrator. We find a reward function that explains the ranking, allowing for extrapolation. 2. Are hard to scale to complex problems. Inverse Reinforcement Learning becomes standard binary classification.

  6. Trajectory-ranked Reward Extrapolation (T-REX)

  7. Trajectory-ranked Reward Extrapolation (T-REX)

  8. Trajectory-ranked Reward Extrapolation (T-REX) Given ranked demonstrations How do we train the reward function ?

  9. Trajectory-ranked Reward Extrapolation (T-REX)

  10. Trajectory-ranked Reward Extrapolation (T-REX)

  11. Trajectory-ranked Reward Extrapolation (T-REX)

  12. Trajectory-ranked Reward Extrapolation (T-REX)

  13. Trajectory-ranked Reward Extrapolation (T-REX)

  14. Trajectory-ranked Reward Extrapolation (T-REX)

  15. Trajectory-ranked Reward Extrapolation (T-REX)

  16. Trajectory-ranked Reward Extrapolation (T-REX)

  17. Trajectory-ranked Reward Extrapolation (T-REX) We subsample trajectories to create a large dataset of weakly labeled pairs!

  18. Trajectory-ranked Reward Extrapolation (T-REX) • Simple: • IRL as binary classification. • No human supervision during policy learning. • No inner-loop MDP solver. • No inference time data collection (e.g. GAIL). • No action labels required.

  19. Trajectory-ranked Reward Extrapolation (T-REX) • Simple: • IRL as binary classification. • No human supervision during policy learning. • No inner-loop MDP solver. • No inference time data collection (e.g. GAIL). • No action labels required. • Scales to high-dimensional tasks (e.g. Atari games)

  20. Trajectory-ranked Reward Extrapolation (T-REX) • Simple: • IRL as binary classification. • No human supervision during policy learning. • No inner-loop MDP solver. • No inference time data collection (e.g. GAIL). • No action labels required. • Scales to high-dimensional tasks (e.g. Atari games) • Can produce policies much better than demonstrator

  21. T-REX Policy Performance

  22. T-REX on HalfCheetah Best demo (88.97) T-REX (143.40)

  23. Reward Extrapolation T-REX can extrapolate beyond the performance of the best demo HalfCheetah Hopper Ant

  24. Results: Atari Games T-REX ou outperf rform orms b best d demon onstration on on 7 out of 8 g 8 games! s!

  25. T-REX on Enduro Best demo (84) T-REX (520)

  26. Come see our poster @ Pacific Ballroom #47 Human demos / ranking labels Robust to noisy ranking labels Automatic ranking by Reward function visualization watching a learner improve at a task

  27. Come see our poster @ Pacific Ballroom #47 Human demos / ranking labels Robust to noisy ranking labels Automatic ranking by Reward function visualization watching a learner improve T-REX at a task

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend