Improving Imitation Learning with Reinforcement Learning Niklas - - PowerPoint PPT Presentation

improving imitation learning with reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

Improving Imitation Learning with Reinforcement Learning Niklas - - PowerPoint PPT Presentation

MIN Faculty Department of Informatics Improving Imitation Learning with Reinforcement Learning Niklas Fiedler University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Department of Informatics Technical Aspects of


slide-1
SLIDE 1

MIN Faculty Department of Informatics

Improving Imitation Learning with Reinforcement Learning

Niklas Fiedler

University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Department of Informatics Technical Aspects of Multimodal Systems

November 26, 2019

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

1 / 23

slide-2
SLIDE 2

Outline

Introduction Imitation Learning Combining RL and IL Conclusion

  • 1. Introduction

Motivation

  • 2. Imitation Learning

Demonstration Methods Behavioral Cloning Inverse Reinforcement Learning

  • 3. Combining Reinforcement Learning and Imitation Learning

BC Application IRL Application

  • 4. Conclusion
  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

2 / 23

slide-3
SLIDE 3

Goal

Introduction Imitation Learning Combining RL and IL Conclusion

◮ Imitate expert behavior ◮ Improve learning by including knowledge given by demonstration ◮ Learn expert policies → Make use of expert demonstrations

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

3 / 23

slide-4
SLIDE 4

Motivation

Humans are Awesome

Introduction Imitation Learning Combining RL and IL Conclusion

https://rejectedprincesses.tumblr.com/post/150495232038/ chynara-madinkulova-long-hair-and-aida-akmatova

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

4 / 23

slide-5
SLIDE 5

Motivation

Learning from Demonstration

Introduction Imitation Learning Combining RL and IL Conclusion

Learning from experts is natural behavior

[Haw50], https://www.wakecounseling.com/therapy-blog/play-therapy

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

5 / 23

slide-6
SLIDE 6

Imitation Learning

Introduction Imitation Learning Combining RL and IL Conclusion

Method to learn a behavior based on a demonstration Various forms of demonstration. Two prominent methods of implementation:

  • 1. Behavioral Cloning
  • 2. Inverse Reinforcement Learning
  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

6 / 23

slide-7
SLIDE 7

Imitation Learning

Introduction Imitation Learning Combining RL and IL Conclusion

Method to learn a behavior based on a demonstration Various forms of demonstration. Two prominent methods of implementation:

  • 1. Behavioral Cloning
  • 2. Inverse Reinforcement Learning
  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

6 / 23

slide-8
SLIDE 8

Imitation Learning

Introduction Imitation Learning Combining RL and IL Conclusion

Method to learn a behavior based on a demonstration Various forms of demonstration. Two prominent methods of implementation:

  • 1. Behavioral Cloning
  • 2. Inverse Reinforcement Learning
  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

6 / 23

slide-9
SLIDE 9

Demonstration Methods

Introduction Imitation Learning Combining RL and IL Conclusion

Virtual/Augumented Reality Tracking of Human Motions Teleoperation Video Stream

s3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/ https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

7 / 23

slide-10
SLIDE 10

Demonstration Methods

Introduction Imitation Learning Combining RL and IL Conclusion

Virtual/Augumented Reality Tracking of Human Motions Teleoperation Video Stream

s3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/ https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

7 / 23

slide-11
SLIDE 11

Demonstration Methods

Introduction Imitation Learning Combining RL and IL Conclusion

Virtual/Augumented Reality Tracking of Human Motions Teleoperation Video Stream

s3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/ https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

7 / 23

slide-12
SLIDE 12

Demonstration Methods

Introduction Imitation Learning Combining RL and IL Conclusion

Virtual/Augumented Reality Tracking of Human Motions Teleoperation Video Stream

s3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/ https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

7 / 23

slide-13
SLIDE 13

Behavioral Cloning

Introduction Imitation Learning Combining RL and IL Conclusion

◮ Training a direct link between demonstrated input and output ◮ Large amounts of training data necessary ◮ Poor generalization

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

8 / 23

slide-14
SLIDE 14

Behavioral Cloning

Video

Introduction Imitation Learning Combining RL and IL Conclusion

https://www.youtube.com/watch?v=5BTIE_fhReo

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

9 / 23

slide-15
SLIDE 15

Inverse Reinforcement Learning

Reinforcement Learning

Introduction Imitation Learning Combining RL and IL Conclusion

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

10 / 23

slide-16
SLIDE 16

Inverse Reinforcement Learning

Reinforcement Learning vs. Inversed Reinforcement Learning

Introduction Imitation Learning Combining RL and IL Conclusion

RL IRL given (partially observed) reward function R policy π or history sampled from that policy searching

  • ptimal policy π

for given reward reward function R for which given behavior is optimal

https://thinkingwires.com/posts/2018-02-13-irl-tutorial-1.html

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

11 / 23

slide-17
SLIDE 17

Inverse Reinforcement Learning

Introduction Imitation Learning Combining RL and IL Conclusion

https://medium.com/@sanketgujar95/generative-adversarial-imitation-learning-266f45634e60

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

12 / 23

slide-18
SLIDE 18

Imitation Learning

Behavioral Cloning vs. Inversed Reinforcement Learning

Introduction Imitation Learning Combining RL and IL Conclusion

Behavioral Cloning ◮ Weak generalization ◮ Relatively low computational effort Inversed Reinforcement Learning ◮ Strong generalization ◮ Large computational effort ◮ Complex structure

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

13 / 23

slide-19
SLIDE 19

Combining Reinforcement Learning and Imitation Learning

Introduction Imitation Learning Combining RL and IL Conclusion

◮ Reducing the impact of shortcomings of both methods ◮ Applications should outperform demonstrators after RL applications ◮ Accelerated training process ◮ Extending the capabilities learned with imitation learning

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

14 / 23

slide-20
SLIDE 20

BC Application

Introduction Imitation Learning Combining RL and IL Conclusion

Overcoming Exploration in Reinforcement Learning with Demonstrations

Ashvin Nair12, Bob McGrew1, Marcin Andrychowicz1, Wojciech Zaremba1 and Pieter Abbeel12 2018 IEEE International Conference on Robotics and Automation (ICRA)

1OpenAI 2University of California, Berkeley

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

15 / 23

slide-21
SLIDE 21

BC Application

Goal

Introduction Imitation Learning Combining RL and IL Conclusion

Pushing Sliding Pick and Place

[NMA+18]

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

16 / 23

slide-22
SLIDE 22

BC Application

Results

Introduction Imitation Learning Combining RL and IL Conclusion

[NMA+18]

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

17 / 23

slide-23
SLIDE 23

IRL Application

Introduction Imitation Learning Combining RL and IL Conclusion

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

Yuke Zhu1, Ziyu Wang2, Josh Merel2, Andrei Rusu2, Tom Erez2, Serkan Cabi2, Saran Tunyasuvunakool2, Janos Kramar2, Raia Hadsell2, Nando de Freitas2 and Nicolas Heess2

1Computer Science Department, Stanford University 2OpenAI

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

18 / 23

slide-24
SLIDE 24

IRL Application

Goal

Introduction Imitation Learning Combining RL and IL Conclusion

[ZWM+18]

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

19 / 23

slide-25
SLIDE 25

IRL Application

Method

Introduction Imitation Learning Combining RL and IL Conclusion

[ZWM+18]

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

20 / 23

slide-26
SLIDE 26

IRL Example

Results - Block stacking

Introduction Imitation Learning Combining RL and IL Conclusion

[ZWM+18]

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

21 / 23

slide-27
SLIDE 27

Combining Reinforcement Learning and Imitation Learning

Comparison

Introduction Imitation Learning Combining RL and IL Conclusion

BC Approach ◮ Behavioral Cloning ◮ Simulation only ◮ Goal: improve training performance and task complexity IRL Approach ◮ Inversed Reinforcement Learning ◮ Policies transferred to real robot ◮ Goal: improve result performance and task complexity

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

22 / 23

slide-28
SLIDE 28

Conclusion

Introduction Imitation Learning Combining RL and IL Conclusion

◮ Behavioral cloning is a convenient option to directly mimic experts behavior ◮ Inverse reinforcement learning is able to learn expert policies ◮ More complex reinforcement learning tasks can be realized ◮ When combined with reinforcement learning, demonstrators can be outperformed

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

23 / 23

slide-29
SLIDE 29

Useful Links

References

GAIL explained in a blog post:

https://medium.com/@sanketgujar95/ generative-adversarial-imitation-learning-266f45634e60

Behavioral Cloning explained in a blog post:

https://medium.com/@ksakmann/ behavioral-cloning-make-a-car-drive-like-yourself-dc6021152713

Source code and model of behavioral cloning based self-driving car:

https://github.com/ksakmann/CarND-BehavioralCloning

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

1 / 5

slide-30
SLIDE 30

References

References

[Haw50] TH Hawkins, Opening of milk bottles by birds, Nature 165 (1950), no. 4194, 435–436. [NMA+18] Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel, Overcoming exploration in reinforcement learning with demonstrations, 2018 IEEE International Conference

  • n Robotics and Automation (ICRA), IEEE, 2018,
  • pp. 6292–6299.

[ZWM+18] Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, János Kramár, Raia Hadsell, Nando de Freitas, et al., Reinforcement and imitation learning for diverse visuomotor skills, arXiv preprint arXiv:1802.09564 (2018).

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

2 / 5

slide-31
SLIDE 31

Prediction in Collaboration

References

https://www.kuka.com/-/media/kuka-corporate/images/industries/case-studies/schwingenmontage/ flexfellow_mrk_header.jpg

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

3 / 5

slide-32
SLIDE 32

IRL Application

Network Structure

References

[ZWM+18]

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

4 / 5

slide-33
SLIDE 33

IRL Application

Full Results

References

[ZWM+18]

  • N. Fiedler – Improving Imitation Learning with Reinforcement Learning

5 / 5