Cautious Adaptation For RL in Safety-Critical Settings
International Conference on Machine Learning 2020
Jesse Zhang1, Brian Cheung1, Chelsea Finn2, Sergey Levine1, Dinesh Jayaraman3
1
1 2 3
Cautious Adaptation For RL in Safety-Critical Settings - - PowerPoint PPT Presentation
3 1 2 Cautious Adaptation For RL in Safety-Critical Settings International Conference on Machine Learning 2020 Jesse Zhang 1 , Brian Cheung 1 , Chelsea Finn 2 , Sergey Levine 1 , Dinesh Jayaraman 3 1 Outline Short overview (4 Minutes)
International Conference on Machine Learning 2020
Jesse Zhang1, Brian Cheung1, Chelsea Finn2, Sergey Levine1, Dinesh Jayaraman3
1
1 2 3
2
safety-critical settings
environments safely?
3
4
5
○ Pretraining: Sandbox environments ○ Adaptation: Safety-critical target environment
Transfer risk knowledge from prior experience
6
○ Pretraining: probabilistic models capture state transition uncertainty1 ○ Adaptation: utilize uncertainty to safely adapt to new environment (planning cost function modification)
7
1PETS (Chua et al., 2018)
8
1Duckietown (Chevalier-Boisvert et al., 2018)
Cartpole (varying pole lengths) Duckietown (varying car width) Half Cheetah (varying disabled joint)
9
10
11
12
○ Probabilistic dynamics models
○ Comparison to other methods ○ Average reward, # of catastrophic events
13
14
(2015); Rajeswaran et al. (2016);
Fisac et al (2017); Sadigh & Kapoor (2017); Berkenkamp et al (2017); Ostafew et al (2016); Hakobyan et al (2019); Hanssen & Foss (2015); Hewing et al (2019); Aswani et al (2013) Nagabandi et al (2018); Sæmundsson et al (2018); Finn et al (2017);
15
Ensemble of Probabilistic Dynamics Models Trajectory sampling for candidate action selection Sequence with highest action score is executed Action Score Over predicted trajectories with actions A Reward for i’th trajectory
16
single environment
○ CARL: Captures uncertainty induced by variations across environments
○ Randomly sample domain ○ Dynamics model captures uncertainty about state transitions, reward, and risk
○ Risk averse action selection
Case 1: Low Reward Risk-Aversion, CARL (Reward)
17
Case 2: Catastrophic State Risk-Aversion, CARL (State)
18
a catastrophic set
Case 2: Catastrophic State Risk-Aversion
19
20
21
1Duckietown (Chevalier-Boisvert et al., 2018)
Cartpole (varying pole lengths) Duckietown (varying car width) Half Cheetah (varying disabled joint)
22
2(Finn et al., 2017)
1(Pinto et al., 2017)
23
(State)
24
(State)
25
(State)
○ Train on sandbox environments, adapt to safety-critical environments
○ Capture source uncertainty, perform risk-averse planning Thank you!
26