SLIDE 23 Simulated Benchmark
Motivation and basics Challenges in DRL Soft actor-critic algorithm Results and Discussion Conclusion
0.0 0.2 0.4 0.6 0.8 1.0 million steps 1000 2000 3000 4000 average return
Hopper-v2
0.0 0.5 1.0 1.5 2.0 2.5 3.0 million steps 1000 2000 3000 4000 5000 6000 7000 average return
Walker2d-v2
0.0 0.5 1.0 1.5 2.0 2.5 3.0 million steps 2500 5000 7500 10000 12500 15000 average return
HalfCheetah-v2
0.0 0.5 1.0 1.5 2.0 2.5 3.0 million steps −1000 1000 2000 3000 4000 5000 6000 7000 average return
Ant-v2
2 4 6 8 10 million steps 2000 4000 6000 8000 average return
Humanoid-v2
2 4 6 8 10 million steps 1000 2000 3000 4000 5000 6000 7000 average return
Humanoid (rllab)
SAC (learned temperature) SAC (fixed temperature) DDPG TD3 PPO
Figure taken from [9]
◮ Comparable to baseline on simple tasks ◮ Exceeds baseline on challenging tasks
Finn Rietz – Soft actor-critic: Deep reinforcement learning for Robotics 23 / 26