c Han and Sung, ICML 2019 1
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
Seungyul Han and Youngchul Sung
- Dept. of Electrical Engineering
KAIST
ICML 2019, Long Beach, CA, USA
- Jun. 12, 2019
Dimension-Wise Importance Sampling Weight Clipping for - - PowerPoint PPT Presentation
Han and Sung, ICML 2019 c 1 Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning Seungyul Han and Youngchul Sung Dept. of Electrical Engineering KAIST ICML 2019, Long Beach, CA, USA Jun. 12,
c Han and Sung, ICML 2019 1
c Han and Sung, ICML 2019 2
c Han and Sung, ICML 2019 3
M−1
M−1
πθi(am|sm) is importance sampling (IS) weight,
c Han and Sung, ICML 2019 4
t := |1 − ρt| + 1 makes more zero-gradient samples.
t is much larger than lower dimensional tasks.
t (left) and the amount of gradient vanishing (right)
c Han and Sung, ICML 2019 5
πθ(at,d|st) πθi(at,d|st) instead of total IS weight ρt.
1 2M
m=0 (log(ρm))2 which enables stable learning.
M−1
c Han and Sung, ICML 2019 6
t,d < 1 + ǫb to avoid too much clipping *.
* Seungyul Han and Youngchul Sung, ”AMBER: Adaptive Multi-Batch Experience Replay for Continuous Action Control,” arXiv, Oct.
c Han and Sung, ICML 2019 7
c Han and Sung, ICML 2019 8
c Han and Sung, ICML 2019 9