DualDICE
Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum,* Yinlam Chow,* Bo Dai, Lihong Li
Google Research
*Equal contribution
DualDICE Behavior-Agnostic Estimation of Discounted Stationary - - PowerPoint PPT Presentation
DualDICE Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections Ofir Nachum ,* Yinlam Chow,* Bo Dai, Lihong Li Google Research *Equal contribution Reinforcement Learning Reinforcement Learning A policy acts on an
*Equal contribution
s0 Initial state distribution β
s0 Initial state distribution β 𝛒(-|s0) a0
s0 Initial state distribution β 𝛒(-|s0) a0 R(-|s0, a0) r0
s0 Initial state distribution β 𝛒(-|s0) T(-|s0, a0) a0 s1 R(-|s0, a0) r0
s0 Initial state distribution β 𝛒(-|s0) T(-|s0, a0) a0 s1 R(-|s0, a0) r0 𝛒(-|s1) T(-|s1, a1) a1 R(-|s1, a1) r1
s0 Initial state distribution β 𝛒(-|s0) T(-|s0, a0) a0 s1 R(-|s0, a0) r0 𝛒(-|s1) T(-|s1, a1) a1 s2 R(-|s1, a1) r1 𝛒(-|s2) T(-|s2, a2) a2 R(-|s2, a2) r2
s0 Initial state distribution β 𝛒(-|s0) T(-|s0, a0) a0 s1 R(-|s0, a0) r0 𝛒(-|s1) T(-|s1, a1) a1 s2 R(-|s1, a1) r1 𝛒(-|s2) T(-|s2, a2) a2 R(-|s2, a2) r2
where transitions are from some unknown distribution
s, a, r, s’ s, a, r, s’ s, a, r, s’ s, a, r, s’ s, a, r, s’
. . .
where transitions are from some unknown distribution
s, a, r, s’ s, a, r, s’ s, a, r, s’ s, a, r, s’ s, a, r, s’
. . .
knowledge of dD(s,a), only samples.
minimize squared Bellman error
s0 s1 s2 s3 s4
minimize squared Bellman error
maximize initial “nu-values”
s0 s1 s2 s3 s4
minimize squared Bellman error
maximize initial “nu-values”
s0 s1 s2 s3 s4
minimize squared Bellman error
maximize initial “nu-values”
s0 s1 s2 s3 s4
minimize squared Bellman error
maximize initial “nu-values”
s0 s1 s2 s3 s4
minimize squared Bellman error
solution by application of Fenchel conjugate!
maximize initial “nu-values”
s0 s1 s2 s3 s4
minimize squared Bellman error
solution by application of Fenchel conjugate!
maximize initial “nu-values”
s0 s1 s2 s3 s4
minimize squared Bellman error