Lecture 2: Gradient Estimators
CSC 2547 Spring 2018 David Duvenaud
Based mainly on slides by Will Grathwohl, Dami Choi, Yuhuai Wu and Geoff Roeder
Lecture 2: Gradient Estimators CSC 2547 Spring 2018 David Duvenaud - - PowerPoint PPT Presentation
Lecture 2: Gradient Estimators CSC 2547 Spring 2018 David Duvenaud Based mainly on slides by Will Grathwohl, Dami Choi, Yuhuai Wu and Geo ff Roeder Where do we see this guy? L ( ) = E p ( b | ) [ f ( b )] Just about everywhere!
CSC 2547 Spring 2018 David Duvenaud
Based mainly on slides by Will Grathwohl, Dami Choi, Yuhuai Wu and Geoff Roeder
is the standard method used today to optimize expectations
neural-net based
be computed analytically
distribution and function being optimized
suffer from high variance
ˆ gREINFORCE[f] = f (b) ∂
∂θ log p(b|θ),
b ∼ p(b|θ)
differentiable
reparameterizable
ˆ greparam[f] = ∂f
∂b ∂b ∂θ
b = T(✓, ✏), ✏ ∼ p(✏)
f(b) p(b|θ)
reparameterization
ˆ gconcrete[f] =
∂f ∂σ(z/t) ∂σ(z/t) ∂θ
z = T(✓, ✏), ✏ ∼ p(✏)
p(z|θ) f(b) f(b)
corr(g, c) > 0
f(b)
p(b|θ)
Will Grathwohl Dami Choi Yuhuai Wu Geoff Roeder David Duvenaud
ˆ gLAX = gREINFORCE[f] − gREINFORCE[cφ] + greparam[cφ]
reparameterization estimator
variate
f(b)
cφ(b) cφ(b)
ˆ gLAX = gREINFORCE[f] − gREINFORCE[cφ] + greparam[cφ]
∂θ log p(b|✓) + ∂ ∂θcφ(b)
estimates for the gradient of the variance of
ˆ g cφ
and a function where
(Tucker et al. 2017)
ˆ gRELAX = [f(b) − cφ(˜ z)] ∂
∂θ log p(b|θ) + ∂ ∂θcφ(z) − ∂ ∂θcφ(˜
z) b = H(z), z ∼ p(z|θ), ˜ z ∼ p(z|b, θ)
p(z|θ)
cφ
H(z) = b ∼ p(b|θ) H(
surrogate cannot produce consistent and correct gradients
balance REINFORCE variance and reparameterization variance
unusable
log p(x) ≥ L(θ) = Eq(b|x)[log p(x|b) + log p(b) − log q(b|x)] cφ(z) = f(σλ(z)) + rρ(z)
A3C, ACKTR)
used
argmaxθEτ∼π(τ|θ)[R(τ)]
∂ ∂θEτ∼π(τ|θ)[R(τ)]
estimate of the value function as a control variate
cφ
ˆ gAC =
T
X
t=1
∂ log π(at|st, θ) ∂θ " T X
t0=t
rt0 − cφ(st) #
ˆ gLAX =
T
X
t=1
∂ log π(at|st, θ) ∂θ " T X
t0=t
rt0 − cφ(st, at)
t)
# + ∂ ∂θcφ(st, at)
bootstrapping, trust-region)
uses local info
distributions on permutations
Reparameterizing the Birkhoff Polytope for Variational Permutation Inference
Learning Latent Permutations with Gumbel-Sinkhorn Networks