The Differentiable Cross-Entropy Method
Br Brandon Amos1 De Denis is Yarats12
12 1Facebook AI Research 2New York University
ICML 2020 brandondamos bamos.github.io denisyarats cs.nyu.edu/~dy1042
The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 - - PowerPoint PPT Presentation
The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1 Facebook AI Research 2 New York University brandondamos denisyarats bamos.github.io cs.nyu.edu/~dy1042 The cross-entropy method is a powerful
Br Brandon Amos1 De Denis is Yarats12
12 1Facebook AI Research 2New York University
ICML 2020 brandondamos bamos.github.io denisyarats cs.nyu.edu/~dy1042
Brandon Amos The Differentiable Cross-Entropy Method 2
It Iterative s sampling-ba based d optimizer that:
Sampl ples from the domain
Observes the functionβs values
Updates the sampling distribution Widely used in co control
model-ba based d RL
A common le learnin ing pip ipelin line, e.g. for control, is
kelihood
Run CE CEM on top of the learned models
reasonable downstre ream performance Ob Obje jectiv ive mis ismatch is issue: models are unaware of downstream performance
Brandon Amos The Differentiable Cross-Entropy Method 3
Dynamics !
"
Policy #"(%) Environment State Transitions Reward Trajectories
Trai aining: g: Maximum Likelihood Objective Mismat atch Control Interacts Responses
Brandon Amos The Differentiable Cross-Entropy Method 4
Differentiate backw kwards through the sequence of samples Using di differentiabl ble top-k (LML) and re reparameterization Useful when a fixed point is ha hard to
nd, or when unrolling gradient descent hits a local optimum A di differentiabl ble controller in the RL setting
Me Meth thod: The differenti tiable-cr cross s en entrop
y met ethod Applications Learning deep energy-based models Learning embedded optimizers Control
Brandon Amos The Differentiable Cross-Entropy Method 5
Brandon Amos The Differentiable Cross-Entropy Method 6
Given π(π¦, π§) and π π¦ = π π¦, π§! , where π§! β {π§: π π¦, π§ = 0} How can we compute D"π π¦ ? The Im Implicit F Function T Theorem gives D"π π¦ = βD#π π¦, π π¦
$%D"π π¦, π π¦
under mild assumptions
[Dini 1877, Dontchev and Rockafellar 2009]
D#π(π¦, π π¦ ) D"π(π¦, π π¦ )
Brandon Amos The Differentiable Cross-Entropy Method 7
[Constrained softmax, constrained sparsemax, Limited Multi-Label Projection]
π§β = argmin
#
βπ§'π¦ β πΌ((π§) subject to 0 β€ π§ β€ 1 1'π§ = π π§β = argmin
#
βπ§'π¦ β πΌ(π§) subject to 0 β€ π§ β€ 1 1'π§ = 1 Optimization perspective of the softmax Limited Multi-Label Projection
In each each iter erat ation
π*,, ,-%
.
βΌ π)! β Sa Sampl ple from the domain π€*,, = π
/ π*,,
Ob Observe the function values β* = Ξ β"(π€*/π) Compute the differentiable to top-k Up Update π*1% with maximum weighted likelihood And finally return π½[π)#$% β ] Ca Captures vanilla illa CE CEM when the soft top-k is hard Composed of operations with in informativ ive deriv ivativ ives
Brandon Amos The Differentiable Cross-Entropy Method 8
π
/
π)% π)& π)'
Method: The differentiable-cross entropy method Appl Applications Le Learni ning ng deep ene nergy gy-ba based d mode dels Learning embedded optimizers Control
Brandon Amos The Differentiable Cross-Entropy Method 9
Ke Key idea: Model p π¦, π§ β exp βπΉ/ π¦, π§ where πΉ/ is a deep energy model Captures non non-tri trivial stru tructu tures in the output space, while also subsuming feed-forward modes Feedforward model: πΉ π¦, π§ = π π¦ β π§ 2
2
Pr Predi dict with the optimization problem: R π§ = argmin
#
πΉ/(π¦, π§) Le Learni ning ng can be done by unr unrol
ng op
Brandon Amos The Differentiable Cross-Entropy Method 10
[Belanger and McCallum, 2016, Amos, Xu, and Kolter, 2017]
Unrolling optimizers lo lose the probabilis ilistic ic in interpretatio ion and can ov
he op
In this regression setting, GD GD le learns barrie iers on the energy surface while DCE DCEM fit its the data
Brandon Amos The Differentiable Cross-Entropy Method 11
Method: The differentiable-cross entropy method Appl Applications Learning deep energy-based models Le Learni ning ng embedded op
Control
Brandon Amos The Differentiable Cross-Entropy Method 12
Brandon Amos The Differentiable Cross-Entropy Method 13
Ful Full Domain
Ma Manifold of Op Opti timal Soluti tions
π¦β = argmin
"β $,& ! π π¦
La Latent nt Manif nifold ld
Method: The differentiable-cross entropy method Appl Applications Learning deep energy-based models Learning embedded optimizers Co Control
Brandon Amos The Differentiable Cross-Entropy Method 14
Mo Model-fr free R RL More general, doesnβt make as many assumptions about the world Rife with poor data efficiency and learning stability issues Mo Model-ba based d RL (or control) A useful prior on the world if it lies within your set of assumptions
Brandon Amos Differentiable Optimization-Based Modeling and Continuous Control 15
State Action Po Policy cy Neural Network(s) Future Plan System Dynamics
Brandon Amos Differentiable Optimization-Based Modeling and Continuous Control 16
Kn Known or le learned from data
A pure pl planning pr prob
cost and dy dynamics:
Brandon Amos The Differentiable Cross-Entropy Method 17
π%:4
β
= argmin
5%:#
[
*
π·/(π*) subject to π¦% = π¦init π¦*1% = π
/ π*
π£ β€ π£ β€ π£
Cost Dynamics where π! = {π¦!, π£!}
Ide Idea: Solve this optimization problem with DCEM and differentiate through it
Brandon Amos The Differentiable Cross-Entropy Method 18
Layer z"
DCEM
A lot of data Model Predictions Loss
Augment neural network k policies in model-free algorithms with MPC policies Fi Fight ght ob
smatch by end-to-end learning dynamics The The cost
Ca Caveat: Control problems are often intractably high-dimensional, so we use embedded DCEM
Brandon Amos The Differentiable Cross-Entropy Method 19
sites.google.com/view/diff-cross-entropy-method
Br Brandon Amos1 De Denis is Yarats12
12 1Facebook AI Research 2New York University
ICML 2020 brandondamos bamos.github.io denisyarats cs.nyu.edu/~dy1042