The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 - - PowerPoint PPT Presentation

β–Ά
the differentiable cross entropy method
SMART_READER_LITE
LIVE PREVIEW

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 - - PowerPoint PPT Presentation

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1 Facebook AI Research 2 New York University brandondamos denisyarats bamos.github.io cs.nyu.edu/~dy1042 The cross-entropy method is a powerful


slide-1
SLIDE 1

The Differentiable Cross-Entropy Method

Br Brandon Amos1 De Denis is Yarats12

12 1Facebook AI Research 2New York University

ICML 2020 brandondamos bamos.github.io denisyarats cs.nyu.edu/~dy1042

slide-2
SLIDE 2

The cross-entropy method is a powerful optimizer

Brandon Amos The Differentiable Cross-Entropy Method 2

It Iterative s sampling-ba based d optimizer that:

  • 1. Sa

Sampl ples from the domain

  • 2. Ob

Observes the function’s values

  • 3. Up

Updates the sampling distribution Widely used in co control

  • l and mo

model-ba based d RL

slide-3
SLIDE 3

Problem: CEM breaks end-to-end learning

A common le learnin ing pip ipelin line, e.g. for control, is

  • 1. Fit models with maximum like

kelihood

  • 2. Ru

Run CE CEM on top of the learned models

  • 3. Hope CEM induces re

reasonable downstre ream performance Ob Obje jectiv ive mis ismatch is issue: models are unaware of downstream performance

Brandon Amos The Differentiable Cross-Entropy Method 3

Dynamics !

"

Policy #"(%) Environment State Transitions Reward Trajectories

Trai aining: g: Maximum Likelihood Objective Mismat atch Control Interacts Responses

slide-4
SLIDE 4

The Differentiable Cross-Entropy Method (DCEM)

Brandon Amos The Differentiable Cross-Entropy Method 4

Differentiate backw kwards through the sequence of samples Using di differentiabl ble top-k (LML) and re reparameterization Useful when a fixed point is ha hard to

  • find

nd, or when unrolling gradient descent hits a local optimum A di differentiabl ble controller in the RL setting

slide-5
SLIDE 5

This Talk

Me Meth thod: The differenti tiable-cr cross s en entrop

  • py

y met ethod Applications Learning deep energy-based models Learning embedded optimizers Control

Brandon Amos The Differentiable Cross-Entropy Method 5

slide-6
SLIDE 6

Foundation: The Implicit Function Theorem

Brandon Amos The Differentiable Cross-Entropy Method 6

Given 𝑕(𝑦, 𝑧) and 𝑔 𝑦 = 𝑕 𝑦, 𝑧! , where 𝑧! ∈ {𝑧: 𝑕 𝑦, 𝑧 = 0} How can we compute D"𝑔 𝑦 ? The Im Implicit F Function T Theorem gives D"𝑔 𝑦 = βˆ’D#𝑕 𝑦, 𝑔 𝑦

$%D"𝑕 𝑦, 𝑔 𝑦

under mild assumptions

[Dini 1877, Dontchev and Rockafellar 2009]

D#𝑕(𝑦, 𝑔 𝑦 ) D"𝑕(𝑦, 𝑔 𝑦 )

slide-7
SLIDE 7

Foundation: Differentiable top-k operations

Brandon Amos The Differentiable Cross-Entropy Method 7

[Constrained softmax, constrained sparsemax, Limited Multi-Label Projection]

𝑧⋆ = argmin

#

βˆ’π‘§'𝑦 βˆ’ 𝐼((𝑧) subject to 0 ≀ 𝑧 ≀ 1 1'𝑧 = 𝑙 𝑧⋆ = argmin

#

βˆ’π‘§'𝑦 βˆ’ 𝐼(𝑧) subject to 0 ≀ 𝑧 ≀ 1 1'𝑧 = 1 Optimization perspective of the softmax Limited Multi-Label Projection

slide-8
SLIDE 8

The Differentiable Cross-Entropy Method

In each each iter erat ation

  • n, update a distribution 𝑕) with:

π‘Œ*,, ,-%

.

∼ 𝑕)! β‹… Sa Sampl ple from the domain 𝑀*,, = 𝑔

/ π‘Œ*,,

Ob Observe the function values ℐ* = Ξ β„’"(𝑀*/𝜐) Compute the differentiable to top-k Up Update 𝜚*1% with maximum weighted likelihood And finally return 𝔽[𝑕)#$% β‹… ] Ca Captures vanilla illa CE CEM when the soft top-k is hard Composed of operations with in informativ ive deriv ivativ ives

Brandon Amos The Differentiable Cross-Entropy Method 8

𝑔

/

𝑕)% 𝑕)& 𝑕)'

slide-9
SLIDE 9

This Talk

Method: The differentiable-cross entropy method Appl Applications Le Learni ning ng deep ene nergy gy-ba based d mode dels Learning embedded optimizers Control

Brandon Amos The Differentiable Cross-Entropy Method 9

slide-10
SLIDE 10

Deep Structured Energy Models (SPENs/ICNNs)

Ke Key idea: Model p 𝑦, 𝑧 ∝ exp βˆ’πΉ/ 𝑦, 𝑧 where 𝐹/ is a deep energy model Captures non non-tri trivial stru tructu tures in the output space, while also subsuming feed-forward modes Feedforward model: 𝐹 𝑦, 𝑧 = 𝑔 𝑦 βˆ’ 𝑧 2

2

Pr Predi dict with the optimization problem: R 𝑧 = argmin

#

𝐹/(𝑦, 𝑧) Le Learni ning ng can be done by unr unrol

  • lling

ng op

  • ptimization
  • n on 𝐹/ using derivative information βˆ‡#𝐹

Brandon Amos The Differentiable Cross-Entropy Method 10

[Belanger and McCallum, 2016, Amos, Xu, and Kolter, 2017]

slide-11
SLIDE 11

Unrolling gradient descent may learn bad energies

Unrolling optimizers lo lose the probabilis ilistic ic in interpretatio ion and can ov

  • verfit to
  • the

he op

  • ptimizer

In this regression setting, GD GD le learns barrie iers on the energy surface while DCE DCEM fit its the data

Brandon Amos The Differentiable Cross-Entropy Method 11

slide-12
SLIDE 12

This Talk

Method: The differentiable-cross entropy method Appl Applications Learning deep energy-based models Le Learni ning ng embedded op

  • ptimizers

Control

Brandon Amos The Differentiable Cross-Entropy Method 12

slide-13
SLIDE 13

DCEM can exploit the solution space structure

Brandon Amos The Differentiable Cross-Entropy Method 13

Ful Full Domain

Ma Manifold of Op Opti timal Soluti tions

𝑦⋆ = argmin

"∈ $,& ! 𝑔 𝑦

La Latent nt Manif nifold ld

  • f
  • f Optimal Sol
  • lution
  • ns
slide-14
SLIDE 14

This Talk

Method: The differentiable-cross entropy method Appl Applications Learning deep energy-based models Learning embedded optimizers Co Control

Brandon Amos The Differentiable Cross-Entropy Method 14

slide-15
SLIDE 15

Should RL policies have a system dynamics model or not?

Mo Model-fr free R RL More general, doesn’t make as many assumptions about the world Rife with poor data efficiency and learning stability issues Mo Model-ba based d RL (or control) A useful prior on the world if it lies within your set of assumptions

Brandon Amos Differentiable Optimization-Based Modeling and Continuous Control 15

State Action Po Policy cy Neural Network(s) Future Plan System Dynamics

slide-16
SLIDE 16

Model Predictive Control

Brandon Amos Differentiable Optimization-Based Modeling and Continuous Control 16

Kn Known or le learned from data

slide-17
SLIDE 17

Differentiable Control via DCEM

A pure pl planning pr prob

  • blem given (potentially non-convex) co

cost and dy dynamics:

Brandon Amos The Differentiable Cross-Entropy Method 17

𝜐%:4

⋆

= argmin

5%:#

[

*

𝐷/(𝜐*) subject to 𝑦% = 𝑦init 𝑦*1% = 𝑔

/ 𝜐*

𝑣 ≀ 𝑣 ≀ 𝑣

Cost Dynamics where 𝜐! = {𝑦!, 𝑣!}

Ide Idea: Solve this optimization problem with DCEM and differentiate through it

slide-18
SLIDE 18

Differentiable Control via DCEM

Brandon Amos The Differentiable Cross-Entropy Method 18

Layer z"

…

DCEM

…

A lot of data Model Predictions Loss

What can we do with this now?

Augment neural network k policies in model-free algorithms with MPC policies Fi Fight ght ob

  • bjective mism

smatch by end-to-end learning dynamics The The cost

  • st can also be end-to-end learned! No longer need to hard-code in values

Ca Caveat: Control problems are often intractably high-dimensional, so we use embedded DCEM

slide-19
SLIDE 19

Brandon Amos The Differentiable Cross-Entropy Method 19

sites.google.com/view/diff-cross-entropy-method

DCEM fine-tunes highly non-convex controllers

slide-20
SLIDE 20

The Differentiable Cross-Entropy Method

Br Brandon Amos1 De Denis is Yarats12

12 1Facebook AI Research 2New York University

ICML 2020 brandondamos bamos.github.io denisyarats cs.nyu.edu/~dy1042