The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 - PowerPoint PPT Presentation

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1 Facebook AI Research 2 New York University brandondamos denisyarats bamos.github.io cs.nyu.edu/~dy1042

The cross-entropy method is a powerful optimizer It Iterative s sampling-ba based d optimizer that: ples from the domain 1. Sa Sampl Observes the function’s values 2. Ob Updates the sampling distribution 3. Up ol and mo Widely used in co control model-ba based d RL Brandon Amos The Differentiable Cross-Entropy Method 2

Problem: CEM breaks end-to-end learning A common le learnin ing pip ipelin line , e.g. for control, is 1. Fit models with maximum like kelihood 2. Ru Run CE CEM on top of the learned models 3. Hope CEM induces re reasonable downstre ream performance Ob Obje jectiv ive mis ismatch is issue : models are unaware of downstream performance Control Interacts Dynamics ! Policy # " (%) Environment " Responses Trai aining: g: Maximum Likelihood Objective Mismat atch Trajectories State Transitions Reward Brandon Amos The Differentiable Cross-Entropy Method 3

The Differentiable Cross-Entropy Method (DCEM) Differentiate backw kwards through the sequence of samples ble top-k (LML) and re Using di differentiabl reparameterization Useful when a fixed point is ha hard to o find nd, or when unrolling gradient descent hits a local optimum A di differentiabl ble controller in the RL setting Brandon Amos The Differentiable Cross-Entropy Method 4

This Talk Me Meth thod: The differenti tiable-cr cross s en entrop opy y met ethod Applications Learning deep energy-based models Learning embedded optimizers Control Brandon Amos The Differentiable Cross-Entropy Method 5

Foundation: The Implicit Function Theorem [Dini 1877, Dontchev and Rockafellar 2009] Given 𝑕(𝑦, 𝑧) and 𝑔 𝑦 = 𝑕 𝑦, 𝑧 ! , where 𝑧 ! ∈ {𝑧: 𝑕 𝑦, 𝑧 = 0} D " 𝑕(𝑦, 𝑔 𝑦 ) How can we compute D " 𝑔 𝑦 ? The Im Implicit F Function T Theorem gives $% D " 𝑕 𝑦, 𝑔 𝑦 D " 𝑔 𝑦 = −D # 𝑕 𝑦, 𝑔 𝑦 D # 𝑕(𝑦, 𝑔 𝑦 ) under mild assumptions Brandon Amos The Differentiable Cross-Entropy Method 6

Foundation: Differentiable top-k operations [Constrained softmax, constrained sparsemax, Limited Multi-Label Projection] Optimization perspective of the softmax Limited Multi-Label Projection 𝑧 ⋆ = 𝑧 ⋆ = −𝑧 ' 𝑦 − 𝐼 ( (𝑧) −𝑧 ' 𝑦 − 𝐼(𝑧) argmin argmin # # subject to 0 ≤ 𝑧 ≤ 1 subject to 0 ≤ 𝑧 ≤ 1 1 ' 𝑧 = 1 1 ' 𝑧 = 𝑙 Brandon Amos The Differentiable Cross-Entropy Method 7

The Differentiable Cross-Entropy Method 𝑔 In each each iter erat ation on , update a distribution 𝑕 ) with: / 𝑕 ) ' 𝑕 ) & . 𝑌 *,, ,-% ∼ 𝑕 ) ! ⋅ Sa Sampl ple from the domain 𝑕 ) % 𝑤 *,, = 𝑔 / 𝑌 *,, Ob Observe the function values ℐ * = Π ℒ " (𝑤 * /𝜐) Compute the differentiable to top-k Up Update 𝜚 *1% with maximum weighted likelihood And finally return 𝔽[𝑕 ) #$% ⋅ ] Ca Captures vanilla illa CE CEM when the soft top-k is hard Composed of operations with in informativ ive deriv ivativ ives Brandon Amos The Differentiable Cross-Entropy Method 8

This Talk Method: The differentiable-cross entropy method Appl Applications Le Learni ning ng deep ene nergy gy-ba based d mode dels Learning embedded optimizers Control Brandon Amos The Differentiable Cross-Entropy Method 9

Deep Structured Energy Models (SPENs/ICNNs) [Belanger and McCallum, 2016, Amos, Xu, and Kolter, 2017] Key idea: Model p 𝑦, 𝑧 ∝ exp −𝐹 / 𝑦, 𝑧 Ke where 𝐹 / is a deep energy model tures in the output space, while also subsuming feed-forward modes Captures non non-tri trivial stru tructu 2 Feedforward model: 𝐹 𝑦, 𝑧 = 𝑔 𝑦 − 𝑧 2 dict with the optimization problem: Pr Predi 𝑧 = argmin R 𝐹 / (𝑦, 𝑧) # ng can be done by unr on on 𝐹 / using derivative information ∇ # 𝐹 Le Learni ning unrol olling ng op optimization Brandon Amos The Differentiable Cross-Entropy Method 10

Unrolling gradient descent may learn bad energies Unrolling optimizers lo lose the probabilis ilistic ic in interpretatio ion and can ov overfit to o the he op optimizer iers on the energy surface while DCE In this regression setting, GD GD le learns barrie DCEM fit its the data Brandon Amos The Differentiable Cross-Entropy Method 11

This Talk Method: The differentiable-cross entropy method Appl Applications Learning deep energy-based models Le Learni ning ng embedded op optimizers Control Brandon Amos The Differentiable Cross-Entropy Method 12

DCEM can exploit the solution space structure 𝑦 ⋆ = argmin "∈ $,& ! 𝑔 𝑦 Ful Full Domain Ma Manifold of Op Opti timal Soluti tions La Latent nt Manif nifold ld of of Optimal Sol olution ons Brandon Amos The Differentiable Cross-Entropy Method 13

This Talk Method: The differentiable-cross entropy method Appl Applications Learning deep energy-based models Learning embedded optimizers Co Control Brandon Amos The Differentiable Cross-Entropy Method 14

Should RL policies have a system dynamics model or not? Policy Po cy Neural Network(s) State Action System Future Dynamics Plan Mo Model-fr free R RL More general, doesn’t make as many assumptions about the world Rife with poor data efficiency and learning stability issues Model-ba Mo based d RL (or control) A useful prior on the world if it lies within your set of assumptions Brandon Amos Differentiable Optimization-Based Modeling and Continuous Control 15

Model Predictive Control Kn Known or le learned from data Brandon Amos Differentiable Optimization-Based Modeling and Continuous Control 16

Differentiable Control via DCEM A pure pl planning pr prob oblem given (potentially non-convex) co cost and dy dynamics : ⋆ 𝜐 %:4 = argmin [ 𝐷 / (𝜐 * ) Cost 5 %:# * subject to 𝑦 % = 𝑦init 𝑦 *1% = 𝑔 / 𝜐 * Dynamics 𝑣 ≤ 𝑣 ≤ 𝑣 where 𝜐 ! = {𝑦 ! , 𝑣 ! } Ide Idea: Solve this optimization problem with DCEM and differentiate through it Brandon Amos The Differentiable Cross-Entropy Method 17

Differentiable Control via DCEM A lot of data Model Predictions Loss … … Layer z " DCEM What can we do with this now? Augment neural network k policies in model-free algorithms with MPC policies Fi Fight ght ob objective mism smatch by end-to-end learning dynamics The The cost ost can also be end-to-end learned! No longer need to hard-code in values Ca Caveat: Control problems are often intractably high-dimensional, so we use embedded DCEM Brandon Amos The Differentiable Cross-Entropy Method 18

DCEM fine-tunes highly non-convex controllers sites.google.com/view/diff-cross-entropy-method Brandon Amos The Differentiable Cross-Entropy Method 19

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1 Facebook AI Research 2 New York University brandondamos denisyarats bamos.github.io cs.nyu.edu/~dy1042

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 - PowerPoint PPT Presentation

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1 Facebook AI Research 2 New York University brandondamos denisyarats bamos.github.io cs.nyu.edu/~dy1042 The cross-entropy method is a powerful

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

Entropy and The Second Law of Thermodynamics Entropy (S)

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Comparison Between Bayesian and Maximum Entropy Analysis of Flow Networks 1 Maximum Entropy

Differentiable Cloth Simulation for Inverse Problems Junbang Liang 1 Content Motivation

Introduction to Pattern Oriented Analysis and Design (POAD) Instructor: Dr. Hany H. Ammar Dept.

Introduction, The PID Controller, State Space Models Automatic Control, Basic Course, Lecture 1

UTIAS C. J. Damaren University of Toronto Institute for Aerospace Studies 4925 Dufferin

Lecture 9 Recurrent Neural Networks Im glad that Im Turing Complete now Xinyu Zhou

ThermOS System Support for Dynamic Thermal Management of Chip Multi-Processors Filippo Sironi

Synthesis of skilled robotic behaviour through human sensorimotor adaptation Jan Babi Joef

Layout and simulation of the ATF2 feedback/feed-forward system in the context of FONT Javier

Cryogenic Controls for FAIR Review on Cryogenics for FAIR Ralph C. Br 28.02.2012 Controls