Machine Learning Optimal Transport Sayas Numerics Seminar Lars - PowerPoint PPT Presentation

Ruthotto ML meet OT @ Oct 2020 Machine Learning ↔ Optimal Transport Sayas Numerics Seminar Lars Ruthotto Departments of Mathematics and Computer Science Emory University lruthotto@emory.edu @lruthotto Title ML → OT Lag NN Exp OT → CNF Σ 1

Ruthotto ML meet OT @ Oct 2020 Agenda: Machine Learning meets Optimal Transport ◮ ML → OT: New Tricks from Learning ◮ based on relaxed dynamical optimal transport ◮ combine macroscopic / microscopic / HJB equations ◮ neural networks for value function ◮ combine analytic gradients and automatic differentiation ◮ generalization to mean field games and control problems ◮ OT → ML: Learning from Old Tricks ◮ variational inference via continuous normalizing flows ◮ applications: density estimation, generative modeling ◮ OT � uniqueness and regularity of dynamics ◮ HJB, solid numerics, and efficient implementation ◮ orders of magnitude speedup training and inference LR, S Osher, W Li, L Nurbekyan, S Wu Fung D Onken, S Wu Fung, X Li, LR A ML Framework for Solving High-Dimensional MFG and MFC OT-Flow: Fast and Accurate CNF via OT PNAS 117 (17), 9183-9193, 2020 arXiv:2006.00104, 2020. Title ML → OT Lag NN Exp OT → CNF Σ 2

Ruthotto ML meet OT @ Oct 2020 Collaborators and Funding Emory Funding: DMS 1751636 ◮ BSF 2018209 ◮ FA9550-20-1-0372 ◮ Special Thanks: ◮ Organizers and staff of IPAM Li Long Program MLP 2019. ◮ Osher’s funding AFOSR Stan Osher MURI and ONR Onken Wu Fung Nurbekyan Title ML → OT Lag NN Exp OT → CNF Σ 3

Ruthotto ML meet OT @ Oct 2020 initial density, ρ 0 target density, ρ 1 density evolution Dynamic Optimal Transport (Benamou and Brenier, ’00) Given the initial density, ρ 0 , and the target density, ρ 1 , find the velocity v that renders the push-forward of ρ 0 equal to ρ 1 and minimizes the transport costs, i.e., � 1 � 1 2 � v ( x , t ) � 2 ρ ( x , t ) dxdt minimize v ,ρ Ω 0 ∂ t ρ + ∇ · ( ρ v ) = 0 , ρ ( · , 0 ) = ρ 0 ( · ) , ρ ( · , 1 ) = ρ 1 ( · ) subject to Title ML → OT Lag NN Exp OT → CNF Σ 4

Ruthotto ML meet OT @ Oct 2020 initial density, ρ 0 target density, ρ 1 density evolution ρ ( · , 1 ) push-fwd of ρ 0 Dynamic Optimal Transport (Benamou and Brenier, ’00) Given the initial density, ρ 0 , and the target density, ρ 1 , find the velocity v that renders the push-forward of ρ 0 equal to ρ 1 and minimizes the transport costs, i.e., � 1 � 1 2 � v ( x , t ) � 2 ρ ( x , t ) dxdt minimize v ,ρ Ω 0 ∂ t ρ + ∇ · ( ρ v ) = 0 , ρ ( · , 0 ) = ρ 0 ( · ) , ρ ( · , 1 ) = ρ 1 ( · ) subject to Title ML → OT Lag NN Exp OT → CNF Σ 4

Ruthotto ML meet OT @ Oct 2020 initial density, ρ 0 target density, ρ 1 density evolution Relaxed Dynamical Optimal Transport Given the initial density, ρ 0 , and the target density, ρ 1 , find the velocity v that minimizes the discrepancy between the push-forward of ρ 0 and ρ 1 and the transport costs, i.e., � 1 � 1 def 2 � v ( x , t ) � 2 ρ ( x , t ) dxdt + G ( ρ ( · , 1 ) , ρ 1 ) minimize v ,ρ J MFG ( ρ, v ) = Ω 0 subject to ∂ t ρ + ∇ · ( ρ v ) = 0 , ρ ( · , 0 ) = ρ 0 ( · ) (CE) Examples for terminal cost G : L 2 , Kullback Leibler divergence,. . . Side note: relaxed OT problem is a potential mean field game (MFG) Title ML → OT Lag NN Exp OT → CNF Σ 5

Ruthotto ML meet OT @ Oct 2020 initial density, ρ 0 target density, ρ 1 density evolution ρ ( · , 1 ) push-fwd of ρ 0 Relaxed Dynamical Optimal Transport Given the initial density, ρ 0 , and the target density, ρ 1 , find the velocity v that minimizes the discrepancy between the push-forward of ρ 0 and ρ 1 and the transport costs, i.e., � 1 � 1 def 2 � v ( x , t ) � 2 ρ ( x , t ) dxdt + G ( ρ ( · , 1 ) , ρ 1 ) minimize v ,ρ J MFG ( ρ, v ) = Ω 0 subject to ∂ t ρ + ∇ · ( ρ v ) = 0 , ρ ( · , 0 ) = ρ 0 ( · ) (CE) Examples for terminal cost G : L 2 , Kullback Leibler divergence,. . . Side note: relaxed OT problem is a potential mean field game (MFG) Title ML → OT Lag NN Exp OT → CNF Σ 5

Ruthotto ML meet OT @ Oct 2020 Relaxed Dynamic Optimal Transport: A Microscopic View A single agent with initial position x ∈ Ω aims at choosing v that minimizes � 1 1 2 � v ( s ) � 2 ds + G ( z ( 1 ) , ρ ( z ( 1 ) , 1 )) , J x , 0 ( v ) = 0 where their position changes according to ∂ t z ( s ) = v ( s ) , 0 ≤ s ≤ 1 , z ( 0 ) = x . ◮ G ( x , ρ ) = δ G ( ρ,ρ 1 ) ( x ) (variational derivative of G ) δρ ◮ agent interacts with the population through ρ and G ◮ z ( · ) is characteristic curve of (CE) starting at x Title ML → OT Lag NN Exp OT → CNF Σ 6

Ruthotto ML meet OT @ Oct 2020 Relaxed Dynamic Optimal Transport: A Microscopic View A single agent with initial position x ∈ Ω aims at choosing v that minimizes � 1 1 2 � v ( s ) � 2 ds + G ( z ( 1 ) , ρ ( z ( 1 ) , 1 )) , J x , 0 ( v ) = 0 where their position changes according to ∂ t z ( s ) = v ( s ) , 0 ≤ s ≤ 1 , z ( 0 ) = x . ◮ G ( x , ρ ) = δ G ( ρ,ρ 1 ) ( x ) (variational derivative of G ) δρ ◮ agent interacts with the population through ρ and G ◮ z ( · ) is characteristic curve of (CE) starting at x Useful to define the value of an agent’s state ( x , t ) as Φ( x , t ) = inf v J x , t ( v ) Title ML → OT Lag NN Exp OT → CNF Σ 6

Ruthotto ML meet OT @ Oct 2020 Hamilton-Jacobi-Bellman (HJB) Equation initial density, ρ 0 value function density evolution target density, ρ 1 Lasry & Lions ’06: First-order optimality conditions of relaxed OT are − ∂ t Φ( x , t ) + 1 2 �∇ Φ( x , t ) � 2 = 0 , Φ( x , 1 ) = G ( x , ρ ( x , 1 )) (HJB) and optimal strategy is v ( x , t ) = −∇ Φ( x , t ) , which gives ∂ t ρ ( x , t ) − ∇ · ( ρ ( x , t ) ∇ Φ( x , t )) = 0 , ρ ( x , 0 ) = ρ 0 ( x ) (CE) challenges: forward-backward structure and high-dimensionality of PDE system Title ML → OT Lag NN Exp OT → CNF Σ 7

Ruthotto ML meet OT @ Oct 2020 Machine Learning for High-Dimensional OT: Overview Three options for solving the problem 1. minimize J MFG w.r.t. ( ρ, v ) , or ( ρ, −∇ Φ) (variational problem) 2. minimize J x , t w.r.t. v or −∇ Φ for some points x (microscopic view) 3. compute value function by solving (HJB) and (CE) (high-dimensional PDEs) Title ML → OT Lag NN Exp OT → CNF Σ 8

Ruthotto ML meet OT @ Oct 2020 Machine Learning for High-Dimensional OT: Overview Three options for solving the problem 1. minimize J MFG w.r.t. ( ρ, v ) , or ( ρ, −∇ Φ) (variational problem) 2. minimize J x , t w.r.t. v or −∇ Φ for some points x (microscopic view) 3. compute value function by solving (HJB) and (CE) (high-dimensional PDEs) Idea: Combine advantages of the above to tackle curse of dimensionality Title ML → OT Lag NN Exp OT → CNF Σ 8

Ruthotto ML meet OT @ Oct 2020 Machine Learning for High-Dimensional OT: Overview Three options for solving the problem 1. minimize J MFG w.r.t. ( ρ, v ) , or ( ρ, −∇ Φ) (variational problem) 2. minimize J x , t w.r.t. v or −∇ Φ for some points x (microscopic view) 3. compute value function by solving (HJB) and (CE) (high-dimensional PDEs) Idea: Combine advantages of the above to tackle curse of dimensionality ◮ formulate as variational problem. minimize J MFG ( ρ, −∇ Φ) ◮ eliminate (CE) with Lagrangian PDE solver � mesh-free, parallel ◮ parameterize Φ with NN � universal approximator, mesh-free, cheap(?) ◮ penalize violations of (HJB) � regularity, global convergence(?) Title ML → OT Lag NN Exp OT → CNF Σ 8

Ruthotto ML meet OT @ Oct 2020 Lagrangian Method for Continuity Equation Assume Φ given. Then, the solution to ∂ t ρ ( x , t ) − ∇ · ( ρ ( x , t ) ∇ Φ( x , t )) = 0 , ρ ( x , 0 ) = ρ 0 ( x ) satisfies ρ ( z ( x , t ) , t ) det ∇ z ( x , t ) = ρ 0 ( x ) along the characteristic curve ∂ t z ( x , t ) = −∇ Φ( z ( x , t )) , z ( x , 0 ) = x . Title ML → OT Lag NN Exp OT → CNF Σ 9

Ruthotto ML meet OT @ Oct 2020 Lagrangian Method for Continuity Equation Assume Φ given. Then, the solution to ∂ t ρ ( x , t ) − ∇ · ( ρ ( x , t ) ∇ Φ( x , t )) = 0 , ρ ( x , 0 ) = ρ 0 ( x ) satisfies ρ ( z ( x , t ) , t ) det ∇ z ( x , t ) = ρ 0 ( x ) along the characteristic curve ∂ t z ( x , t ) = −∇ Φ( z ( x , t )) , z ( x , 0 ) = x . instead of computing det ∇ z ( x , t ) (cost O ( d 3 ) flops) use � 1 def = log det( ∇ z ( x , t )) = l ( x , t ) ∆Φ( z ( x , t ) , t ) dt 0 Hint: Compute z and l in one ODE solve (parallelize over x 1 , x 2 , . . . ). Title ML → OT Lag NN Exp OT → CNF Σ 9

Ruthotto ML meet OT @ Oct 2020 Lagrangian Method for Optimal Transport � � minimize Φ c L ( x , 1 ) + G ( z ( x , 1 )) + α 1 c H ( x , 1 ) + α 2 � Φ( z ( x , 1 ) , 1 ) − G ( z ( x , 1 )) � E ρ 0     −∇ Φ( z ( x , t ) , t ) z ( x , t ) l ( x , t ) − ∆Φ( z ( x , t ) , t )     subject to  = t ∈ ( 0 , 1 ] ∂ t  ,    1  c L ( x , t ) 2 �∇ Φ( z ( x , t ) , t ) � 2   � 2 �∇ Φ( z ( x , t ) , t ) � 2 � � ∂ t Φ( z ( x , t ) , t ) + 1 c H ( x , t ) � z ( x , 0 ) = x , l ( x , 0 ) = c L ( x , 0 ) = c H ( x , 0 ) = 0 Title ML → OT Lag NN Exp OT → CNF Σ 10

Machine Learning Optimal Transport Sayas Numerics Seminar Lars - PowerPoint PPT Presentation

Ruthotto ML meet OT @ Oct 2020 Machine Learning Optimal Transport Sayas Numerics Seminar Lars Ruthotto Departments of Mathematics and Computer Science Emory University lruthotto@emory.edu @lruthotto Title ML OT Lag NN Exp OT

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

Optimal Transport for Machine Learning Aude Genevay CEREMADE (Universit Paris-Dauphine) DMA

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

GRAVITY AS AN EMERGENT FORCE Erik Verlinde University of Amsterdam Emergence Current Paradigm

Bose-Einstein condensation and limit theorems Kay Kirkpatrick, UIUC 2015 Bose-Einstein

Optimization of bodies with locally periodic microstructure Cristian Barbarosie CMAF,

Macroscopic convective phenomena in non-uniformly heated liquid mixtures Vitaly . Demin 1 ,

Mathematical models of self-organization Pierre Degond Imperial College London

Birth of the Ehrenfest time Quantum Chaos in Mesoscopic Superconductivity Philippe Jacquod U

Cem 484: Molecular Thermodynamics Prof. Liddick Office: NSCL 1006 Email:

Condensation of fermion pairs in a domain Marius Lemm (Caltech) joint with Rupert L. Frank and