Optimal Transport for Machine Learning Aude Genevay CEREMADE - PowerPoint PPT Presentation

Optimal Transport for Machine Learning Aude Genevay CEREMADE (Université Paris-Dauphine) DMA (Ecole Normale Supérieure) MOKAPLAN Team (INRIA Paris) Imaging in Paris - February 2018

Optimal transport Outline Aude Genevay Entropy Regularized OT 1 Entropy Regularized OT Applications in Imaging Large Scale "OT" for Machine 2 Applications in Imaging Learning Application to Generative Models 3 Large Scale "OT" for Machine Learning 4 Application to Generative Models

Optimal transport Shortcomings of OT Aude Genevay Entropy Regularized OT Applications in Imaging Two main issues when using OT in practice : Large Scale "OT" for • Poor sample complexity : need a lot of samples from µ and Machine Learning ν to get a good approximation of W ( µ, ν ) Application • Heavy computational cost : solving discrete OT requires to Generative Models solving an LP → network simplex solver O ( n 3 log ( n )) [Pele and Werman ’09]

Optimal transport Entropy! Aude Genevay Entropy Regularized OT • Basically : Adding an entropic regularization smoothes the Applications in Imaging constraint Large Scale • Makes the problem easier : "OT" for Machine • yields an unconstrained dual problem Learning • discrete case can be solved efficiently with iterative Application to Generative algorithm (more on that later) Models • For ML applications, regularized Wasserstein is better than standard one • In high dimension, helps avoiding overfitting

Optimal transport Entropic Relaxation of OT [Cuturi Aude Genevay ’13] Entropy Regularized OT Applications in Imaging Add entropic penalty to Kantorovitch formulation of OT Large Scale "OT" for � Machine min c ( x , y ) d γ ( x , y ) + ε KL ( γ | µ ⊗ ν ) Learning γ ∈ Π( µ,ν ) X×Y Application to Generative Models where � d γ � def. � � � KL ( γ | µ ⊗ ν ) = log d µ d ν ( x , y ) − 1 d γ ( x , y ) X×Y

Optimal transport Dual Formulation Aude Genevay Entropy Regularized OT Applications in Imaging Large Scale � � "OT" for max u ( x ) d µ ( x ) + v ( y ) d ν ( y ) Machine u ∈ C ( X ) v ∈ C ( Y ) Learning X Y � Application u ( x )+ v ( y ) − c ( x , y ) − ε e d µ ( x ) d ν ( y ) to Generative ε Models X×Y Constraint in standard OT u ( x ) + v ( y ) < c ( x , y ) replaced by a smooth penalty term.

Optimal transport Dual Formulation Aude Genevay Entropy Regularized OT Applications Dual problem concave in u and v , first order condition for each in Imaging variable yield : Large Scale "OT" for Machine Learning � v ( y ) − c ( x , y ) ∇ u = 0 ⇔ u ( x ) = − ε log ( e d ν ( y )) ε Application to Generative Y Models � u ( x ) − c ( x , y ) ∇ v = 0 ⇔ v ( y ) = − ε log ( d µ ( x )) e ε X

Optimal transport The Discrete Case Aude Genevay Entropy Dual problem : Regularized OT n , m n m Applications ui + vj − c ( xi , yj ) � � � in Imaging max u i µ i + v j ν j − ε e µ i ν j ε u ∈ R m v ∈ R n Large Scale i = 1 j = 1 i , j = 1 "OT" for Machine Learning First order conditions for each variable: Application to Generative m vj − c ( xi , yj ) Models � ∇ u = 0 ⇔ u i = − ε log ( e ν j ) ε j = 1 n ui − c ( xi , yj ) � ∇ v = 0 ⇔ v j = − ε log ( e µ i ) ε i = 1 ⇒ Do alternate maximizations!

Optimal transport Sinkhorn’s Algorithm Aude Genevay u v ε , e ε ) • Iterates ( a , b ) := ( e Entropy Regularized OT Sinkhorn algorithm [Cuturi ’13] Applications in Imaging Large Scale K ← ( e − c ij /ε m ij ) ij initialize b ← 1 m "OT" for Machine Learning repeat Application a ← µ ⊘ Kb to Generative Models b ← ν ⊘ K T a return γ = diag ( a ) K diag ( b ) • each iteration O ( nm ) complexity (matrix vector multiplication) • can be improved to O ( n log n ) on gridded space with convolutions [Solomon et al. ’15]

Optimal transport Sinkhorn - Toy Example Aude Genevay Entropy Regularized OT Applications in Imaging Marginals µ and ν Large Scale "OT" for Machine Learning Application to Generative Models top : evolution of γ with number of iterations l bottom : evolution of γ with regularization parameter ε

Optimal transport Sinkhorn - Convergence Aude Genevay Entropy Regularized OT Definition (Hilbert metric) Applications in Imaging Projective metric defined for x , y ∈ R d ++ by Large Scale "OT" for d H ( x , y ) := log max i ( x i / y i ) Machine Learning min i ( x i / y i ) Application to Generative Models Theorem The iterates ( a ( l ) , b ( l ) ) converge linearly for the Hilbert metric. Remark : the contraction coefficient deteriorates quickly when ε → 0 (exponentially in worst-case)

Optimal transport Sinkhorn - Convergence Aude Genevay Constraint violation Entropy Regularized We have the following bound on the iterates: OT Applications d H ( a ( l ) , a ⋆ ) ≤ κ d H ( γ 1 m , µ ) in Imaging Large Scale "OT" for So monitoring the violation of the marginal constraints is a good Machine Learning way to monitor convergence of Sinkhorn’s algorithm Application to Generative Models � γ 1 m − µ � for various regularizations

Optimal transport Color Transfer Aude Genevay Entropy Regularized OT Applications in Imaging Large Scale "OT" for Machine Learning Application to Generative Models Image courtesy of G. Peyré

Optimal transport Shape / Image Barycenters Aude Genevay Regularized Wasserstein Barycenters [Nenna et al. ’15] Entropy Regularized µ = arg min ¯ W ε ( µ k , µ ) OT µ ∈ Σ n Applications in Imaging Large Scale "OT" for Machine Learning Application to Generative Models Image from [Solomon et al. ’15]

Optimal transport Sinkhorn loss Aude Genevay Entropy Regularized OT Consider entropy-regularized OT Applications in Imaging � Large Scale min c ( x , y ) d π ( x , y ) + ε KL ( π | µ ⊗ ν ) "OT" for π ∈ Π( µ,ν ) Machine X×Y Learning Application Regularized loss : to Generative Models � def. W c ,ε ( µ, ν ) = c ( x , y ) d π ε ( x , y ) XY where π ε solution of (15)

Optimal transport Sinkhorn Divergences : Aude Genevay interpolation between OT and MMD Entropy Regularized OT Theorem Applications in Imaging The Sinkhorn loss between two measures µ, ν is defined as: Large Scale "OT" for Machine ¯ W c ,ε ( µ, ν ) = 2 W c ,ε ( µ, ν ) − W c ,ε ( µ, µ ) − W c ,ε ( ν, ν ) Learning Application to Generative with the following limiting behavior in ε : Models ¯ 1 as ε → 0 , W c ,ε ( µ, ν ) → 2 W c ( µ, ν ) ¯ 2 as ε → + ∞ , W c ,ε ( µ, ν ) → � µ − ν � − c where �·� − c is the MMD distance whose kernel is minus the cost from OT. Remark : Some conditions are required on c to get MMD distance when ε → ∞ . In particular, c = �·� p p , 0 < p < 2 is valid.

Optimal transport Sample Complexity Aude Genevay Entropy Regularized OT Sample Complexity of OT and MMD Applications in Imaging Let µ a probability distribution on R d , and ˆ µ n an empirical Large Scale "OT" for measure from µ Machine Learning O ( n − 1 / d ) Application W ( µ, ˆ µ n ) = to Generative Models O ( n − 1 / 2 ) MMD ( µ, ˆ µ n ) = ⇒ the number n of samples you need to get a precision η on the Wassertein distance grows exponentially with the dimension d of the space!

Optimal transport Sample Complexity - Sinkhorn loss Aude Genevay Entropy Regularized OT Applications in Imaging Large Scale "OT" for Machine Learning Application to Generative Models Sample Complexity of Sinkhorn loss seems to improve as ε grows. Plots courtesy of G. Peyré and M. Cuturi

Optimal transport Generative Models Aude Genevay Entropy Regularized OT Applications in Imaging Large Scale "OT" for Machine Learning Application to Generative Models Figure: Illustration of Density Fitting on a Generative Model

Optimal transport Density Fitting with Sinkhorn loss Aude Genevay "Formally" Entropy Regularized OT Applications in Imaging Large Scale Solve min θ E ( θ ) "OT" for Machine Learning Application def. = ¯ where E ( θ ) W c ,ε ( µ θ , ν ) to Generative Models ⇒ Issue : untractable gradient

Optimal transport Approximating Sinkhorn loss Aude Genevay Entropy Regularized OT • Rather than approximating the gradient approximate the Applications loss itself in Imaging Large Scale • Minibatches : ˆ "OT" for E ( θ ) Machine Learning • sample x 1 , . . . , x m from µ θ Application • use empirical Wasserstein distance W c ,ε (ˆ µ θ , ˆ ν ) where to Generative � m µ θ = 1 Models ˆ i = 1 δ x i N • Use L iterations of Sinkhorn’s algorithm : ˆ E ( L ) ( θ ) • compute L steps of the algorithm • use this as a proxy for W (ˆ µ θ , ν )

Optimal Transport for Machine Learning Aude Genevay CEREMADE - PowerPoint PPT Presentation

Optimal Transport for Machine Learning Aude Genevay CEREMADE (Universit Paris-Dauphine) DMA (Ecole Normale Suprieure) MOKAPLAN Team (INRIA Paris) Imaging in Paris - February 2018 Optimal transport Outline Aude Genevay Entropy

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

GIS-Geographical Information System A universal ( x , y , z ) frame: (latitude, longitude,

The development and performance evaluation of a hybrid photo- detector for Hyper-Kamiokande M.

The Case for Accessible Digital Images Jen Hale, Archivist, Perkins School for the Blind 2 2

Wednesday, March 20, 2013 Introducing Pedestal Who: Relevance What: alpha release, open

Welcome to the Course Hans-Joachim Bckenhauer and Dennis Komm Digital Medicine I: Introduction

On the Computation of Distances between 2D-Histograms by Minimum Cost Flows Stefano Gualandi

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

1 & 2 Samuel Series Lesson #172 May 28, 2019 Dean Bible Ministries

Optimal Transport for Machine Learning Aude Genevay CEREMADE - PowerPoint PPT Presentation

Optimal Transport for Machine Learning Aude Genevay CEREMADE (Universit Paris-Dauphine) DMA (Ecole Normale Suprieure) MOKAPLAN Team (INRIA Paris) Imaging in Paris - February 2018 Optimal transport Outline Aude Genevay Entropy

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

GIS-Geographical Information System A universal ( x , y , z ) frame: (latitude, longitude,

The development and performance evaluation of a hybrid photo- detector for Hyper-Kamiokande M.

The Case for Accessible Digital Images Jen Hale, Archivist, Perkins School for the Blind 2 2

Wednesday, March 20, 2013 Introducing Pedestal Who: Relevance What: alpha release, open

Welcome to the Course Hans-Joachim Bckenhauer and Dennis Komm Digital Medicine I: Introduction

On the Computation of Distances between 2D-Histograms by Minimum Cost Flows Stefano Gualandi

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

1 &amp; 2 Samuel Series Lesson #172 May 28, 2019 Dean Bible Ministries

1 & 2 Samuel Series Lesson #172 May 28, 2019 Dean Bible Ministries