Geometric losses for distributional learning Arthur Mensch (1) , - PowerPoint PPT Presentation

Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel Peyr´ e (1) (1) ´ Ecole Normale Sup´ erieure, DMA Centre National pour la Recherche Scientifique Paris, France (2) NTT Communication Science Laboratories Kyoto, Japan June 12, 2019

Introduction 1 Fenchel-Young losses for distribution spaces 2 Geometric softmax from Sinkhorn negentropies 3 Applications 4

Introduction: Predicting distributions Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 1 / 16

Contribution: losses and links for continuous metrized output Handling output geometry Link and loss with cost between classes C : Y × Y → R Output distribution over continuous space Y Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 2 / 16

Contribution: losses and links for continuous metrized output Handling output geometry Link and loss with cost between classes C : Y × Y → R Output distribution over continuous space Y New geometric losses and associated link functions: 1 Construction from duality between distributions and scores Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 2 / 16

Contribution: losses and links for continuous metrized output Handling output geometry Link and loss with cost between classes C : Y × Y → R Output distribution over continuous space Y New geometric losses and associated link functions: 1 Construction from duality between distributions and scores 2 Need: Convex functional on distribution space Provided by regularized optimal transport Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 2 / 16

Background: learning with a cost over outputs Y Cost augmentation of losses 1, 2 : Convex cost-aware loss L c : [1 , d ] × R d → R △ Undefined link functions : R d → △ d : what to predict at test time ? ! 1 Ioannis Tsochantaridis et al. “Large margin methods for structured and interdependent output variables”. In: JMLR (2005). 2 Kevin Gimpel and Noah A Smith. “Softmax-margin CRFs: Training log-linear models with cost functions”. In: NAACL . 2010. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 3 / 16

Background: learning with a cost over outputs Y Cost augmentation of losses 1, 2 : Convex cost-aware loss L c : [1 , d ] × R d → R △ Undefined link functions : R d → △ d : what to predict at test time ? ! Use a Wasserstein distance between output distributions 3 : Ground metric C defines a distance W C between distributions Prediction with a softmax link ℓ ( α, f ) � W C (softmax( f ) , α )) △ Non-convex loss and costly to compute ! 1 Ioannis Tsochantaridis et al. “Large margin methods for structured and interdependent output variables”. In: JMLR (2005). 2 Kevin Gimpel and Noah A Smith. “Softmax-margin CRFs: Training log-linear models with cost functions”. In: NAACL . 2010. 3 Charlie Frogner et al. “Learning with a Wasserstein loss”. In: NIPS . 2015. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 3 / 16

Predicting distributions from topological duality Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 4 / 16

All you need is a convex functional Fenchel-Young losses 4 5 : Convex function Ω : △ d → R 4 John C. Duchi et al. “Multiclass Classification, Information, Divergence, and Surrogate Risk”. In: Annals of Statistics (2018). 5 Mathieu Blondel et al. “Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms”. In: AISTATS . 2019. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 5 / 16

All you need is a convex functional Fenchel-Young losses 4 5 : Convex function Ω : △ d → R and conjugate Ω ⋆ ( f ) = min α ∈△ d Ω( α ) − � α, f � 4 John C. Duchi et al. “Multiclass Classification, Information, Divergence, and Surrogate Risk”. In: Annals of Statistics (2018). 5 Mathieu Blondel et al. “Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms”. In: AISTATS . 2019. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 5 / 16

All you need is a convex functional Fenchel-Young losses 4 5 : Convex function Ω : △ d → R and conjugate Ω ⋆ ( f ) = min ℓ Ω ( α, f ) = Ω( α ) + Ω ⋆ ( f ) − � α, f � � 0 α ∈△ d Ω( α ) − � α, f � 4 John C. Duchi et al. “Multiclass Classification, Information, Divergence, and Surrogate Risk”. In: Annals of Statistics (2018). 5 Mathieu Blondel et al. “Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms”. In: AISTATS . 2019. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 5 / 16

All you need is a convex functional Fenchel-Young losses 4 5 : Convex function Ω : △ d → R and conjugate Ω ⋆ ( f ) = min ℓ Ω ( α, f ) = Ω( α ) + Ω ⋆ ( f ) − � α, f � � 0 α ∈△ d Ω( α ) − � α, f � Define link functions between dual and primal ∇ Ω ⋆ ( f ) = argmin ∇ Ω( α ) = argmin ℓ Ω ( α, f ) α ∈△ d ℓ Ω ( α, f ) f ∈ R d 4 John C. Duchi et al. “Multiclass Classification, Information, Divergence, and Surrogate Risk”. In: Annals of Statistics (2018). 5 Mathieu Blondel et al. “Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms”. In: AISTATS . 2019. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 5 / 16

Discrete canonical example: Shannon entropy d Ω ∗ ( f ) = logsumexp( f ) Ω( α ) = − H ( α ) = � α i log α i i =1 Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 6 / 16

Discrete canonical example: Shannon entropy d Ω ∗ ( f ) = logsumexp( f ) Ω( α ) = − H ( α ) = � α i log α i i =1 Not defined on continuous distributions, cost-agnostic Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 6 / 16

Sinkhorn entropies from regularized optimal transport Self regularized optimal transportation distance: Ω C ( α ) = − 1 f ∈C ( Y ) � α, f � − log � α ⊗ α, exp( f ⊕ f − C 2 OT C,ε =2 ( α, α ) = − max ) � 2 Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 7 / 16

Sinkhorn entropies from regularized optimal transport Self regularized optimal transportation distance: Ω C ( α ) = − 1 f ∈C ( Y ) � α, f � − log � α ⊗ α, exp( f ⊕ f − C 2 OT C,ε =2 ( α, α ) = − max ) � 2 Continuous convex Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 7 / 16

Geometric losses for distributional learning Arthur Mensch (1) , - PowerPoint PPT Presentation

Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel Peyr e (1) (1) Ecole Normale Sup erieure, DMA Centre National pour la Recherche Scientifique Paris, France (2) NTT Communication Science

Contents of Presentation Types of losses Causes of losses Prevention of losses

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

LOSSES OEE Workshop Siyambulela Bozo: Junior Project Manager AIDC - TPM Pres resentation

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

Piping Systems and Flow Analysis ( Chapter 3) 2 Learning Outcomes (Chapter 3) Losses in

Food losses and waste in Fresh Fruit & Vegetables supply chains Indonesia Quick Scan

PREVENTION OF FOOD LOSSES IN THE FIELD In order to prevent these losses in the field, the use of

Overall Equipment Effectiveness OEE Calculation Eight Major Plant Losses Sr. No. Losses

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song

BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, , , g , Christian

Definition of Stochastic Processes Definition of Stochastic Processes st Order Density

Security proofs for continuous-variable quantum key distribution Anthony Leverrier Inria Paris

Key Management and Distribution Symmetric with Asymmetric Public Keys CSS322: Security and

The Climate-G testbed: issues, requirements and results S. Fiore, Ph.D. SPACI and University of

GEO-Software R. Prez, G. Gonzlez, J. Becedas, F. Pedrera, M. J. Latorre Jonathan Becedas, PhD

IATA World Passenger Symposium in October 2015 in Hamburg By Paul Tilstone & Aurelie Krau

Geometric losses for distributional learning Arthur Mensch (1) , - PowerPoint PPT Presentation

Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel Peyr e (1) (1) Ecole Normale Sup erieure, DMA Centre National pour la Recherche Scientifique Paris, France (2) NTT Communication Science

Contents of Presentation Types of losses Causes of losses Prevention of losses

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

LOSSES OEE Workshop Siyambulela Bozo: Junior Project Manager AIDC - TPM Pres resentation

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

Piping Systems and Flow Analysis ( Chapter 3) 2 Learning Outcomes (Chapter 3) Losses in

Food losses and waste in Fresh Fruit &amp; Vegetables supply chains Indonesia Quick Scan

PREVENTION OF FOOD LOSSES IN THE FIELD In order to prevent these losses in the field, the use of

Overall Equipment Effectiveness OEE Calculation Eight Major Plant Losses Sr. No. Losses

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song

BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, , , g , Christian

Definition of Stochastic Processes Definition of Stochastic Processes st Order Density

Security proofs for continuous-variable quantum key distribution Anthony Leverrier Inria Paris

Key Management and Distribution Symmetric with Asymmetric Public Keys CSS322: Security and

The Climate-G testbed: issues, requirements and results S. Fiore, Ph.D. SPACI and University of

GEO-Software R. Prez, G. Gonzlez, J. Becedas, F. Pedrera, M. J. Latorre Jonathan Becedas, PhD

IATA World Passenger Symposium in October 2015 in Hamburg By Paul Tilstone &amp; Aurelie Krau

Food losses and waste in Fresh Fruit & Vegetables supply chains Indonesia Quick Scan

IATA World Passenger Symposium in October 2015 in Hamburg By Paul Tilstone & Aurelie Krau