geometric losses for distributional learning
play

Geometric losses for distributional learning Arthur Mensch (1) , - PowerPoint PPT Presentation

Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel Peyr e (1) (1) Ecole Normale Sup erieure, DMA Centre National pour la Recherche Scientifique Paris, France (2) NTT Communication Science


  1. Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel Peyr´ e (1) (1) ´ Ecole Normale Sup´ erieure, DMA Centre National pour la Recherche Scientifique Paris, France (2) NTT Communication Science Laboratories Kyoto, Japan June 12, 2019

  2. Introduction 1 Fenchel-Young losses for distribution spaces 2 Geometric softmax from Sinkhorn negentropies 3 Applications 4

  3. Introduction: Predicting distributions Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 1 / 16

  4. Introduction: Predicting distributions Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 1 / 16

  5. Introduction: Predicting distributions Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 1 / 16

  6. Introduction: Predicting distributions Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 1 / 16

  7. Introduction: Predicting distributions Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 1 / 16

  8. Introduction: Predicting distributions Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 1 / 16

  9. Contribution: losses and links for continuous metrized output Handling output geometry Link and loss with cost between classes C : Y × Y → R Output distribution over continuous space Y Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 2 / 16

  10. Contribution: losses and links for continuous metrized output Handling output geometry Link and loss with cost between classes C : Y × Y → R Output distribution over continuous space Y New geometric losses and associated link functions: 1 Construction from duality between distributions and scores Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 2 / 16

  11. Contribution: losses and links for continuous metrized output Handling output geometry Link and loss with cost between classes C : Y × Y → R Output distribution over continuous space Y New geometric losses and associated link functions: 1 Construction from duality between distributions and scores 2 Need: Convex functional on distribution space Provided by regularized optimal transport Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 2 / 16

  12. Background: learning with a cost over outputs Y Cost augmentation of losses 1, 2 : Convex cost-aware loss L c : [1 , d ] × R d → R △ Undefined link functions : R d → △ d : what to predict at test time ? ! 1 Ioannis Tsochantaridis et al. “Large margin methods for structured and interdependent output variables”. In: JMLR (2005). 2 Kevin Gimpel and Noah A Smith. “Softmax-margin CRFs: Training log-linear models with cost functions”. In: NAACL . 2010. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 3 / 16

  13. Background: learning with a cost over outputs Y Cost augmentation of losses 1, 2 : Convex cost-aware loss L c : [1 , d ] × R d → R △ Undefined link functions : R d → △ d : what to predict at test time ? ! Use a Wasserstein distance between output distributions 3 : Ground metric C defines a distance W C between distributions Prediction with a softmax link ℓ ( α, f ) � W C (softmax( f ) , α )) △ Non-convex loss and costly to compute ! 1 Ioannis Tsochantaridis et al. “Large margin methods for structured and interdependent output variables”. In: JMLR (2005). 2 Kevin Gimpel and Noah A Smith. “Softmax-margin CRFs: Training log-linear models with cost functions”. In: NAACL . 2010. 3 Charlie Frogner et al. “Learning with a Wasserstein loss”. In: NIPS . 2015. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 3 / 16

  14. Introduction 1 Fenchel-Young losses for distribution spaces 2 Geometric softmax from Sinkhorn negentropies 3 Applications 4

  15. Predicting distributions from topological duality Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 4 / 16

  16. Predicting distributions from topological duality Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 4 / 16

  17. Predicting distributions from topological duality Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 4 / 16

  18. Predicting distributions from topological duality Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 4 / 16

  19. Predicting distributions from topological duality Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 4 / 16

  20. Predicting distributions from topological duality Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 4 / 16

  21. All you need is a convex functional Fenchel-Young losses 4 5 : Convex function Ω : △ d → R 4 John C. Duchi et al. “Multiclass Classification, Information, Divergence, and Surrogate Risk”. In: Annals of Statistics (2018). 5 Mathieu Blondel et al. “Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms”. In: AISTATS . 2019. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 5 / 16

  22. All you need is a convex functional Fenchel-Young losses 4 5 : Convex function Ω : △ d → R and conjugate Ω ⋆ ( f ) = min α ∈△ d Ω( α ) − � α, f � 4 John C. Duchi et al. “Multiclass Classification, Information, Divergence, and Surrogate Risk”. In: Annals of Statistics (2018). 5 Mathieu Blondel et al. “Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms”. In: AISTATS . 2019. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 5 / 16

  23. All you need is a convex functional Fenchel-Young losses 4 5 : Convex function Ω : △ d → R and conjugate Ω ⋆ ( f ) = min ℓ Ω ( α, f ) = Ω( α ) + Ω ⋆ ( f ) − � α, f � � 0 α ∈△ d Ω( α ) − � α, f � 4 John C. Duchi et al. “Multiclass Classification, Information, Divergence, and Surrogate Risk”. In: Annals of Statistics (2018). 5 Mathieu Blondel et al. “Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms”. In: AISTATS . 2019. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 5 / 16

  24. All you need is a convex functional Fenchel-Young losses 4 5 : Convex function Ω : △ d → R and conjugate Ω ⋆ ( f ) = min ℓ Ω ( α, f ) = Ω( α ) + Ω ⋆ ( f ) − � α, f � � 0 α ∈△ d Ω( α ) − � α, f � Define link functions between dual and primal ∇ Ω ⋆ ( f ) = argmin ∇ Ω( α ) = argmin ℓ Ω ( α, f ) α ∈△ d ℓ Ω ( α, f ) f ∈ R d 4 John C. Duchi et al. “Multiclass Classification, Information, Divergence, and Surrogate Risk”. In: Annals of Statistics (2018). 5 Mathieu Blondel et al. “Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms”. In: AISTATS . 2019. Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 5 / 16

  25. Discrete canonical example: Shannon entropy d Ω ∗ ( f ) = logsumexp( f ) Ω( α ) = − H ( α ) = � α i log α i i =1 Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 6 / 16

  26. Discrete canonical example: Shannon entropy d Ω ∗ ( f ) = logsumexp( f ) Ω( α ) = − H ( α ) = � α i log α i i =1 Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 6 / 16

  27. Discrete canonical example: Shannon entropy d Ω ∗ ( f ) = logsumexp( f ) Ω( α ) = − H ( α ) = � α i log α i i =1 Not defined on continuous distributions, cost-agnostic Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 6 / 16

  28. Introduction 1 Fenchel-Young losses for distribution spaces 2 Geometric softmax from Sinkhorn negentropies 3 Applications 4

  29. Sinkhorn entropies from regularized optimal transport Self regularized optimal transportation distance: Ω C ( α ) = − 1 f ∈C ( Y ) � α, f � − log � α ⊗ α, exp( f ⊕ f − C 2 OT C,ε =2 ( α, α ) = − max ) � 2 Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 7 / 16

  30. Sinkhorn entropies from regularized optimal transport Self regularized optimal transportation distance: Ω C ( α ) = − 1 f ∈C ( Y ) � α, f � − log � α ⊗ α, exp( f ⊕ f − C 2 OT C,ε =2 ( α, α ) = − max ) � 2 Continuous convex Arthur Mensch , Mathieu Blondel, Gabriel Peyr´ e Geometric losses for distributional learning 7 / 16

Recommend


More recommend