Supervised Metric Learning M. Sebban Laboratoire Hubert Curien , UMR - PowerPoint PPT Presentation

Supervised Metric Learning M. Sebban Laboratoire Hubert Curien , UMR CNRS 5516 University of Jean Monnet Saint-´ Etienne (France) AAFD’14, Paris 13, April, 2014 Sebban ( LaHC ) Supervised Metric Learning 1 / 45

Outline Intuition behind Metric Learning 1 State of the Art 2 Mahalanobis Distance Learning Nonlinear Metric Learning Online Metric Learning Similarity Learning for Provably Accurate Linear Classification 3 Consistency and Generalization Guarantees 4 Experiments 5 Sebban ( LaHC ) Supervised Metric Learning 2 / 45

Intuition behind Metric Learning Importance of Metrics Pairwise metric The notion of metric plays an important role in many domains such as classification, regression, clustering, ranking , etc. ? Sebban ( LaHC ) Supervised Metric Learning 3 / 45

Intuition behind Metric Learning Minkowski distances: family of distances induced by ℓ p norms � d � 1 / p � i | p d p ( x , x ′ ) = � x − x ′ � p = | x i − x ′ i =1 For p = 1, the Manhattan distance d man ( x , x ′ ) = � d i =1 | x i − x ′ i | . For p = 2, the “ordinary” Euclidean distance : � d � 1 / 2 � � i | 2 d euc ( x , x ′ ) = | x i − x ′ = ( x − x ′ ) T ( x − x ′ ) i =1 For p → ∞ , the Chebyshev distance d che ( x , x ′ ) = max i | x i − x ′ i | . p=0 p=0.3 p=0.5 p=1 p=1.5 p=2 p=infty Sebban ( LaHC ) Supervised Metric Learning 4 / 45

Intuition behind Metric Learning Key question How to choose the right metric? The notion of good metric is problem-dependent Each problem has its own notion of similarity, which is often badly captured by standard metrics. Sebban ( LaHC ) Supervised Metric Learning 5 / 45

Intuition behind Metric Learning Metric learning Adapt the metric to the problem of interest Solution: learn the metric from data Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar). Metric Learning It typically induces a change of representation space which satisfies constraints. Sebban ( LaHC ) Supervised Metric Learning 6 / 45

Intuition behind Metric Learning “Learnable” Metrics The Mahalanobis distance ∀ x , x ′ ∈ R d , the Mahalanobis distance is defined as follows: � d M ( x , x ′ ) = ( x − x ′ ) T M ( x − x ′ ) , where M ∈ R d × d is a symmetric PSD matrix ( M � 0). The original term refers to the case where x and x ′ are random vectors from the same distribution with covariance matrix Σ , with M = Σ − 1 . Useful properties If M � 0, then x T Mx ≥ 0 ∀ x (as a linear operator, can be seen as nonnegative scaling). M = L T L for some matrix L . Sebban ( LaHC ) Supervised Metric Learning 7 / 45

Intuition behind Metric Learning Mahalanobis distance learning Using the decomposition M = L T L , where L ∈ R k × d , where k is the rank of M , one can rewrite d M ( x , x ′ ). � d M ( x , x ′ ) ( x − x ′ ) T L T L ( x − x ′ ) = � ( Lx − Lx ′ ) T ( Lx − Lx ′ ) . = Mahalanobis distance learning = Learning a linear projection If M is learned, a Mahalanobis distance implicitly corresponds to computing the Euclidean distance after a learned linear projection of the data by L in a k -dimensional space. Sebban ( LaHC ) Supervised Metric Learning 8 / 45

State of the Art Metric learning in a nutshell General formulation Given a metric, find its parameters M ∗ as M ∗ = arg min [ ℓ ( M , S , D , R ) + λ R ( M )] , M � 0 where ℓ ( M , S , D , R ) is a loss function that penalizes violated constraints, R ( M ) is some regularizer on M , and λ ≥ 0 is the regularization parameter. State of the art methods essentially differ by the choice of constraints , loss function and regularizer on M . Sebban ( LaHC ) Supervised Metric Learning 9 / 45

State of the Art Mahalanobis Distance Learning LMNN (Weinberger et al. 2005) Main Idea Define constraints tailored to k -NN in a local way: the k nearest neighbors should be of same class (“target neighbors”), while examples of different classes should be kept away (“impostors”): S = { ( x i , x j ) : y i = y j and x j belongs to the k -neighborhood of x i } , R = { ( x i , x j , x k ) : ( x i , x j ) ∈ S , y i � = y k } . Sebban ( LaHC ) Supervised Metric Learning 10 / 45

State of the Art Mahalanobis Distance Learning LMNN (Weinberger et al. 2005) Formulation � d 2 min M ( x i , x j ) M � 0 ( x i , x j ) ∈S d 2 M ( x i , x k ) − d 2 s.t. M ( x i , x j ) ≥ 1 ∀ ( x i , x j , x k ) ∈ R . Remarks Advantages : Convex, with a solver based on working set and subgradient descent. Can deal with millions of constraints and very popular in practice. Drawback : Subject to overfitting in high dimension. Sebban ( LaHC ) Supervised Metric Learning 11 / 45

State of the Art Mahalanobis Distance Learning ITML (Davis et al. 2007) Information-Theoretical Metric Learning (ITML) introduces LogDet divergence regularization. This Bregman divergence on PSD matrices is defined as: D ld ( M , M 0 ) = trace ( MM 0 − 1 ) − log det( MM 0 − 1 ) − d . where d is the dimension of the input space and M 0 is some PSD matrix we want to remain close to. ITML is formulated as follows: min D ld ( M , M 0 ) M � 0 d 2 s.t. M ( x i , x j ) ≤ u ∀ ( x i , x j ) ∈ S d 2 M ( x i , x j ) ≥ v ∀ ( x i , x j ) ∈ D , The LogDet divergence is finite iff M is PSD (cheap way of preserving a PSD matrix). It is also rank-preserving. Sebban ( LaHC ) Supervised Metric Learning 12 / 45

State of the Art Nonlinear Metric Learning Nonlinear metric learning The big picture Three approaches 1 Kernelization of linear methods. 2 Learning a nonlinear metric. 3 Learning several local linear metrics. Sebban ( LaHC ) Supervised Metric Learning 13 / 45

State of the Art Nonlinear Metric Learning Nonlinear metric learning Kernelization of linear methods Some algorithms have been shown to be kernelizable, but in general this is not trivial: a new formulation of the problem has to be derived, where interface to the data is limited to inner products , and sometimes a different implementation is necessary. When the number of training examples n is large, learning n 2 parameters may be intractable . A solution: KPCA trick (Chatpatanasiri et al., 2010) Use KPCA (PCA in kernel space) to get a nonlinear but low-dimensional projection of the data. Then use unchanged algorithm! Sebban ( LaHC ) Supervised Metric Learning 14 / 45

State of the Art Nonlinear Metric Learning Nonlinear metric learning Learning a nonlinear metric: GB-LMNN (Kedem et al. 2012) Main idea Learn a nonlinear mapping φ to optimize the Euclidean distance d φ ( x , x ′ ) = � φ ( x ) − φ ( x ′ ) � 2 in the transformed space. φ = φ 0 + α � T t =1 h t , where φ 0 is the mapping learned by linear LMNN, and h 1 , . . . , h T are gradient boosted regression trees . Intuitively, each tree divides the space into 2 p regions, and instances falling in the same region are translated by the same vector . Sebban ( LaHC ) Supervised Metric Learning 15 / 45

State of the Art Nonlinear Metric Learning Nonlinear metric learning Local metric learning Motivation Simple linear metrics perform well locally. Since everything is linear, can keep formulation convex. Pitfalls How to split the space? How to avoid a blow-up in number of parameters to learn, and avoid overfitting? How to obtain a proper (continuous) global metric? . . . Sebban ( LaHC ) Supervised Metric Learning 16 / 45

State of the Art Online Metric Learning Online learning If the number of training constraints is very large (this can happen even with a moderate number of training examples), previous algorithms become huge, possibly intractable optimization problems (gradient computation and/or projections become very expensive). One solution: online learning In online learning, the algorithm receives training pairs of instances one at a time and updates the current hypothesis at each step. Performance typically inferior to that of batch algorithms, but allows to tackle large-scale problems. Often come with guarantees in the form of regret bounds stating that the accumulated loss suffered along the way is not much worse than that of the best hypothesis chosen in hindsight . Sebban ( LaHC ) Supervised Metric Learning 17 / 45

State of the Art Online Metric Learning Mahalanobis distance learning LEGO (Jain et al. 2008) Formulation At each step, receive ( x t , x ′ t , y t ) where y t is the target distance between x t and x ′ t , and update as follows: M t +1 = arg min D ld ( M , M t ) + λℓ ( M , x t , x ′ t , y t ) , M � 0 where ℓ is a loss function (square loss, hinge loss...). Remarks It turns out that the above update has a closed-form solution which maintains M � 0 automatically. Can derive a regret bound. Sebban ( LaHC ) Supervised Metric Learning 18 / 45

State of the Art Online Metric Learning A quick advertisement... Recent survey There exist many other metric learning approaches. Most of them are discussed at more length in our recent survey: Bellet, A., Habrard, A., and Sebban, M. (2013). A Survey on Metric Learning for Feature Vectors and Structured Data . Technical report, available at the following address: http://arxiv.org/abs/1306.6709 Sebban ( LaHC ) Supervised Metric Learning 19 / 45

Supervised Metric Learning M. Sebban Laboratoire Hubert Curien , UMR - PowerPoint PPT Presentation

Supervised Metric Learning M. Sebban Laboratoire Hubert Curien , UMR CNRS 5516 University of Jean Monnet Saint- Etienne (France) AAFD14, Paris 13, April, 2014 Sebban ( LaHC ) Supervised Metric Learning 1 / 45 Outline Intuition behind

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Information- -Velocity Metric Velocity Metric Information-Velocity Metric Information for the

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Information Theoretic Metric Learning Instructor: Sham Kakade 1 Metric Learning In k -nearest

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Metric Conversions Ladder Method T. Trimpe 2008 http://sciencespot.net/ Metric System The

Linear Fitting CS3220 - Summer 2008 Jonathan Kaldor (based on Sp07 Slides) From N to M We

Outline Outline Several Random Variables Several Random Variables Joint

Perfectoid fields, deeply ramified fields and their relatives Franz-Viktor Kuhlmann (joint work

CS 61: Database Systems Advanced data modeling Adapted from Silberschatz, Korth, and Sundarshan

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Distance models

Dynamic Classifier Selection Based on Imprecise Probabilities Meizhu Li Ghent University

On the construction of minimax-distance (sub-)optimal designs Luc Pronzato Universit Cte

Myself Researcher at CNR-IMATI & Member of the Shape and Seman2cs Modelling Group

Supervised Metric Learning M. Sebban Laboratoire Hubert Curien , UMR - PowerPoint PPT Presentation

Supervised Metric Learning M. Sebban Laboratoire Hubert Curien , UMR CNRS 5516 University of Jean Monnet Saint- Etienne (France) AAFD14, Paris 13, April, 2014 Sebban ( LaHC ) Supervised Metric Learning 1 / 45 Outline Intuition behind

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Information- -Velocity Metric Velocity Metric Information-Velocity Metric Information for the

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Information Theoretic Metric Learning Instructor: Sham Kakade 1 Metric Learning In k -nearest

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Metric Conversions Ladder Method T. Trimpe 2008 http://sciencespot.net/ Metric System The

Linear Fitting CS3220 - Summer 2008 Jonathan Kaldor (based on Sp07 Slides) From N to M We

Outline Outline Several Random Variables Several Random Variables Joint

Perfectoid fields, deeply ramified fields and their relatives Franz-Viktor Kuhlmann (joint work

CS 61: Database Systems Advanced data modeling Adapted from Silberschatz, Korth, and Sundarshan

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 17: Distance models

Dynamic Classifier Selection Based on Imprecise Probabilities Meizhu Li Ghent University

On the construction of minimax-distance (sub-)optimal designs Luc Pronzato Universit Cte

Myself Researcher at CNR-IMATI &amp; Member of the Shape and Seman2cs Modelling Group

Myself Researcher at CNR-IMATI & Member of the Shape and Seman2cs Modelling Group