information theoretic metric learning
play

Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, - PowerPoint PPT Presentation

Formulation Algorithm Experiments Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon The University of Texas at Austin December 9, 2006 Presenter: Jason V. Davis Jason V. Davis, Brian Kulis,


  1. Formulation Algorithm Experiments Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon The University of Texas at Austin December 9, 2006 Presenter: Jason V. Davis Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  2. Formulation Algorithm Experiments Introduction ◮ Problem: Learn a Mahalanobis distance function subject to linear constraints Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  3. Formulation Algorithm Experiments Introduction ◮ Problem: Learn a Mahalanobis distance function subject to linear constraints ◮ Information-theoretic viewpoint ◮ Bijection between Gaussian distributions and Mahalanobis distances ◮ Natural entropy-based objective Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  4. Formulation Algorithm Experiments Introduction ◮ Problem: Learn a Mahalanobis distance function subject to linear constraints ◮ Information-theoretic viewpoint ◮ Bijection between Gaussian distributions and Mahalanobis distances ◮ Natural entropy-based objective ◮ Connections with kernel learning Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  5. Formulation Algorithm Experiments Introduction ◮ Problem: Learn a Mahalanobis distance function subject to linear constraints ◮ Information-theoretic viewpoint ◮ Bijection between Gaussian distributions and Mahalanobis distances ◮ Natural entropy-based objective ◮ Connections with kernel learning ◮ Fast and simple methods ◮ Based on Bregman’s method for convex optimization ◮ No eigenvalue computations are needed! Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  6. Formulation Algorithm Experiments Learning a Mahalanobis Distance ◮ Given n points { x 1 , ..., x n } in ℜ d ◮ Given inequality constraints relating pairs of points ◮ Similarity constraints: d A ( x i , x j ) ≤ u ◮ Dissimilarity constraints: d A ( x i , x j ) ≥ ℓ Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  7. Formulation Algorithm Experiments Learning a Mahalanobis Distance ◮ Given n points { x 1 , ..., x n } in ℜ d ◮ Given inequality constraints relating pairs of points ◮ Similarity constraints: d A ( x i , x j ) ≤ u ◮ Dissimilarity constraints: d A ( x i , x j ) ≥ ℓ ◮ Problem: Learn a Mahalanobis distance that satisfies these constraints: d A ( x i , x j ) = ( x i − x j ) T A ( x i − x j ) Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  8. Formulation Algorithm Experiments Learning a Mahalanobis Distance ◮ Given n points { x 1 , ..., x n } in ℜ d ◮ Given inequality constraints relating pairs of points ◮ Similarity constraints: d A ( x i , x j ) ≤ u ◮ Dissimilarity constraints: d A ( x i , x j ) ≥ ℓ ◮ Problem: Learn a Mahalanobis distance that satisfies these constraints: d A ( x i , x j ) = ( x i − x j ) T A ( x i − x j ) ◮ Applications ◮ k -means clustering ◮ Nearest neighbor searches Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  9. Formulation Algorithm Experiments Mahalanobis Distance and the Multivariate Gaussian ◮ Problem: How to choose the ‘best’ Mahalanobis distance from the feasible set? Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  10. Formulation Algorithm Experiments Mahalanobis Distance and the Multivariate Gaussian ◮ Problem: How to choose the ‘best’ Mahalanobis distance from the feasible set? ◮ Solution: Regularize by choosing that which is ‘closest’ to Euclidean distance Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  11. Formulation Algorithm Experiments Mahalanobis Distance and the Multivariate Gaussian ◮ Problem: How to choose the ‘best’ Mahalanobis distance from the feasible set? ◮ Solution: Regularize by choosing that which is ‘closest’ to Euclidean distance ◮ Bijection between the multivariate Gaussian and the Mahalanobis Distance p ( x ; m , A ) = 1 Z exp ( − 1 2( x − m ) T A ( x − m )) Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  12. Formulation Algorithm Experiments Mahalanobis Distance and the Multivariate Gaussian ◮ Problem: How to choose the ‘best’ Mahalanobis distance from the feasible set? ◮ Solution: Regularize by choosing that which is ‘closest’ to Euclidean distance ◮ Bijection between the multivariate Gaussian and the Mahalanobis Distance p ( x ; m , A ) = 1 Z exp ( − 1 2( x − m ) T A ( x − m )) ◮ Allows for comparison of two Mahalanobis distances ◮ Differential relative entropy between the associated Gaussians: p ( x ; m 1 , A 1 ) log p ( x ; m 1 , A 1 ) � KL( p ( x ; m 1 , A 1 ) � p ( x ; m 2 , A 2 )) = p ( x ; m 2 , A 2 ) d x . Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  13. Formulation Algorithm Experiments Problem Formulation Goal: Minimize differential relative entropy subject to pairwise inequality constraints min KL( p ( x ; m , A ) � p ( x ; m , I )) subject to d A ( x i , x j ) ≤ u ( i , j ) ∈ S , d A ( x i , x j ) ≥ ℓ ( i , j ) ∈ D A ≻ 0 Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  14. Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Overview: Optimizing the Model ◮ Show an equivalence between our problem and a low-rank kernel learning problem [Kulis, 2006] ◮ Yields closed-form solutions to compute the problem objective ◮ Shows that the problem is convex Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  15. Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Overview: Optimizing the Model ◮ Show an equivalence between our problem and a low-rank kernel learning problem [Kulis, 2006] ◮ Yields closed-form solutions to compute the problem objective ◮ Shows that the problem is convex ◮ Use this equivalence to solve our problem efficiently Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  16. Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Low-Rank Kernel Learning ◮ Given X = [ x 1 x 2 ... x n ], x i ∈ ℜ d , define K 0 = X T X ◮ Constraints: similarity ( S ) or dissimilarity ( D ) between pairs of points ◮ Objective: Learn K that minimizes the divergence to K 0 Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  17. Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Low-Rank Kernel Learning ◮ Given X = [ x 1 x 2 ... x n ], x i ∈ ℜ d , define K 0 = X T X ◮ Constraints: similarity ( S ) or dissimilarity ( D ) between pairs of points ◮ Objective: Learn K that minimizes the divergence to K 0 min D Burg ( K , K 0 ) subject to K ii + K jj − 2 K ij ≤ u ( i , j ) ∈ S , K ii + K jj − 2 K ij ≥ ℓ ( i , j ) ∈ D , K � 0 ◮ D Burg is the Burg divergence D Burg ( K , K 0 ) = Tr( KK − 1 0 ) − log det( KK − 1 0 ) − n Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  18. Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Equivalence to Kernel Learning [Kulis, 2006] Let K be the optimal solution to the low-rank kernel learning problem. ◮ Then K has the same range space as K 0 ◮ K = X T W T WX Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  19. Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Equivalence to Kernel Learning [Kulis, 2006] Let K be the optimal solution to the low-rank kernel learning problem. ◮ Then K has the same range space as K 0 ◮ K = X T W T WX Theorem : Let K = X T W T WX be an optimal solution to the low-rank kernel learning problem. ◮ Then A = W T W is an optimal solution to the corresponding metric learning problem Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

  20. Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Proof Sketch Lemma 1 : D Burg ( K , K 0 ) = 2KL( p ( x ; m , A ) � p ( x ; m , I )) + c ◮ Establishes that the objectives for the problem are the same ◮ Builds on a recent connection relating the relative entropy between Gaussians and the Burg divergence [Davis, 2006] Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend