Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, - PowerPoint PPT Presentation

Formulation Algorithm Experiments Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon The University of Texas at Austin December 9, 2006 Presenter: Jason V. Davis Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Introduction ◮ Problem: Learn a Mahalanobis distance function subject to linear constraints Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Introduction ◮ Problem: Learn a Mahalanobis distance function subject to linear constraints ◮ Information-theoretic viewpoint ◮ Bijection between Gaussian distributions and Mahalanobis distances ◮ Natural entropy-based objective Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Introduction ◮ Problem: Learn a Mahalanobis distance function subject to linear constraints ◮ Information-theoretic viewpoint ◮ Bijection between Gaussian distributions and Mahalanobis distances ◮ Natural entropy-based objective ◮ Connections with kernel learning Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Introduction ◮ Problem: Learn a Mahalanobis distance function subject to linear constraints ◮ Information-theoretic viewpoint ◮ Bijection between Gaussian distributions and Mahalanobis distances ◮ Natural entropy-based objective ◮ Connections with kernel learning ◮ Fast and simple methods ◮ Based on Bregman’s method for convex optimization ◮ No eigenvalue computations are needed! Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Learning a Mahalanobis Distance ◮ Given n points { x 1 , ..., x n } in ℜ d ◮ Given inequality constraints relating pairs of points ◮ Similarity constraints: d A ( x i , x j ) ≤ u ◮ Dissimilarity constraints: d A ( x i , x j ) ≥ ℓ Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Learning a Mahalanobis Distance ◮ Given n points { x 1 , ..., x n } in ℜ d ◮ Given inequality constraints relating pairs of points ◮ Similarity constraints: d A ( x i , x j ) ≤ u ◮ Dissimilarity constraints: d A ( x i , x j ) ≥ ℓ ◮ Problem: Learn a Mahalanobis distance that satisfies these constraints: d A ( x i , x j ) = ( x i − x j ) T A ( x i − x j ) Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Learning a Mahalanobis Distance ◮ Given n points { x 1 , ..., x n } in ℜ d ◮ Given inequality constraints relating pairs of points ◮ Similarity constraints: d A ( x i , x j ) ≤ u ◮ Dissimilarity constraints: d A ( x i , x j ) ≥ ℓ ◮ Problem: Learn a Mahalanobis distance that satisfies these constraints: d A ( x i , x j ) = ( x i − x j ) T A ( x i − x j ) ◮ Applications ◮ k -means clustering ◮ Nearest neighbor searches Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Mahalanobis Distance and the Multivariate Gaussian ◮ Problem: How to choose the ‘best’ Mahalanobis distance from the feasible set? Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Mahalanobis Distance and the Multivariate Gaussian ◮ Problem: How to choose the ‘best’ Mahalanobis distance from the feasible set? ◮ Solution: Regularize by choosing that which is ‘closest’ to Euclidean distance Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Mahalanobis Distance and the Multivariate Gaussian ◮ Problem: How to choose the ‘best’ Mahalanobis distance from the feasible set? ◮ Solution: Regularize by choosing that which is ‘closest’ to Euclidean distance ◮ Bijection between the multivariate Gaussian and the Mahalanobis Distance p ( x ; m , A ) = 1 Z exp ( − 1 2( x − m ) T A ( x − m )) Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Mahalanobis Distance and the Multivariate Gaussian ◮ Problem: How to choose the ‘best’ Mahalanobis distance from the feasible set? ◮ Solution: Regularize by choosing that which is ‘closest’ to Euclidean distance ◮ Bijection between the multivariate Gaussian and the Mahalanobis Distance p ( x ; m , A ) = 1 Z exp ( − 1 2( x − m ) T A ( x − m )) ◮ Allows for comparison of two Mahalanobis distances ◮ Differential relative entropy between the associated Gaussians: p ( x ; m 1 , A 1 ) log p ( x ; m 1 , A 1 ) � KL( p ( x ; m 1 , A 1 ) � p ( x ; m 2 , A 2 )) = p ( x ; m 2 , A 2 ) d x . Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Algorithm Experiments Problem Formulation Goal: Minimize differential relative entropy subject to pairwise inequality constraints min KL( p ( x ; m , A ) � p ( x ; m , I )) subject to d A ( x i , x j ) ≤ u ( i , j ) ∈ S , d A ( x i , x j ) ≥ ℓ ( i , j ) ∈ D A ≻ 0 Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Overview: Optimizing the Model ◮ Show an equivalence between our problem and a low-rank kernel learning problem [Kulis, 2006] ◮ Yields closed-form solutions to compute the problem objective ◮ Shows that the problem is convex Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Overview: Optimizing the Model ◮ Show an equivalence between our problem and a low-rank kernel learning problem [Kulis, 2006] ◮ Yields closed-form solutions to compute the problem objective ◮ Shows that the problem is convex ◮ Use this equivalence to solve our problem efficiently Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Low-Rank Kernel Learning ◮ Given X = [ x 1 x 2 ... x n ], x i ∈ ℜ d , define K 0 = X T X ◮ Constraints: similarity ( S ) or dissimilarity ( D ) between pairs of points ◮ Objective: Learn K that minimizes the divergence to K 0 Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Low-Rank Kernel Learning ◮ Given X = [ x 1 x 2 ... x n ], x i ∈ ℜ d , define K 0 = X T X ◮ Constraints: similarity ( S ) or dissimilarity ( D ) between pairs of points ◮ Objective: Learn K that minimizes the divergence to K 0 min D Burg ( K , K 0 ) subject to K ii + K jj − 2 K ij ≤ u ( i , j ) ∈ S , K ii + K jj − 2 K ij ≥ ℓ ( i , j ) ∈ D , K � 0 ◮ D Burg is the Burg divergence D Burg ( K , K 0 ) = Tr( KK − 1 0 ) − log det( KK − 1 0 ) − n Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Equivalence to Kernel Learning [Kulis, 2006] Let K be the optimal solution to the low-rank kernel learning problem. ◮ Then K has the same range space as K 0 ◮ K = X T W T WX Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Equivalence to Kernel Learning [Kulis, 2006] Let K be the optimal solution to the low-rank kernel learning problem. ◮ Then K has the same range space as K 0 ◮ K = X T W T WX Theorem : Let K = X T W T WX be an optimal solution to the low-rank kernel learning problem. ◮ Then A = W T W is an optimal solution to the corresponding metric learning problem Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Formulation Equivalence to Kernel Learning Algorithm Optimization via Bregman’s Method Experiments Extensions Proof Sketch Lemma 1 : D Burg ( K , K 0 ) = 2KL( p ( x ; m , A ) � p ( x ; m , I )) + c ◮ Establishes that the objectives for the problem are the same ◮ Builds on a recent connection relating the relative entropy between Gaussians and the Burg divergence [Davis, 2006] Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon Information-Theoretic Metric Learning

Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, - PowerPoint PPT Presentation

Formulation Algorithm Experiments Information-Theoretic Metric Learning Jason V. Davis, Brian Kulis, Suvrit Sra, and Inderjit Dhillon The University of Texas at Austin December 9, 2006 Presenter: Jason V. Davis Jason V. Davis, Brian Kulis,

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Information Theoretic Metric Learning Instructor: Sham Kakade 1 Metric Learning In k -nearest

Information- -Velocity Metric Velocity Metric Information-Velocity Metric Information for the

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 1 - Elements of Information

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 2 - Elements of Information

Metric Conversions Ladder Method T. Trimpe 2008 http://sciencespot.net/ Metric System The

Dynamical Systems Continuous maps of metric spaces We work with metric spaces, usually a

The Metric Coalescent joint with David Aldous Daniel Lanoue University of California, Berkeley

The Metric Coalescent Process joint with David Aldous Daniel Lanoue June 17, 2014 Daniel Lanoue

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Faster arithmetic for number-theoretic transforms David Harvey University of New South Wales 7th

ORDER-THEORETIC INVARIANTS IN SET-THEORETIC TOPOLOGY By David Milovich A dissertation submitted

Lattice-Theoretic Framework for Data-Flow Analysis Last time Generalizing data-flow

Exercise 1: Energy Deposition FLUKA Advanced Course Exercise 1a Study case Beam dump of a

Agreement: Implications of Proposals to date Xolisa Ngwadla, Marianne Karlsen CCXG Global Forum

Formalizing the Informal, From Equations to . . . Precisiating the Imprecise: Divergence: A

Unsupervised Domain Adaptation Based on Source-guided Discrepancy 23 th Sep. Han Bao (The

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Multi-agent learning T eahing strategies Gerard Vreeswijk , Intelligent Software Systems,

#gotochicago @thejayfields Tuesday, May 12, 15 @thejayfields JUnit version 4.11 .........

ETC5510: Introduction to Data Analysis ETC5510: Introduction to Data Analysis Week 5, part B