Classification with mixtures of curved Mahalanobis metrics or LMNN - PowerPoint PPT Presentation

Classification with mixtures of curved Mahalanobis metrics — or LMNN in Cayley-Klein geometries — Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 23 rd September 2016 1

Mahalanobis distances ◮ For Q ≻ 0, a symmetric positive definite matrix like a covariance matrix, define Mahalanobis distance : � ( p − q ) ⊤ Q ( p − q ) D Q ( p , q ) = Metric distance (indiscernibles/symmetry/triangle inequality) Eg., Q = precision matrix Σ − 1 , where Σ = covariance matrix ◮ Generalize Euclidean distance when Σ = I : D I ( p , q ) = � p − q � ◮ Mahalanobis distance interpreted as Euclidean distance after Cholesky decomposition Q = L ⊤ L and affine transformation x ′ ← L ⊤ x : D Σ ( p , q ) = D I ( L ⊤ p , L ⊤ q ) = � p ′ − q ′ � 2

Generalizing Mahalanobis distances with Cayley-Klein projective geometries + Learning in Cayley-Klein spaces 3

Cayley-Klein geometry: Projective geometry [5, 2] ◮ RP d : ( λ x , λ ) ∼ ( x , 1 ) homogeneous coordinates x �→ ˜ x = ( x , w = 1 ) , and dehomogeneization by “perspective division” ˜ x �→ x w ◮ cross-ratio measure is invariant by projectivity/homography: ( p , q ; P , Q ) = | p , P || q , Q | | p , Q || q , P | where p , q , P , Q are collinear Q q p P 4

Definition of Cayley-Klein geometries A Cayley-Klein geometry is K = ( F , c dist , c angle ) : 1. A fundamental conic: F 2. A constant unit c dist ∈ C for measuring distances 3. A constant unit c angle ∈ C for measuring angles See monograph [5] 5

Distance in Cayley-Klein geometries dist ( x , y ) = c dist log (( p , q ; P , Q )) where P and Q are intersection points of line l = ( pq ) ( ˜ q ) l = ˜ p × ˜ with the conic Q F q p P l Extend to Hilbert projective geometries: Bounded convex subset of R d instead of a conic 6

Key properties of Cayley-Klein distances ◮ dist ( p , p ) = 0 (law of indiscernibles) ◮ Signed distances : dist ( p , q ) = − dist ( q , p ) ◮ When p , q , r are collinear dist ( p , q ) = dist ( p , r ) + dist ( r , q ) Geodesics in Cayley-Klein geometries are straight lines (eventually clipped within the conic domain) Logarithm is transferring multiplicative properties of the cross-ratio to additive properties of Cayley-Klein distances. When p , q , P , Q are collinear: ( p , q ; P , Q ) = ( p , r ; P , Q ) · ( r , q ; P , Q ) 7

Dual conics In projective geometry, points and lines are dual concepts Dual parameterizations of the fundamental conic F = ( A , A ∆ ) Quadratic form Q A ( x ) = ˜ x ⊤ A ˜ x ◮ primal conic = set of border points: C A = { ˜ p ) = 0 } p : Q A (˜ ◮ dual conic = set of tangent hyperplanes: A = { ˜ l : Q A ∆ (˜ C ∗ l ) = 0 } A ∆ = A − 1 | A | is the adjoint matrix Adjoint can be computed even when A is not invertible ( | A | = 0) 8

Taxonomy Signature of matrix = sign of eigenvalues of its eigen decomposition Type A ∆ Conic A (+ , + , +) (+ , + , +) non-degenerate complex conic Elliptic non-degenerate real conic Hyperbolic (+ , + , − ) (+ , + , − ) Dual Euclidean (+ , + , 0 ) (+ , + , 0 ) Two complex lines with a real intersection point Dual Pseudo-euclidean (+ , − , 0 ) (+ , 0 , 0 ) Two real lines with a double real intersection point Deux (+ , 0 , 0 ) (+ , + , 0 ) Two complex points with a double real line passing through Euclidean Pseudo-euclidean (+ , 0 , 0 ) (+ , − , 0 ) Two complex points with a double real line passing through Galilean (+ , 0 , 0 ) (+ , 0 , 0 ) Double real line with a real intersection point Degenerate cases are obtained as limit of non-degenerate cases. Thus restrict to “three kinds” of Cayley-Klein geometries [5]: 1. elliptical 2. hyperbolic 3. parabolic 9

Real CK distances without cross-ratio expressions For real Cayley-Klein measures, we choose the constants: ◮ Constants ( κ is curvature): ◮ Elliptic ( κ > 0): c dist = κ 2 i ◮ Hyperbolic ( κ < 0): c dist = − κ 2 ◮ Bilinear form S pq = ( p ⊤ , 1 ) ⊤ S ( q , 1 ) = ˜ p ⊤ S ˜ q ◮ Get rid of cross-ratio using: � S 2 S pq + pq − S pp S qq ( p , q ; P , Q ) = � S 2 S pq − pq − S pp S qq 10

Elliptical Cayley-Klein metric distance �   S 2 S pq + pq − S pp S qq d E ( p , q ) = κ 2 i · log   � S 2 S pq − pq − S pp S qq � � S pq d E ( p , q ) = κ · arccos � S pp S qq Notice that d E ( p , q ) < κπ , domain D S = R d in elliptical case. x y x’ y’ Gnomonic projection d E ( x , y ) = κ · arccos ( � x ′ , y ′ � ) 11

Hyperbolic Cayley-Klein distance When p , q ∈ D S := { p : S pp < 0 } , the hyperbolic domain: �   S pq + S 2 pq − S pp S qq d H ( p , q ) = − κ 2 log   � S 2 S pq − pq − S pp S qq �� 1 − S pp S qq d H ( p , q ) = κ arctanh S 2 pq � � S pq d H ( p , q ) = κ arccosh � S pp S qq √ x 2 − 1 ) and arctanh ( x ) = 1 2 log 1 + x with arccosh ( x ) = log ( x + 1 − x 12

Decomposition of the bilinear form [1] � Σ � a Write S = = S Σ , a , b with Σ ≻ 0. a ⊤ b p ⊤ S ˜ q = p ⊤ Σ q + p ⊤ a + a ⊤ q + b S p , q = ˜ Let µ = − Σ − 1 a ∈ R d ( a = − Σ µ ) and b = µ ⊤ Σ µ + sign ( κ ) 1 κ 2 � ( b − µ ⊤ µ ) − 1 b > µ ⊤ µ 2 κ = − ( µ ⊤ µ − b ) − 1 b < µ ⊤ µ 2 Then the bilinear form writes as: S ( p , q ) = S Σ ,µ,κ ( p , q ) = ( p − µ ) ⊤ Σ( q − µ ) + sign ( κ ) 1 κ 2 13

Curved Mahalanobis metric distances We have [1]: κ → 0 + D Σ ,µ,κ ( p , q ) = lim lim κ → 0 − D Σ ,µ,κ ( p , q ) = D Σ ( p , q ) Mahalanobis distance D Σ ( p , q ) = D Σ , 0 , 0 ( p , q ) Thus hyperbolic/elliptical Cayley-Klein distances can be interpreted as curved Mahalanobis distances , or κ -Mahalanobis distances When S = diag ( 1 , 1 , ..., 1 , − 1 ) , we recover the canonical hyperbolic distance [3] in Cayley-Klein model: � � 1 − � p , q � D h ( p , q ) = arccosh � 1 − � p , p � � 1 − � q , q � defined inside the interior of a unit ball. 14

Cayley-Klein bisectors are affine Bisector Bi ( p , q ) : Bi ( p , q ) = { x ∈ D S : dist S ( p , x ) = dist S ( x , q ) } � � � � x , | S ( p , p ) | Σ q − | S ( q , q ) | Σ p � | S ( p , p ) | ( a ⊤ ( q + x ) + b ) − � | S ( q , q ) | ( a ⊤ ( p + x ) + b ) = 0 + 15

Cayley-Klein Voronoi diagrams are affine Can be computed from equivalent (clipped) power diagrams https://www.youtube.com/watch?v=YHJLq3-RL58 16

Cayley-Klein balls Blue: Mahalanobis Red: Elliptical Green: Hyperbolic Cayley-Klein balls have Mahalanobis ball shapes with displaced centers 17

Learning curved Mahalanobis metrics 18

Large Margin Nearest Neighbors (LMNN) Learn [6] Mahalanobis distance M = L ⊤ L ≻ 0 for a given input data-set P ◮ Distance of each point to its target neighbors shrink, ǫ pull ( L ) ◮ Keep a distance margin of each point to its impostors , ǫ push ( L ) http://www.cs.cornell.edu/~kilian/code/lmnn/lmnn.html 19

LMNN: Cost function and optimization Objective cost function [6]: convex and piecewise linear Σ i , i → j � L ( x i − x j ) � 2 , ǫ pull ( L ) = 1 + � L ( x i − x j ) � 2 − � L ( x i − x l ) � 2 � Σ i , i → j Σ j ( 1 − y il ) � ǫ push ( L ) = + , ( 1 − µ ) ǫ pull ( L ) + µǫ push ( L ) ǫ ( L ) = i → j : x j is a target neighbor of x i y il = 1 iff x i and x j have same label, y il = 0 otherwise. Optimize by gradient descent: ǫ ( L t + 1 ) = ǫ ( L t ) − γ ∂ǫ ( L t ) ∂ L ∂ǫ ∂ L = ( 1 − µ )Σ i , i → j C ij + µ Σ ( i , j , l ) ∈ N t ( C ij − C il ) where C ij = ( x i − x j ) ⊤ ( x i − x j ) Easy, no projection mechanism like for Mahalanobis Metric for Clustering (MMC) [7] 20

Elliptical Cayley-Klein LMNN [1], CVPR 2015 � � � ǫ ( L ) = ( 1 − µ ) ( 1 − y il ) ζ ijl d E ( x i , x j ) + µ i , i → j i , i → j l with ζ ijl = [ 1 + d E ( x i , x j ) − d E ( x i , x l )] + ∂ǫ ( L ) ∂ d E ( x i , x j ) ( 1 − y il ) ∂ζ ijl � � � = ( 1 − µ ) + µ ∂ L ∂ L ∂ L i , i → j i , i → j l C ij = ( x ⊤ i , 1 ) ⊤ ( x ⊤ j , 1 ) ∂ d E ( x i , x j ) k � S ij C ii + S ij � = C jj − ( C ij + C ji ) L ∂ L � S ii S jj S ii S jj − S 2 ij � ∂ d E ( x i , x j ) − ∂ d E ( x i , x l ) if ζ ijl ≥ 0 , ∂ζ ijl , ∂ L ∂ L ∂ L = 0 , otherwise . 21

Hyperbolic Cayley-Klein LMNN To ensure S keeps correct signature ( 1 , n , 0 ) during the LMNN gradient descent, we decompose S = L ⊤ DL (with L ≻ 0) and perform a gradient descent on L with the following gradient: ∂ d H ( x i , x j ) k � S ij C ii + S ij � = C jj − ( C ij + C ji ) DL ∂ L � S ii S jj S 2 ij − S ii S jj Recall two difficulties of hyperbolic case compared to elliptical case: ◮ Hyperbolic Cayley-Klein distance may be very large (unbounded vs. < κπ for elliptical case) ◮ Data-set should be contained inside the compact domain D S 22

Hyperbolic CK-LMNN: Initialization and learning rate � L ′ � ◮ Initialize L = and D so that P ∈ D S with 1 Σ − 1 = L ′⊤ L ′ (eg., precision matrix of P ).  − 1  ...   D =   − 1     κ max x � L ′ x � 2 with κ > 1. ◮ At iteration t , it may happen that P �∈ D S t since we do not know the optimal learning rate γ . When this happens, we reduce γ ← γ 2 , otherwise we let γ ← 1 . 01 γ . 23

Classification with mixtures of curved Mahalanobis metrics or LMNN - PowerPoint PPT Presentation

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 23

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Lect. 17 General Relativity - Curved Space-Time. General Relativity - Curved Space-time

Axiomatic Quantum Field Theory in Curved Spacetime Robert M. Wald (based on work with Stefan

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

TRACKING FEATURES WITH KALMAN FILTERING, MAHALANOBIS DISTANCE AND A MANAGEMENT MODEL Raquel R.

Method Based on Morphological Analysis, Clustering and the Mahalanobis-Taguchi Method Hirohisa

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

This reduces to a generalized eigenvalue problem, i.e. to finding generalized eigenvectors of

+ m: iTEIi:' -f;'o:&

INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal &

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Mark S. Handcock

Confronting the Partition Function Lecture slides for Chapter 18 of Deep Learning

Undirected Graphical Model Application Aryan Arbabi CSC 412 Tutorial February 1, 2018 Outline

GMN GMNN: Gr Graph Ma Mark rkov Neur Neural al Ne Networks Meng Qu 1 2 , Yoshua Bengio 1 2 4

Classification with mixtures of curved Mahalanobis metrics or LMNN - PowerPoint PPT Presentation

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 23

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Lect. 17 General Relativity - Curved Space-Time. General Relativity - Curved Space-time

Axiomatic Quantum Field Theory in Curved Spacetime Robert M. Wald (based on work with Stefan

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

TRACKING FEATURES WITH KALMAN FILTERING, MAHALANOBIS DISTANCE AND A MANAGEMENT MODEL Raquel R.

Method Based on Morphological Analysis, Clustering and the Mahalanobis-Taguchi Method Hirohisa

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

This reduces to a generalized eigenvalue problem, i.e. to finding generalized eigenvectors of

+ m: iTEIi:' -f;'o:&amp;

INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal &amp;

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Mark S. Handcock

Confronting the Partition Function Lecture slides for Chapter 18 of Deep Learning

Undirected Graphical Model Application Aryan Arbabi CSC 412 Tutorial February 1, 2018 Outline

GMN GMNN: Gr Graph Ma Mark rkov Neur Neural al Ne Networks Meng Qu 1 2 , Yoshua Bengio 1 2 4

+ m: iTEIi:' -f;'o:&

INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal &