classification with mixtures of curved mahalanobis metrics
play

Classification with mixtures of curved Mahalanobis metrics or LMNN - PowerPoint PPT Presentation

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries arXiv:1609.07082 Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 , 5 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia


  1. Classification with mixtures of curved Mahalanobis metrics — or LMNN in Cayley-Klein geometries — arXiv:1609.07082 Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 , 5 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 4 The University of Sydney, Australia 26 th September 2016 1

  2. Mahalanobis distances ◮ For Q ≻ 0, a symmetric positive definite matrix like a covariance matrix, define Mahalanobis distance : � ( p − q ) ⊤ Q ( p − q ) D Q ( p , q ) = Metric distance (indiscernibles/symmetry/triangle inequality) Eg., Q = precision matrix Σ − 1 , where Σ = covariance matrix ◮ Generalize Euclidean distance when Q = I : D I ( p , q ) = � p − q � ◮ Mahalanobis distance interpreted as Euclidean distance after Cholesky decomposition Q = L ⊤ L and affine transformation x ′ ← L ⊤ x : D Q ( p , q ) = D I ( L ⊤ p , L ⊤ q ) = � p ′ − q ′ � 2

  3. Generalizing Mahalanobis distances with Cayley-Klein projective geometries + Learning in Cayley-Klein spaces 3

  4. Cayley-Klein geometry: Projective geometry [7, 3] ◮ RP d : ( λ x , λ ) ∼ ( x , 1 ) homogeneous coordinates x �→ ˜ x = ( x , w = 1 ) , and x �→ x dehomogeneization by “perspective division” ˜ w ◮ cross-ratio measure is invariant by projectivity/homography/collineation: ( p , q ; P , Q ) = ( p − P )( q − Q ) ( p − Q )( q − P ) where p , q , P , Q are collinear Q q p P 4

  5. Definition of Cayley-Klein geometries A Cayley-Klein geometry is K = ( F , c dist , c angle ) : 1. A fundamental conic: F 2. A constant unit c dist ∈ C for measuring distances 3. A constant unit c angle ∈ C for measuring angles See monograph [7] 5

  6. Distance in Cayley-Klein geometries dist ( p , q ) = c dist Log (( p , q ; P , Q )) where P and Q are intersection points of line l = ( pq ) ( ˜ l = ˜ p × ˜ q in 2D) with the conic. Log is principal complex logarithm (modulo 2 π i ) Q F q p P l 6

  7. Key properties of Cayley-Klein distances ◮ dist ( p , p ) = 0 (law of indiscernibles) ◮ Signed distances : dist ( p , q ) = − dist ( q , p ) ◮ When p , q , r are collinear dist ( p , q ) = dist ( p , r ) + dist ( r , q ) Geodesics in Cayley-Klein geometries are straight lines (eventually clipped within the conic domain) Logarithm is transferring multiplicative properties of the cross-ratio to additive properties of Cayley-Klein distances. When p , q , P , Q are collinear: ( p , q ; P , Q ) = ( p , r ; P , Q ) · ( r , q ; P , Q ) 7

  8. Dual conics In projective geometry, points and lines are dual concepts Dual parameterizations of the fundamental conic F = ( A , A ∆ ) x ⊤ A ˜ Quadratic form Q A ( x ) = ˜ x ◮ primal conic = set of border points: C A = { ˜ p : Q A (˜ p ) = 0 } ◮ dual conic = set of tangent hyperplanes: A = { ˜ l : Q A ∆ (˜ C ∗ l ) = 0 } A ∆ = A − 1 | A | is the adjoint matrix Adjoint can be computed even when A is not invertible ( | A | = 0) 8

  9. Taxonomy Signature of matrix = sign of eigenvalues of its eigen decomposition A ∆ Type A Conic Elliptic (+ , + , +) (+ , + , +) non-degenerate complex conic Hyperbolic (+ , + , − ) (+ , + , − ) non-degenerate real conic Dual Euclidean (+ , + , 0 ) (+ , + , 0 ) Two complex lines with a real intersection point Dual Pseudo-euclidean (+ , − , 0 ) (+ , 0 , 0 ) Two real lines with a double real intersection point Deux Euclidean (+ , 0 , 0 ) (+ , + , 0 ) Two complex points with a double real line passing through Pseudo-euclidean (+ , 0 , 0 ) (+ , − , 0 ) Two complex points with a double real line passing through Galilean (+ , 0 , 0 ) (+ , 0 , 0 ) Double real line with a real intersection point Degenerate cases are obtained as limit of non-degenerate cases. Measurements can be elliptic, hyperbolic or parabolic (degenerate case). 9

  10. Real CK distances without cross-ratio expressions For real Cayley-Klein measures, we choose the constants: ◮ Constants ( κ is curvature): ◮ Elliptic ( κ > 0): c dist = κ 2 i ◮ Hyperbolic ( κ < 0): c dist = − κ 2 ◮ Bilinear form S pq = ( p ⊤ , 1 ) ⊤ S ( q , 1 ) = ˜ p ⊤ S ˜ q ◮ Get rid of cross-ratio using: � S 2 S pq + pq − S pp S qq ( p , q ; P , Q ) = � S 2 S pq − pq − S pp S qq 10

  11. Elliptic Cayley-Klein metric distance �   S 2 S pq + pq − S pp S qq d E ( p , q ) = κ 2 i Log   � S 2 S pq − pq − S pp S qq � � S pq d E ( p , q ) = κ arccos � S pp S qq Notice that d E ( p , q ) < κπ , domain D S = R d in elliptic case. x y x’ y’ Gnomonic projection d E ( x , y ) = κ · arccos ( � x ′ , y ′ � ) 11

  12. Hyperbolic Cayley-Klein distance When p , q ∈ D S := { p : S pp < 0 } , the hyperbolic domain: �   S 2 S pq + pq − S pp S qq d H ( p , q ) = − κ 2 log   � S 2 S pq − pq − S pp S qq �� � 1 − S pp S qq d H ( p , q ) = − κ arctanh S 2 pq � � S pq d H ( p , q ) = − κ arccosh � S pp S qq √ x 2 − 1 ) and arctanh ( x ) = 1 2 log 1 + x with arccosh ( x ) = log ( x + 1 − x . Curvature κ < 0 12

  13. Decomposition of the bilinear form [1] � Σ � a Write S = = S Σ , a , b with Σ ≻ 0. a ⊤ b p ⊤ S ˜ q = p ⊤ Σ q + p ⊤ a + a ⊤ q + b S p , q = ˜ Let µ = − Σ − 1 a ∈ R d ( a = − Σ µ ) and b = µ ⊤ Σ µ + sign ( κ ) 1 κ 2 ( b − µ ⊤ µ ) − 1 � b > µ ⊤ µ 2 κ = − ( µ ⊤ µ − b ) − 1 b < µ ⊤ µ 2 Then the bilinear form writes as: S ( p , q ) = S Σ ,µ,κ ( p , q ) = ( p − µ ) ⊤ Σ( q − µ ) + sign ( κ ) 1 κ 2 13

  14. Curved Mahalanobis metric distances We have [1]: κ → 0 + D Σ ,µ,κ ( p , q ) = lim lim κ → 0 − D Σ ,µ,κ ( p , q ) = D Σ ( p , q ) Mahalanobis distance D Σ ( p , q ) = D Σ , 0 , 0 ( p , q ) Thus hyperbolic/elliptic Cayley-Klein distances can be interpreted as curved Mahalanobis distances , or κ -Mahalanobis distances When S = diag ( 1 , 1 , ..., 1 , − 1 ) , we recover the canonical hyperbolic distance [5] in Cayley-Klein model: � � 1 − � p , q � D h ( p , q ) = arccosh � � 1 − � p , p � 1 − � q , q � defined inside the interior of a unit ball. 14

  15. Cayley-Klein bisectors are affine Bisector Bi ( p , q ) : Bi ( p , q ) = { x ∈ D S : dist S ( p , x ) = dist S ( x , q ) } S ( p , x ) S ( q , x ) = � � S ( p , p ) S ( q , q ) arccos and arccosh are monotonically increasing functions. � � � � x , | S ( p , p ) | Σ q − | S ( q , q ) | Σ p � � | S ( p , p ) | ( a ⊤ ( q + x ) + b ) − | S ( q , q ) | ( a ⊤ ( p + x ) + b ) = 0 + Hyperplanes (restricted to the domain) 15

  16. Cayley-Klein Voronoi diagrams are affine Can be computed from equivalent (clipped) power diagrams [2, 5] https://www.youtube.com/watch?v=YHJLq3-RL58 16

  17. Cayley-Klein balls Blue: Mahalanobis Red: elliptic Green: Hyperbolic Cayley-Klein balls have Mahalanobis ball shapes with displaced centers 17

  18. Learning curved Mahalanobis metrics 18

  19. Large Margin Nearest Neighbors [8], LMNN Learn Mahalanobis distance M = L ⊤ L ≻ 0 for a given input data-set P ◮ Distance of each point to its target neighbors shrink, ǫ pull ( L ) S = { ( x i , x j ) : y i = y j and x j ∈ N ( x j ) } ◮ Keep a distance margin of each point to its impostors , ǫ push ( L ) R = { ( x i , x j , x l ) : ( x i , x j ) ∈ S and y i � = y l } http://www.cs.cornell.edu/~kilian/code/lmnn/lmnn.html 19

  20. LMNN: Cost function and optimization Objective cost function [8]: convex and piecewise linear (SDP) Σ i , i → j � L ( x i − x j ) � 2 , ǫ pull ( L ) = 1 + � L ( x i − x j ) � 2 − � L ( x i − x l ) � 2 � � ǫ push ( L ) = Σ i , i → j Σ j ( 1 − y il ) + , ǫ ( L ) = ( 1 − µ ) ǫ pull ( L ) + µǫ push ( L ) i → j : x j is a target neighbor of x i y il = 1 iff x i and x j have same label, y il = 0 otherwise. µ set by cross-validation Optimize by gradient descent: ǫ ( L t + 1 ) = ǫ ( L t ) − γ ∂ǫ ( L t ) ∂ L ∂ǫ ∂ L = ( 1 − µ )Σ i , i → j C ij + µ Σ ( i , j , l ) ∈R t ( C ij − C il ) where C ij = ( x i − x j ) ⊤ ( x i − x j ) Easy, no projection mechanism like for Mahalanobis Metric for Clustering (MMC) [9] 20

  21. Elliptic Cayley-Klein LMNN [1], CVPR 2015 � � � ǫ ( L ) = ( 1 − µ ) d E ( x i , x j ) + µ ( 1 − y il ) ζ ijl i , i → j i , i → j l with ζ ijl = [ 1 + d E ( x i , x j ) − d E ( x i , x l )] + (hinge loss) ∂ǫ ( L ) ∂ d E ( x i , x j ) ( 1 − y il ) ∂ζ ijl � � � = ( 1 − µ ) + µ ∂ L ∂ L ∂ L i , i → j i , i → j l C ij = ( x ⊤ i , 1 ) ⊤ ( x ⊤ j , 1 ) ∂ d E ( x i , x j ) k � S ij C ii + S ij � = C jj − ( C ij + C ji ) L ∂ L � S ii S jj S ii S jj − S 2 ij � ∂ d E ( x i , x j ) − ∂ d E ( x i , x l ) ∂ζ ijl , if ζ ijl ≥ 0 , ∂ L ∂ L ∂ L = 0 , otherwise . 21

  22. Hyperbolic Cayley-Klein LMNN (new case) To ensure S keeps correct signature ( 1 , d , 0 ) during the LMNN gradient descent, we decompose S = L ⊤ DL (with L ≻ 0) and perform a gradient descent on L with the following gradient: ∂ d H ( x i , x j ) � S ij � k C ii + S ij = DL C jj − ( C ij + C ji ) � ∂ L S ii S jj S 2 ij − S ii S jj Recall two difficulties of hyperbolic case compared to elliptic case: ◮ Hyperbolic Cayley-Klein distance may be very large (unbounded vs. < κπ for elliptic case) ◮ Data-set should be contained inside the compact domain D S 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend