classification with mixtures of curved mahalanobis metrics
play

Classification with mixtures of curved Mahalanobis metrics or LMNN - PowerPoint PPT Presentation

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 23


  1. Classification with mixtures of curved Mahalanobis metrics — or LMNN in Cayley-Klein geometries — Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 23 rd September 2016 1

  2. Mahalanobis distances ◮ For Q ≻ 0, a symmetric positive definite matrix like a covariance matrix, define Mahalanobis distance : � ( p − q ) ⊤ Q ( p − q ) D Q ( p , q ) = Metric distance (indiscernibles/symmetry/triangle inequality) Eg., Q = precision matrix Σ − 1 , where Σ = covariance matrix ◮ Generalize Euclidean distance when Σ = I : D I ( p , q ) = � p − q � ◮ Mahalanobis distance interpreted as Euclidean distance after Cholesky decomposition Q = L ⊤ L and affine transformation x ′ ← L ⊤ x : D Σ ( p , q ) = D I ( L ⊤ p , L ⊤ q ) = � p ′ − q ′ � 2

  3. Generalizing Mahalanobis distances with Cayley-Klein projective geometries + Learning in Cayley-Klein spaces 3

  4. Cayley-Klein geometry: Projective geometry [5, 2] ◮ RP d : ( λ x , λ ) ∼ ( x , 1 ) homogeneous coordinates x �→ ˜ x = ( x , w = 1 ) , and dehomogeneization by “perspective division” ˜ x �→ x w ◮ cross-ratio measure is invariant by projectivity/homography: ( p , q ; P , Q ) = | p , P || q , Q | | p , Q || q , P | where p , q , P , Q are collinear Q q p P 4

  5. Definition of Cayley-Klein geometries A Cayley-Klein geometry is K = ( F , c dist , c angle ) : 1. A fundamental conic: F 2. A constant unit c dist ∈ C for measuring distances 3. A constant unit c angle ∈ C for measuring angles See monograph [5] 5

  6. Distance in Cayley-Klein geometries dist ( x , y ) = c dist log (( p , q ; P , Q )) where P and Q are intersection points of line l = ( pq ) ( ˜ q ) l = ˜ p × ˜ with the conic Q F q p P l Extend to Hilbert projective geometries: Bounded convex subset of R d instead of a conic 6

  7. Key properties of Cayley-Klein distances ◮ dist ( p , p ) = 0 (law of indiscernibles) ◮ Signed distances : dist ( p , q ) = − dist ( q , p ) ◮ When p , q , r are collinear dist ( p , q ) = dist ( p , r ) + dist ( r , q ) Geodesics in Cayley-Klein geometries are straight lines (eventually clipped within the conic domain) Logarithm is transferring multiplicative properties of the cross-ratio to additive properties of Cayley-Klein distances. When p , q , P , Q are collinear: ( p , q ; P , Q ) = ( p , r ; P , Q ) · ( r , q ; P , Q ) 7

  8. Dual conics In projective geometry, points and lines are dual concepts Dual parameterizations of the fundamental conic F = ( A , A ∆ ) Quadratic form Q A ( x ) = ˜ x ⊤ A ˜ x ◮ primal conic = set of border points: C A = { ˜ p ) = 0 } p : Q A (˜ ◮ dual conic = set of tangent hyperplanes: A = { ˜ l : Q A ∆ (˜ C ∗ l ) = 0 } A ∆ = A − 1 | A | is the adjoint matrix Adjoint can be computed even when A is not invertible ( | A | = 0) 8

  9. Taxonomy Signature of matrix = sign of eigenvalues of its eigen decomposition Type A ∆ Conic A (+ , + , +) (+ , + , +) non-degenerate complex conic Elliptic non-degenerate real conic Hyperbolic (+ , + , − ) (+ , + , − ) Dual Euclidean (+ , + , 0 ) (+ , + , 0 ) Two complex lines with a real intersection point Dual Pseudo-euclidean (+ , − , 0 ) (+ , 0 , 0 ) Two real lines with a double real intersection point Deux (+ , 0 , 0 ) (+ , + , 0 ) Two complex points with a double real line passing through Euclidean Pseudo-euclidean (+ , 0 , 0 ) (+ , − , 0 ) Two complex points with a double real line passing through Galilean (+ , 0 , 0 ) (+ , 0 , 0 ) Double real line with a real intersection point Degenerate cases are obtained as limit of non-degenerate cases. Thus restrict to “three kinds” of Cayley-Klein geometries [5]: 1. elliptical 2. hyperbolic 3. parabolic 9

  10. Real CK distances without cross-ratio expressions For real Cayley-Klein measures, we choose the constants: ◮ Constants ( κ is curvature): ◮ Elliptic ( κ > 0): c dist = κ 2 i ◮ Hyperbolic ( κ < 0): c dist = − κ 2 ◮ Bilinear form S pq = ( p ⊤ , 1 ) ⊤ S ( q , 1 ) = ˜ p ⊤ S ˜ q ◮ Get rid of cross-ratio using: � S 2 S pq + pq − S pp S qq ( p , q ; P , Q ) = � S 2 S pq − pq − S pp S qq 10

  11. Elliptical Cayley-Klein metric distance �   S 2 S pq + pq − S pp S qq d E ( p , q ) = κ 2 i · log   � S 2 S pq − pq − S pp S qq � � S pq d E ( p , q ) = κ · arccos � S pp S qq Notice that d E ( p , q ) < κπ , domain D S = R d in elliptical case. x y x’ y’ Gnomonic projection d E ( x , y ) = κ · arccos ( � x ′ , y ′ � ) 11

  12. Hyperbolic Cayley-Klein distance When p , q ∈ D S := { p : S pp < 0 } , the hyperbolic domain: �   S pq + S 2 pq − S pp S qq d H ( p , q ) = − κ 2 log   � S 2 S pq − pq − S pp S qq �� � 1 − S pp S qq d H ( p , q ) = κ arctanh S 2 pq � � S pq d H ( p , q ) = κ arccosh � S pp S qq √ x 2 − 1 ) and arctanh ( x ) = 1 2 log 1 + x with arccosh ( x ) = log ( x + 1 − x 12

  13. Decomposition of the bilinear form [1] � Σ � a Write S = = S Σ , a , b with Σ ≻ 0. a ⊤ b p ⊤ S ˜ q = p ⊤ Σ q + p ⊤ a + a ⊤ q + b S p , q = ˜ Let µ = − Σ − 1 a ∈ R d ( a = − Σ µ ) and b = µ ⊤ Σ µ + sign ( κ ) 1 κ 2 � ( b − µ ⊤ µ ) − 1 b > µ ⊤ µ 2 κ = − ( µ ⊤ µ − b ) − 1 b < µ ⊤ µ 2 Then the bilinear form writes as: S ( p , q ) = S Σ ,µ,κ ( p , q ) = ( p − µ ) ⊤ Σ( q − µ ) + sign ( κ ) 1 κ 2 13

  14. Curved Mahalanobis metric distances We have [1]: κ → 0 + D Σ ,µ,κ ( p , q ) = lim lim κ → 0 − D Σ ,µ,κ ( p , q ) = D Σ ( p , q ) Mahalanobis distance D Σ ( p , q ) = D Σ , 0 , 0 ( p , q ) Thus hyperbolic/elliptical Cayley-Klein distances can be interpreted as curved Mahalanobis distances , or κ -Mahalanobis distances When S = diag ( 1 , 1 , ..., 1 , − 1 ) , we recover the canonical hyperbolic distance [3] in Cayley-Klein model: � � 1 − � p , q � D h ( p , q ) = arccosh � 1 − � p , p � � 1 − � q , q � defined inside the interior of a unit ball. 14

  15. Cayley-Klein bisectors are affine Bisector Bi ( p , q ) : Bi ( p , q ) = { x ∈ D S : dist S ( p , x ) = dist S ( x , q ) } � � � � x , | S ( p , p ) | Σ q − | S ( q , q ) | Σ p � | S ( p , p ) | ( a ⊤ ( q + x ) + b ) − � | S ( q , q ) | ( a ⊤ ( p + x ) + b ) = 0 + 15

  16. Cayley-Klein Voronoi diagrams are affine Can be computed from equivalent (clipped) power diagrams https://www.youtube.com/watch?v=YHJLq3-RL58 16

  17. Cayley-Klein balls Blue: Mahalanobis Red: Elliptical Green: Hyperbolic Cayley-Klein balls have Mahalanobis ball shapes with displaced centers 17

  18. Learning curved Mahalanobis metrics 18

  19. Large Margin Nearest Neighbors (LMNN) Learn [6] Mahalanobis distance M = L ⊤ L ≻ 0 for a given input data-set P ◮ Distance of each point to its target neighbors shrink, ǫ pull ( L ) ◮ Keep a distance margin of each point to its impostors , ǫ push ( L ) http://www.cs.cornell.edu/~kilian/code/lmnn/lmnn.html 19

  20. LMNN: Cost function and optimization Objective cost function [6]: convex and piecewise linear Σ i , i → j � L ( x i − x j ) � 2 , ǫ pull ( L ) = 1 + � L ( x i − x j ) � 2 − � L ( x i − x l ) � 2 � Σ i , i → j Σ j ( 1 − y il ) � ǫ push ( L ) = + , ( 1 − µ ) ǫ pull ( L ) + µǫ push ( L ) ǫ ( L ) = i → j : x j is a target neighbor of x i y il = 1 iff x i and x j have same label, y il = 0 otherwise. Optimize by gradient descent: ǫ ( L t + 1 ) = ǫ ( L t ) − γ ∂ǫ ( L t ) ∂ L ∂ǫ ∂ L = ( 1 − µ )Σ i , i → j C ij + µ Σ ( i , j , l ) ∈ N t ( C ij − C il ) where C ij = ( x i − x j ) ⊤ ( x i − x j ) Easy, no projection mechanism like for Mahalanobis Metric for Clustering (MMC) [7] 20

  21. Elliptical Cayley-Klein LMNN [1], CVPR 2015 � � � ǫ ( L ) = ( 1 − µ ) ( 1 − y il ) ζ ijl d E ( x i , x j ) + µ i , i → j i , i → j l with ζ ijl = [ 1 + d E ( x i , x j ) − d E ( x i , x l )] + ∂ǫ ( L ) ∂ d E ( x i , x j ) ( 1 − y il ) ∂ζ ijl � � � = ( 1 − µ ) + µ ∂ L ∂ L ∂ L i , i → j i , i → j l C ij = ( x ⊤ i , 1 ) ⊤ ( x ⊤ j , 1 ) ∂ d E ( x i , x j ) k � S ij C ii + S ij � = C jj − ( C ij + C ji ) L ∂ L � S ii S jj S ii S jj − S 2 ij � ∂ d E ( x i , x j ) − ∂ d E ( x i , x l ) if ζ ijl ≥ 0 , ∂ζ ijl , ∂ L ∂ L ∂ L = 0 , otherwise . 21

  22. Hyperbolic Cayley-Klein LMNN To ensure S keeps correct signature ( 1 , n , 0 ) during the LMNN gradient descent, we decompose S = L ⊤ DL (with L ≻ 0) and perform a gradient descent on L with the following gradient: ∂ d H ( x i , x j ) k � S ij C ii + S ij � = C jj − ( C ij + C ji ) DL ∂ L � S ii S jj S 2 ij − S ii S jj Recall two difficulties of hyperbolic case compared to elliptical case: ◮ Hyperbolic Cayley-Klein distance may be very large (unbounded vs. < κπ for elliptical case) ◮ Data-set should be contained inside the compact domain D S 22

  23. Hyperbolic CK-LMNN: Initialization and learning rate � L ′ � ◮ Initialize L = and D so that P ∈ D S with 1 Σ − 1 = L ′⊤ L ′ (eg., precision matrix of P ).  − 1  ...   D =   − 1     κ max x � L ′ x � 2 with κ > 1. ◮ At iteration t , it may happen that P �∈ D S t since we do not know the optimal learning rate γ . When this happens, we reduce γ ← γ 2 , otherwise we let γ ← 1 . 01 γ . 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend