Covariance Matrices and Covariance Operators in Machine Learning and - PowerPoint PPT Presentation

Affine-invariant Riemannian metric Riemannian metric: On the tangent space T P ( Sym ++ ( n )) ∼ = Sym ( n ) , the inner product � , � P is � V , W � P = � P − 1 / 2 VP − 1 / 2 , P − 1 / 2 WP − 1 / 2 � F = tr ( P − 1 VP − 1 W ) P ∈ Sym ++ ( n ) , V , W ∈ Sym ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 25 / 103

Affine-invariant Riemannian metric Geodesically complete Riemannian manifold, nonpositive curvature Unique geodesic joining A , B ∈ Sym ++ ( n ) γ AB ( t ) = A 1 / 2 ( A − 1 / 2 BA − 1 / 2 ) t A 1 / 2 γ AB ( 0 ) = A , γ AB ( 1 ) = B Riemannian (geodesic) distance d aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || F where log( A ) is the principal logarithm of A A = UDU T = U diag ( λ 1 , . . . , λ n ) U T log( A ) = U log( D ) U T = U diag (log λ 1 , . . . , log λ n ) U T H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 26 / 103

Affine-invariant Riemannian distance - Properties Affine-invariance d aiE ( CAC T , CBC T ) = d aiE ( A , B ) , any C invertible Scale invariance: C = √ sI , s > 0, d aiE ( sA , sB ) = d aiE ( A , B ) Unitary (orthogonal) invariance: CC T = I ⇐ ⇒ C − 1 = C T d aiE ( CAC − 1 , CBC − 1 ) = d aiE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 27 / 103

Affine-invariant Riemannian distance - Properties Invariance under inversion d aiE ( A − 1 , B − 1 ) = d aiE ( A , B ) ( Sym ++ ( n ) , d aiE ) is a complete metric space H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 28 / 103

Connection with Fisher-Rao metric Close connection with Fisher-Rao metric in information geometry (e.g. Amari 1985, 2016) For two multivariate Gaussian probability densities ρ 1 ∼ N ( µ, C 1 ) , ρ 2 ∼ N ( µ, C 2 ) d aiE ( C 1 , C 2 ) = 2 ( Fisher-Rao distance between ρ 1 and ρ 2 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 29 / 103

Affine-invariant Riemannian distance - Complexity For two matrices A , B ∈ Sym ++ ( n ) n � d 2 aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || 2 (log λ k ) 2 F = k = 1 where { λ k } n k = 1 are the eigenvalues of A − 1 / 2 BA − 1 / 2 A − 1 B or equivalently Matrix inversion, SVD, eigenvalue computation all have computational complexity O ( n 3 ) Therefore d aiE ( A , B ) has computational complexity O ( n 3 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 30 / 103

Affine-invariant Riemannian distance - Complexity For a set { A i } N i = 1 of N SPD matrices, consider computing all the pairwise distances d aiE ( A i , A j ) = || log( A − 1 / 2 A j A − 1 / 2 ) || F , 1 ≤ i , j ≤ N i i The matrices A i , A j are all coupled together The computational complexity required is O ( N 2 n 3 ) This is very large when N is large H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 31 / 103

Log-Euclidean metric Arsigny, Fillard, Pennec, Ayache (SIAM Journal on Matrix Analysis and Applications 2007) Another Riemannian metric on Sym ++ ( n ) Much faster to compute than the affine-invariant Riemannian distance on large sets of matrices Can be used to define many positive definite kernels on Sym ++ ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 32 / 103

Log-Euclidean metric Riemannian metric: On the tangent space T P ( Sym ++ ( n )) � V , W � P = � D log( P )( V ) , D log( P )( W ) � F P ∈ Sym ++ ( n ) , V , W ∈ Sym ( n ) where D log is the Fr´ echet derivative of the function log : Sym ++ ( n ) → Sym ( n ) D log( P ) : Sym ( n ) → Sym ( n ) is a linear map Explicit knowledge of � , � P is not necessary for computing geodesics and Riemannian distances H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 33 / 103

Log-Euclidean metric Unique geodesic joining A , B ∈ Sym ++ ( n ) γ AB ( t ) = exp[( 1 − t ) log( A ) + t log( B )] Riemannian (geodesic) distance d logE ( A , B ) = || log( A ) − log( B ) || F H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 34 / 103

Log-Euclidean distance - Complexity For two matrices A , B ∈ Sym ++ ( n ) d logE ( A , B ) = || log( A ) − log( B ) || F The computation of the log function, requiring an SVD, has computational complexity O ( n 3 ) Therefore d logE ( A , B ) has computational complexity O ( n 3 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 35 / 103

Log-Euclidean distance - Complexity For a set { A i } N i = 1 of N SPD matrices, consider computing all the pairwise distances d logE ( A i , A j ) = || log( A i ) − log( A j ) || F , 1 ≤ i , j ≤ N The matrices A i , A j are all uncoupled The computational complexity required is O ( Nn 3 ) This is much faster than the affine-invariant Riemannian distance d aiE when N is large H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 36 / 103

Log-Euclidean vector space Arsigny et al (2007): Log-Euclidean metric is a bi-invariant Riemannian metric associated with the Lie group operation ⊙ : Sym ++ ( n ) × Sym ++ ( n ) → Sym ++ ( n ) A ⊙ B = exp(log( A ) + log( B )) = B ⊙ A Bi-invariance: for any C ∈ Sym ++ ( n ) d logE [( A ⊙ C ) , ( B ⊙ C )] = d logE [( C ⊙ A ) , ( C ⊙ B )] = d logE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 37 / 103

Log-Euclidean vector space Arsigny et al (2007): scalar multiplication operation ⊛ : R × Sym ++ ( n ) → Sym ++ ( n ) λ ⊛ A = exp( λ log( A )) = A λ ( Sym ++ ( n ) , ⊙ , ⊛ ) is a vector space, with ⊙ acting as vector addition and ⊛ acting as scalar multiplication Sym ++ ( n ) under the Log-Euclidean metric is a Riemannian manifold with zero curvature H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 38 / 103

Log-Euclidean vector space Vector space isomorphism log : ( Sym ++ ( n ) , ⊙ , ⊛ ) → ( Sym ( n ) , + , · ) A → log( A ) The vector space ( Sym ++ ( n ) , ⊙ , ⊛ ) is not a subspace of the Euclidean vector space ( Sym ( n ) , + , · ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 39 / 103

Log-Euclidean inner product space Log-Euclidean inner product (Li, Wang, Zuo, Zhang, ICCV 2013) � A , B � logE = � log( A ) , log( B ) � F || A || logE = || log( A ) || F Log-Euclidean inner product space ( Sym ++ ( n ) , ⊙ , ⊛ , � , � logE ) Log-Euclidean distance d logE ( A , B ) = || log( A ) − log( B ) || F = || ( A ⊙ B − 1 ) || logE H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 40 / 103

Log-Euclidean vs. Euclidean Unitary (orthogonal) invariance CC T = I ⇐ ⇒ C T = C − 1 Euclidean distance d E ( CAC − 1 , CBC − 1 ) = || CAC − 1 − CBC − 1 || F = || A − B || F = d E ( A , B ) Log-Euclidean distance d logE ( CAC − 1 , CBC − 1 ) = || log( CAC − 1 ) − log( CBC − 1 ) || F = || log( A ) − log( B ) || F = d logE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 41 / 103

Log-Euclidean vs. Euclidean Log-Euclidean distance is scale-invariant d logE ( sA , sB ) = || log( sA ) − log( sB ) || F = || log( A ) − log( B ) || F = d logE ( A , B ) Euclidean distance is not scale-invariant d E ( sA , sB ) = s || A − B || F = sd E ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 42 / 103

Log-Euclidean vs. Euclidean Log-Euclidean distance is inversion-invariant d logE ( A − 1 , B − 1 ) = || log( A − 1 ) − log( B − 1 ) || = || − log( A ) + log( B ) || F = d logE ( A , B ) Euclidean distance is not inversion-invariant d E ( A − 1 , B − 1 ) = || A − 1 − B − 1 || F � = || A − B || F = d E ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 43 / 103

Log-Euclidean vs. Euclidean As metric spaces ( Sym ++ ( n ) , d E ) is incomplete ( Sym ++ ( n ) , d logE ) is complete H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 44 / 103

Log-Euclidean vs. Euclidean Summary of comparison The two metrics are fundamentally different Euclidean metric is extrinsic to Sym ++ ( n ) Log-Euclidean metric is intrinsic to Sym ++ ( n ) The vector space structures are fundamentally different They have different invariance properties H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 45 / 103

Geometry of SPD matrices Euclidean metric Set of SPD matrices viewed as a Riemannian manifold Affine-invariant Riemannian metric Log-Euclidean metric Set of SPD matrices viewed as a convex cone Log-Determinant divergences (symmetric Stein divergence) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 46 / 103

Alpha Log-Determinant divergences Chebbi and Moakher (Linear Algebra and Its Applications 2012) Ω = Sym ++ ( n ) , φ ( X ) = − log det( X ) 1 − α 2 log det( 1 − α 2 A + 1 + α 2 B ) 4 d α logdet ( A , B ) = 1 − α 1 + α det( A ) det( B ) 2 2 − 1 < α < 1 Limiting cases d 1 α → 1 d α logdet ( A , B ) = tr ( B − 1 A − I ) − log det( B − 1 A ) logdet ( A , B ) = lim (Burg divergence) d − 1 α →− 1 d α logdet ( A , B ) = tr ( A − 1 B − I ) − log det( A − 1 B ) logdet ( A , B ) = lim H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 47 / 103

Alpha Log-Determinant divergences α = 0: Symmetric Stein divergence (also called S -divergence) � � A + B � � − 1 d 0 = 4 d 2 logdet ( A , B ) = 4 log 2 log det( AB ) stein ( A , B ) 2 Sra (NIPS 2012): � � A + B � − 1 d stein ( A , B ) = log 2 log det( AB ) 2 is a metric (satisfying positivity, symmetry, and triangle inequality) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 48 / 103

Outline Covariance matrices Covariance matrix representation in computer vision Geometry of SPD matrices Kernel methods on covariance matrices H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 49 / 103

Positive Definite Kernels X any nonempty set K : X × X → R is a (real-valued) positive definite kernel if it is symmetric and N � a i a j K ( x i , x j ) ≥ 0 i , j = 1 for any finite set of points { x i } N i = 1 ∈ X and real numbers { a i } N i = 1 ∈ R . [ K ( x i , x j )] N i , j = 1 is symmetric positive semi-definite H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 50 / 103

Reproducing Kernel Hilbert Spaces K a positive definite kernel on X × X . For each x ∈ X , there is a function K x : X → R , with K x ( t ) = K ( x , t ) . N � H K = { a i K x i : N ∈ N } i = 1 with inner product � � � � a i K x i , b j K y j � H K = a i b j K ( x i , y j ) i j i , j H K = RKHS associated with K (unique). H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 51 / 103

Reproducing Kernel Hilbert Spaces Reproducing property : for each f ∈ H K , for every x ∈ X f ( x ) = � f , K x � H K Abstract theory due to Aronszajn (1950) Numerous applications in machine learning (kernel methods) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 52 / 103

Examples: RKHS Polynomial kernels K ( x , y ) = ( � x , y � + c ) d , c ≥ 0 , d ∈ N , x , y ∈ R n The Gaussian kernel K ( x , y ) = exp( − | x − y | 2 ) on R n induces the σ 2 space � 1 σ 2 | ξ | 2 | � H K = {|| f || 2 f ( ξ ) | 2 d ξ < ∞} . H K = ( 2 π ) n ( σ √ π ) n R n e 4 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 53 / 103

Kernels with Log-Euclidean metric Positive definite kernels on Sym ++ ( n ) defined with the Log-Euclidean inner product � , � logE and norm || || logE Polynomial kernels K ( A , B ) = ( � A , B � logE + c � ) d = ( � log( A ) , log( B ) � F + c ) d , d ∈ N , c ≥ 0 Gaussian and Gaussian-like kernels K ( A , B ) = exp( − 1 σ 2 || ( A ⊙ B − 1 ) || p logE ) , 0 < p ≤ 2 = exp( − 1 σ 2 || log( A ) − log( B ) || p F ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 54 / 103

Kernel methods with Log-Euclidean metric S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. CVPR 2013. S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on Riemannian manifolds with Gaussian RBF kernels, PAMI 2015. P . Li, Q. Wang, W. Zuo, and L. Zhang. Log-Euclidean kernels for sparse representation and dictionary learning, ICCV 2013 D. Tosato, M. Spera, M. Cristani, and V. Murino. Characterizing humans on Riemannian manifolds, PAMI 2013 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 55 / 103

Kernel methods with Log-Euclidean metric for image classification H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 56 / 103

Material classification Example: KTH-TIPS2b data set � � � � � �� G 0 , 0 ( x , y ) � , . . . � G 3 , 4 ( x , y ) f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 57 / 103

Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 58 / 103

Numerical results Better results with covariance operators (Part II)! Method KTH-TIPS2b ETH-80 E 55.3% 64.4% ( ± 7 . 6 % ) ( ± 0 . 9 % ) Stein 73.1% 67.5% ( ± 8 . 0 % ) ( ± 0 . 4 % ) Log-E 74.1 % 71.1% ( ± 7 . 4 % ) ( ± 1 . 0 % ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 59 / 103

Comparison of metrics Results from Cherian et al (PAMI 2013) using Nearest Neighbor Method Texture Activity Affine-invariant 85.5% 99.5% Stein 85.5% 99.5% Log-E 82.0% 96.5% Texture: images from Brodatz and CURET datasets Activity: videos from Weizmann, KTH, and UT Tower datasets H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 60 / 103

Outline Covariance operators Covariance operator representation in computer vision Geometry of covariance operators Kernel methods on covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 61 / 103

Covariance operator representation - Motivation Covariance matrices encode linear correlations of input features Nonlinearization Map original input features into a high (generally infinite) 1 dimensional feature space (via kernels) Covariance operators: covariance matrices of infinite-dimensional 2 features Encode nonlinear correlations of input features 3 Provide a richer, more expressive representation of the data 4 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 62 / 103

Covariance operator representation S.K. Zhou and R. Chellappa. From sample similarity to ensemble similarity: Probabilistic distance measures in reproducing kernel Hilbert space, PAMI 2006 M. Harandi, M. Salzmann, and F . Porikli. Bregman divergences for infinite-dimensional covariance matrices, CVPR 2014 H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 63 / 103

Positive definite kernels, feature map, and feature space K = positive definite kernels on X × X H K = corresponding RKHS Geometric viewpoint from machine learning Positive definite kernel K on X × X induces feature map Φ : X → H K Φ( x ) = K x ∈ H K , H K = feature space � Φ( x ) , Φ( y ) � H K = � K x , K y � H K = K ( x , y ) Kernelization: Transform linear algorithm depending on � x , y � R n into nonlinear algorithms depending on K ( x , y ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 64 / 103

RKHS covariance operators ρ = Borel probability distribution on X , with � � || Φ( x ) || 2 H K d ρ ( x ) = K ( x , x ) d ρ ( x ) < ∞ X X RKHS mean vector � µ Φ = E ρ [Φ( x )] = Φ( x ) d ρ ( x ) ∈ H K X H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 65 / 103

RKHS covariance operators RKHS covariance operator C Φ : H K → H K C Φ = E ρ [(Φ( x ) − µ ) ⊗ (Φ( x ) − µ )] � = Φ( x ) ⊗ Φ( x ) d ρ ( x ) − µ ⊗ µ X H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 66 / 103

Empirical mean and covariance X = [ x 1 , . . . , x m ] = data matrix randomly sampled from X according to ρ , with m observations Informally, Φ gives an infinite feature matrix in the feature space H K , of size dim( H K ) × m Φ( X ) = [Φ( x 1 ) , . . . , Φ( x m )] Formally, Φ( X ) : R m → H K is the bounded linear operator m � w ∈ R m Φ( X ) w = w i Φ( x i ) , i = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 67 / 103

Empirical mean and covariance Theoretical RKHS mean � µ Φ = Φ( x ) d ρ ( x ) ∈ H K X Empirical RKHS mean m � µ Φ( X ) = 1 Φ( x i ) = 1 m Φ( X ) 1 m ∈ H K m i = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 68 / 103

Empirical mean and covariance Theoretical covariance operator C Φ : H K → H K � C Φ = Φ( x ) ⊗ Φ( x ) d ρ ( x ) − µ ⊗ µ X Empirical covariance operator C Φ( x ) : H K → H K m � C Φ( X ) = 1 Φ( x i ) ⊗ Φ( x i ) − µ Φ( X ) ⊗ µ Φ( X ) m i = 1 = 1 m Φ( X ) J m Φ( X ) ∗ J m = I m − 1 m 1 m 1 T m = centering matrix H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 69 / 103

Covariance operator representation of images Given an image F (or a patch in F ), at each pixel, extract a feature vector (e.g. intensity, colors, filter responses etc) Each image corresponds to a data matrix X X = [ x 1 , . . . , x m ] = n × m matrix where m = number of pixels, n = number of features at each pixel Define a kernel K , with corresponding feature map Φ and feature matrix Φ( X ) = [Φ( x 1 ) , . . . , Φ( x m )] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 70 / 103

Covariance operator representation of images Each image is represented by covariance operator C Φ( X ) = 1 m Φ( X ) J m Φ( X ) ∗ This representation is implicit, since Φ is generally implicit Computations are carried out via Gram matrices H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 71 / 103

Infinite-dimensional generalization of Sym ++ ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 72 / 103

Affine-invariant Riemannian metric Affine-invariant Riemannian metric: Larotonda (2005), Larotonda (2007), Andruchow and Varela (2007), Lawson and Lim (2013) Larotonda, Nonpositive curvature: A geometrical approach to Hilbert-Schmidt operators, Differential Geometry and Its Applications , 2007 In the setting of RKHS covariance operators H.Q.M. Affine-invariant Riemannian distance between infinite-dimensional covariance operators, Geometric Science of Information , 2015 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 74 / 103

Log-Determinant divergences Zhou and Chellappa (PAMI 2006), Harandi et al (CVPR 214): finite-dimensional RKHS H.Q.M. Infinite-dimensional Log-Determinant divergences between positive definite trace class operators, Linear Algebra and its Applications , 2017 H.Q.M. Log-Determinant divergences between positive definite Hilbert-Schmidt operators, Geometric Science of Information , 2017 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 75 / 103

Log Hilbert-Schmidt metric H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 76 / 103

Distances between positive definite operators Larotonda (2007): generalization of the manifold Sym ++ ( n ) of SPD matrices to the infinite-dimensional Hilbert manifold Σ( H ) = { A + γ I > 0 : A ∗ = A , A ∈ HS ( H ) , γ ∈ R } Hilbert-Schmidt operators on the Hilbert space H ∞ � || Ae k || 2 < ∞} HS ( H ) = { A : || A || 2 HS = tr ( A ∗ A ) = k = 1 for any orthonormal basis { e k } ∞ k = 1 Hilbert-Schmidt inner product (generalizing Frobenius inner product � A , B � F = tr ( A T B ) ) � ∞ � ∞ � A , B � HS = tr ( A ∗ B ) = � e k , A ∗ Be k � = � Ae k , Be k � k = 1 k = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 77 / 103

Distances between positive definite operators On the infinite-dimensional manifold Σ( H ) Larotonda (2007): Infinite-dimensional affine-invariant Riemannian distance H.Q. Minh et al (2014): Log-Hilbert-Schmidt distance, infinite-dimensional generalization of Log-Euclidean distance H.Q. Minh (2017): Infinite-dimensional Log-Determinant divergences H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 78 / 103

Log-Hilbert-Schmidt distance Generalizing Log-Euclidean distance d logE ( A , B ) = || log( A ) − log( B ) || Log-Hilbert-Schmidt distance d logHS [( A + γ I ) , ( B + ν I )] = || log( A + γ I ) − log( B + ν I ) || eHS Extended Hilbert-Schmidt norm || A + γ I || 2 eHS = || A || 2 HS + γ 2 Extended Hilbert-Schmidt inner product � A + γ I , B + ν I � = � A , B � HS + γν H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 79 / 103

Log-Hilbert-Schmidt distance Why log( A + γ I ) ? Why extended Hilbert-Schmidt norm? A ∈ Sym ++ ( n ) , with eigenvalues { λ k } n k = 1 and orthonormal eigenvectors { u k } n k = 1 n n � � λ k u k u T log( λ k ) u k u T A = k , log( A ) = k k = 1 k = 1 A : H → H self-adjoint, positive, compact operator, with eigenvalues { λ k } ∞ k = 1 , λ k > 0 , lim k →∞ λ k = 0, and orthonormal eigenvectors { u k } ∞ k = 1 ∞ � A = λ k ( u k ⊗ u k ) , ( u k ⊗ u k ) w = � u k , w � u k k = 1 ∞ � log( A ) = log( λ k )( u k ⊗ u k ) , k →∞ log( λ k ) = −∞ lim k = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 80 / 103

Log-Hilbert-Schmidt distance Why log( A + γ I ) ? Why extended Hilbert-Schmidt norm? log( A ) is unbounded log( A + γ I ) is bounded Hilbert-Schmidt norm ∞ � [log( λ k + γ )] 2 = ∞ if γ � = 1 || log( A + γ I ) || 2 HS = j = 1 The extended Hilbert-Schmidt norm eHS = || log( A || log( A + γ I ) || 2 γ + I ) || 2 HS + (log γ ) 2 ∞ � [log( λ k γ + 1 )] 2 + (log γ ) 2 < ∞ = j = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 81 / 103

Log-Hilbert-Schmidt metric Generalization from Sym ++ ( n ) to Σ( H ) ⊙ : Σ( H ) × Σ( H ) → Σ( H ) ( A + γ I ) ⊙ ( B + ν I ) = exp[log( A + γ I ) + log( B + ν I )] ⊛ : R × Σ( H ) → Σ( H ) λ ⊛ ( A + γ I ) = exp[ λ log( A + γ I )] = ( A + γ I ) λ , λ ∈ R (Σ( H ) , ⊙ , ⊛ ) is a vector space ⊙ acting as vector addition ⊛ acting as scalar multiplication H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 82 / 103

Log-Hilbert-Schmidt metric (Σ( H ) , ⊙ , ⊛ ) is a vector space Log-Hilbert-Schmidt inner product � A + γ I , B + ν I � logHS = � log( A + γ I ) , log( B + ν I ) � eHS || A + γ I || logHS = || log( A + γ I ) || eHS (Σ( H ) , ⊙ , ⊛ , � , � logHS ) is a Hilbert space Log-Hilbert-Schmidt distance is the Hilbert distance d logHS ( A + γ I , B + ν I ) = || log( A + γ I ) − log( B + ν I ) || eHS = || ( A + γ I ) ⊙ ( B + ν I ) − 1 || logHS H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 83 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators The distance d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] �� 1 � � 1 �� m Φ( X ) J m Φ( X ) ∗ + γ I H K m Φ( Y ) J m Φ( Y ) ∗ + ν I H K = d logHS , has a closed form in terms of m × m Gram matrices K [ X ] = Φ( X ) ∗ Φ( X ) , ( K [ X ]) ij = K ( x i , x j ) , K [ Y ] = Φ( Y ) ∗ Φ( Y ) , ( K [ Y ]) ij = K ( y i , y j ) , K [ X , Y ] = Φ( X ) ∗ Φ( Y ) , ( K [ X , Y ]) ij = K ( x i , y j ) K [ Y , X ] = Φ( Y ) ∗ Φ( X ) , ( K [ Y , x ]) ij = K ( y i , x j ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 84 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators 1 1 γ mJ m K [ X ] J m = U A Σ A U T µ mJ m K [ Y ] J m = U B Σ B U T A , B , 1 A ∗ B = √ γµ mJ m K [ X , Y ] J m C AB = 1 T N A log( I N A + Σ A )Σ − 1 A ( U T A A ∗ BU B ◦ U T A A ∗ BU B )Σ − 1 B log( I N B + Σ B ) 1 N B H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 85 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) = ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt distance between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = tr [log( I N A + Σ A )] 2 + tr [log( I N B + Σ B )] 2 d 2 − 2 C AB + (log γ − log ν ) 2 The Log-Hilbert-Schmidt inner product between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = C AB + (log γ )(log ν ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 86 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) = ∞ . Let γ > 0 . The Log-Hilbert-Schmidt norm of the operator ( C Φ( X ) + γ I H K ) is logHS = tr [log( I N A + Σ A )] 2 + (log γ ) 2 || ( C Φ( X ) + γ I H K ) || 2 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 87 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) < ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt distance between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is d 2 logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = tr [log( I N A + Σ A )] 2 + tr [log( I N B + Σ B )] 2 − 2 C AB + 2 (log γ ν )( tr [log( I N A + Σ A )] − tr [log( I N B + Σ B )]) + (log γ − log ν ) 2 dim( H K ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 88 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) < ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt inner product between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = C AB + (log ν ) tr [log( I N A + Σ A )] + (log γ ) tr [log( I N B + Σ B )] + (log γ log ν )dim( H K ) The Log-Hilbert-Schmidt norm of ( C Φ( X ) + γ I H K ) is logHS = tr [log( I N A + Σ A )] 2 + 2 (log γ ) tr [log( I N A + Σ A )] || ( C Φ( X ) + γ I H K ) || 2 + (log γ ) 2 dim( H K ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 89 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Special case For linear kernel K ( x , y ) = � x , y � , x , y ∈ R n d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = d logE [( C X + γ I n ) , ( C Y + ν I n )] � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = � ( C X + γ I n ) , ( C Y + ν I n ) � logE || ( C X + γ I H K ) || logHS = || ( C X + γ I n ) || logE These can be used to verify the correctness of an implementation H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 90 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators For m ∈ N fixed, γ � = ν , dim( H K ) →∞ d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = ∞ lim In general, the infinite-dimensional formulation cannot be approximated by the finite-dimensional counterpart. H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 91 / 103

Kernels with Log-Hilbert-Schmidt metric (Σ( H ) , ⊙ , ⊛ , � , � logHS ) is a Hilbert space Theorem (H.Q.M. et al - NIPS 2014) The following kernels K : Σ( H ) × Σ( H ) → R are positive definite K [( A + γ I ) , ( B + ν I )] = ( c + � A + γ I , B + ν I � logHS ) d c ≥ 0 , d ∈ N K [( A + γ I ) , ( B + ν I )] = exp( − 1 σ 2 || log( A + γ I ) − log( B + ν I ) || p eHS ) 0 < p ≤ 2 , σ � = 0 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 93 / 103

Two-layer kernel machine with Log-Hilbert-Schmidt metric First layer: kernel K 1 , inducing covariance operators 1 Second layer: kernel K 2 , defined using the Log-Hilbert-Schmidt 2 distance or inner product between the covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 94 / 103

Two-layer kernel machine with Log-Hilbert-Schmidt metric H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 95 / 103

Material classification Example: KTH-TIPS2b data set (Caputo et al, ICCV , 2005) � � � � �� G 0 , 0 ( x , y ) � , . . . � G 3 , 4 ( x , y ) f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 96 / 103

Material classification Method KTH-TIPS2b E 55.3% ( ± 7 . 6 % ) Stein 73.1% ( ± 8 . 0 % ) Log-E 74.1 % ( ± 7 . 4 % ) HS 79.3% ( ± 8 . 2 % ) Log-HS 81.9% ( ± 3 . 3 % ) Log-HS (CNN) 96.6% ( ± 3 . 4 % ) CNN features = MatConvNet features H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 97 / 103

Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 98 / 103

Approximate methods for reducing computational complexity M. Faraki, M. Harandi, and F . Porikli, Approximate infinite-dimensional region covariance descriptors for image classification, ICASSP 2015 H.Q. Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 Q. Wang, P . Li, W. Zuo, and L. Zhang. RAID-G: Robust estimation of approximate infinite-dimensional Gaussian with application to material recognition, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 99 / 103

Object recognition Results obtained using approximate Log-HS distance Method ETH-80 64.4%( ± 0 . 9 % ) E Stein 67.5% ( ± 0 . 4 % ) 71.1%( ± 1 . 0 % ) Log-E HS 93.1 % ( ± 0 . 4) 95.0% ( ± 0 . 5 % ) Approx-LogHS H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 100 / 103

Covariance Matrices and Covariance Operators in Machine Learning and - PowerPoint PPT Presentation

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A geometrical framework H` a Quang Minh Pattern Analysis and Computer Vision (PAVIS) Istituto Italiano di Tecnologia, ITALY November 13, 2017 H.Q. Minh

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

VHDL VHDL - Flaxer Eli Ch 5 - 1 Operators and Attributes Outline Logical Operators

On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

More Self-study Operators Unary operators, sizeof, boolean operators, comma, and operators

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer

Covariance Matrices & All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

Transformations and Matrices Transformations I Transformations are functions Matrices

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &

The multivariate normal distribution Anders Ringgaard Kristensen Department of Large Animal

Fitting with FD covariance matrices Seb Jones Department of Physics & Astronomy University

GAN Frontiers/Related Methods Improving GAN Training Improved Techniques for Training GANs

Detecting Changes and Anomalies in Noisy Text Streams Jerry Wright Networking and Services

Business Statistics CONTENTS Key questions Roadmaps for statistical tests A decision tree Old

Lecture 8. Models for Count Response Nan Ye School of Mathematics and Physics University of

FE65-P2 Timing Dispersion Student Instrumentation Meeting Katie Dunne Dec 2 , 2016 FE65-P2:

Covariance Matrices and Covariance Operators in Machine Learning and - PowerPoint PPT Presentation

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A geometrical framework H` a Quang Minh Pattern Analysis and Computer Vision (PAVIS) Istituto Italiano di Tecnologia, ITALY November 13, 2017 H.Q. Minh

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

VHDL VHDL - Flaxer Eli Ch 5 - 1 Operators and Attributes Outline Logical Operators

On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

More Self-study Operators Unary operators, sizeof, boolean operators, comma, and operators

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer

Covariance Matrices &amp; All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal &amp; spectral matrices) by

Transformations and Matrices Transformations I Transformations are functions Matrices

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &amp;

The multivariate normal distribution Anders Ringgaard Kristensen Department of Large Animal

Fitting with FD covariance matrices Seb Jones Department of Physics &amp; Astronomy University

GAN Frontiers/Related Methods Improving GAN Training Improved Techniques for Training GANs

Detecting Changes and Anomalies in Noisy Text Streams Jerry Wright Networking and Services

Business Statistics CONTENTS Key questions Roadmaps for statistical tests A decision tree Old

Lecture 8. Models for Count Response Nan Ye School of Mathematics and Physics University of

FE65-P2 Timing Dispersion Student Instrumentation Meeting Katie Dunne Dec 2 , 2016 FE65-P2:

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Covariance Matrices & All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &

Fitting with FD covariance matrices Seb Jones Department of Physics & Astronomy University