kernel properties convexity
play

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila - PowerPoint PPT Presentation

Kernel Properties Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties - Convexity Kernel Properties Kernel Properties data is not linearly separable ! use feature vector of the data ( x ) in another


  1. Kernel Properties Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties - Convexity

  2. Kernel Properties Kernel Properties data is not linearly separable ! use feature vector of the data Φ ( x ) in another space we can even use infinite feature vectors because of the Kernel trick you will not have to explicitly compute the feature vectors Φ ( x ) . (you will Kernelize an algorithms in HW2). Leila Wehbe Kernel Properties - Convexity

  3. Kernel Properties Kernels dot product in feature space k ( x , x 0 ) = h Φ ( x ) , Φ ( x 0 ) i we can write the kernel in matrix form over the data sample: K ij = h Φ ( x ) , Φ ( x 0 ) i = k ( x , x 0 ) . This is called a Gram matrix. K is positive semi-definite, i.e. α K α � 0 for all α 2 R m and all kernel matrices K 2 R m ⇥ m . Proof (from class): m m X X α i α j K ij = α i α j h Φ ( x i ) , Φ ( x j ) i i , j i , j m m m α i Φ ( x i ) || 2 � 0 X X X = h α i Φ ( x i ) , α j Φ ( x j ) i = || i j i Leila Wehbe Kernel Properties - Convexity

  4. Kernel Properties Kernels by mercer’s theorem, any symmetric, square integrable function k : X ⇥ X ! R that satisfies Z k ( x , x 0 ) f ( x ) f ( x 0 ) dxdx 0 � 0 X ⇥ X there exist a feature space Φ ( x ) and a λ � 0 k ( x , x 0 ) = P i λ i φ i ( x ) φ i ( x 0 ) ( we have k ( x , x 0 ) = h Φ 0 ( x ) , Φ 0 ( x 0 ) i ) in discrete space: P P j K ( x i , x j ) c i c j i any Gram matrix derived of a kernel k is positive semi definite $ k is a valid kernel (dot product) Leila Wehbe Kernel Properties - Convexity

  5. Kernel Properties Exercices k ( x , x 0 ) is a valid kernel show that f ( x ) f ( x 0 ) k ( x , x 0 ) is a kernel Leila Wehbe Kernel Properties - Convexity

  6. Kernel Properties Exercices Answer: f ( x ) f ( y ) k ( x , y ) = f ( x ) f ( y ) < φ ( x ) , φ ( y ) > = < f ( x ) φ ( x ) , f ( y ) φ ( y ) > = < φ 0 ( x ) , φ 0 ( y ) > Leila Wehbe Kernel Properties - Convexity

  7. Kernel Properties Exercices k 1 ( x , x 0 ) , k 2 ( x , x 0 ) are valid kernels show that c 1 ⇤ k 1 ( x , x 0 ) + c 2 ⇤ k 2 ( x , x 0 ) , where c 1 , c 2 � 0 is a valid Kernel (multiple ways to show it) Leila Wehbe Kernel Properties - Convexity

  8. Kernel Properties Exercices Answer 1: For any function f ( . ) : Z x , x 0 f ( x ) f ( x 0 )[ c 1 k 1 ( x , x 0 ) + c 2 k 2 ( x , x 0 )] dx dx 0 Z Z x , x 0 f ( x ) f ( x 0 ) k 1 ( x , x 0 ) dx dx 0 + c 2 x , x 0 f ( x ) f ( x 0 ) k 2 ( x , x 0 ) dx dx 0 � 0 = c 1 x , x 0 f ( x ) f ( x 0 ) k 1 ( x , x 0 ) dx dx 0 � 0 and R since x , x 0 f ( x ) f ( x 0 ) k 2 ( x , x 0 ) dx dx 0 � 0 since k 1 and k 2 are valid kernels. R Leila Wehbe Kernel Properties - Convexity

  9. Kernel Properties Exercices Answer 2: Here is another way to prove it: Given any final set of instances { x 1 , . . . , x n } , let K 1 (resp., K 2 ) be the n ⇥ n Gram matrix associated with k 1 (resp., k 2 ). The Gram matrix associated with c 1 k 1 + c 2 k 2 is just K = c 1 K 1 + c 2 K 2 . K is PSD because any v 2 R n , v T ( c 1 K 1 + c 2 K 2 ) v = c 1 ( v T K 1 v ) + c 2 ( v T K 2 v ) � 0 as v T K 1 v � 0 and v T K 2 v � 0 follows from K 1 and K 2 being positive semi definite. k is a valid kernel. Leila Wehbe Kernel Properties - Convexity

  10. Kernel Properties Exercices Answer 3: let Φ 1 and Φ 2 be the feature vectors associated with k 1 and k 2 respectively. Take vector Φ which is the concatenation of p c 1 Φ 1 and p c 2 Φ 2 . i.e. Φ ( x ) = [ p c 1 φ 1 1 ( x ) , p c 1 φ 1 2 ( x ) , .... p c 1 φ 1 m ( x ) , p c 2 φ 2 1 ( x ) , p c 2 φ 2 2 ( x ) , .... p c 2 φ 2 m ( x )] . It’s easy to check that N m X X φ 1 i ( x ) ⇥ φ 1 h Φ ( x ) , Φ ( x 0 ) i = φ i ( x ) ⇥ φ i ( x 0 ) = c 1 i ( x 0 ) i = 1 i = 1 = c 1 h Φ 1 ( x ) , Φ 1 ( x 0 ) i + c 2 h Φ 2 ( x ) , Φ 2 ( x 0 ) i = c 1 k 1 ( x , x 0 ) + c 2 k 2 ( x , x 0 ) = k ( x , x 0 ) therefore k is a valid kernel. Leila Wehbe Kernel Properties - Convexity

  11. Kernel Properties Exercices k 1 , k 2 are valid kernels show that k 1 ( x , x 0 ) � k 2 ( x , x 0 ) is not necessarily a kernel Leila Wehbe Kernel Properties - Convexity

  12. Kernel Properties Exercices Proof by counter example: Consider the kernel k 1 being the identity ( k 1 ( x , x 0 ) = 1 iff x = x 0 and = 0 otherwise), and k 2 being twice the identity ( k 1 ( x , x 0 ) = 2 iff x = x 0 and = 0 otherwise). Let K 1 = I p be the p ⇥ p identity matrix and K p = 2 I p be 2 times that identity matrix. K 1 and K 2 are the Gram matrices associated with k 1 and k 2 respectively. Clearly both K 1 and K 2 are positive semi definite, however K 1 � K 2 = � I is not, as its eigenvalues are -1. Therefore k is not a valid kernel. Leila Wehbe Kernel Properties - Convexity

  13. Kernel Properties Exercices PSD matrices A and B show that AB is not necessarily PSD Leila Wehbe Kernel Properties - Convexity

  14. Kernel Properties Exercices for PSD matrices A and B , it suffices to show that AB is not ✓ 1 ✓ 2 ◆ ◆ 0 1 symmetric – so just use A = and B = ; here 0 2 1 2 ✓ 2 ◆ 1 AB = which is not symmetric. 2 4 Leila Wehbe Kernel Properties - Convexity

  15. Kernel Properties Exercices k 1 , k 2 are valid kernels show that the element wise product k ( x i , x j ) = k 1 ( x i , x j ) ⇥ k 2 ( x i , x j ) is a valid kernel. start by showing that if matrices A and B are PSD, then C ij = A ij ⇥ B ij is PSD Leila Wehbe Kernel Properties - Convexity

  16. Kernel Properties Exercices Answer: First show that C s.t. C ij = A ij ⇥ B ij is PSD: One way to show it: Any PSD matrix Q is a covariance matrix. 1 To see this, think of a p-dimensional random variable x with a covariance matrix I p , the identity matrix. ( Q is p ⇥ p ) Because Q is PSD it admits a non-negative symmetric 1 2 . square root Q Then: 1 1 1 1 1 2 = Q 2 = Q cov ( Q 2 x ) = Q 2 cov ( x )) Q 2 I Q And therefore Q is a covariance matrix. We also know that any covariance matrix is PSD. So given 2 A and B PSD, we know that they are covariance matrices. We want to show that C is also a covariance matrix and therefore PSD. Leila Wehbe Kernel Properties - Convexity

  17. Kernel Properties Exercices Let u = ( u 1 , . . . , u n ) T ⇠ N ( 0 p , A ) and 3 v = ( v 1 , . . . , v n ) T ⇠ N ( 0 p , B ) where 0 + p is a p-dimensional vector of zeros Define the vector w = ( u 1 v 1 , . . . , u n v n ) T 4 cov ( w ) = E [( w � µ w )( w � µ w ) T ] = E [ ww T ] This is because µ w i = 0 for all i . This is because u and v are independent so µ w = µ u ⇥ µ v = 0 p cov ( w ) i , j = E [ w i w T j ] = E [( u i v i )( u j v j )] = E [( u i u j )( v i v j )] = E [ u i u j ] E [ v i v j ] This is again because u and v are independent. cov ( w ) i , j = E [ u i u j ] E [ v i v j ] = A i , j ⇥ B i , j = C i , j Leila Wehbe Kernel Properties - Convexity

  18. Kernel Properties Exercices Therefore C is a covariance matrix and therefore PSD 5 Since any kernel matrix created from 6 k ( x i , x j ) = k 1 ( x i , x j ) ⇥ k 2 ( x i , x j ) is PSD, then k is PSD. Leila Wehbe Kernel Properties - Convexity

  19. Kernel Properties Exercices A is PSD show that A m is PSD Leila Wehbe Kernel Properties - Convexity

  20. Kernel Properties Exercices Answer: Recall A = UDU T First we show that A m = UD m U T . Proof by induction: trivially true for m = 1 . A m + 1 = AA m = UDU T ( UD m U T ) = UD ( U T U ) D m U T = UDD m U T = UD m + 1 U T Hence, the eigenvalues of A m are the diagonal elements of D m , which are λ m i (where { λ i } are the diagonal elements of D ). Since λ i � 0 , these eigenvalues λ m i are also � 0 . This means A m is PSD. Leila Wehbe Kernel Properties - Convexity

  21. Kernel Properties Exercices k ( x , x 0 ) is a valid kernel show that k ( x , y ) 2  k ( x , x ) k ( y , y ) Leila Wehbe Kernel Properties - Convexity

  22. Kernel Properties Exercices Answer: k ( x , y ) 2 = < φ ( x ) , φ ( y ) > 2 = || φ ( x ) || 2 || φ ( y ) || 2 ( cos ( θ φ ( x ) , φ ( y ) )) 2  || φ ( x ) || 2 || φ ( y ) || 2 = k ( x , x ) k ( y , y ) Leila Wehbe Kernel Properties - Convexity

  23. Convexity Unconstrained Convex Optimization Introduction to Convex Optimization Xuezhi Wang Computer Science Department Carnegie Mellon University 10701-recitation, Jan 29 Introduction to Convex Optimization

  24. Convexity Unconstrained Convex Optimization Outline Convexity 1 Convex Sets Convex Functions Unconstrained Convex Optimization 2 First-order Methods Newton’s Method Introduction to Convex Optimization

  25. Convexity Convex Sets Unconstrained Convex Optimization Convex Functions Outline Convexity 1 Convex Sets Convex Functions Unconstrained Convex Optimization 2 First-order Methods Newton’s Method Introduction to Convex Optimization

  26. Convexity Convex Sets Unconstrained Convex Optimization Convex Functions Convex Sets Definition For x , x 0 2 X it follows that λ x + ( 1 � λ ) x 0 2 X for λ 2 [ 0 , 1 ] Examples Empty set ; , single point { x 0 } , the whole space R n Hyperplane: { x | a > x = b } , halfspaces { x | a > x  b } Euclidean balls: { x | || x � x c || 2  r } + = { A 2 S n | A ⌫ 0 } ( S n is Positive semidefinite matrices: S n the set of symmetric n ⇥ n matrices) Introduction to Convex Optimization

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend