principal components analysis pca and singular value
play

Principal Components Analysis (PCA) and Singular Value Decomposition - PowerPoint PPT Presentation

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2018 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 1 / 40 Covariance Let X and Y be


  1. Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2018 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 1 / 40

  2. Covariance Let X and Y be random variables, possibly dependent. Recall that the covariance of X and Y is defined as � � Cov ( X , Y ) = E ( X − µ X )( Y − µ Y ) and that an alternate formula is Cov ( X , Y ) = E ( XY ) − E ( X ) E ( Y ) Previously we used Var ( X + Y ) = Var ( X ) + Var ( Y ) + 2 Cov ( X , Y ) and Var ( X 1 + X 2 + · · · + X n ) = Var ( X 1 ) + · · · + Var ( X n ) Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 2 / 40

  3. Covariance properties Covariance properties Cov ( X , X ) = Var ( X ) Cov ( X , Y ) = Cov ( Y , X ) Cov ( aX + b , cY + d ) = ac Cov ( X , Y ) Sign of covariance Cov ( X , Y ) = E (( X − µ X )( Y − µ Y )) When Cov ( X , Y ) is positive: there is a tendency to have X > µ X when Y > µ Y and vice-versa, and X < µ X when Y < µ Y and vice-versa. When Cov ( X , Y ) is negative: there is a tendency to have X > µ X when Y < µ Y and vice-versa, and X < µ X when Y > µ Y and vice-versa. When Cov ( X , Y ) = 0 : a) X and Y might be independent, but it’s not guaranteed. b) Var ( X + Y ) = Var ( X ) + Var ( Y ) Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 3 / 40

  4. Sample variance Variance of a random variable: σ 2 = Var ( X ) = E (( X − µ X ) 2 ) = E ( X 2 ) − ( E ( X )) 2 Sample variance from data x 1 , . . . , x n : � n � n 1 1 n � � s 2 = var ( x ) = x ) 2 = x i 2 x 2 ( x i − ¯ − n − 1 ¯ n − 1 n − 1 i = 1 i = 1 Vector formula: � � Centered data: M = x 1 − ¯ x 2 − ¯ x n − ¯ x x x · · · n − 1 = M M ′ s 2 = M · M n − 1 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 4 / 40

  5. Sample covariance Covariance between random variables X , Y : σ XY = Cov ( X , Y ) = E (( X − µ X )( Y − µ Y )) = E ( XY ) − E ( X ) E ( Y ) Sample covariance from data ( x 1 , y 1 ) , . . . , ( x n , y n ) : � n � n 1 1 n � � s XY = cov ( x , y ) = ( x i − ¯ x )( y i − ¯ y ) = − n − 1 ¯ x ¯ x i y i y n − 1 n − 1 i = 1 i = 1 Vector formula: � � = x 1 − ¯ x 2 − ¯ x n − ¯ M X x x x · · · � � = y 1 − ¯ y 2 − ¯ y n − ¯ M Y y y y · · · = M X M ′ s XY = M X · M Y Y n − 1 n − 1 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 5 / 40

  6. Covariance matrix For problems with many simultaneous random variables, put them into vectors:   � R � T   � � X = Y = U S V and then form a covariance matrix: � Cov ( R , T ) � Cov ( R , U ) Cov ( R , V ) Cov ( � X , � Y ) = Cov ( S , T ) Cov ( S , U ) Cov ( S , V ) In matrix/vector notation, � � Cov ( � X , � ( � X − E ( � ( � Y − E ( � Y )) ′ Y ) = E X )) � �������� �� �������� � � ���������� �� ���������� � � ����������� �� ����������� � 2 × 3 2 × 1 ( 3 × 1 ) ′ = 1 × 3 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 6 / 40

  7. Covariance matrix (a.k.a. Variance-Covariance matrix) Often there’s one vector with all the variables:   R   � X = S T Cov ( � Cov ( � X , � X ) = X ) � X )) ′ � ( � X − E ( � X )) ( � X − E ( � = E   Cov ( R , R ) Cov ( R , S ) Cov ( R , T )   = Cov ( S , R ) Cov ( S , S ) Cov ( S , T ) Cov ( T , R ) Cov ( T , S ) Cov ( T , T )   Var ( R ) Cov ( R , S ) Cov ( R , T )   = Cov ( R , S ) Var ( S ) Cov ( S , T ) Cov ( R , T ) Cov ( S , T ) Var ( T ) This matrix is symmetric (it equals its own transpose). The diagonal entries are ordinary variances. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 7 / 40

  8. Covariance matrix properties Cov ( � X , � Cov ( � Y , � X ) ′ Y ) = Cov ( A � X + � B , � A Cov ( � X , � Y ) = Y ) Cov ( � X , C � Y + � Cov ( � X , � Y ) C ′ D ) = Cov ( A � X + � A Cov ( � B ) = X ) A ′ Cov ( � X 1 + � X 2 , � Cov ( � X 1 , � Y ) + Cov ( � X 2 , � Y ) = Y ) Cov ( � X , � Y 1 + � Cov ( � X , � Y 1 ) + Cov ( � X , � Y 2 ) = Y 2 ) A , C are constant matrices, � B , � D are constant vectors, and all dimensions must be correct for matrix arithmetic. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 8 / 40

  9. Example (2D, but works for higher dimensions too) Data ( x 1 , y 1 ) , . . . , ( x 100 , y 100 ) : � x 1 · · · x 100 � � 3 . 0858 � 0 . 8806 9 . 8850 · · · 4 . 4106 M 0 = = 12 . 8562 10 . 7804 8 . 7504 · · · 13 . 5627 y 1 · · · y 100 (ri+inal data 20 1' 16 14 12 10 ' 6 4 2 0 ! 5 0 5 10 15 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 9 / 40

  10. Centered data (ri+inal data Centered data 20 10 1' 8 16 6 14 4 12 2 10 0 ' ! 2 6 ! 4 4 ! 6 2 ! 8 0 ! 5 0 5 10 15 ! 10 ! 5 0 5 10 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 10 / 40

  11. Computing sample covariance matrix Original data: 100 ( x , y ) points in a 2 × 100 matrix M 0 : � x 1 · · · x 100 � � 3 . 0858 � 0 . 8806 9 . 8850 · · · 4 . 4106 M 0 = = 12 . 8562 10 . 7804 8 . 7504 · · · 13 . 5627 y 1 · · · y 100 Centered data: subtract ¯ x from x ’s and ¯ y from y ’s to get M ; here x = 5 , ¯ ¯ y = 10 : � − 1 . 9142 � − 4 . 1194 4 . 8850 − 0 . 5894 · · · M = 2 . 8562 0 . 7804 − 1 . 2496 3 . 5627 · · · Sample covariance: � 31 . 9702 � M M ′ − 16 . 5683 = 100 − 1 = C − 16 . 5683 13 . 0018 � � � s XX � 2 s X s XY s XY = = 2 s YX s YY s XY s Y Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 11 / 40

  12. Orthonormal matrix Recall that for vectors � v , � w , we have � w = | � v | | � w | cos ( θ ) , v · � where θ is the angle between the vectors. Orthogonal means perpendicular. � v and � w are orthogonal when the angle between them is θ = 90 ◦ = π 2 radians. So cos ( θ ) = 0 and � w = 0 . v · � Vectors � v 1 , . . . , � v n are orthonormal when v j = 0 for i � j (different vectors are orthogonal) � v i · � v i = 1 for all i (each vector has length 1; they are all unit vectors) � v i · � � if i � j 0 In short: � v i · � v j = δ ij = if i = j . 1  , ˆ Example: ˆ ı , ˆ k (3D unit vectors along x , y , z axes) are orthonormal. These can be rotated into other orientations to give new “axes” in other directions; that will be our focus. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 12 / 40

  13. Orthonormal matrix Form an n × n matrix of orthonormal vectors � � V = � v 1 | · · · | � v n by loading n -dimensional column vectors into the columns of V . Transpose it to convert the vectors to row vectors:   v ′ � 1   v ′ � V ′ =   2 .  .  . v ′ � n ( V ′ V ) ij is the i th row of V ′ dotted with the j th column of V :   1 0 0 · · ·  0 1 0  · · · . . . ... ( V ′ V ) ij = � V ′ V = v j = δ ij v i · �   . . . . . . 0 0 1 · · · Thus, V ′ V = I ( n × n identity matrix), so V ′ = V − 1 . An n × n matrix V is orthonormal when V ′ V = I (or equivalently, VV ′ = I ), where I is the n × n identity matrix. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 13 / 40

  14. Diagonalizing the sample covariance matrix C = V ′ C V D � 31 . 9702 � � − 0 . 8651 − 0 . 5016 �� 41 . 5768 �� − 0 . 8651 � − 16 . 5683 0 . 5016 0 = − 16 . 5683 13 . 0018 0 . 5016 − 0 . 8651 3 . 3952 − 0 . 5016 − 0 . 8651 0 C is a real-valued symmetric matrix. It can be shown that: C can be diagonalized (recall not all matrices are diagonalizable); in the special form C = VDV ′ with V orthonormal, so V − 1 = V ′ ; all eigenvalues are real numbers � 0 . So we can put them on the diagonal of D in decreasing order: λ 1 � λ 2 � · · · � 0 . Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 14 / 40

  15. Diagonalizing the sample covariance matrix C Since C is symmetric, if � v is a right eigenvector with eigenvalue λ , v ′ is a left eigenvector with eigenvalue λ , and vice-versa: then � v ′ = ( C � v ) ′ = � v ′ C ′ = � v ′ C v = λ � so C � λ � v Diagonalization C = VDV − 1 loads right and left eigenvectors into V and V − 1 . Here those eigenvectors are transposes of each other, leading to the special form C = VDV ′ . Also, all eigenvalues are � 0 (“ C is positive semidefinite ”): For all vectors � w , w | 2 w ′ MM ′ � = | M ′ � w = � w w ′ C � � n − 1 � 0 n − 1 w | 2 . w ′ C � w ′ � Eigenvector equation C � w = λ � w gives � w = λ � w = λ | � w | 2 = � w ′ C � So λ | � w � 0 , giving λ � 0 . Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 15 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend