Principal Components Analysis (PCA) and Singular Value Decomposition - PowerPoint PPT Presentation

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2018 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 1 / 40

Covariance Let X and Y be random variables, possibly dependent. Recall that the covariance of X and Y is defined as � � Cov ( X , Y ) = E ( X − µ X )( Y − µ Y ) and that an alternate formula is Cov ( X , Y ) = E ( XY ) − E ( X ) E ( Y ) Previously we used Var ( X + Y ) = Var ( X ) + Var ( Y ) + 2 Cov ( X , Y ) and Var ( X 1 + X 2 + · · · + X n ) = Var ( X 1 ) + · · · + Var ( X n ) Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 2 / 40

Covariance properties Covariance properties Cov ( X , X ) = Var ( X ) Cov ( X , Y ) = Cov ( Y , X ) Cov ( aX + b , cY + d ) = ac Cov ( X , Y ) Sign of covariance Cov ( X , Y ) = E (( X − µ X )( Y − µ Y )) When Cov ( X , Y ) is positive: there is a tendency to have X > µ X when Y > µ Y and vice-versa, and X < µ X when Y < µ Y and vice-versa. When Cov ( X , Y ) is negative: there is a tendency to have X > µ X when Y < µ Y and vice-versa, and X < µ X when Y > µ Y and vice-versa. When Cov ( X , Y ) = 0 : a) X and Y might be independent, but it’s not guaranteed. b) Var ( X + Y ) = Var ( X ) + Var ( Y ) Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 3 / 40

Sample variance Variance of a random variable: σ 2 = Var ( X ) = E (( X − µ X ) 2 ) = E ( X 2 ) − ( E ( X )) 2 Sample variance from data x 1 , . . . , x n : � n � n 1 1 n � � s 2 = var ( x ) = x ) 2 = x i 2 x 2 ( x i − ¯ − n − 1 ¯ n − 1 n − 1 i = 1 i = 1 Vector formula: � � Centered data: M = x 1 − ¯ x 2 − ¯ x n − ¯ x x x · · · n − 1 = M M ′ s 2 = M · M n − 1 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 4 / 40

Sample covariance Covariance between random variables X , Y : σ XY = Cov ( X , Y ) = E (( X − µ X )( Y − µ Y )) = E ( XY ) − E ( X ) E ( Y ) Sample covariance from data ( x 1 , y 1 ) , . . . , ( x n , y n ) : � n � n 1 1 n � � s XY = cov ( x , y ) = ( x i − ¯ x )( y i − ¯ y ) = − n − 1 ¯ x ¯ x i y i y n − 1 n − 1 i = 1 i = 1 Vector formula: � � = x 1 − ¯ x 2 − ¯ x n − ¯ M X x x x · · · � � = y 1 − ¯ y 2 − ¯ y n − ¯ M Y y y y · · · = M X M ′ s XY = M X · M Y Y n − 1 n − 1 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 5 / 40

Covariance matrix For problems with many simultaneous random variables, put them into vectors:   � R � T   � � X = Y = U S V and then form a covariance matrix: � Cov ( R , T ) � Cov ( R , U ) Cov ( R , V ) Cov ( � X , � Y ) = Cov ( S , T ) Cov ( S , U ) Cov ( S , V ) In matrix/vector notation, � � Cov ( � X , � ( � X − E ( � ( � Y − E ( � Y )) ′ Y ) = E X )) � �� 2 × 3 2 × 1 ( 3 × 1 ) ′ = 1 × 3 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 6 / 40

Covariance matrix (a.k.a. Variance-Covariance matrix) Often there’s one vector with all the variables:   R   � X = S T Cov ( � Cov ( � X , � X ) = X ) � X )) ′ � ( � X − E ( � X )) ( � X − E ( � = E   Cov ( R , R ) Cov ( R , S ) Cov ( R , T )   = Cov ( S , R ) Cov ( S , S ) Cov ( S , T ) Cov ( T , R ) Cov ( T , S ) Cov ( T , T )   Var ( R ) Cov ( R , S ) Cov ( R , T )   = Cov ( R , S ) Var ( S ) Cov ( S , T ) Cov ( R , T ) Cov ( S , T ) Var ( T ) This matrix is symmetric (it equals its own transpose). The diagonal entries are ordinary variances. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 7 / 40

Covariance matrix properties Cov ( � X , � Cov ( � Y , � X ) ′ Y ) = Cov ( A � X + � B , � A Cov ( � X , � Y ) = Y ) Cov ( � X , C � Y + � Cov ( � X , � Y ) C ′ D ) = Cov ( A � X + � A Cov ( � B ) = X ) A ′ Cov ( � X 1 + � X 2 , � Cov ( � X 1 , � Y ) + Cov ( � X 2 , � Y ) = Y ) Cov ( � X , � Y 1 + � Cov ( � X , � Y 1 ) + Cov ( � X , � Y 2 ) = Y 2 ) A , C are constant matrices, � B , � D are constant vectors, and all dimensions must be correct for matrix arithmetic. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 8 / 40

Example (2D, but works for higher dimensions too) Data ( x 1 , y 1 ) , . . . , ( x 100 , y 100 ) : � x 1 · · · x 100 � � 3 . 0858 � 0 . 8806 9 . 8850 · · · 4 . 4106 M 0 = = 12 . 8562 10 . 7804 8 . 7504 · · · 13 . 5627 y 1 · · · y 100 (ri+inal data 20 1' 16 14 12 10 ' 6 4 2 0 ! 5 0 5 10 15 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 9 / 40

Centered data (ri+inal data Centered data 20 10 1' 8 16 6 14 4 12 2 10 0 ' ! 2 6 ! 4 4 ! 6 2 ! 8 0 ! 5 0 5 10 15 ! 10 ! 5 0 5 10 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 10 / 40

Computing sample covariance matrix Original data: 100 ( x , y ) points in a 2 × 100 matrix M 0 : � x 1 · · · x 100 � � 3 . 0858 � 0 . 8806 9 . 8850 · · · 4 . 4106 M 0 = = 12 . 8562 10 . 7804 8 . 7504 · · · 13 . 5627 y 1 · · · y 100 Centered data: subtract ¯ x from x ’s and ¯ y from y ’s to get M ; here x = 5 , ¯ ¯ y = 10 : � − 1 . 9142 � − 4 . 1194 4 . 8850 − 0 . 5894 · · · M = 2 . 8562 0 . 7804 − 1 . 2496 3 . 5627 · · · Sample covariance: � 31 . 9702 � M M ′ − 16 . 5683 = 100 − 1 = C − 16 . 5683 13 . 0018 � � � s XX � 2 s X s XY s XY = = 2 s YX s YY s XY s Y Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 11 / 40

Orthonormal matrix Recall that for vectors � v , � w , we have � w = | � v | | � w | cos ( θ ) , v · � where θ is the angle between the vectors. Orthogonal means perpendicular. � v and � w are orthogonal when the angle between them is θ = 90 ◦ = π 2 radians. So cos ( θ ) = 0 and � w = 0 . v · � Vectors � v 1 , . . . , � v n are orthonormal when v j = 0 for i � j (different vectors are orthogonal) � v i · � v i = 1 for all i (each vector has length 1; they are all unit vectors) � v i · � � if i � j 0 In short: � v i · � v j = δ ij = if i = j . 1  , ˆ Example: ˆ ı , ˆ k (3D unit vectors along x , y , z axes) are orthonormal. These can be rotated into other orientations to give new “axes” in other directions; that will be our focus. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 12 / 40

Orthonormal matrix Form an n × n matrix of orthonormal vectors � � V = � v 1 | · · · | � v n by loading n -dimensional column vectors into the columns of V . Transpose it to convert the vectors to row vectors:   v ′ � 1   v ′ � V ′ =   2 .  .  . v ′ � n ( V ′ V ) ij is the i th row of V ′ dotted with the j th column of V :   1 0 0 · · ·  0 1 0  · · · . . . ... ( V ′ V ) ij = � V ′ V = v j = δ ij v i · �   . . . . . . 0 0 1 · · · Thus, V ′ V = I ( n × n identity matrix), so V ′ = V − 1 . An n × n matrix V is orthonormal when V ′ V = I (or equivalently, VV ′ = I ), where I is the n × n identity matrix. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 13 / 40

Diagonalizing the sample covariance matrix C = V ′ C V D � 31 . 9702 � � − 0 . 8651 − 0 . 5016 �� 41 . 5768 �� − 0 . 8651 � − 16 . 5683 0 . 5016 0 = − 16 . 5683 13 . 0018 0 . 5016 − 0 . 8651 3 . 3952 − 0 . 5016 − 0 . 8651 0 C is a real-valued symmetric matrix. It can be shown that: C can be diagonalized (recall not all matrices are diagonalizable); in the special form C = VDV ′ with V orthonormal, so V − 1 = V ′ ; all eigenvalues are real numbers � 0 . So we can put them on the diagonal of D in decreasing order: λ 1 � λ 2 � · · · � 0 . Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 14 / 40

Diagonalizing the sample covariance matrix C Since C is symmetric, if � v is a right eigenvector with eigenvalue λ , v ′ is a left eigenvector with eigenvalue λ , and vice-versa: then � v ′ = ( C � v ) ′ = � v ′ C ′ = � v ′ C v = λ � so C � λ � v Diagonalization C = VDV − 1 loads right and left eigenvectors into V and V − 1 . Here those eigenvectors are transposes of each other, leading to the special form C = VDV ′ . Also, all eigenvalues are � 0 (“ C is positive semidefinite ”): For all vectors � w , w | 2 w ′ MM ′ � = | M ′ � w = � w w ′ C � � n − 1 � 0 n − 1 w | 2 . w ′ C � w ′ � Eigenvector equation C � w = λ � w gives � w = λ � w = λ | � w | 2 = � w ′ C � So λ | � w � 0 , giving λ � 0 . Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 15 / 40

Principal Components Analysis (PCA) and Singular Value Decomposition - PowerPoint PPT Presentation

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2018 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 1 / 40 Covariance Let X and Y be

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

PCA applied to bodies e 1 e 2 e 3 e 4 e 5 +4 4 Freifeld and Black, ECCV 2012 PCA

Principal Components Analysis (PCA) in Matlab Princi cipal C Compon onen ents An Analysis i

1 Principal Components Analysis (PCA) Review of basic setup: N vectors, { x 1 , . . .

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This

Principal Component Analysis (PCA) Dr. Veselina Kalinova Max Planck Institute for

Principal Components Analysis David Benjamin, Broad DSDE Methods February 10, 2016 What is PCA?

1 Principal Components Analysis (PCA) Suppose someone hands you a stack of N vectors, { x 1 ,

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

The game of go as a complex network The game of go as a complex network Bertrand Georgeot,

REAMIT Project RAC+WP+RSC Meeting 11-12 September 2019 University of Bedfordshire, Luton

r

Real-time Evolution of U ( 1 ) Chiral Charge Daniel Figueroa Adrien Florio Mikhail Shaposhnikov

PS 405 Week 7 Section: Interactions D.J. Flynn February 25, 2014 Todays plan Review:

On the Total Variation Distance of SMCs Giorgio Bacci, Giovanni Bacci, Kim G. Larsen, Radu

r st Pt

RegularRoute: An Efficient Detailed Router with Regular Routing Patterns Yanheng Zhang and Chris