probability and statistics
play

Probability and Statistics for Computer Science Principal - PowerPoint PPT Presentation

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring the data in less dimensions Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.27.2020 Last time Review of Bayesian inference


  1. Probability and Statistics ì for Computer Science Principal Component Analysis --- Exploring the data in less dimensions Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.27.2020

  2. Last time ✺ Review of Bayesian inference ✺ Visualizing high dimensional data & Summarizing data ✺ The covariance matrix

  3. Objectives ✺ Principal Component Analysis ✺ Examples of PCA

  4. Diagonalization of a symmetric matrix ✺ If A is an n × n symmetric square matrix, the eigenvalues are real. ✺ If the eigenvalues are also disSnct, their eigenvectors are orthogonal ✺ We can then scale the eigenvectors to unit length, and place them into an orthogonal matrix U = [ u 1 u 2 …. u n ] ✺ We can write the diagonal matrix such Λ = U T AU that the diagonal entries of Λ are λ 1 , λ 2 … λ n in that order.

  5. Diagonalization example ✺ For � � 5 3 A = 3 5

  6. Covariance for a pair of components in a data set ✺ For the jth and kth components of a data set {x} i ( x ( j ) − mean ( { x ( j ) } ))( x ( k ) − mean ( { x ( k ) } )) T � cov ( { x } ; j, k ) = i i N

  7. Covariance matrix Data set { x } 7×8 { x } Covmat( ) 7×7 cov ( { x } ; 3 , 5) 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 1 * * * * * * * 1 * * * * * * * * 2 * * * * * * * 2 * * * * * * * * 3 * * * * * * * * 3 * * * * * * * { 4 * * * * * * * * 4 * * * * * * * 5 * * * * * * * * 5 * * * * * * * 6 * * * * * * * * 7 * * * * * * * * 6 * * * * * * * 7 * * * * * * *

  8. Properties of Covariance matrix { x } Covmat( ) cov ( { x } ; j, j ) = var ( { x ( j ) } ) 7×7 1 2 3 4 5 6 7 ✺ The diagonal elements 1 * * * * * * * of the covariance matrix 2 * * * * * * * are just variances of 3 * * * * * * * each jth components 4 * * * * * * * ✺ The off diagonals are 5 * * * * * * * covariance between 6 * * * * * * * different components 7 * * * * * * *

  9. Properties of Covariance matrix { x } Covmat( ) 7×7 cov ( { x } ; j, k ) = cov ( { x } ; k, j ) 1 2 3 4 5 6 7 ✺ The covariance 1 * * * * * * * matrix is symmetric ! 2 * * * * * * * ✺ And it’s posi6ve 3 * * * * * * * semi-definite , that is 4 * * * * * * * all λ i ≥ 0 5 * * * * * * * 6 * * * * * * * ✺ Covariance matrix is 7 * * * * * * * diagonalizable

  10. Properties of Covariance matrix { x } Covmat( ) ✺ If we define x c as the 7×7 mean centered 1 2 3 4 5 6 7 matrix for dataset {x} 1 * * * * * * * 2 * * * * * * * Covmat ( { x } ) = x c × x T 3 * * * * * * * c N 4 * * * * * * * 5 * * * * * * * ✺ The covariance 6 * * * * * * * matrix is a d×d matrix 7 * * * * * * * d =7

  11. Example: covariance matrix of a data set (I) What are the dimensions of the covariance matrix of this data? X (1) � � 5 4 3 2 1 A 0 = X (2) − 1 1 0 1 − 1 A) 2 by 2 B) 5 by 5 C) 5 by 2 D) 2 by 5

  12. Example: covariance matrix of a data set Mean centering A 2 = A 1 A T (II) (I) 1 � � 5 4 3 2 1 Inner product of each pairs: A 0 = [1,1] = 10 − 1 1 0 1 − 1 A 2 [2,2] = 4 A 2 � � 2 1 0 − 1 − 2 [1,2] = 0 A 2 A 1 = − 1 1 0 1 − 1 (III) Divide the matrix with N – the number of data poits � � � � = 1 N A 2 = 1 10 0 2 0 Covmat( ) { x } = 0 . 8 0 4 0 5

  13. What do the data look like when Covmat({x}) is diagonal? X (2) X (1) � � 5 4 3 2 1 A 0 = − 1 1 0 1 − 1 X (2) * * X (1) * * * � � � � { x } = 1 N A 2 = 1 10 0 2 0 Covmat( ) = 0 . 8 0 4 0 5

  14. What is the correlation between the 2 components for the data m? � � 20 25 Covmat ( m ) = 25 40

  15. Q. Is this true? Transforming a matrix with orthonormal matrix only rotates the data A. Yes B. No

  16. Dimension Reduction ✺ In stead of showing more dimensions through visualizaSon, it’s a good idea to do dimension reducSon in order to see the major features of the data set. ✺ For example, principal component analysis help find the major components of the data set. ✺ PCA is essenSally about finding eigenvectors of the covariance matrix of the data set {x}

  17. Dimension reduction from 2D to 1D Credit: Prof. Forsyth

  18. Step 1: subtract the mean Credit: Prof. Forsyth

  19. Step 2: Rotate to diagonalize the covariance Credit: Prof. Forsyth

  20. Step 3: Drop component(s) Credit: Prof. Forsyth

  21. Principal Components ✺ The columns of are the normalized eigenvectors of U the Covmat({x}) and are called the principal components of the data {x}

  22. Principal components analysis ✺ We reduce the dimensionality of dataset { x } represented by matrix from d to s (s < d). D d × n ✺ Step 1. define matrix such that m = D − mean ( D ) m d × n r i = U T m i ✺ Step 2. define matrix such that r d × n Λ = U T Covmat ( { x } ) U Λ Where saSsfies , is U T the diagonalizaSon of with the eigenvalues Covmat ( { x } ) sorted in decreasing order, is the orthonormal U eigenvectors’ matrix ✺ Step 3. Define matrix such that is with the last p p d × n r d-s components of made zero. r

  23. What happened to the mean? ✺ Step 1. mean ( m ) = mean ( D − mean ( D )) = 0 ✺ Step 2. mean ( r ) = U T mean ( m ) = U T 0 = 0 ✺ Step 3. mean ( p i ) = mean ( r i ) = 0 while i ∈ 1 : s mean ( p i ) = 0 while i ∈ s + 1 : d

  24. What happened to the covariances? ✺ Step 1. Covmat ( m ) = Covmat ( D ) = Covmat ( { x } ) ✺ Step 2. Covmat ( r ) = U T Covmat ( m ) U = Λ ✺ Step 3. is with the last/smallest d-s Λ Covmat ( p ) diagonal terms turned to 0.

  25. Sample covariance matrix ✺ In many staSsScal programs, the sample covariance matrix is defined to be Covmat ( m ) = m m T N − 1 ✺ Similar to what happens to the unbiased standard deviaSon

  26. PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. ✺ Step 3.

  27. PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 ✺ Step 3.

  28. PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ r = U T m = 1 . 440 − 0 . 052 − 1 . 311 − 1 . 389 2 . 752 − 1 . 440 ✺ Step 3.

  29. PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ r = U T m = 1 . 440 − 0 . 052 − 1 . 311 − 1 . 389 2 . 752 − 1 . 440 ✺ Step 3. � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ p = 0 0 0 0 0 0

  30. What is this matrix for the previous example? U T Covmat ( m ) U =?

  31. What is this matrix for the previous example? U T Covmat ( m ) U =? � � 57 0 0 3

  32. The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d 1 1 ∥ r i − p i ∥ 2 = ( r ( j ) � � � i ) 2 N − 1 N − 1 j = s +1 i i

  33. The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i

  34. The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i d var ( r ( j ) � = i ) j = s +1

  35. The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i d var ( r ( j ) � = i ) j = s +1 d � λ j = j = s +1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend