scientific computing
play

Scientific Computing Maastricht Science Program Week 5 Frans - PowerPoint PPT Presentation

Scientific Computing Maastricht Science Program Week 5 Frans Oliehoek <frans.oliehoek@maastrichtuniversity.nl> Announcements I will be more strict! Requirements updated... YOU are responsible that the submission satisfies the


  1. Scientific Computing Maastricht Science Program Week 5 Frans Oliehoek <frans.oliehoek@maastrichtuniversity.nl>

  2. Announcements  I will be more strict!  Requirements updated...  YOU are responsible that the submission satisfies the requirements!!!  I will not email you until the rest has their mark.

  3. Recap Last Two Week  Supervised Learning  find f that maps {x 1 (j) ,...,x D (j) } → y (j)  Interpolation  f goes through the data points  linear regression  lossy fit, minimizes 'vertical' SSE  Unsupervised Learning x 1  We just have data points {x 1 (j) ,...,x D (j) }  PCA  minimizes orthogonal projection u =( u 1, u 2 ) x 2

  4. Recap: Clustering  Clustering or Cluster Analysis has many applications  Understanding  Astronomy, Biology, etc.  Data (pre)processing  summarization of data set  compression  Are there questions about k-means clustering?

  5. This Lecture  Last week: unlabeled data (also 'unsupervised learning')  data: just x  Clustering  Principle Components analysis (PCA) – what?  This week  Principle Components analysis (PCA) – how?  Numerical differentiation and integration.

  6. Part 1: Principal Component Analysis ● Recap ● How to do it?

  7. PCA – Intuition  How would you summarize this data using 1 dimension? (what variable contains the most information?) x 2 Very important idea The most information is contained by the variable with the largest spread. ● i.e., highest variance (Information Theory) x 1

  8. PCA – Intuition  How would you summarize this data using 1 dimension? (what variable contains the most information?) so if we have to chose x 2 between x 1 and x 2 → remember x 2 Very important idea Transform of k -th point: ( k ) , x 2 ( k ) )→( z 1 ( k ) ) ( x 1 The most information is contained by the variable with where the largest spread. ( k ) = x 2 ( k ) ● i.e., highest variance z 1 (Information Theory) x 1

  9. PCA – Intuition  How would you summarize this data using 1 dimension? Transform of k -th point: x 2 ( k ) , x 2 ( k ) )→( z 1 ( k ) ) ( x 1 where z 1 is the orthogonal scalar projection on (unit vector) u (1) : ( k ) = u 1 ( 1 ) x 1 ( k ) + u 2 ( 1 ) x 2 ( k ) =( u ( 1 ) , x ( k ) ) z 1 u x 1

  10. More Principle Components  u (2) is the direction with most 'remaining' variance  orthogonal to u (1) ! x 2 In general ● If the data is D-dimensional ● We can find D directions ( 1 ) , ... ,u ( D ) u ● Each direction itself is a D-vector: ( i ) =( u 1, ( i ) ... ,u D ( i ) ) u x 1 ● Each direction is orthogonal to the others: ( i ) ,u ( j ) )= 0 ( u ● The first direction is has most variance ( D ) ● The least variance is in direction u

  11. PCA – Goals  All directions of high variance might be useful in itself  Analysis of data  In the lab you will analyze the ECG signal of a patient with a heart disease.

  12. PCA – Goals  All directions of high variance might be useful in itself  But not for dimension reduction...  Given X (N data points of D variables) → Convert to Z (N data points of d variables) ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z d ( 0 ) ) ( x 1 ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z d ( 1 ) ) ( x 1 ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z d ( n ) ) ( x 1 ( 0 ) , z i ( 1 ) , ... , z i ( n ) ) ( z i The vector is called the i -th principal component (of the data set)

  13. PCA – Dimension Reduction  Approach  Step 1: ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z D ( 0 ) ) ( x 1 ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z D ( 1 ) ) ( x 1  find all directions (and principal components) ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z D ( n ) ) ( x 1  Step 2: …?

  14. PCA – Dimension Reduction  Approach  Step 1: ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z D ( 0 ) ) ( x 1 ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z D ( 1 ) ) ( x 1  find all directions (and principal components) ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z D ( n ) ) ( x 1  Step 2: first d<D PCs contain  keep only the directions with most information! high variance. ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z d ( 0 ) ) ( x 1 → the principal components ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z d ( 1 ) ) with much information ( x 1 ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z d ( n ) ) ( x 1

  15. PCA – Dimension Reduction  Approach  Step 1: ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z D ( 0 ) ) ( x 1 ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z D ( 1 ) ) ( x 1  find all directions (and principal components) ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z D ( n ) ) ( x 1  Step 2: first d<D PCs contain  keep only the directions with most information! high variance. ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z d ( 0 ) ) ( x 1 → the principal components ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z d ( 1 ) ) with much information ( x 1 ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z d ( n ) ) ( x 1

  16. PCA – More Concrete  PCA  finding all the directions, and  principle components  Data compression using PCA  computing compressed representation  computing reconstruction

  17. PCA – More Concrete still to be shown  PCA (using eigen decomposition of cov. matrix)  finding all the directions, and  principle components Easy! for k -th point: ( k ) =( u ( j ) ,x ( k ) ) z j  Data compression using PCA  computing compressed representation  computing reconstruction Easy! For k -th point just keep ( k ) , ... , z d ( k ) ) ( z 1 still to be shown (we show that data is a linear combination of the PCs)

  18. PCA – More Concrete still to be shown  PCA (using eigen decomposition of cov. matrix)  finding all the directions, and  principle components Easy! for k -th point: ( k ) =( u ( j ) ,x ( k ) ) z j  Data compression using PCA  computing compressed representation  computing reconstruction Easy! For k -th point just keep ( k ) , ... , z d ( k ) ) ( z 1 still to be shown (we show that data is a linear combination of the PCs)

  19. Computing the directions U Note: X is now D x N (before N x D) Algorithm  X is the DxN data matrix 1)Preprocessing: ● scale the features ● make X zero mean 2)Compute the data covariance matrix 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  20. Computing the directions U Algorithm  X is the DxN data matrix ( k ) 2 x i ( k ) = x i 1)Preprocessing: ( l ) − min m x i ( l ) max l x i ● scale the features ● make X zero mean 2)Compute the data covariance matrix 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  21. Computing the directions U Algorithm  X is the DxN data matrix μ N − 1 ● Compute μ i = 1 N ∑ ( k ) x i (the mean data point) 1)Preprocessing: k = 1 ● scale the features ● subtract the mean ( k ) = x ( k ) −μ from each point x ● make X zero mean 2)Compute the data covariance matrix 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  22. Computing the directions U Algorithm  X is the DxN data matrix 1)Preprocessing: ● scale the features ● Data covariance matrix ● make X zero mean C = 1 T N XX 2)Compute the data covariance matrix 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  23. Computing the directions U Algorithm  X is the DxN data matrix ● A square matrix has eigenvectors: 1)Preprocessing: map to a multiple of themselves ● scale the features C x =λ x ● make X zero mean 2)Compute the eigenvector data covariance matrix (scalar) eigenvalue 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  24. Computing the directions U Algorithm  X is the DxN data matrix ● A square matrix has eigenvectors: 1)Preprocessing: map to a multiple of themselves ● scale the features C x =λ x [eigenvectors, eigenvals] = eig(C) ● make X zero mean % 'eig' delivers eigenvectors with % the wrong order 2)Compute the % so we flip the matrix U = fliplr(eigenvectors) eigenvector data covariance matrix (scalar) eigenvalue % U(i, :) now is the i-th direction 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  25. PCA – More Concrete still to be shown  PCA (using eigen decomposition of cov. matrix)  finding all the directions, and  principle components Easy! for k -th point: ( k ) =( u ( j ) ,x ( k ) ) z j  Data compression using PCA  computing compressed representation  computing reconstruction Easy! For k -th point just keep ( k ) , ... , z d ( k ) ) ( z 1 still to be shown (we show that data is a linear combination of the PCs)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend