Lecture 24: Principal Component Analysis Aykut Erdem January 2017 - PowerPoint PPT Presentation

Lecture 24: − Principal Component Analysis Aykut Erdem January 2017 Hacettepe University

This week • Motivation • PCA algorithms • Applications • PCA shortcomings • Autoencoders • Kernel PCA 2

PCA Applications • Data Visualization • Data Compression • Noise Reduction • Learning • Anomaly detection slide by Barnabás Póczos and Aarti Singh 3

Data Visualization Example: • Given 53 blood and urine samples (features) from 65 people. • How can we visualize the measurements? slide by Barnabás Póczos and Aarti Singh 4

Data Visualization • Matrix format (65x53) • H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC H-MCHC A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000 A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000 Instances A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000 A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000 A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000 A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000 A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000 A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000 A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000 slide by Barnabás Póczos and Aarti Singh Features Difficult to see the correlations between the features... 5 6

Data Visualization • • Spectral format (65 curves, one for each person) 1000 900 800 700 600 Value 500 400 300 slide by Barnabás Póczos and Aarti Singh 200 100 0 0 10 20 30 40 50 60 measurement Measurement Difficult to compare the different patients... 6

Data Visualization • Spectral format (53 pictures, one for each feature) • 1.8 1.6 1.4 1.2 H-Bands 1 0.8 0.6 0.4 slide by Barnabás Póczos and Aarti Singh 0.2 0 0 10 20 30 40 50 60 70 Person Difficult to see the correlations between the features... 8 7

Data Visualization Bi-variate Tri-variate 550 500 4 450 400 3 C-LDH M-EPI 350 2 300 1 250 200 0 600 150 100200300400500 400 100 200 C-LDH 50 slide by Barnabás Póczos and Aarti Singh 0 0 0 50 150 250 350 450 C-Triglycerides C-Triglycerides How can we visualize the other variables??? … ¡difficult ¡to ¡see ¡in ¡4 ¡or ¡higher ¡dimensional ¡spaces... 8 9

Data Visualization • Is there a representation better than the coordinate axes?   • Is it really necessary to show all the 53 dimensions? - ... what if there are strong correlations between the features?   • How could we find the smallest subspace of the 53-D space that keeps the most information about slide by Barnabás Póczos and Aarti Singh the original data?   • A solution: Principal Component Analysis 9

PCA algorithms 10

Principal Component Analysis PCA: Orthogonal projection of the data onto a lower- dimension linear space that... • maximizes variance of projected data (purple line)   � slide by Barnabás Póczos and Aarti Singh � • minimizes mean squared distance between • - data point and • - projections (sum of blue lines) 11

Principal Component Analysis Idea: • Given data points in a d-dimensional space, project them into a lower dimensional space while preserving as much information as possible. - Find best planar approximation to 3D data - Find best 12-D approximation to 10 4 -D data   slide by Barnabás Póczos and Aarti Singh • In particular, choose projection that   minimizes squared error   in reconstructing the original data. 12

Principal Component Analysis • PCA Vectors originate from the center of mass.   • Principal component #1: points in the direction of the largest variance .   • Each subsequent principal component - is orthogonal to the previous ones, and slide by Barnabás Póczos and Aarti Singh - points in the directions of the largest variance of the residual subspace 13

2D Gaussian dataset slide by Barnabás Póczos and Aarti Singh 14

15 1 st PCA axis slide by Barnabás Póczos and Aarti Singh

16 2 nd PCA axis slide by Barnabás Póczos and Aarti Singh

PCA algorithm I (sequential) Given the centered data { x 1 , ¡…, ¡ x m }, compute the principal vectors: 1 m � � 2 arg max {( T ) } w w x 1 st PCA vector 1 i � m 1 w � 1 i We maximize the variance of projection of x 1 m � � � 2 arg max {[ T ( T )] } k th PCA vector w w x w w x 2 1 1 i i � m 1 w � 1 i x’ ¡ PCA reconstruction w We maximize the variance slide by Barnabás Póczos and Aarti Singh x of the projection in the x- x’ w 1 residual subspace x’=w 1 ( w 1 T x ) w 17 18

PCA algorithm I (sequential) Given w 1 ,…, ¡ w k-1 , we calculate w k principal vector as before: Maximize the variance of projection of x � 1 1 m k � � � � 2 arg max {[ T ( T )] } w w x w w x k i j j i � m 1 w � � 1 1 i j k th PCA vector x’ ¡ PCA reconstruction w We maximize the variance x slide by Barnabás Póczos and Aarti Singh of the projection in the w 1 residual subspace w 1 ( w 1 T x ) w 2 ( w 2 T x ) x’=w 1 ( w 1 T x ) +w 2 ( w 2 T x ) w 2 18 19

PCA algorithm II   (sample covariance matrix) • Given data { x 1 , ¡…, ¡ x m }, compute covariance matrix � 1 1 m m � � � � � � � where ( )( ) T x x x x x x i i m m � 1 i � 1 i • PCA basis vectors = the eigenvectors of �� slide by Barnabás Póczos and Aarti Singh • Larger eigenvalue � more important eigenvectors 19

PCA algorithm II   (sample covariance matrix) PCA algorithm( X , k ): top k eigenvalues/eigenvectors % X = N � m data matrix, % ¡… ¡each ¡ data point x i = column vector, i=1..m 1 m � x x � • i m � 1 i • X � subtract mean x from each column vector x i in X • � � X X T … ¡ covariance matrix of X slide by Barnabás Póczos and Aarti Singh • { � i , u i } i=1..N = eigenvectors/eigenvalues of �� 1 � � 2 � … ¡ � � N � • Return { � i , u i } i=1.. k % top k PCA components 20 22

PCA algorithm III   (SVD of the data matrix) (SVD of the data matrix) Singular Value Decomposition of the centered data matrix X . X features � samples = USV T X U S V T = sig. significant significant slide by Barnabás Póczos and Aarti Singh noise noise noise 23 samples 21

PCA algorithm III • Columns of U • the principal vectors, { u (1) , ¡…, ¡ u ( k) } • orthogonal and has unit norm – so U T U = I • Can reconstruct the data using linear combinations of { u (1) , ¡…, ¡ u ( k) } • Matrix S • Diagonal • Shows importance of each eigenvector slide by Barnabás Póczos and Aarti Singh • Columns of V T • The coefficients for reconstructing the samples 22

Applications 23

Face Recognition 24

Face Recognition • Want to identify specific person, based on facial image � • Robust to glasses, lighting, … � Robust ¡to ¡glasses, ¡lighting,… - Can’t just use the given 256 x 256 pixels � Can’t ¡just ¡use ¡the ¡given ¡256 ¡x ¡256 ¡pixels slide by Barnabás Póczos and Aarti Singh 25

Applying PCA: Eigenfaces Method A: Build a PCA subspace for each person and check which subspace can reconstruct the test image the best Method B: Build one PCA database for the whole dataset and then classify based on the weights. � Example data set: Images of faces • Famous Eigenface approach [Turk & Pentland], [Sirovich & Kirby] � Each face x is ¡… x 1 , ¡…, ¡ x m • 256 � 256 values (luminance at location) slide by Barnabás Póczos and Aarti Singh real values 256 x 256 • x in � 256 � 256 (view as 64K dim vector) X = � Form X = [ x 1 , ¡…, ¡ x m ] centered data mtx � Compute �� = XX T � Problem: � is 64K � 64K ¡… ¡HUGE!!! 27 26 m faces

Computational Complexity • Suppose m instances, each of size N • Eigenfaces: m=500 faces, each of size N=64K • Given N � N covariance matrix �� can compute • all N eigenvectors/eigenvalues in O(N 3 ) • first k eigenvectors/eigenvalues in O(k N 2 ) • But if N=64K, EXPENSIVE! slide by Barnabás Póczos and Aarti Singh 27

A Clever Workaround • Note that m<<64K x 1 , ¡…, ¡ x m • Use L =X T X instead of � =XX T • If v is eigenvector of L real values 256 x 256 then Xv is eigenvector of �� X = Proof: L v = � v X T X v = � v X (X T X v) = X( � v) = � Xv m faces (XX T )X v = � ( Xv ) slide by Barnabás Póczos and Aarti Singh �� Xv) = � ( Xv ) 28

Principle Components (Method B) slide by Barnabás Póczos and Aarti Singh 29

Reconstructing… ¡(Method ¡B) Principle Components (Method B) slide by Barnabás Póczos and Aarti Singh • … faster if train with … � … ¡faster ¡if ¡train ¡with… - only people w/out glasses • - same lighting conditions • 30

Shortcomings • Requires carefully controlled data: - All faces centered in frame - Same size - Some sensitivity to angle • Method is completely knowledge free - (sometimes this is good!) - Doesn’t know that faces are wrapped around slide by Barnabás Póczos and Aarti Singh 3D objects (heads) - Makes no e ff ort to preserve class distinctions 31

Happiness subspace (method A) slide by Barnabás Póczos and Aarti Singh 32

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 - PowerPoint PPT Presentation

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA 2 PCA Applications Data

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

OpenSPARC T1 on Xilinx FPGAs Updates Thomas Thatcher Paul Hartke Durgam Vahia

Affine Matrix Ball Construction Michael Chmutov Pavlo Pylyavskyy Elena Yudovina AMS Meeting

Optimizations to NFS LA Patrick Stach NFS Linear Algebra Solve for a vector x such that:

Algorithmic Sahlqvist Preservation for Modal Compact Hausdorff Spaces Zhiguang Zhao Delft

Lecture 23: Principal Component Analysis Aykut Erdem January 2019 Hacettepe University

Aspe% immunologici nelle Mielodisplasie Ipoplas3che Renato Zambello, MD Padua University School

Learning Interpretable Models Expressed in Linear Temporal Logic Alberto Camacho 1 , 2 and Sheila

1 " QK

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 - PowerPoint PPT Presentation

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA 2 PCA Applications Data

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

OpenSPARC T1 on Xilinx FPGAs Updates Thomas Thatcher Paul Hartke Durgam Vahia

Affine Matrix Ball Construction Michael Chmutov Pavlo Pylyavskyy Elena Yudovina AMS Meeting

Optimizations to NFS LA Patrick Stach NFS Linear Algebra Solve for a vector x such that:

Algorithmic Sahlqvist Preservation for Modal Compact Hausdorff Spaces Zhiguang Zhao Delft

Lecture 23: Principal Component Analysis Aykut Erdem January 2019 Hacettepe University

Aspe% immunologici nelle Mielodisplasie Ipoplas3che Renato Zambello, MD Padua University School

Learning Interpretable Models Expressed in Linear Temporal Logic Alberto Camacho 1 , 2 and Sheila

1 &quot; QK

1 " QK