MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco - PowerPoint PPT Presentation

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015

Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component PCA and Singular Value Decomposition Kernel PCA MLCC 2015 2

Dimensionality Reduction In many practical applications it is of interest to reduce the dimensionality of the data: ◮ data visualization ◮ data exploration : for investigating the ”effective” dimensionality of the data MLCC 2015 3

Dimensionality Reduction (cont.) This problem of dimensionality reduction can be seen as the problem of defining a map M : X = R D → R k , k ≪ D, according to some suitable criterion . MLCC 2015 4

Dimensionality Reduction (cont.) This problem of dimensionality reduction can be seen as the problem of defining a map M : X = R D → R k , k ≪ D, according to some suitable criterion . In the following data reconstruction will be our guiding principle. MLCC 2015 5

Principal Component Analysis PCA is arguably the most popular dimensionality reduction procedure. MLCC 2015 6

Principal Component Analysis PCA is arguably the most popular dimensionality reduction procedure. It is a data driven procedure that given an unsupervised sample S = ( x 1 , . . . , x n ) derive a dimensionality reduction defined by a linear map M . MLCC 2015 7

Principal Component Analysis PCA is arguably the most popular dimensionality reduction procedure. It is a data driven procedure that given an unsupervised sample S = ( x 1 , . . . , x n ) derive a dimensionality reduction defined by a linear map M . PCA can be derived from several prospective and here we give a geometric derivation. MLCC 2015 8

Dimensionality Reduction by Reconstruction Recall that, if w ∈ R D , � w � = 1 , then ( w T x ) w is the orthogonal projection of x on w MLCC 2015 9

Dimensionality Reduction by Reconstruction Recall that, if w ∈ R D , � w � = 1 , then ( w T x ) w is the orthogonal projection of x on w MLCC 2015 10

Dimensionality Reduction by Reconstruction (cont.) First, consider k = 1 . The associated reconstruction error is � x − ( w T x ) w � 2 (that is how much we lose by projecting x along the direction w ) MLCC 2015 11

Dimensionality Reduction by Reconstruction (cont.) First, consider k = 1 . The associated reconstruction error is � x − ( w T x ) w � 2 (that is how much we lose by projecting x along the direction w ) Problem: Find the direction p allowing the best reconstruction of the training set MLCC 2015 12

Dimensionality Reduction by Reconstruction (cont.) Let S D − 1 = { w ∈ R D | � w � = 1 } is the sphere in D dimensions. Consider the empirical reconstruction minimization problem, n 1 � x i − ( w T x i ) w � 2 . � min n w ∈ S D − 1 i =1 The solution p to the above problem is called the first principal component of the data MLCC 2015 13

An Equivalent Formulation A direct computation shows that � x i − ( w T x i ) w � 2 = � x i � − ( w T x i ) 2 MLCC 2015 14

An Equivalent Formulation A direct computation shows that � x i − ( w T x i ) w � 2 = � x i � − ( w T x i ) 2 Then, problem n 1 � x i − ( w T x i ) w � 2 � min n w ∈ S D − 1 i =1 is equivalent to n 1 ( w T x i ) 2 � max n w ∈ S D − 1 i =1 MLCC 2015 15

Reconstruction and Variance x = 1 Assume the data to be centered, ¯ n x i = 0 , then we can interpret the term ( w T x ) 2 as the variance of x in the direction w . MLCC 2015 17

Reconstruction and Variance x = 1 Assume the data to be centered, ¯ n x i = 0 , then we can interpret the term ( w T x ) 2 as the variance of x in the direction w . The first PC can be seen as the direction along which the data have maximum variance. n 1 ( w T x i ) 2 � max n w ∈ S D − 1 i =1 MLCC 2015 18

Centering If the data are not centered, we should consider n 1 ( w T ( x i − ¯ � x )) 2 max (1) n w ∈ S D − 1 i =1 equivalent to n 1 ( w T x c � i ) 2 max n w ∈ S D − 1 i =1 with x c = x − ¯ x . MLCC 2015 19

Centering and Reconstruction If we consider the effect of centering to reconstruction it is easy to see that we get n 1 � x i − (( w T ( x i − b )) w + b ) � 2 � min n w,b ∈ S D − 1 i =1 where (( w T ( x i − b )) w + b is an affine (rather than an orthogonal) projection MLCC 2015 20

PCA as an Eigenproblem A further manipulation shows that PCA corresponds to an eigenvalue problem. MLCC 2015 22

PCA as an Eigenproblem A further manipulation shows that PCA corresponds to an eigenvalue problem. Using the symmetry of the inner product, n n n n 1 ( w T x i ) 2 = 1 w T x i w T x i = 1 i w = w T ( 1 w T x i x T � � � � x i x T i ) w n n n n i =1 i =1 i =1 i =1 MLCC 2015 23

PCA as an Eigenproblem A further manipulation shows that PCA corresponds to an eigenvalue problem. Using the symmetry of the inner product, n n n n 1 ( w T x i ) 2 = 1 w T x i w T x i = 1 i w = w T ( 1 w T x i x T � � � � x i x T i ) w n n n n i =1 i =1 i =1 i =1 Then, we can consider the problem n C n = 1 w ∈ S D − 1 w T C n w, � x i x T max i n i =1 MLCC 2015 24

PCA as an Eigenproblem (cont.) We make two observations: � n ◮ The (”covariance”) matrix C n = 1 i =1 X T n X n is symmetric and n positive semi-definite. MLCC 2015 25

PCA as an Eigenproblem (cont.) We make two observations: � n ◮ The (”covariance”) matrix C n = 1 i =1 X T n X n is symmetric and n positive semi-definite. ◮ The objective function of PCA can be written as w T C n w w T w the so called Rayleigh quotient. MLCC 2015 26

PCA as an Eigenproblem (cont.) We make two observations: � n ◮ The (”covariance”) matrix C n = 1 i =1 X T n X n is symmetric and n positive semi-definite. ◮ The objective function of PCA can be written as w T C n w w T w the so called Rayleigh quotient. Note that, if C n u = λu then u T C n u = λ , since u is normalized. u T u MLCC 2015 27

PCA as an Eigenproblem (cont.) We make two observations: � n ◮ The (”covariance”) matrix C n = 1 i =1 X T n X n is symmetric and n positive semi-definite. ◮ The objective function of PCA can be written as w T C n w w T w the so called Rayleigh quotient. Note that, if C n u = λu then u T C n u = λ , since u is normalized. u T u Indeed, it is possible to show that the Rayleigh quotient achieves its maximum at a vector corresponding to the maximum eigenvalue of C n MLCC 2015 28

PCA as an Eigenproblem (cont.) Computing the first principal component of the data reduces to computing the biggest eigenvalue of the covariance and the corresponding eigenvector. n C n = 1 � X T C n u = λu, n X n n i =1 MLCC 2015 29

Beyond the First Principal Component We discuss how to consider more than one principle component ( k > 1 ) M : X = R D → R k , k ≪ D The idea is simply to iterate the previous reasoning MLCC 2015 31

Residual Reconstruction The idea is to consider the one dimensional projection that can best reconstruct the residuals r i = x i − ( p T x i ) p i MLCC 2015 32

Residual Reconstruction The idea is to consider the one dimensional projection that can best reconstruct the residuals r i = x i − ( p T x i ) p i An associated minimization problem is given by n 1 � r i − ( w T r i ) w � 2 . � min n w ∈ S D − 1 ,w ⊥ p i =1 (note: the constraint w ⊥ p ) MLCC 2015 33

Residual Reconstruction (cont.) Note that for all i = 1 , . . . , n , � r i − ( w T r i ) w � 2 = � r i � 2 − ( w T r i ) 2 = � r i � 2 − ( w T x i ) 2 since w ⊥ p MLCC 2015 34

Residual Reconstruction (cont.) Note that for all i = 1 , . . . , n , � r i − ( w T r i ) w � 2 = � r i � 2 − ( w T r i ) 2 = � r i � 2 − ( w T x i ) 2 since w ⊥ p Then, we can consider the following equivalent problem n 1 ( w T x i ) 2 = w T C n w. � max n w ∈ S D − 1 ,w ⊥ p i =1 MLCC 2015 35

PCA as an Eigenproblem n 1 ( w T x i ) 2 = w T C n w. � max n w ∈ S D − 1 ,w ⊥ p i =1 Again, we have to minimize the Rayleigh quotient of the covariance matrix with the extra constraint w ⊥ p MLCC 2015 36

PCA as an Eigenproblem n 1 ( w T x i ) 2 = w T C n w. � max n w ∈ S D − 1 ,w ⊥ p i =1 Again, we have to minimize the Rayleigh quotient of the covariance matrix with the extra constraint w ⊥ p Similarly to before, it can be proved that the solution of the above problem is given by the second eigenvector of C n , and the corresponding eigenvalue. MLCC 2015 37

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco - PowerPoint PPT Presentation

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component PCA and Singular Value

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

Advanced PCA: Choosing the right number of PCs Alexandros Tantos Assistant Professor Aristotle

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco UNIGE-MIT-IIT Outline Learning

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Lecture 13 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett

APPLIED MACHINE LEARNING Methods for Reduction of Dimensionality through Linear Projection

Emerging Vaping Products Tobacco Free Mass Policy Forum Youn Ok Lee, PhD www.rti.org RTI

An Extension to Basic ME Tableaux (1) 11ai Example Given: 1. Rx Px, 2. Px Q, 3.

1 Principal Components Analysis (PCA) Suppose someone hands you a stack of N vectors, { x 1 ,

CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford