ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for - PowerPoint PPT Presentation

ECS231 PCA, revisited May 28, 2019 1 / 18

Outline 1. PCA for lossy data compression 2. PCA for learning a representation of data 3. Extra: learning XOR 2 / 18

1. PCA for lossy data compression 1 ◮ Data compression: given data points { x (1) , ..., x ( m ) } ∈ R n , for each x ( i ) ∈ R n , find the code vector c ( i ) ∈ R ℓ , where ℓ < n . ◮ Encoding function f : x → c ◮ Lossy decoding function g : c ❀ x ◮ Reconstruction: x ≈ g ( c ) = g ( f ( x )) ◮ PCA is defined by choicing decoding function: g ( c ) = Dc where D ∈ R n × ℓ defines the decoding and is constrained to have column orthonormal, i.e., D T D = I ℓ . ◮ Questions: 1. How to generate optimal code point c ∗ for each input point x ? 2. How to choose the decoding matrix D ? 1 Section 2.12 of I. Goodfellow, Y. Bengio and A. Courville, Deep Learning , deeplearningbook.org 3 / 18

1. PCA for lossy data compression, cont’d Question 1: How to generate optimal code point c ∗ for each input point x ? i.e., solve � x − g ( c ) � 2 c ∗ = argmin 2 . c ◮ By vector calculus and the first-order necessary condition for optimality, we conclude c ∗ = D T x. ◮ To encode x , we just need the mat-vec product f ( x ) = D T x ◮ PCA reconstruction operation r ( x ) = g ( f ( x )) = g ( D T x ) = DD T x. 4 / 18

1. PCA for lossy data compression, cont’d Question 2: How to choose the decoding matrix D ? ◮ Idea: minimize the L 2 distance between inputs and reconstructions: � �� i,j ( x ( i ) − r ( x ( i ) ) j ) 2 D ∗ = argmin D j D T D = I ℓ s.t. ◮ For simplicity , consider ℓ = 1 and D = d ∈ R n , then i � x ( i ) − dd T x ( i ) � 2 � d ∗ = argmin d � 2 d T d = 1 . s.t. ◮ Let X ∈ R m × n with X ( i, :) = ( x ( i ) ) T , then argmin d � X − Xdd T � 2 � = d ∗ F d T d = 1 . s.t. 5 / 18

1. PCA for lossy data compression, cont’d ◮ Equivalently, argmax d tr ( X T Xdd T ) = argmax d � Xd � 2 � = d ∗ 2 d T d = 1 . s.t. ◮ Let ( σ , u 1 , v 1 ) be the largest singular triplet of X , i.e., Xv 1 = σ 1 u 1 . Then we have d ∗ = argmax d � Xd � 2 2 = v 1 . ◮ In the general case, when ℓ > 1 , the matrix D is given by the ℓ right singular vectors of X corresponding to the ℓ largest singular values of X . (Exercise: write out the proof.) 6 / 18

1. PCA for lossy data compression, cont’d MATLAB demo code: pca4ldc.m >> ... >> % SVD >> [U,S,V] = svd(X,0); >> % >> % Decode matrix D = V(:,1) >> % >> % PCA reconstruction >> % Xpca = (X*V(:,1))*V(:,1)’ = sigma(1)*U(:,1)*V(:,1)’; >> % >> Xpca = (X*V(:,1))*V(:,1)’ >> ... 7 / 18

1. PCA for lossy data compression, cont’d Height 100 data 80 pca 60 40 20 0 1 2 3 4 5 6 Weight 40 data 30 pca 20 10 0 1 2 3 4 5 6 8 / 18

22. PCA for learning a repesentation of data 2 ◮ PCA as an unsupervised learning algorithm that learns a representation of data: ◮ learns a representation that has lower dimensionality than the original input. ◮ learns a representation whose element have no linear correlation with each other (but may still have nonlinear relationships between variables). ◮ Consider m × n “design” matrix X of data x with E [ x ] = 0 1 m − 1 X T X. Var [ x ] = ◮ PCA finds a representation of x via an orthogonal linear transformation z = x T W such that Var [ z ] = diag , where the transformation matrix W satisfying W T W = I . 2 Section 5.8.1 of I. Goodfellow, Y. Bengio and A. Courville, Deep Learning , deeplearningbook.org 9 / 18

2. PCA for learning a repesentation of data, cont’d Question: how to find W ? ◮ Let X = UΣW T be the SVD of X ◮ Then 1 m − 1 X T X Var [ x ] = 1 m − 1( UΣW T ) T UΣW T = 1 m − 1 W T Σ T U T UΣW T = 1 m − 1 W T Σ T ΣW T = 10 / 18

2. PCA for learning a repesentation of data, cont’d ◮ Therefore, if we take z = x T W Then 1 m − 1 Z T Z Var [ z ] = 1 m − 1 W T X T XW = 1 m − 1 W T WΣ T ΣW T W = 1 m − 1 Σ T Σ = 11 / 18

2. PCA for learning a repesentation of data, cont’d ◮ The individual elements of z are mutually uncorrelated — disentangle the unknown factors of variation underlying the data. ◮ While correlation is an important category of dependency between element of data, we are also interested in learning more representation that disentangle more complicated forms of feature dependencies. For this, we will need to more than what can be done with a simple linear transformation. 12 / 18

2. PCA for learning a repesentation of data, cont’d MATLAB demo code: pca4dr.m >> ... >> % make E(x) = 0 >> X1 = X - ones(m,1)*mean(X); >> % >> % SVD >> [U,S,W] = svd(X1); >> % >> %PCA >> Z = X1*W; >> % >> % covariance of the new variable z >> var_z = Z’*Z >> ... 13 / 18

2. PCA for learning a repesentation of data, cont’d Original data PCA-transformed data 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 z2 0 x2 0 -0.1 -0.1 -0.2 -0.2 -0.3 -0.3 -0.4 -0.4 -0.5 -0.5 -0.5 0 0.5 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 z1 x1 14 / 18

Topic: extra 15 / 18

Learning XOR 3 ◮ The first (simplest) example of “Deeping Learning” ◮ The XOR function (“exclusive or”) x 1 x 2 y 0 0 0 1 0 1 0 1 1 1 1 0 ◮ Task: find function f ∗ such that y = f ∗ ( x ) for x ∈ X = { (0 , 0) , (1 , 0) , (0 , 1) , (1 , 1) } . ◮ Model: ˆ y = f ( x ; θ ) , where θ are parameters ◮ Measure: MSE loss function J ( θ ) = 1 � ( f ∗ ( x ) − f ( x ; θ )) 2 . 4 x ∈ X 3 Section 6.1 of I. Goodfellow, Y. Bengio and A. Courville, Deep Learning , deeplearningbook.org 16 / 18

Learning XOR, cont’d ◮ Linear model: f ( x ; θ ) = f ( x ; w, b ) = x T w + b ◮ Solution of the minimization of the MSE loss function b = 1 w = 0 and 2 . ◮ A linear model is not able to represent the XOR function 17 / 18

Learning XOR, cont’d ◮ Two-layer model: f ( x ; θ ) = f (2) � � f (1) ( x ; W, c ); w, b where θ ≡ { W, c, w, b } and f (1) ( x ; W, c ) = max { 0 , W T x + c } ≡ h f (2) ( h ; w, b ) = w T h + b, max { 0 , z } is called an “activation function”. ◮ Then by taking � 1 � 1 � � 0 � � 1 � � θ ∗ = W = , c = , w = , b = 0 1 1 − 1 − 2 we can verify that the two-layar model (“neural network”) obtains the correct answer for any x ∈ X . ◮ Question: how to find θ ∗ ? 18 / 18

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for - PowerPoint PPT Presentation

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for learning a representation of data 3. Extra: learning XOR 2 / 18 1. PCA for lossy data compression 1 Data compression: given data points { x (1)

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

ECS231 Least-squares problems (Introduction to Randomized Algorithms) May 21, 2019 1 / 12

ECS231 Mathematics Review I: Linear Algebra Reference: Chap.1 of Solomon 1 / 23 Vector spaces

ECS231 Intro to High Performance Computing April 13, 2019 1 / 33 Algorithm design and

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

Application of PCA to Facial Recognition Aaron Kosmatin, Clayton Broman Math 45 December 17,

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

The Zen of PCA, t-SNE, and Autoencoders http://mit6874.github.io 1 Today: Gene Expression, PCA,

Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1 Linear

Research Update Examining PCa Disparities Globally Rotimi Nettey PGY-6 May 2020 Prostate

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Admixture model

History and Theory of Nonlinear Principal Component Analysis Jan de Leeuw February 11, 2011 Jan

Pattern Detection in Computer Networks Using Robust Principal Component Analysis Randy

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

How? Where? Who? When? Matthew 16:18 And I tell you, you are Peter, and on this rock I will

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to:

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for - PowerPoint PPT Presentation

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for learning a representation of data 3. Extra: learning XOR 2 / 18 1. PCA for lossy data compression 1 Data compression: given data points { x (1)

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

ECS231 Least-squares problems (Introduction to Randomized Algorithms) May 21, 2019 1 / 12

ECS231 Mathematics Review I: Linear Algebra Reference: Chap.1 of Solomon 1 / 23 Vector spaces

ECS231 Intro to High Performance Computing April 13, 2019 1 / 33 Algorithm design and

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

Application of PCA to Facial Recognition Aaron Kosmatin, Clayton Broman Math 45 December 17,

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

The Zen of PCA, t-SNE, and Autoencoders http://mit6874.github.io 1 Today: Gene Expression, PCA,

Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1 Linear

Research Update Examining PCa Disparities Globally Rotimi Nettey PGY-6 May 2020 Prostate

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Admixture model

History and Theory of Nonlinear Principal Component Analysis Jan de Leeuw February 11, 2011 Jan

Pattern Detection in Computer Networks Using Robust Principal Component Analysis Randy

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

How? Where? Who? When? Matthew 16:18 And I tell you, you are Peter, and on this rock I will

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

Dimensionality Reduction &amp; Embedding Prof. Mike Hughes Many ideas/slides attributable to:

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to: