Advanced Section #4: Methods of Dimensionality Reduction: Principal - PowerPoint PPT Presentation

Advanced Section #4: Methods of Dimensionality Reduction: Principal Component Analysis (PCA) Cedric Flamant CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1

Outline 1. Introduction: a. Why Dimensionality Reduction? b. Linear Algebra (Recap). c. Statistics (Recap). 2. Principal Component Analysis: a. Foundation. b. Assumptions & Limitations. c. Kernel PCA for nonlinear dimensionality reduction. CS109A, P ROTOPAPAS , R ADER , T ANNER 2

Dimensionality Reduction, why? A process of reducing the number of predictor variables under consideration. To find a more meaningful basis to express our data filtering the noise and revealing the hidden structure. C. Bishop, Pattern Recognition and Machine CS109A, P ROTOPAPAS , R ADER , T ANNER Learning , Springer (2008). 3

A simple example taken from Physics Consider an ideal spring-mass system oscillating along x. Seeking the pressure Y that spring exerts on the wall. LASSO regression model: LASSO variable selection: J. Shlens, A Tutorial on Principal Component CS109A, P ROTOPAPAS , R ADER , T ANNER Analysis , (2003). 4

Principal Component Analysis versus LASSO LASSO simply selects one of the arbitrary LASSO directions, scientifically unsatisfactory . We want to use all the measurements to X situate the position of mass. X We want to find a lower-dimensional manifold of predictors on which data lie. ✓ Principal Component Analysis (PCA): A powerful Statistical tool for analyzing data sets and is formulated in the context of Linear Algebra . CS109A, P ROTOPAPAS , R ADER , T ANNER 5

Linear Algebra (Recap) 6

Symmetric matrices Consider a design (or data) matrix consists of n observations and p predictors: Then is a symmetric matrix. Symmetric: Using that : Similar for CS109A, P ROTOPAPAS , R ADER , T ANNER 7

Eigenvalues and Eigenvectors For a real and symmetric matrix: There exists a unique set of real eigenvalues: and the associated eigenvectors: such that: (orthogonal) (normalized) ➢ Hence, they form an orthonormal basis. CS109A, P ROTOPAPAS , R ADER , T ANNER 8

Spectrum and Eigen-decomposition Spectrum: Orthogonal Matrix: Eigen-decomposition: CS109A, P ROTOPAPAS , R ADER , T ANNER 9

Real & Positive Eigenvalues: Gram Matrix ● The eigenvalues of are non-negative real numbers: Similar for ● Hence, and are positive-semidefinite . CS109A, P ROTOPAPAS , R ADER , T ANNER

Same eigenvalues and share the same eigenvalues: ● Same eigenvalues. Transformed eigenvectors: CS109A, P ROTOPAPAS , R ADER , T ANNER 11

The sum of eigenvalues of is equal to its trace ● Cyclic Property of Trace: Suppose the matrices: ● The trace of a Gram matrix is the sum of its eigenvalues. CS109A, P ROTOPAPAS , R ADER , T ANNER 12

Statistics (Recap) 13

Centered Model Matrix Consider the model (data) matrix We make the predictors centered ( each column has zero expectation) by subtracting the sample mean: Centered Model Matrix: CS109A, P ROTOPAPAS , R ADER , T ANNER 14

Sample Covariance Matrix Consider the Covariance matrix: Inspecting the terms: ➢ The diagonal terms are the sample variances: ➢ The non-diagonal terms are the sample covariances: CS109A, P ROTOPAPAS , R ADER , T ANNER 15

Principal Components Analysis (PCA) 16

PCA PCA tries to fit an ellipsoid to the data. PCA is a linear transformation that transforms data to a new coordinate system. The data with the greatest variance lie on the first axis (first principal component) and so on. PCA reduces the dimensions by throwing away the low variance principal components. CS109A, P ROTOPAPAS , R ADER , T ANNER 17 J. Jauregui (2012)

PCA foundation Note that the covariance matrix is symmetric, so it permits an orthonormal eigenbasis: The eigenvalues can be sorted in as: The eigenvector is called the ith principal component of CS109A, P ROTOPAPAS , R ADER , T ANNER 18

Measure the importance of the principal components The total sample variance of the predictors: The fraction of the total sample variance that corresponds to : so, indicates the “ importance ” of the ith principal component. CS109A, P ROTOPAPAS , R ADER , T ANNER 19

Back to spring-mass example PCA finds: revealing the one-degree of freedom. Hence, PCA indicates that there may be fewer variables that are essentially responsible for the variability of the response. CS109A, P ROTOPAPAS , R ADER , T ANNER 20

PCA Dimensionality Reduction The Spectrum represents the dimensionality reduction by PCA. CS109A, P ROTOPAPAS , R ADER , T ANNER 21

PCA Dimensionality Reduction There is no rule in how many eigenvalues to keep, but it is generally clear and left to the analyst’s discretion. C. Bishop, Pattern Recognition and Machine Learning , Springer (2008). CS109A, P ROTOPAPAS , R ADER , T ANNER 22

PCA Dimensionality Reduction An example on leaves (thanks to Chris Rycroft, AM205) CS109A, P ROTOPAPAS , R ADER , T ANNER 23

PCA Dimensionality Reduction The average leaf (Why do we need this again?) CS109A, P ROTOPAPAS , R ADER , T ANNER 24

PCA Dimensionality Reduction First three principal components positive negative CS109A, P ROTOPAPAS , R ADER , T ANNER 25

PCA Dimensionality Reduction – Keeping up to k Components CS109A, P ROTOPAPAS , R ADER , T ANNER 26

Assumptions of PCA Although PCA is a powerful tool for dimension reduction, it is based on some strong assumptions. The assumptions are reasonable, but they must be checked in practice before drawing conclusions from PCA. When PCA assumptions fail, we need to use other Linear or Nonlinear dimension reduction methods. CS109A, P ROTOPAPAS , R ADER , T ANNER 27

Mean/Variance are sufficient In applying PCA, we assume that means and covariance matrix are sufficient for describing the distributions of the predictors. This is only exactly true if the predictors are drawn from a multivariable Normal distribution, but works approximately for many situations. When a predictor deviates heavily from being Normally distributed, an appropriate nonlinear transformation may solve this problem. CS109A, P ROTOPAPAS , R ADER , T ANNER Wikipedia – multivariate normal distribution 28

High Variance indicates importance Assumption: The eigenvalue is measures the “ importance ” of the i th principal component. It is intuitively reasonable that lower variability components describe the data less, but it is not always true. CS109A, P ROTOPAPAS , R ADER , T ANNER 29

Principal Components are orthogonal PCA assumes that the intrinsic dimensions are orthogonal. When this assumption fails, we need to assume non-orthogonal components which are not compatible with PCA. Balaji Pitchai Kannu (on Quora) CS109A, P ROTOPAPAS , R ADER , T ANNER 30

Linear Change of Basis PCA assumes that data lie on a lower dimensional linear manifold. vs projectrhea.org Alexsei Tiulpin When the data lie on a nonlinear manifold in the predictor space, then linear methods are likely to be ineffective. CS109A, P ROTOPAPAS , R ADER , T ANNER 31

Kernel PCA for Nonlinear Dimensionality Reduction Applying a nonlinear map Φ (called feature map ) on data yields PCA kernel: Centered nonlinear representation: Apply PCA to the modified Kernel: CS109A, P ROTOPAPAS , R ADER , T ANNER 32 Alexsei Tiulpin

Summary Dimensionality Reduction Methods • 1. A process of reducing the number of predictor variables under consideration. 2. To find a more meaningful basis to express our data filtering the noise and revealing the hidden structure. Principal Component Analysis • 1. A powerful Statistical tool for analyzing data sets and is formulated in the context of Linear Algebra . 2. Spectral decomposition: We reduce the dimension of predictors by reducing the number of principal components and their eigenvalues. 3. PCA is based on strong assumptions that we need to check. 4. Kernel PCA for nonlinear dimensionality reduction. CS109A, P ROTOPAPAS , R ADER , T ANNER 33

Advanced Section 4: Dimensionality Reduction, PCA Thank you CS109A, P ROTOPAPAS , R ADER , T ANNER 34

Advanced Section #4: Methods of Dimensionality Reduction: Principal - PowerPoint PPT Presentation

Advanced Section #4: Methods of Dimensionality Reduction: Principal Component Analysis (PCA) Cedric Flamant CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1 Outline 1. Introduction: a. Why

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

May 2013 Agenda Section 1 Jaypee Group Overview Section 2 Company Overview Section 3 Yamuna

Fermilab NORTH 0 20 20 40 1"=20'-0" 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Advanced Nutrition Course Advanced Nutrition Course 6 Week Advanced Nutrition Live Online

Investor Update CONTENTS SECTION 01 SECTION 02 Asset Overview management strategy SECTION

Agenda Section 1: Introduction Section 2: Emergency & Welfare Arrangements Section

Company presentation June 2016 Table of contents Section 1 Summary 3 Section 2 Market

1 2 3 4 Section 1 Section 2 Section 3 Section 4 INTRODUCTION FINANCIAL SEGMENTAL GROUP

SR 15 SECTION 088 CSVT SOUTHERN SECTION PUBLIC MEETING NOVEMBER 15, 2017 SR 15 SECTION 088

1 Table of content Introduction Section 1 Executive Summary 3 Corporate Overview 9 Section 2

Advanced Section #4: Methods of Dimensionality Reduction: Principal Component Analysis (PCA)

Top Feeds Errors on Shopping How to fix it in Lengow Rozenn LHelgoualch - Shopping Specialist

MyZoobug is the new sunglass range with 5 styles for babies to 12yrs by award-winning London kids

Introduction to Oloryn Partners 2 Content Introduction to Oloryn Partners Scope of work and

Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn

+ The right answer to the wrong question The use of factor analysis and principal component

On the Karhunen-Love basis for continuous mechanical systems R. Sampaio Pontifcia

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Advanced Section #4: Methods of Dimensionality Reduction: Principal - PowerPoint PPT Presentation

Advanced Section #4: Methods of Dimensionality Reduction: Principal Component Analysis (PCA) Cedric Flamant CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1 Outline 1. Introduction: a. Why

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

May 2013 Agenda Section 1 Jaypee Group Overview Section 2 Company Overview Section 3 Yamuna

Fermilab NORTH 0 20 20 40 1&quot;=20'-0&quot; 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

Probability Chapter 4 Section 2: Fundamentals Section 3: Addition Rule Section 4:

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Advanced Nutrition Course Advanced Nutrition Course 6 Week Advanced Nutrition Live Online

Investor Update CONTENTS SECTION 01 SECTION 02 Asset Overview management strategy SECTION

Agenda Section 1: Introduction Section 2: Emergency &amp; Welfare Arrangements Section

Company presentation June 2016 Table of contents Section 1 Summary 3 Section 2 Market

1 2 3 4 Section 1 Section 2 Section 3 Section 4 INTRODUCTION FINANCIAL SEGMENTAL GROUP

SR 15 SECTION 088 CSVT SOUTHERN SECTION PUBLIC MEETING NOVEMBER 15, 2017 SR 15 SECTION 088

1 Table of content Introduction Section 1 Executive Summary 3 Corporate Overview 9 Section 2

Advanced Section #4: Methods of Dimensionality Reduction: Principal Component Analysis (PCA)

Top Feeds Errors on Shopping How to fix it in Lengow Rozenn LHelgoualch - Shopping Specialist

MyZoobug is the new sunglass range with 5 styles for babies to 12yrs by award-winning London kids

Introduction to Oloryn Partners 2 Content Introduction to Oloryn Partners Scope of work and

Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn

+ The right answer to the wrong question The use of factor analysis and principal component

On the Karhunen-Love basis for continuous mechanical systems R. Sampaio Pontifcia

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Fermilab NORTH 0 20 20 40 1"=20'-0" 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

Agenda Section 1: Introduction Section 2: Emergency & Welfare Arrangements Section