Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf - PowerPoint PPT Presentation

Geometric Data Analysis Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit´ e de Montr´ eal Fall 2019 MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 1 / 20

Outline Preprocessing for data simplification 1 Sampling Aggregation Discretization Density estimation Dimensionality reduction Principal component analysis (PCA) 2 Autoencoder Variance maximization Singular value decomposition (SVD) MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 2 / 20

Preprocessing for data simplification Sampling Sampling Select a subset of representative data points instead of processing the entire data. A sampled subset is useful only if its analysis yields the same patterns, results, conclusions, etc., as the analysis of the entire data. 8000 points 2000 points 500 points MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 3 / 20

Preprocessing for data simplification Sampling Sampling Select a subset of representative data points instead of processing the entire data. Common sampling approaches Random: an equal probability of selecting any particular item. Without replacement: iteratively selected & remove items. With replacement: selected items remain in the population. Stratified: draw random samples from each partition. Choosing a sufficient sample size is often crucial for effective sampling. MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 3 / 20

Preprocessing for data simplification Sampling Example Choose enough samples guarantee at least one representative is selected from each distinct group/cluster/profile in the data. MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 3 / 20

Preprocessing for data simplification Aggregation Instead of sampling representative data points we can coarse-grain the data by aggregating together attributes or data points. Aggregation Combining several attributes to a single feature, or several data points into a single observation. Examples Change monthly revenues to annual revenues Analyze neighborhoods instead of houses Provide average rating of a season (not per episode) MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 4 / 20

Preprocessing for data simplification Discretization It is sometimes convenient to transform the entire data to nominal (or ordinal) attributes. Discretization Transformation of continuous attributes (or ones with infinite range) to discrete ones with a finite range. Discretization can be done in a supervised discretization (e.g., using class labels) or in an unsupervised manner (e.g., using clustering). MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 5 / 20

Preprocessing for data simplification Discretization Supervised discretization based on minimizing impurity: 3 values per axis 5 values per axis MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 5 / 20

Preprocessing for data simplification Discretization Unsupervised discretization: MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 5 / 20

Preprocessing for data simplification Density estimation Transforming attributes from raw vales to densities can be used to coarse-grain the data and bring its features to comparable scales between zero and one. MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 6 / 20

Preprocessing for data simplification Density estimation Transforming attributes from raw vales to densities can be used to coarse-grain the data and bring its features to comparable scales between zero and one. Cell-based density estimation MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 6 / 20

Preprocessing for data simplification Density estimation Transforming attributes from raw vales to densities can be used to coarse-grain the data and bring its features to comparable scales between zero and one. Center-based density estimation MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 6 / 20

Preprocessing for data simplification Dimensionality reduction Dimensionality of data is generally determined by the number of attributes or features that represent each data point. Curse of dimensionality A general term for various phenomena that arise when analyzing and processing high-dimensional data. Common theme - statistical significance is difficult, impractical, or even impossible to obtain due to sparsity of the data in high-dimensions Causes poor performance of classical statistical methods compared to low-dimensional data Common solution - reduce the dimensionality of the data as part of its (pre)processing. MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 7 / 20

Preprocessing for data simplification Dimensionality reduction There are several approaches to represent the data in a lower dimension, which can generally be split into two types: Dimensionality reduction approaches Feature selection/weighting - select a subset of existing features and only use them in the analysis, while possibly also assigning them importance weights to eliminate redundant information Feature extraction/construction - create new features by extracting relevant information from the original features PCA and MDS are two of the most common dimensionality reduction methods in data analysis, but many others exist as well. MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 7 / 20

Preprocessing for data simplification Feature subset selection Ideally - choose the best feature subset out of all possible combinations. Impractical - there are 2 n choices for n attributes! Feature selection approaches Embedded methods - choose the best features for a task as part of the data mining algorithm (e.g., decision trees). Filter methods - choose features that optimize a general criterion (e.g., min correlation) as part of data preprocessing using an efficient search algorithm. Wrapper methods - first formulate & handle a data mining task to select features, and then use the resulting subset to solve the real task. Alternatively, expert knowledge can sometimes be used to eliminate redundant and unnecessary features. MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 8 / 20

Principal component analysis MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 9 / 20

Principal component analysis � � � � � � MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 9 / 20

Principal component analysis � � Assume: avg = 0 � � � Find: � best k -dim projection MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 9 / 20

Principal component analysis Projection on principal components: Principal components                     Data points               MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 10 / 20

Principal component analysis Projection on principal components: � ✒ λ 1 φ 1 3D space ✻ ✲ � ✁ ☛ �→ 1D space ✲ MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 10 / 20

❘ Principal component analysis What is the best projection? Find subspace S ⊆ ❘ n s.t. dim( S ) = k and the data is well approximated by ˆ x = proj S x . MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 11 / 20

Principal component analysis What is the best projection? Find subspace S ⊆ ❘ n s.t. dim( S ) = k and the data is well approximated by ˆ x = proj S x . ⇓ Find subspace S ⊆ ❘ n s.t. S = span { u 1 , . . . , u k } and the data is � x − ˆ x � is minimal over the data with ˆ x = proj S x . MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 11 / 20

Principal component analysis What is the best projection? Find subspace S ⊆ ❘ n s.t. dim( S ) = k and the data is well approximated by ˆ x = proj S x . ⇓ Find subspace S ⊆ ❘ n s.t. S = span { u 1 , . . . , u k } and the data is � x − ˆ x � is minimal over the data with ˆ x = proj S x . ⇓ x i � 2 is minimal with Find k vectors u 1 , . . . , u k s.t. N − 1 � N i =1 � x i − ˆ x = proj span { u 1 ,..., u k } x . ˆ MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 11 / 20

Principal component analysis What is the best projection? Find subspace S ⊆ ❘ n s.t. dim( S ) = k and the data is well approximated by ˆ x = proj S x . ⇓ Find subspace S ⊆ ❘ n s.t. S = span { u 1 , . . . , u k } and the data is � x − ˆ x � is minimal over the data with ˆ x = proj S x . ⇓ x i � 2 is minimal with Find k vectors u 1 , . . . , u k s.t. N − 1 � N i =1 � x i − ˆ x = proj span { u 1 ,..., u k } x . ˆ How do we find these vectors u 1 , . . . , u k ? MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 11 / 20

Principal component analysis Autoencoder x i � 2 s.t. ˆ Minimize N − 1 � n i =1 � x i − ˆ x = proj span { u 1 ,..., u k } x Input layer: x [1] x [2] x [3] x [4] x [5] h i = W x i ❄ Hidden layer: h [1] h [2] h [3] x i = Uh i ˆ ❄ Output layer: x [1] ˆ ˆ x [2] ˆ x [3] x [4] ˆ x [5] ˆ N � � x i − UWx i � 2 arg min W ∈ ❘ k × n , U ∈ ❘ n × k i =1 MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 12 / 20

Principal component analysis Reconstruction error minimization We only need to consider orthonormal vectors u 1 , . . . , u k (i.e., � u i � = 1, � u i , u j � = 0 for i � = j ) that form a basis for the subspace. We can then extend this set to form a basis u 1 , . . . , u n for the entire ❘ n . � n � n j =1 u j u T Then, we can write x = j =1 � x , u j � u j = j x and proj span { u 1 ,..., u k } x = � k j =1 u j u T j x . x i � 2 . We now consider the reconstruction error N − 1 � N i =1 � x i − ˆ MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 13 / 20

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf - PowerPoint PPT Presentation

Geometric Data Analysis Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de Montr eal Fall 2019 MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 1 / 20 Outline Preprocessing for data

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Principal Component Analysis http://setosa.io/ev/principal- Food consumption in the UK

CS475/CS675 Lecture 23: July 19, 2016 Principal Component Analysis, Eigenfaces CS475/CS675 (c)

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Hebbian Learning, Hebbian Learning Principal Component Analysis, and Independent Component

Principal Component Analysis in a Linear Algebraic View by Anna Orosz under the mentorship of

Lecture 3 Principal Component Analysis Lin ZHANG, PhD School of Software Engineering Tongji

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace Machine

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf - PowerPoint PPT Presentation

Geometric Data Analysis Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de Montr eal Fall 2019 MAT 6480W (Guy Wolf) PCA UdeM - Fall 2019 1 / 20 Outline Preprocessing for data

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Principal Component Analysis http://setosa.io/ev/principal- Food consumption in the UK

CS475/CS675 Lecture 23: July 19, 2016 Principal Component Analysis, Eigenfaces CS475/CS675 (c)

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Hebbian Learning, Hebbian Learning Principal Component Analysis, and Independent Component

Principal Component Analysis in a Linear Algebraic View by Anna Orosz under the mentorship of

Lecture 3 Principal Component Analysis Lin ZHANG, PhD School of Software Engineering Tongji

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace Machine

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent

CSC 411: Lecture 14: Principal Components Analysis &amp; Autoencoders Class based on Raquel

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel