Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 7: Dimensionality Reduction Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 1 /

Dimensionality Reduction The goal of dimensionality reduction is to find a lower dimensional representation of the data matrix D to avoid the curse of dimensionality. Given n × d data matrix, each point x i = ( x i 1 , x i 2 ,..., x id ) T is a vector in the ambient d -dimensional vector space spanned by the d standard basis vectors e 1 , e 2 ,..., e d . Given any other set of d orthonormal vectors u 1 , u 2 ,..., u d we can re-express each point x as x = a 1 u 1 + a 2 u 2 + ··· + a d u d where a = ( a 1 , a 2 ,..., a d ) T represents the coordinates of x in the new basis. More compactly: x = Ua where U is the d × d orthogonal matrix, whose i th column comprises the i th basis vector u i . Thus U − 1 = U T , and we have a = U T x Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 2 /

Optimal Basis: Projection in Lower Dimensional Space There are potentially infinite choices for the orthonormal basis vectors. Our goal is to choose an optimal basis that preserves essential information about D . We are interested in finding the optimal r -dimensional representation of D , with r ≪ d . Projection of x onto the first r basis vectors is given as r x ′ = a 1 u 1 + a 2 u 2 + ··· + a r u r = � a i u i = U r a r i = 1 where U r and a r comprises the r basis vectors and coordinates, respv. Also, restricting a = U T x to r terms, we have a r = U T r x The r -dimensional projection of x is thus given as: x ′ = U r U T r x = P r x where P r = U r U T r = � r i = 1 u i u T i is the orthogonal projection matrix for the subspace spanned by the first r basis vectors. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 3 /

Optimal Basis: Error Vector Given the projected vector x ′ = P r x , the corresponding error vector , is the projection onto the remaining d − r basis vectors d � ǫ = a i u i = x − x ′ i = r + 1 The error vector ǫ is orthogonal to x ′ . The goal of dimensionality reduction is to seek an r -dimensional basis that gives the best possible approximation x ′ i over all the points x i ∈ D . Alternatively, we seek to minimize the error ǫ i = x i − x ′ i over all the points. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 4 /

bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Iris Data: Optimal One-dimensional Basis X 3 X 3 X 1 X 1 X 2 X 2 bC bC bC bC u 1 Iris Data: 3D Optimal 1D Basis Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 5 /

bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Iris Data: Optimal 2D Basis X 3 X 3 X 1 X 2 X 1 X 2 bC bC u 2 bC bC bC bC bC bC u 1 Iris Data (3D) Optimal 2D Basis Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 6 /

Principal Component Analysis Principal Component Analysis (PCA) is a technique that seeks a r -dimensional basis that best captures the variance in the data. The direction with the largest projected variance is called the first principal component. The orthogonal direction that captures the second largest projected variance is called the second principal component, and so on. The direction that maximizes the variance is also the one that minimizes the mean squared error. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 7 /

Principal Component: Direction of Most Variance We seek to find the unit vector u that maximizes the projected variance of the points. Let D be centered, and let Σ be its covariance matrix. The projection of x i on u is given as � u T x i � x ′ u = ( u T x i ) u = a i u i = u T u Across all the points, the projected variance along u is � � n n n u = 1 ( a i − µ u ) 2 = 1 1 � � � σ 2 u T � x i x T � u = u T x i x T u = u T Σ u i i n n n i = 1 i = 1 i = 1 We have to find the optimal basis vector u that maximizes the projected variance σ 2 u = u T Σ u , subject to the constraint that u T u = 1. The maximization objective is given as J ( u ) = u T Σ u − α ( u T u − 1 ) max u Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 8 /

Principal Component: Direction of Most Variance Given the objective max u J ( u ) = u T Σ u − α ( u T u − 1 ) , we solve it by setting the derivative of J ( u ) with respect to u to the zero vector, to obtain ∂ u T Σ u − α ( u T u − 1 ) � � = 0 ∂ u that is, 2 Σ u − 2 α u = 0 which implies Σ u = α u Thus α is an eigenvalue of the covariance matrix Σ , with the associated eigenvector u . Taking the dot product with u on both sides, we have σ 2 u = u T Σ uu T α u = α u T u = α To maximize the projected variance σ 2 u , we thus choose the largest eigenvalue λ 1 of Σ , and the dominant eigenvector u 1 specifies the direction of most variance, also called the f irst principal component . Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 9 /

bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Iris Data: First Principal Component X 3 X 1 X 2 bC bC u 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 10 /

Minimum Squared Error Approach The direction that maximizes the projected variance is also the one that minimizes the average squared error. The mean squared error ( MSE ) optimization condition is n n n � x i � 2 MSE ( u ) = 1 � ǫ i � 2 = 1 i � 2 = � � � � x i − x ′ − u T Σ u n n n i = 1 i = 1 i = 1 Since the first term is fixed for a dataset D , we see that the direction u 1 that maximizes the variance is also the one that minimizes the MSE. Further, n d � x i � 2 � � − u T Σ u = var ( D ) = tr (Σ) = σ 2 i n i = 1 i = 1 Thus, for the direction u 1 that minimizes MSE, we have MSE ( u 1 ) = var ( D ) − u T 1 Σ u 1 = var ( D ) − λ 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 7: Dimensionality Reduction 11 /

Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Principal component analysis DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science

Principal Component Analysis Eric Eager Data Scientist at Pro Football Focus DataCamp Linear

1 Principal Components Analysis (PCA) Review of basic setup: N vectors, { x 1 , . . .

Curriculum Briefing for P3 & P4 Parents 18 January 2020 Overview Vision, Mission &

Linear Dimensionality Reduction Practical Machine Learning (CS294-34) September 24, 2009 Percy

A Cluster Target Similarity Based g y Principal Component Analysis for Interval Valued

PRINCIPAL COMPONENT ANALYSIS(PCA) By Deepen naorem Latent(hidden) representation Method A

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Principal component analysis DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science

Principal Component Analysis Eric Eager Data Scientist at Pro Football Focus DataCamp Linear

1 Principal Components Analysis (PCA) Review of basic setup: N vectors, { x 1 , . . .

Curriculum Briefing for P3 &amp; P4 Parents 18 January 2020 Overview Vision, Mission &amp;

Linear Dimensionality Reduction Practical Machine Learning (CS294-34) September 24, 2009 Percy

A Cluster Target Similarity Based g y Principal Component Analysis for Interval Valued

PRINCIPAL COMPONENT ANALYSIS(PCA) By Deepen naorem Latent(hidden) representation Method A

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Curriculum Briefing for P3 & P4 Parents 18 January 2020 Overview Vision, Mission &