Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade - PowerPoint PPT Presentation

Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016

Announcements ◮ Sheet 4 due this Friday by noon ◮ Practical 3 this week (continue next week if necessary) ◮ Revision Class for M.Sc. + D.Phil. Thu Week 9 (2pm & 3pm) ◮ Work through ML HT2016 Exam (Problem 3 is optional) 1

Supervised Learning: Summary ◮ Training data is of the form � ( x i , y i ) � where x i are features and y i is target ◮ We formulate a model: generative or discriminative ◮ Choose a suitable training criterion (loss function, maximum likelihood) ◮ Use optimisation procedure to learn parameters ◮ Use regularization or other techniques to reduce overfitting ◮ Use trained classifier to predict targets/labels on unseen x new 2

Unsupervised Learning Training data is of the form x 1 , . . . , x N Infer properties about the data ◮ Search: Identify patterns in data ◮ Density Estimation: Learn the underlying distribution generating data ◮ Clustering: Group similar points together ◮ Today: Dimensionality Reduction 3

Outline Today, we’ll study a technique for dimensionality reduction ◮ Principal Component Analysis (PCA) identifies a small number of directions which explain most variation in the data ◮ PCA can be kernelised ◮ Dimensionality reduction is important both for visualising and as a preprocessing step before applying other (typically unsupervised) learning algorithms 4

Principal Component Analysis (PCA) 5

PCA: Maximum Variance View PCA is a linear dimensionality reduction technique Find the directions of maximum variance in the data � ( x i ) � N i =1 Assume that data is centered, i.e., � i x i = 0 Find a set of orthogonal vectors v 1 , . . . , v k ◮ The first principal component (PC) v 1 is the direction of largest variance ◮ The second PC v 2 is the direction of largest variance orthogonal to v 1 ◮ The i th PC v i is the direction of largest variance orthogonal to v 1 , . . . , v i − 1 V D × k gives projection z i = V T x i for datapoint x i Z = XV for entire dataset 6

PCA: Maximum Variance View We are given i.i.d. data � ( x i ) � N i =1 ; data matrix X Want to find v 1 ∈ R D , � v 1 � = 1 , that maximizes � Xv 1 � 2 Let z = Xv 1 , so z i = x i · v 1 . We wish to find v 1 so that � N i =1 z 2 i is maximised. N � z 2 i = z T z i =1 = v T 1 X T Xv 1 The maximum value attained by v T 1 X T Xv 1 for � v 1 � = 1 is the largest eigenvalue of X T X . The argmax is the corresponding eigenvector v 1 . Find v 2 , v 3 , . . . , v k that are all successively orthogonal to previous directions and maximise (as yet unexplained variance) 7

PCA: Best Reconstruction We have i.i.d. data � ( x i ) � N i =1 ; data matrix X Find a k -dimensional linear projection that best represents the data Suppose V k ∈ R D × k is such that columns of V k are orthogonal Project data X on to subspace defined by V Z = XV k Minimize reconstruction error N � � x i − V k V T k x i � 2 i =1 8

Principal Component Analysis (PCA) 9

Equivalence between the Two Objectives: One PC Case Let v 1 be the direction of projection The point x is mapped to ˜ x = ( v 1 · x ) v 1 , where � v 1 � = 1 Maximum Variance Find v 1 that maximises � N i =1 ( v 1 · x i ) 2 Best Reconstruction Find v 1 that minimises: � x i � 2 � N N � � x i � 2 = � x i � 2 − 2( x i · ˜ � x i − ˜ x i ) + � ˜ i =1 i =1 � � x i � 2 − 2( v 1 · x i ) 2 + ( v 1 · x i ) 2 � v 1 � 2 � � N = i =1 � N � N � x i � 2 − ( v 1 · x i ) 2 = i =1 i =1 So the same v 1 satisfies the two objectives 10

Finding Principal Components: SVD Let X be the N × D data matrix Pair of singular vectors u ∈ R N , v ∈ R D and singular value σ ∈ R + if σ v = X T u σ u = Xv and v is an eigenvector of X T X with eigenvalue σ 2 u is an eigenvector of XX T with eigenvalue σ 2 11

Finding Principal Components: SVD X = UΣV T (say N > D ) Thin SVD: U is N × D , Σ is D × D , V is D × D , U T U = V T V = I D Σ is diagonal with σ 1 ≥ σ 2 ≥ · · · ≥ σ D ≥ 0 The first k principal components are first k columns of V Full SVD: U is N × N , Σ = N × D , V is D × D . V and U are orthonormal matrices 12

Algorithm for finding PCs (when N > D) Constructing the matrix X T X takes time O ( D 2 N ) Eigenvectors of X T X can be computed in time O ( D 3 ) Iterative methods to get top k singular (right) vectors directly: ◮ Initiate v 0 to be random unit norm vector ◮ Iterative Update: ◮ v t +1 = X T Xv t ◮ v t +1 = v t +1 / � v t +1 � until (approximate) convergence ◮ Update step only takes O ( ND ) time (compute Xv t first, then X T ( Xv t ) ) ◮ This gives the singular vector corresponding to the largest singular value ◮ Subsequent singular vectors obtained by choosing v 0 orthogonal to previously identified singular vectors (this needs to be done at each iteration to avoid numerical errors creeping in) 13

Algorithm for finding PCs (when D ≫ N) Constructing the matrix XX T takes time O ( N 2 D ) Eigenvectors of XX T can be computed in time O ( N 3 ) The eigenvectors give the ‘left’ singular vectors, u i of X To obtain v i , we use the fact that v i = σ − 1 X T u i Iterative method can be used directly as in the case when N > D 14

PCA: Reconstruction Error We have thin SVD: X = UΣV T Let V k be the matrix containing first k columns of V Projection on to k PCs: Z = XV k = U k Σ k , where U k is the matrix of the first k columns of U and Σ k is the k × k diagonal submatrix for Σ of the top k singular values Reconstruction: ˜ X = ZV T k = U k Σ k V T k � N � D k x i � 2 = � x i − V k V T σ 2 Reconstruction error = j i =1 j = k +1 This follows from the following calculations: D k � � X = UΣV T = σ j u j v T X = U k Σ k V T � σ j u j v T k = j j j =1 j =1 D � � X − ˜ σ 2 X � F = j j = k +1 15

Reconstruction of an Image using PCA 16

How many principal components to pick? Look for an ‘elbow’ in the curve of reconstruction error vs # PCs 17

Application: Eigenfaces A popular application of PCA for face detection and recognition is known as Eigenfaces ◮ Face detection: Identify faces in a given image ◮ Face Recognition: Classification (or search) problem to identify a certain person 18

Application: Eigenfaces PCA on a dataset of face images. Each principal component can be thought of as being an ‘element’ of a face. Source: http://vismod.media.mit.edu/vismod/demos/facerec/basic.html 19

Application: Eigenfaces Detection: Each patch of the image can be checked to identify whether there is a face in it Recognition: Map all faces in terms of their principal components. Then use some distance measure on the projections to find faces that are most like the input image. Why use PCA for face detection? ◮ Even though images can be large, we can use the D ≫ N approach to be efficient ◮ The final model (the PCs) can be quite compact, can fit on cameras, phones ◮ Works very well given the simplicity of the model 20

Application: Latent Semantic Analysis X is an N × D matrix, D is the size of dictionary x i is a vector of word counts (bag of words) Reconstruction using k eigenvectors X ≈ ZV T k , where Z = XV k � z i , z j � is probably a better notion of similarity than � x i , x j � × V T k ≈ X Z Non-negative matrix factorisation has more natural interpretation, but is harder to compute 21

PCA: Beyond Linearity 22

Projection: Linear PCA 23

Projection: Kernel PCA 24

Kernel PCA Suppose our original data is, for example, x ∈ R 2 We could perform degree 2 polynomial basis expansion as: � � T √ √ √ 2 x 2 , x 2 1 , x 2 φ ( x ) = 1 , 2 x 1 , 2 , 2 x 1 x 2 Recall that we can compute the inner products φ ( x ) · φ ( x ′ ) efficiently using the kernel trick 1 ) 2 + x 2 2 ) 2 + 2 x 1 x 2 x ′ φ ( x ) · φ ( x ′ ) = 1 + 2 x 1 x ′ 1 + 2 x 2 x ′ 2 + x 2 1 ( x ′ 2 ( x ′ 1 x ′ 2 2 ) 2 = (1 + x · x ′ ) 2 =: κ ( x , x ′ ) = (1 + x 1 x 2 + x ′ 1 x ′ 25

Kernel PCA Suppose we use the feature map: φ : R D → R M Let φ ( X ) be the N × M matrix We want find the singular vectors of φ ( X ) (eigenvectors of φ ( X ) T φ ( X ) ) However, in general M ≫ N (in fact M could be infinite for some kernels) Instead we’ll find the eigenvectors of φ ( X ) φ ( X ) T , the kernel matrix 26

Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade - PowerPoint PPT Presentation

Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary) Revision Class

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

VINCO tambour cupboard MULTI DRAWER SYSTEM (MDS) Multi Drawer System (MDS) 1 VINCO tambour

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Upcoming MDS 3.0 Changes: Section GG and More Shelly Nanney, RN, RAC-CT MDS Clinical Coordinator

Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

MDs Forest Laws, Management Programs, Policies and the MDs GGRA June 6, 2018

MDS 3.0 - Section Q Participation in Assessment and Goal Setting Do You Want To Speak To

Richards Bay B2B Engagement MDS Journey 15 September 2014 1 Contents MDS Strategy

MDS Execution Journey 11 February 2015 1 Contents MDS Strategy Executing the Ports Act

High-Risk MDS Patient Mika Geva, Hematology unit, The Chaim Sheba medical center 39 Y/O M

MDS Embedding MDS takes as input a distance matrix D , containing all N N pair of distances

Treatment of high risk MDS Valeria Santini MDS Unit, AOU Careggi, Universit di Firenze

Computer Graphics Camera & Projective Transformations Philipp Slusallek Motivation

Lecture 14 - Lecture 14 - 29 Feb 2016 29 Feb 2016 1 Administrative Everyone should be

Skinning CS418 Computer Graphics John C. Hart Simple Inverse Kinematics Given target point

Inference Tolerant to Missing Wearable Sensors Yang Liu 1 , Zhenjiang Li 1 , Zhidan Liu 2 , Kaishun

Trajectory Inverse Kinematics By Conditional Density Models Chao Qin and Miguel .

Peter Elbow revised: 05.29.14 || English 1301: Composition & Rhetoric I || D. Glen Smith,

TNRII SLIDE A24860 INSTALLATION AND OPERATING INSTRUCTIONS WARNING Prior to assembly, you

Symmetries of stochastic colored vertex models Pavel Galashin (UCLA) Dimers in Combinatorics and

Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade - PowerPoint PPT Presentation

Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary) Revision Class

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

VINCO tambour cupboard MULTI DRAWER SYSTEM (MDS) Multi Drawer System (MDS) 1 VINCO tambour

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Upcoming MDS 3.0 Changes: Section GG and More Shelly Nanney, RN, RAC-CT MDS Clinical Coordinator

Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

MDs Forest Laws, Management Programs, Policies and the MDs GGRA June 6, 2018

MDS 3.0 - Section Q Participation in Assessment and Goal Setting Do You Want To Speak To

Richards Bay B2B Engagement MDS Journey 15 September 2014 1 Contents MDS Strategy

MDS Execution Journey 11 February 2015 1 Contents MDS Strategy Executing the Ports Act

High-Risk MDS Patient Mika Geva, Hematology unit, The Chaim Sheba medical center 39 Y/O M

MDS Embedding MDS takes as input a distance matrix D , containing all N N pair of distances

Treatment of high risk MDS Valeria Santini MDS Unit, AOU Careggi, Universit di Firenze

Computer Graphics Camera &amp; Projective Transformations Philipp Slusallek Motivation

Lecture 14 - Lecture 14 - 29 Feb 2016 29 Feb 2016 1 Administrative Everyone should be

Skinning CS418 Computer Graphics John C. Hart Simple Inverse Kinematics Given target point

Inference Tolerant to Missing Wearable Sensors Yang Liu 1 , Zhenjiang Li 1 , Zhidan Liu 2 , Kaishun

Trajectory Inverse Kinematics By Conditional Density Models Chao Qin and Miguel .

Peter Elbow revised: 05.29.14 || English 1301: Composition &amp; Rhetoric I || D. Glen Smith,

TNRII SLIDE A24860 INSTALLATION AND OPERATING INSTRUCTIONS WARNING Prior to assembly, you

Symmetries of stochastic colored vertex models Pavel Galashin (UCLA) Dimers in Combinatorics and

Computer Graphics Camera & Projective Transformations Philipp Slusallek Motivation

Peter Elbow revised: 05.29.14 || English 1301: Composition & Rhetoric I || D. Glen Smith,