Linear Dimensionality Reduction Practical Machine Learning - PowerPoint PPT Presentation

PCA objective 2: projected variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): � n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): � n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 � n Assume data is centered: ˆ E [ x ] = 0 (what’s ˆ E [ U ⊤ x ] ?) Principal component analysis (PCA) / Basic principles 10

PCA objective 2: projected variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): � n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): � n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 � n Assume data is centered: ˆ E [ x ] = 0 (what’s ˆ E [ U ⊤ x ] ?) Objective: maximize variance of projected data ˆ E [ � U ⊤ x � 2 ] max U ∈ R d × k , U ⊤ U = I Principal component analysis (PCA) / Basic principles 10

Equivalence in two objectives Key intuition: variance of data = captured variance + reconstruction error � �� fixed want small want large Principal component analysis (PCA) / Basic principles 11

Equivalence in two objectives Key intuition: variance of data = captured variance + reconstruction error � �� fixed want small want large Pythagorean decomposition: x = UU ⊤ x + ( I − UU ⊤ ) x Principal component analysis (PCA) / Basic principles 11

Equivalence in two objectives Key intuition: variance of data = captured variance + reconstruction error � �� fixed want small want large Pythagorean decomposition: x = UU ⊤ x + ( I − UU ⊤ ) x � x � � ( I − UU ⊤ ) x � � UU ⊤ x � Principal component analysis (PCA) / Basic principles 11

Equivalence in two objectives Key intuition: variance of data = captured variance + reconstruction error � �� fixed want small want large Pythagorean decomposition: x = UU ⊤ x + ( I − UU ⊤ ) x � x � � ( I − UU ⊤ ) x � � UU ⊤ x � Take expectations; note rotation U doesn’t affect length: E [ � x � 2 ] = ˆ ˆ E [ � U ⊤ x � 2 ] + ˆ E [ � x − UU ⊤ x � 2 ] Principal component analysis (PCA) / Basic principles 11

Equivalence in two objectives Key intuition: variance of data = captured variance + reconstruction error � �� fixed want small want large Pythagorean decomposition: x = UU ⊤ x + ( I − UU ⊤ ) x � x � � ( I − UU ⊤ ) x � � UU ⊤ x � Take expectations; note rotation U doesn’t affect length: E [ � x � 2 ] = ˆ ˆ E [ � U ⊤ x � 2 ] + ˆ E [ � x − UU ⊤ x � 2 ] Minimize reconstruction error ↔ Maximize captured variance Principal component analysis (PCA) / Basic principles 11

Finding one principal component Input data: X = ( x 1 . . . x n ) Principal component analysis (PCA) / Basic principles 12

Finding one principal component Objective: maximize variance of projected data Input data: X = ( x 1 . . . x n ) Principal component analysis (PCA) / Basic principles 12

Finding one principal component Objective: maximize variance of projected data ˆ E [( u ⊤ x ) 2 ] = max � u � =1 Input data: X = ( x 1 . . . x n ) Principal component analysis (PCA) / Basic principles 12

Finding one principal component Objective: maximize variance of projected data ˆ E [( u ⊤ x ) 2 ] = max � u � =1 n 1 � ( u ⊤ x i ) 2 = max � u � =1 n i =1 Input data: X = ( x 1 . . . x n ) Principal component analysis (PCA) / Basic principles 12

Finding one principal component Objective: maximize variance of projected data ˆ E [( u ⊤ x ) 2 ] = max � u � =1 n 1 � ( u ⊤ x i ) 2 = max � u � =1 n i =1 1 Input data: n � u ⊤ X � 2 = max � u � =1 X = ( x 1 . . . x n ) Principal component analysis (PCA) / Basic principles 12

Finding one principal component Objective: maximize variance of projected data ˆ E [( u ⊤ x ) 2 ] = max � u � =1 n 1 � ( u ⊤ x i ) 2 = max � u � =1 n i =1 1 Input data: n � u ⊤ X � 2 = max � u � =1 X = ( x 1 . . . x n ) � 1 � n XX ⊤ � u � =1 u ⊤ = max u Principal component analysis (PCA) / Basic principles 12

Finding one principal component Objective: maximize variance of projected data ˆ E [( u ⊤ x ) 2 ] = max � u � =1 n 1 � ( u ⊤ x i ) 2 = max n � u � =1 i =1 1 Input data: n � u ⊤ X � 2 = max � u � =1 X = ( x 1 . . . x n ) � 1 � n XX ⊤ � u � =1 u ⊤ = max u = 1 def n XX ⊤ = largest eigenvalue of C ( C is covariance matrix of data) Principal component analysis (PCA) / Basic principles 12

How many principal components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured. Principal component analysis (PCA) / Basic principles 15

How many principal components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured. • Eigenvalues on a face image dataset: 1353.2 1086.7 820.1 λ i 553.6 287.1 2 3 4 5 6 7 8 9 10 11 i Principal component analysis (PCA) / Basic principles 15

How many principal components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured. • Eigenvalues on a face image dataset: 1353.2 1086.7 820.1 λ i 553.6 287.1 2 3 4 5 6 7 8 9 10 11 i • Eigenvalues typically drop off sharply, so don’t need that many. • Of course variance isn’t everything... Principal component analysis (PCA) / Basic principles 15

Computing PCA Method 1: eigendecomposition n XX ⊤ U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive) Principal component analysis (PCA) / Basic principles 16

Computing PCA Method 1: eigendecomposition n XX ⊤ U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d × d Σ d × n V ⊤ n × n where U ⊤ U = I d × d , V ⊤ V = I n × n , Σ is diagonal Computing top k singular vectors takes only O ( ndk ) Principal component analysis (PCA) / Basic principles 16

Computing PCA Method 1: eigendecomposition n XX ⊤ U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d × d Σ d × n V ⊤ n × n where U ⊤ U = I d × d , V ⊤ V = I n × n , Σ is diagonal Computing top k singular vectors takes only O ( ndk ) Relationship between eigendecomposition and SVD: Left singular vectors are principal components ( C = U Σ 2 U ⊤ ) Principal component analysis (PCA) / Basic principles 16

Roadmap • Principal component analysis (PCA) – Basic principles – Case studies – Kernel PCA – Probabilistic PCA • Canonical correlation analysis (CCA) • Fisher discriminant analysis (FDA) • Summary Principal component analysis (PCA) / Case studies 17

Eigen-faces [Turk and Pentland, 1991] • d = number of pixels • Each x i ∈ R d is a face image • x ji = intensity of the j -th pixel in image i Principal component analysis (PCA) / Case studies 18

Eigen-faces [Turk and Pentland, 1991] • d = number of pixels • Each x i ∈ R d is a face image • x ji = intensity of the j -th pixel in image i X d × n ≅ U d × k Z k × n ) ( z 1 . . . z n ) ) ≅ ( ( . . . Principal component analysis (PCA) / Case studies 18

Eigen-faces [Turk and Pentland, 1991] • d = number of pixels • Each x i ∈ R d is a face image • x ji = intensity of the j -th pixel in image i X d × n ≅ U d × k Z k × n ) ( z 1 . . . z n ) ) ≅ ( ( . . . Idea: z i more “meaningful” representation of i -th face than x i Can use z i for nearest-neighbor classification Much faster: O ( dk + nk ) time instead of O ( dn ) when n, d ≫ k Why no time savings for linear classifier? Principal component analysis (PCA) / Case studies 18

Latent Semantic Analysis [Deerwater, 1990] • d = number of words in the vocabulary • Each x i ∈ R d is a vector of word counts • x ji = frequency of word j in document i X d × n ≅ U d × k Z k × n ( game: 1 · · · · · · · · · 3 ) ≅ ( 1.9 ) ( z 1 . . . z n ) stocks: 2 · · · · · · · · · 0 0.4 ·· -0.001 chairman: 4 · · · · · · · · · 1 0.8 ·· 0.03 the: 8 · · · · · · · · · 7 0.01 ·· 0.04 · · · . . · · · · · · · · · . . . . . . . . . ·· . wins: 0 · · · · · · · · · 2 0.002 ·· 2.3 0.003 ·· Principal component analysis (PCA) / Case studies 19

Latent Semantic Analysis [Deerwater, 1990] • d = number of words in the vocabulary • Each x i ∈ R d is a vector of word counts • x ji = frequency of word j in document i X d × n ≅ U d × k Z k × n ( game: 1 · · · · · · · · · 3 ) ≅ ( 1.9 ) ( z 1 . . . z n ) stocks: 2 · · · · · · · · · 0 0.4 ·· -0.001 chairman: 4 · · · · · · · · · 1 0.8 ·· 0.03 the: 8 · · · · · · · · · 7 0.01 ·· 0.04 · · · . . · · · · · · · · · . . . . . . . . . ·· . wins: 0 · · · · · · · · · 2 0.002 ·· 2.3 0.003 ·· How to measure similarity between two documents? z ⊤ 1 z 2 is probably better than x ⊤ 1 x 2 Principal component analysis (PCA) / Case studies 19

Latent Semantic Analysis [Deerwater, 1990] • d = number of words in the vocabulary • Each x i ∈ R d is a vector of word counts • x ji = frequency of word j in document i X d × n ≅ U d × k Z k × n ( game: 1 · · · · · · · · · 3 ) ≅ ( 1.9 ) ( z 1 . . . z n ) stocks: 2 · · · · · · · · · 0 0.4 ·· -0.001 chairman: 4 · · · · · · · · · 1 0.8 ·· 0.03 the: 8 · · · · · · · · · 7 0.01 ·· 0.04 · · · . . · · · · · · · · · . . . . . . . . . ·· . wins: 0 · · · · · · · · · 2 0.002 ·· 2.3 0.003 ·· How to measure similarity between two documents? z ⊤ 1 z 2 is probably better than x ⊤ 1 x 2 Applications: information retrieval Principal component analysis (PCA) / Case studies 19

Latent Semantic Analysis [Deerwater, 1990] • d = number of words in the vocabulary • Each x i ∈ R d is a vector of word counts • x ji = frequency of word j in document i X d × n ≅ U d × k Z k × n ( game: 1 · · · · · · · · · 3 ) ≅ ( 1.9 ) ( z 1 . . . z n ) stocks: 2 · · · · · · · · · 0 0.4 ·· -0.001 chairman: 4 · · · · · · · · · 1 0.8 ·· 0.03 the: 8 · · · · · · · · · 7 0.01 ·· 0.04 · · · . . · · · · · · · · · . . . . . . . . . ·· . wins: 0 · · · · · · · · · 2 0.002 ·· 2.3 0.003 ·· How to measure similarity between two documents? z ⊤ 1 z 2 is probably better than x ⊤ 1 x 2 Applications: information retrieval Note: no computational savings; original x is already sparse Principal component analysis (PCA) / Case studies 19

Network anomaly detection [Lakhina, ’05] x ji = amount of traffic on link j in the network during each time interval i Principal component analysis (PCA) / Case studies 20

Network anomaly detection [Lakhina, ’05] x ji = amount of traffic on link j in the network during each time interval i Model assumption: total traffic is sum of flows along a few “paths” Principal component analysis (PCA) / Case studies 20

Network anomaly detection [Lakhina, ’05] x ji = amount of traffic on link j in the network during each time interval i Model assumption: total traffic is sum of flows along a few “paths” Apply PCA: each principal component intuitively represents a “path” Principal component analysis (PCA) / Case studies 20

Network anomaly detection [Lakhina, ’05] x ji = amount of traffic on link j in the network during each time interval i Model assumption: total traffic is sum of flows along a few “paths” Apply PCA: each principal component intuitively represents a “path” Anomaly when traffic deviates from first few principal components Principal component analysis (PCA) / Case studies 20

Unsupervised POS tagging [Sch¨ utze, ’95] Part-of-speech (POS) tagging task: Input: I like reducing the dimensionality of data . Output: NOUN VERB VERB(-ING) DET NOUN PREP NOUN . Principal component analysis (PCA) / Case studies 21

Unsupervised POS tagging [Sch¨ utze, ’95] Part-of-speech (POS) tagging task: Input: I like reducing the dimensionality of data . Output: NOUN VERB VERB(-ING) DET NOUN PREP NOUN . Each x i is (the context distribution of) a word. x ji is number of times word i appeared in context j Key idea: words appearing in similar contexts tend to have the same POS tags; so cluster using the contexts of each word type Problem: contexts are too sparse Principal component analysis (PCA) / Case studies 21

Unsupervised POS tagging [Sch¨ utze, ’95] Part-of-speech (POS) tagging task: Input: I like reducing the dimensionality of data . Output: NOUN VERB VERB(-ING) DET NOUN PREP NOUN . Each x i is (the context distribution of) a word. x ji is number of times word i appeared in context j Key idea: words appearing in similar contexts tend to have the same POS tags; so cluster using the contexts of each word type Problem: contexts are too sparse Solution: run PCA first, then cluster using new representation Principal component analysis (PCA) / Case studies 21

Multi-task learning [Ando & Zhang, ’05] • Have n related tasks (classify documents for various users) • Each task has a linear classifier with weights x i • Want to share structure between classifiers Principal component analysis (PCA) / Case studies 22

Multi-task learning [Ando & Zhang, ’05] • Have n related tasks (classify documents for various users) • Each task has a linear classifier with weights x i • Want to share structure between classifiers One step of their procedure: given n linear classifiers x 1 , . . . , x n , run PCA to identify shared structure: Principal component analysis (PCA) / Case studies 22

Multi-task learning [Ando & Zhang, ’05] • Have n related tasks (classify documents for various users) • Each task has a linear classifier with weights x i • Want to share structure between classifiers One step of their procedure: given n linear classifiers x 1 , . . . , x n , run PCA to identify shared structure: X = ( x 1 . . . x n ) ≅ UZ Principal component analysis (PCA) / Case studies 22

Multi-task learning [Ando & Zhang, ’05] • Have n related tasks (classify documents for various users) • Each task has a linear classifier with weights x i • Want to share structure between classifiers One step of their procedure: given n linear classifiers x 1 , . . . , x n , run PCA to identify shared structure: X = ( x 1 . . . x n ) ≅ UZ Each principal component is a eigen-classifier Principal component analysis (PCA) / Case studies 22

Multi-task learning [Ando & Zhang, ’05] • Have n related tasks (classify documents for various users) • Each task has a linear classifier with weights x i • Want to share structure between classifiers One step of their procedure: given n linear classifiers x 1 , . . . , x n , run PCA to identify shared structure: X = ( x 1 . . . x n ) ≅ UZ Each principal component is a eigen-classifier Other step of their procedure: Retrain classifiers, regularizing towards subspace U Principal component analysis (PCA) / Case studies 22

PCA summary • Intuition: capture variance of data or minimize reconstruction error Principal component analysis (PCA) / Case studies 23

PCA summary • Intuition: capture variance of data or minimize reconstruction error • Algorithm: find eigendecomposition of covariance matrix or SVD Principal component analysis (PCA) / Case studies 23

PCA summary • Intuition: capture variance of data or minimize reconstruction error • Algorithm: find eigendecomposition of covariance matrix or SVD • Impact: reduce storage (from O ( nd ) to O ( nk ) ), reduce time complexity Principal component analysis (PCA) / Case studies 23

PCA summary • Intuition: capture variance of data or minimize reconstruction error • Algorithm: find eigendecomposition of covariance matrix or SVD • Impact: reduce storage (from O ( nd ) to O ( nk ) ), reduce time complexity • Advantages: simple, fast Principal component analysis (PCA) / Case studies 23

PCA summary • Intuition: capture variance of data or minimize reconstruction error • Algorithm: find eigendecomposition of covariance matrix or SVD • Impact: reduce storage (from O ( nd ) to O ( nk ) ), reduce time complexity • Advantages: simple, fast • Applications: eigen-faces, eigen-documents, network anomaly detection, etc. Principal component analysis (PCA) / Case studies 23

Roadmap • Principal component analysis (PCA) – Basic principles – Case studies – Kernel PCA – Probabilistic PCA • Canonical correlation analysis (CCA) • Fisher discriminant analysis (FDA) • Summary Principal component analysis (PCA) / Kernel PCA 24

Limitations of linearity Principal component analysis (PCA) / Kernel PCA 25

Limitations of linearity PCA is effective Principal component analysis (PCA) / Kernel PCA 25

Limitations of linearity PCA is effective PCA is ineffective Principal component analysis (PCA) / Kernel PCA 25

Limitations of linearity PCA is effective PCA is ineffective Problem is that PCA subspace is linear: S = { x = Uz : z ∈ R k } Principal component analysis (PCA) / Kernel PCA 25

Limitations of linearity PCA is effective PCA is ineffective Problem is that PCA subspace is linear: S = { x = Uz : z ∈ R k } In this example: S = { ( x 1 , x 2 ) : x 2 = u 2 u 1 x 1 } Principal component analysis (PCA) / Kernel PCA 25

Going beyond linearity: quick solution Broken solution Principal component analysis (PCA) / Kernel PCA 26

Going beyond linearity: quick solution Broken solution Desired solution u 1 x 2 We want desired solution: S = { ( x 1 , x 2 ) : x 2 = u 2 1 } Principal component analysis (PCA) / Kernel PCA 26

Going beyond linearity: quick solution Broken solution Desired solution u 1 x 2 We want desired solution: S = { ( x 1 , x 2 ) : x 2 = u 2 1 } 1 , x 2 ) ⊤ We can get this: S = { φ ( x ) = Uz } with φ ( x ) = ( x 2 Principal component analysis (PCA) / Kernel PCA 26

Going beyond linearity: quick solution Broken solution Desired solution u 1 x 2 We want desired solution: S = { ( x 1 , x 2 ) : x 2 = u 2 1 } 1 , x 2 ) ⊤ We can get this: S = { φ ( x ) = Uz } with φ ( x ) = ( x 2 Linear dimensionality reduction in φ ( x ) space ⇔ Nonlinear dimensionality reduction in x space Principal component analysis (PCA) / Kernel PCA 26

Going beyond linearity: quick solution Broken solution Desired solution u 1 x 2 We want desired solution: S = { ( x 1 , x 2 ) : x 2 = u 2 1 } 1 , x 2 ) ⊤ We can get this: S = { φ ( x ) = Uz } with φ ( x ) = ( x 2 Linear dimensionality reduction in φ ( x ) space ⇔ Nonlinear dimensionality reduction in x space 1 , x 1 x 2 , sin( x 1 ) , . . . ) ⊤ In general, can set φ ( x ) = ( x 1 , x 2 Principal component analysis (PCA) / Kernel PCA 26

Going beyond linearity: quick solution Broken solution Desired solution u 1 x 2 We want desired solution: S = { ( x 1 , x 2 ) : x 2 = u 2 1 } 1 , x 2 ) ⊤ We can get this: S = { φ ( x ) = Uz } with φ ( x ) = ( x 2 Linear dimensionality reduction in φ ( x ) space ⇔ Nonlinear dimensionality reduction in x space 1 , x 1 x 2 , sin( x 1 ) , . . . ) ⊤ In general, can set φ ( x ) = ( x 1 , x 2 Problems: (1) ad-hoc and tedious (2) φ ( x ) large, computationally expensive Principal component analysis (PCA) / Kernel PCA 26

Towards kernels Representer theorem: PCA solution is linear combination of x i s Principal component analysis (PCA) / Kernel PCA 27

Towards kernels Representer theorem: PCA solution is linear combination of x i s Why? Recall PCA eigenvalue problem: XX ⊤ u = λ u Principal component analysis (PCA) / Kernel PCA 27

Towards kernels Representer theorem: PCA solution is linear combination of x i s Why? Recall PCA eigenvalue problem: XX ⊤ u = λ u Notice that u = X α = � n i =1 α i x i for some weights α Principal component analysis (PCA) / Kernel PCA 27

Towards kernels Representer theorem: PCA solution is linear combination of x i s Why? Recall PCA eigenvalue problem: XX ⊤ u = λ u Notice that u = X α = � n i =1 α i x i for some weights α Analogy with SVMs: weight vector w = X α Principal component analysis (PCA) / Kernel PCA 27

Towards kernels Representer theorem: PCA solution is linear combination of x i s Why? Recall PCA eigenvalue problem: XX ⊤ u = λ u Notice that u = X α = � n i =1 α i x i for some weights α Analogy with SVMs: weight vector w = X α Key fact: PCA only needs inner products K = X ⊤ X Principal component analysis (PCA) / Kernel PCA 27

Towards kernels Representer theorem: PCA solution is linear combination of x i s Why? Recall PCA eigenvalue problem: XX ⊤ u = λ u Notice that u = X α = � n i =1 α i x i for some weights α Analogy with SVMs: weight vector w = X α Key fact: PCA only needs inner products K = X ⊤ X Why? Use representer theorem on PCA objective: � u � =1 u ⊤ XX ⊤ u = α ⊤ ( X ⊤ X )( X ⊤ X ) α max max α ⊤ X ⊤ X α =1 Principal component analysis (PCA) / Kernel PCA 27

Kernel PCA Kernel function: k ( x 1 , x 2 ) such that K , the kernel matrix formed by K ij = k ( x i , x j ) , is positive semi-definite Principal component analysis (PCA) / Kernel PCA 28

Linear Dimensionality Reduction Practical Machine Learning - PowerPoint PPT Presentation

Linear Dimensionality Reduction Practical Machine Learning (CS294-34) September 24, 2009 Percy Liang Lots of high-dimensional data... According to media reports, a pair of hackers said on Saturday that the Firefox Zambian President Levy

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Principal component analysis DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science

Principal Component Analysis Eric Eager Data Scientist at Pro Football Focus DataCamp Linear

1 Principal Components Analysis (PCA) Review of basic setup: N vectors, { x 1 , . . .

A Cluster Target Similarity Based g y Principal Component Analysis for Interval Valued

PRINCIPAL COMPONENT ANALYSIS(PCA) By Deepen naorem Latent(hidden) representation Method A

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Topological Data Analysis A Framework for Machine Learning Samarth Bansal (11630) Deepak