Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace

Machine Learning 2 DS 4420 - Spring 2020 Some slides today borrowing from :   Percy Liang (Stanford) Other material from the MML book (Faisal and Ong)

Motivation • We often want to work with high dimensional data (e.g., images). We also often have lots of it. • This is computationally expensive to store and work with.

Dimensionality Reduction Fundamental idea Exploit redundancy in the data; find lower-dimensional representation 4 4 2 2 x 2 x 2 0 0 − 2 − 2 − 4 − 4 − 5 . 0 − 2 . 5 0 . 0 2 . 5 5 . 0 − 5 . 0 − 2 . 5 0 . 0 2 . 5 5 . 0 x 1 x 1

Example (from lecture 5): Dimensionality reduction via k- means

Example (from lecture 5): Dimensionality reduction via k- means This highlights the natural connection between dimensionality reduction and compression.

Dimensionality reduction Goal: Map high dimensional data onto lower-dimensional data in a manner that preserves distances/similarities Original Data (4 dims) Projection with PCA (2 dims) Objective: projection should “preserve” relative distances

Linear dimensionality reduction Idea : Project high-dimensional vector   onto a lower dimensional space ∈ x ∈ R 361 z = U > x z ∈ R 10

Linear dimensionality reduction Original Reconstructed R D R D Compressed R M x z x ˜

Objective Key intuition: variance of data = captured variance + reconstruction error | {z } | {z } | {z } fixed want small want large

Principal Component Analysis (on board)

In Sum: Principal Component Analysis Data Orthonormal Basis U = ( u 1 ·· u k ) ∈ R d ⇥ X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n d d Eigenvectors of Covariance Eigen-decomposition   λ 1 λ 2   Λ = ...   λ d Idea : Take top- k eigenvectors to maximize variance

Getting the eigenvalues, two ways • Direct eigenvalue decomposition of the covariance matrix N S = 1 n = 1 X N XX > x n x > N n =1

Getting the eigenvalues, two ways • Direct eigenvalue decomposition of the covariance matrix N S = 1 n = 1 X N XX > x n x > N n =1 • Singular Value Decomposition (SVD)

Singular Value Decomposition Idea : Decompose the   d x n matrix X into 1. A n x n basis V   (unitary matrix) 2. A d x n matrix Σ   (diagonal projection) 3. A d x d basis U   (unitary matrix) d X = U d ⇥ d Σ d ⇥ n V > n ⇥ n

SVD for PCA V > = U X Σ , |{z} |{z} |{z} |{z} D ⇥ N D ⇥ D D ⇥ N N ⇥ N S = 1 N XX > = 1 Σ > U > = 1 N U Σ V > V N U ΣΣ > U > | {z } = I N

SVD for PCA V > = U X Σ , |{z} |{z} |{z} |{z} D ⇥ N D ⇥ D D ⇥ N N ⇥ N S = 1 N XX > = 1 Σ > U > = 1 N U Σ V > V N U ΣΣ > U > | {z } = I N It turns out the columns of U are the eigenvectors of XX T

Principal Component Analysis Example 10.3 (MNIST Digits Embedding) the in

Principal Component Analysis Top 2 components Bottom 2 components Data : three varieties of wheat: Kama, Rosa, Canadian   Attributes : Area, Perimeter, Compactness, Length of Kernel,   Width of Kernel, Asymmetry Coefficient, Length of Groove

Eigen-faces [Turk & Pentland 1991] • d = number of pixels • Each x i 2 R d is a face image • x ji = intensity of the j -th pixel in image i

Eigen-faces [Turk & Pentland 1991] • d = number of pixels • Each x i 2 R d is a face image • x ji = intensity of the j -th pixel in image i X d × n U d × k Z k × n u ) ( z 1 . . . z n ) ) u ( ( . . .

Eigen-faces [Turk & Pentland 1991] • d = number of pixels • Each x i 2 R d is a face image • x ji = intensity of the j -th pixel in image i X d × n U d × k Z k × n u ) ( z 1 . . . z n ) ) u ( ( . . . Idea: z i more “meaningful” representation of i -th face than x i Can use z i for nearest-neighbor classification

Eigen-faces [Turk & Pentland 1991] • d = number of pixels • Each x i 2 R d is a face image • x ji = intensity of the j -th pixel in image i X d × n U d × k Z k × n u ) ( z 1 . . . z n ) ) u ( ( . . . Idea: z i more “meaningful” representation of i -th face than x i Can use z i for nearest-neighbor classification Much faster: O ( dk + nk ) time instead of O ( dn ) when n, d � k

Aside: How many components? • • Magnitude of eigenvalues indicate fraction of variance captured. • Eigenvalues on a face image dataset: 1353.2 1086.7 820.1 λ i 553.6 287.1 2 3 4 5 6 7 8 9 10 11 i • Eigenvalues typically drop o ff sharply, so don’t need that many. • Of course variance isn’t everything...

Latent Semantic Analysis [Deerwater 1990] • d = number of words in the vocabulary • Each x i ∈ R d is a vector of word counts =

Latent Semantic Analysis [Deerwater 1990] • d = number of words in the vocabulary • Each x i ∈ R d is a vector of word counts • x ji = frequency of word j in document i X d ⇥ n U d ⇥ k Z k ⇥ n u ( game: 1 · · · · · · · · · 3 ) u ( 1.9 ) ( z 1 . . . z n ) stocks: 2 · · · · · · · · · 0 0.4 ·· -0.001 chairman: 4 · · · · · · · · · 1 0.8 ·· 0.03 the: 8 · · · · · · · · · 7 0.01 ·· 0.04 · · · . . · · · · · · · · · . . . . . . . . . ·· . wins: 0 · · · · · · · · · 2 0.002 ·· 2.3 0.003 ··

Latent Semantic Analysis [Deerwater 1990] • d = number of words in the vocabulary • Each x i ∈ R d is a vector of word counts • x ji = frequency of word j in document i X d ⇥ n U d ⇥ k Z k ⇥ n u ( game: 1 · · · · · · · · · 3 ) u ( 1.9 ) ( z 1 . . . z n ) stocks: 2 · · · · · · · · · 0 0.4 ·· -0.001 chairman: 4 · · · · · · · · · 1 0.8 ·· 0.03 the: 8 · · · · · · · · · 7 0.01 ·· 0.04 · · · . . · · · · · · · · · . . . . . . . . . ·· . wins: 0 · · · · · · · · · 2 0.002 ·· 2.3 0.003 ·· How to measure similarity between two documents? z > 1 z 2 is probably better than x > 1 x 2

Probabilistic PCA • If we define a prior over z then we can sample from the latent space and hallucinate images

Limitations of Linearity PCA is e ff ective PCA is ine ff ective

Nonlinear PCA Broken solution Desired solution u 1 x 2 We want desired solution: S = { ( x 1 , x 2 ) : x 2 = u 2 1 }

Nonlinear PCA Broken solution Desired solution u 1 x 2 We want desired solution: S = { ( x 1 , x 2 ) : x 2 = u 2 1 } Idea: Use kernels 1 , x 2 ) > We can get this: S = { φ ( x ) = Uz } with φ ( x ) = ( x 2 { } Linear dimensionality reduction in φ ( x ) space ⇔ Nonlinear dimensionality reduction in x space

Kernel PCA

Wrapping up • PCA is a linear model for dimensionality reduction which finds a mapping to a lower dimensional space that maximizes variance • We saw that this is equivalent to performing an eigendecomposition on the covariance matrix of X • Next time Auto-encoders and neural compression for non-linear projections

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace Machine Learning 2 DS 4420 - Spring 2020 Some slides today borrowing from : Percy Liang (Stanford) Other material from the MML book (Faisal and Ong)

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Online Principal Component Analysis Edo Liberty . . . . . . . . . . . . . . . . .

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &

Structured sparse methods for matrix factorization Francis Bach Sierra team, INRIA - Ecole

Symbolic PCA of compositional data. Sun Makosso Kallyth & Edwin Diday Universit e Paris

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace Machine Learning 2 DS 4420 - Spring 2020 Some slides today borrowing from : Percy Liang (Stanford) Other material from the MML book (Faisal and Ong)

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Online Principal Component Analysis Edo Liberty . . . . . . . . . . . . . . . . .

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &amp;

Structured sparse methods for matrix factorization Francis Bach Sierra team, INRIA - Ecole

Symbolic PCA of compositional data. Sun Makosso Kallyth &amp; Edwin Diday Universit e Paris

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &

Symbolic PCA of compositional data. Sun Makosso Kallyth & Edwin Diday Universit e Paris