Dimensionality Reduction & Embedding Prof. Mike Hughes Many - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2

What will we learn? Supervised Learning Data Examples Performance { x n } N measure Task n =1 Unsupervised Learning summary data of x x Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 3

Task: Embedding Supervised Learning x 2 Unsupervised Learning embedding Reinforcement x 1 Learning Mike Hughes - Tufts COMP 135 - Spring 2019 4

Dim. Reduction/Embedding Unit Objectives • Goals of dimensionality reduction • Reduce feature vector size (keep signal, discard noise) • “Interpret” features: visualize/explore/understand • Common approaches • Principal Component Analysis (PCA) • t-SNE (“tee-snee”) • word2vec and other neural embeddings • Evaluation Metrics • Storage size - Reconstruction error • “Interpretability” - Prediction error Mike Hughes - Tufts COMP 135 - Spring 2019 5

Example: 2D viz. of movies Mike Hughes - Tufts COMP 135 - Spring 2019 6

Example: Genes vs. geography Mike Hughes - Tufts COMP 135 - Spring 2019 7

Example: Eigen Clothing Mike Hughes - Tufts COMP 135 - Spring 2019 8

Mike Hughes - Tufts COMP 135 - Spring 2019 9

Principal Component Analysis Mike Hughes - Tufts COMP 135 - Spring 2019 10

Linear Projection to 1D Mike Hughes - Tufts COMP 135 - Spring 2019 11

Reconstruction from 1D to 2D Mike Hughes - Tufts COMP 135 - Spring 2019 12

2D Orthogonal Basis Mike Hughes - Tufts COMP 135 - Spring 2019 13

Which 1D projection is best? Mike Hughes - Tufts COMP 135 - Spring 2019 14

PCA Principles • Minimize reconstruction error • Should be able to recreate x from z • Equivalent to maximizing variance • Want z to retain maximum information Mike Hughes - Tufts COMP 135 - Spring 2019 15

Best Direction related to Eigenvalues of Data Covariance Mike Hughes - Tufts COMP 135 - Spring 2019 16

Principal Component Analysis Training step: .fit() • Input: • X : training data, N x F • N high-dim. example vectors • K : int, number of dimensions to discover • Satisfies 1 <= K <= F • Output: • m : mean vector, size F • V : learned eigenvector basis, K x F • One F-dimensional vector for each component • Each of the K vectors is orthogonal to every other Mike Hughes - Tufts COMP 135 - Spring 2019 17

Principal Component Analysis Transformation step: .transform() • Input: • X : training data, N x F • N high-dim. example vectors • Trained PCA “model” • m : mean vector, size F • V : learned eigenvector basis, K x F • One F-dimensional vector for each component • Each of the K vectors is orthogonal to every other • Output: • Z : projected data, N x K Mike Hughes - Tufts COMP 135 - Spring 2019 18

PCA Demo • http://setosa.io/ev/principal- component-analysis/ Mike Hughes - Tufts COMP 135 - Spring 2019 19

Example: EigenFaces Mike Hughes - Tufts COMP 135 - Spring 2019 20

PCA: How to Select K? • 1) Use downstream supervised task metric • Regression error • 2) Use memory constraints of task • Can’t store more than 50 dims for 1M examples? Take K=50 • 3) Plot cumulative “variance explained” • Take K that seems to capture 90% or all variance Mike Hughes - Tufts COMP 135 - Spring 2019 21

PCA Summary PRO • Usually, fast to train, fast to test • Slow only if finding K eigenvectors of an F x F matrix is slow • Nested model • PCA with K=5 has subset of params equal to PCA with K=4 CON • Learned basis known only up to +/- scaling • Not often best for supervised tasks Mike Hughes - Tufts COMP 135 - Spring 2019 22

Visualization with t-SNE Mike Hughes - Tufts COMP 135 - Spring 2019 23

Credit: Luuk Derksen (https://medium.com/@luckylwk/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python- 8ef87e7915b) Mike Hughes - Tufts COMP 135 - Spring 2019 24

Credit: Luuk Derksen (https://medium.com/@luckylwk/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python- 8ef87e7915b) Mike Hughes - Tufts COMP 135 - Spring 2019 25

Practical Tips for t-SNE • If dim is very high, preprocess with PCA to ~30 dims, then apply t-SNE • Beware: Non-convex cost function https://distill.pub/2016/misread-tsne/ Mike Hughes - Tufts COMP 135 - Spring 2019 27

Matrix Factorization as Learned “Embedding” Mike Hughes - Tufts COMP 135 - Spring 2019 28

Matrix Factorization (MF) • User ! represented by vector " # ∈ % & • Item ' represented by vector ( ) ∈ % & * ( ) approximates the utility + #) • Inner product " # • Intuition: • Two items with similar vectors get similar utility scores from the same user; • Two users with similar vectors give similar utility scores to the same item Mike Hughes - Tufts COMP 135 - Spring 2019 29

Word Embeddings Mike Hughes - Tufts COMP 135 - Spring 2019 31

Word Embeddings (word2vec) Goal: map each word in vocabulary to an embedding vector • Preserve semantic meaning in this new vector space vec(swimming) – vec(swim) + vec(walk) = vec(walking) 32

Word Embeddings (word2vec) Goal: map each word in vocabulary to an embedding vector • Preserve semantic meaning in this new vector space 33

How to embed? Goal: learn weights Training W = Reward embeddings that predict nearby words in the sentence. 7.1 3.2 embedding dimensions typical 100-1000 -4.1 dinosaur s hammer tacos W t a f f fixed vocabulary Credit: typical 1000-100k https://www.tensorflow.org/tutorials/representation/word2vec 34

Dimensionality Reduction & Embedding Prof. Mike Hughes Many - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Dimensionality Reduction embedding Distortion L Norm Corollaries Anil Maheshwari Euclidean

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Dimensionality Reduction Based on Geodesic Distance Hao Li,515030910494 Yifan Shen,515030910491

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

History and Theory of Nonlinear Principal Component Analysis Jan de Leeuw February 11, 2011 Jan

Pattern Detection in Computer Networks Using Robust Principal Component Analysis Randy

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration