Dimensionality Reduction Algorithms (and how to interpret their - PowerPoint PPT Presentation

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

What is Dimensionality Reduction? Dimensionality Reduction algorithm 28 x 28 features per object 2 features per object feature 2 feature 1

Why do we need dimensionality reduction? • “Practical”: • Improve performance of supervised learning algorithms: original features can be correlated and redundant, most algorithms cannot handle thousands of features. • Compressing data (e.g., SKA). • “Artistic”: • Data visualization and interpretation. • Uncover complex trends. • Look for “unknown unknowns”.

Two types of dimensionality reduction 1. Decomposition of the objects into “prototypes”. Each object can be represented using the prototypes. We gain: prototypes that represent the population and low-dimensional embedding. For example: SVD, PCA, ICA, NNMF , SOM and more…

Two types of dimensionality reduction 2. Embedding of a high-dimensional dataset into a lower dimensional dataset. We gain: low-dimensional embedding. For example: tSNE, autoencoders

Principle Component Analysis (PCA) PCA is a transformation that converts a set of observations (possibly from correlated variables) into a set of values of linearly uncorrelated variables, called principle components. - The first principle component has the largest possible variance . - Each succeeding component has the highest possible variance, under the constrain that it is orthogonal to the preceding components. principle component 2 feature 2 principle component 1 feature 1

Principle Component Analysis (PCA) PCA allows us to compress the data, by representing each object as a projection on the first principle components. cumulative percentage of the variance variance index of principle component

Principle Component Analysis (PCA) The principle components may represent the true building blocks of the objects in our dataset. principle comp. 1 observed object principle comp. 2 = A * + B * principle comp. 3 + C *

Principle Component Analysis (PCA) The projection onto the principle components gives a low-dimensional representation of the objects in the sample. principle comp. 1 B (corresponds to principle component 2) principle comp. 2 A (corresponds to principle component 1)

PCA: Pros & Cons • Advantages: • Very simple and intuitive to use. • No free parameters! • Optimized to reduce variance. • Disadvantages: • Linear decomposition: we will not be able to describe absorption lines, dust extinction, distance, etc.. • Can produce negative principle components, which is not always physical in astronomy.

From: http://www.astroml.org/book_figures/chapter7/fig_spec_decompositions.html

t-distributed stochastic neighbor embedding (tSNE) Embedding high-dimensional data in a low dimensional space (2 or 3) Input: (1) raw data, extracted features, or a distance matrix (2) hyper-parameters: perplexity high-dimensional space: low-dimensional space:

tSNE Intuition: tSNE tries to find a low-dimensional embedding that preserves, as much as possible, the distribution of distances between di ff erent objects. perplexity: the neighborhood that tSNE considers in optimization high-dimensional N space: distance in neighborhood low-dimensional space:

tSNE - example 28 x 28 features per object feature 2 feature 1

tSNE - example https://distill.pub/2016/misread-tsne/

tSNE : Pros & Cons • Advantages: • Can take as an input a general distance matrix. • Non-linear embedding. • Preserves high-dimensional clustering well (depending on the chosen perplexity). • Disadvantages: • No prototypes. • Sensitive to distance scales < perplexity. • Large distances are meaningless. feature 2 28 x 28 features per object feature 1

UMAP See: https://arxiv.org/abs/1802.03426 https://github.com/lmcinnes/umap

Autoencoders 2 loss function = ( - )

Autoencoders - Pros & Cons • Advantages: • Can reduce the dimensions of raw images (CNN) or time-series (RNN)! • Can be used to produce an uncertainty on the embedding. • Disadvantages: • No prototypes. • Complexity and interpretability.

Self Organizing Maps (SOM) and PINK See: https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2016-116.pdf http://www.astron.nl/LifeCycle2018/Documents/Talks_Session1/Harwood_LifeCycle18.pdf

How to interpret the output of a dimensionality reduction algorithm? High-dimensional data Dimensionality Reduction algorithm 2D embedding

How to interpret the output of a dimensionality reduction algorithm? If we have prototypes - try to understand what they mean

How to interpret the output of a dimensionality reduction algorithm? Dimension #2 Dimension #1

tSNE embedding in two dimensions normalized flux Dimension #2 Dimension #1 wavelength (A)

Example with the APOGEE dataset • APOGEE stars: infrared spectra of ~250K stars. • Calculate Random Forest distance matrix —> Apply tSNE for dimensionality reduction. • See Reis+17.

Example with the APOGEE dataset tSNE dimension #2 tSNE dimension #1

Example with the APOGEE dataset 1. Stack observations along di ff erent axises. tSNE dimension #2 s i x a s i h t g n o l a a r t c e p s k c a t s tSNE dimension #1

Example with the APOGEE dataset 2. Color points according to tabulated parameters (e.g., from the SDSS) tSNE dimension #2 tSNE dimension #1

Questions?

Dimensionality Reduction Algorithms (and how to interpret their - PowerPoint PPT Presentation

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv University) XXX Winter School, November 2018 What is Dimensionality Reduction? Dimensionality Reduction algorithm 28 x 28 features per object 2

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

6.1 Dimensionality reduction Previously in the course, we have discussed algorithms suited for a

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Dimensionality Reduction Techniques for Proximity Problems Piotr Indyk, SODA 2000 CS 468 |

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace Machine

Online Principal Component Analysis Edo Liberty . . . . . . . . . . . . . . . . .

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent

Dimensionality Reduction Algorithms (and how to interpret their - PowerPoint PPT Presentation

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv University) XXX Winter School, November 2018 What is Dimensionality Reduction? Dimensionality Reduction algorithm 28 x 28 features per object 2

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

6.1 Dimensionality reduction Previously in the course, we have discussed algorithms suited for a

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Dimensionality Reduction Techniques for Proximity Problems Piotr Indyk, SODA 2000 CS 468 |

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace Machine

Online Principal Component Analysis Edo Liberty . . . . . . . . . . . . . . . . .

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &amp;

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute

Introduction to Big Data and Machine Learning Dimensionality Reduction Continuous Latent

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &