Dimensionality Reduction Algorithms (and how to interpret their - - PowerPoint PPT Presentation

dimensionality reduction algorithms
SMART_READER_LITE
LIVE PREVIEW

Dimensionality Reduction Algorithms (and how to interpret their - - PowerPoint PPT Presentation

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv University) XXX Winter School, November 2018 What is Dimensionality Reduction? Dimensionality Reduction algorithm 28 x 28 features per object 2


slide-1
SLIDE 1

Dimensionality Reduction Algorithms

(and how to interpret their output)

Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

slide-2
SLIDE 2

What is Dimensionality Reduction?

Dimensionality Reduction algorithm

28 x 28 features per object 2 features per object feature 1 feature 2

slide-3
SLIDE 3

Why do we need dimensionality reduction?

  • “Practical”:
  • Improve performance of supervised learning

algorithms: original features can be correlated and redundant, most algorithms cannot handle thousands

  • f features.
  • Compressing data (e.g., SKA).
  • “Artistic”:
  • Data visualization and interpretation.
  • Uncover complex trends.
  • Look for “unknown unknowns”.
slide-4
SLIDE 4

Two types of dimensionality reduction

1. Decomposition of the objects into “prototypes”. Each object can be represented using the prototypes. We gain: prototypes that represent the population and low-dimensional embedding. For example: SVD, PCA, ICA, NNMF , SOM and more…

slide-5
SLIDE 5

Two types of dimensionality reduction

2. Embedding of a high-dimensional dataset into a lower dimensional dataset. We gain: low-dimensional embedding. For example: tSNE, autoencoders

slide-6
SLIDE 6

Principle Component Analysis (PCA)

feature 1 feature 2

principle component 1 principle component 2

PCA is a transformation that converts a set of observations (possibly from correlated variables) into a set of values of linearly uncorrelated variables, called principle components.

  • The first principle component has the largest possible variance.
  • Each succeeding component has the highest possible variance, under the constrain

that it is orthogonal to the preceding components.

slide-7
SLIDE 7

Principle Component Analysis (PCA)

index of principle component variance cumulative percentage of the variance

PCA allows us to compress the data, by representing each object as a projection on the first principle components.

slide-8
SLIDE 8

Principle Component Analysis (PCA)

The principle components may represent the true building blocks of the

  • bjects in our dataset.
  • bserved object

= A * + B * + C * principle comp. 1 principle comp. 2 principle comp. 3

slide-9
SLIDE 9

Principle Component Analysis (PCA)

The projection onto the principle components gives a low-dimensional representation of the objects in the sample.

A (corresponds to principle

component 1)

B (corresponds to principle

component 2)

principle comp. 1 principle comp. 2

slide-10
SLIDE 10
  • Advantages:
  • Very simple and intuitive to use.
  • No free parameters!
  • Optimized to reduce variance.
  • Disadvantages:
  • Linear decomposition: we will not be able to describe

absorption lines, dust extinction, distance, etc..

  • Can produce negative principle components, which is

not always physical in astronomy.

PCA: Pros & Cons

slide-11
SLIDE 11

From: http://www.astroml.org/book_figures/chapter7/fig_spec_decompositions.html

slide-12
SLIDE 12

t-distributed stochastic neighbor embedding

(tSNE)

Embedding high-dimensional data in a low dimensional space (2 or 3) Input: (1) raw data, extracted features, or a distance matrix (2) hyper-parameters: perplexity high-dimensional space: low-dimensional space:

slide-13
SLIDE 13

tSNE

high-dimensional space: low-dimensional space: Intuition: tSNE tries to find a low-dimensional embedding that preserves, as much as possible, the distribution of distances between different objects. perplexity: the neighborhood that tSNE considers in optimization

distance in neighborhood N

slide-14
SLIDE 14

tSNE - example

28 x 28 features per object feature 1 feature 2

slide-15
SLIDE 15

tSNE - example

https://distill.pub/2016/misread-tsne/

slide-16
SLIDE 16

tSNE : Pros & Cons

  • Advantages:
  • Can take as an input a general distance matrix.
  • Non-linear embedding.
  • Preserves high-dimensional clustering well (depending on the

chosen perplexity).

  • Disadvantages:
  • No prototypes.
  • Sensitive to distance scales < perplexity.
  • Large distances are meaningless.

feature 1 feature 2 28 x 28 features per object

slide-17
SLIDE 17

UMAP

See: https://arxiv.org/abs/1802.03426 https://github.com/lmcinnes/umap

slide-18
SLIDE 18

Autoencoders

loss function = (

)

  • 2
slide-19
SLIDE 19

Autoencoders - Pros & Cons

  • Advantages:
  • Can reduce the dimensions of raw images (CNN) or time-series

(RNN)!

  • Can be used to produce an uncertainty on the embedding.
  • Disadvantages:
  • No prototypes.
  • Complexity and interpretability.
slide-20
SLIDE 20

Self Organizing Maps (SOM) and PINK

See: https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2016-116.pdf http://www.astron.nl/LifeCycle2018/Documents/Talks_Session1/Harwood_LifeCycle18.pdf

slide-21
SLIDE 21

How to interpret the output of a dimensionality reduction algorithm?

High-dimensional data Dimensionality Reduction algorithm 2D embedding

slide-22
SLIDE 22

How to interpret the output of a dimensionality reduction algorithm?

If we have prototypes - try to understand what they mean

slide-23
SLIDE 23

How to interpret the output of a dimensionality reduction algorithm?

Dimension #1 Dimension #2

slide-24
SLIDE 24

How to interpret the output of a dimensionality reduction algorithm?

Dimension #1 Dimension #2

slide-25
SLIDE 25

wavelength (A) normalized flux Dimension #1 Dimension #2 tSNE embedding in two dimensions

slide-26
SLIDE 26

Example with the APOGEE dataset

  • APOGEE stars: infrared spectra of ~250K stars.
  • Calculate Random Forest distance matrix —> Apply tSNE for dimensionality

reduction.

  • See Reis+17.
slide-27
SLIDE 27

Example with the APOGEE dataset

tSNE dimension #1 tSNE dimension #2

slide-28
SLIDE 28

Example with the APOGEE dataset

tSNE dimension #1 tSNE dimension #2 s t a c k s p e c t r a a l

  • n

g t h i s a x i s

  • 1. Stack observations along different axises.
slide-29
SLIDE 29

Example with the APOGEE dataset

tSNE dimension #1 tSNE dimension #2

  • 2. Color points according to tabulated parameters (e.g., from the SDSS)
slide-30
SLIDE 30

Example with the APOGEE dataset

tSNE dimension #1 tSNE dimension #2

  • 2. Color points according to tabulated parameters (e.g., from the SDSS)
slide-31
SLIDE 31

Example with the APOGEE dataset

tSNE dimension #1 tSNE dimension #2

  • 2. Color points according to tabulated parameters (e.g., from the SDSS)
slide-32
SLIDE 32

Example with the APOGEE dataset

tSNE dimension #1 tSNE dimension #2

  • 2. Color points according to tabulated parameters (e.g., from the SDSS)
slide-33
SLIDE 33

Questions?