Spatial Data: Dimensionality Reduction CSC444 Techniques In this - - PowerPoint PPT Presentation

spatial data dimensionality reduction
SMART_READER_LITE
LIVE PREVIEW

Spatial Data: Dimensionality Reduction CSC444 Techniques In this - - PowerPoint PPT Presentation

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data point as a vector in R^n (what could possibly go wrong?) Linear dimensionality reduction: Reduction is achieved by multiplying a point by


slide-1
SLIDE 1

Spatial Data: Dimensionality Reduction

CSC444 Techniques

slide-2
SLIDE 2

In this subfield, we think

  • f a data point as a

vector in R^n

(what could possibly go wrong?)

slide-3
SLIDE 3

“Linear” dimensionality reduction:

Reduction is achieved by multiplying a point by a single matrix for every point.

slide-4
SLIDE 4

Regular Scatterplots

  • Every data point is a vector:
  • Every scatterplot is produced

by a very simple matrix:

    v0 v1 v2 v3      1 1

  •  1

1

slide-5
SLIDE 5

What about other matrices?

slide-6
SLIDE 6

Grand Tour (Asimov, 1985)

http://cscheid.github.io/lux/demos/tour/tour.html

slide-7
SLIDE 7

Is there a best matrix? How do we think about that?

slide-8
SLIDE 8

Linear Algebra review

  • Vectors
  • Inner Products
  • Lengths
  • Angles
  • Bases
  • Linear Transformations and Eigenvectors
slide-9
SLIDE 9

Principal Component Analysis

Sepal.Length Sepal.Width Petal.Length Petal.Width −0.2 −0.1 0.0 0.1 0.2 −0.10 −0.05 0.00 0.05 0.10 0.15

PC1 PC2 Species

setosa versicolor virginica

slide-10
SLIDE 10

Principal Component Analysis

  • Given data set as matrix X in R^(d x n),
  • Center matrix:
  • is a matrix of inner products of centered rows of X
  • Compute eigendecomposition of
  • The principal components are the first few rows of

˜ X = X(I − ~ 1 n ~ 1T ) = XH ˜ XT ˜ X = UΣU T ˜ XT ˜ X UΣ1/2 ˜ XT ˜ X

slide-11
SLIDE 11
slide-12
SLIDE 12

What if we don’t have coordinates, but distances? “Classical” Multidimensional Scaling

slide-13
SLIDE 13

http://www.math.pku.edu.cn/teachers/yaoy/Fall2011/ lecture11.pdf

slide-14
SLIDE 14

What if we don’t distances, but similarities? Classical Multidimensional Scaling works, too!

slide-15
SLIDE 15

Borg and Groenen, Modern Multidimensional Scaling

slide-16
SLIDE 16

Borg and Groenen, Modern Multidimensional Scaling

slide-17
SLIDE 17

Classical Multidimensional Scaling

  • Algorithm:
  • Given , create
  • PCA of B is equal to the PCA of X
  • (Huh?!)

Dij = |Xi − Xj|2 B = −1 2HDHT

slide-18
SLIDE 18

“Nonlinear” dimensionality reduction (ie: projection is not a matrix operation)

slide-19
SLIDE 19

Data might have “high-

  • rder” structure
slide-20
SLIDE 20

http://isomap.stanford.edu/Supplemental_Fig.pdf

slide-21
SLIDE 21

We might want to minimize something else besides “difference between squared distances”

t-SNE: difference between neighbor ordering Why not distances?

slide-22
SLIDE 22

The curse of Dimensionality

  • High dimensional space looks nothing like low-

dimensional space

  • Most distances become meaningless