CS 171: Visualization High-Dimensional Data Hanspeter Pfister - - PowerPoint PPT Presentation

cs 171 visualization
SMART_READER_LITE
LIVE PREVIEW

CS 171: Visualization High-Dimensional Data Hanspeter Pfister - - PowerPoint PPT Presentation

CS 171: Visualization High-Dimensional Data Hanspeter Pfister pfister@seas.harvard.edu This Week PII due Monday, April 8 Friday lab (10-11:30 am, MD G115): Hands-On D3 with Azalea, Sofia, and Billy (our last lab!) Th: Alberto Cairo


slide-1
SLIDE 1

CS 171: Visualization

High-Dimensional Data

Hanspeter Pfister pfister@seas.harvard.edu

slide-2
SLIDE 2

This Week

  • PII due Monday, April 8
  • Friday lab (10-11:30 am, MD G115): Hands-On D3

with Azalea, Sofia, and Billy (our last lab!)

slide-3
SLIDE 3

Th: Alberto Cairo

A Functional Art: Storytelling with Data, Graphs, Maps, and Diagrams

slide-4
SLIDE 4

High-Dimensional Data

slide-5
SLIDE 5

Item

slide-6
SLIDE 6

Attribute

slide-7
SLIDE 7

Taxonomy

  • Based on number of attributes
  • 1: Univariate
  • 2: Bivariate
  • 3: Trivariate
  • >3: Multivariate
slide-8
SLIDE 8

Tableau

slide-9
SLIDE 9

Linked Views

slide-10
SLIDE 10

Multivariate Plots

ggplot2

slide-11
SLIDE 11

Multivariate Plots

R

slide-12
SLIDE 12

Heatmap

ggplot2

slide-13
SLIDE 13
  • A. Lex

Hierarchical Heatmap

slide-14
SLIDE 14

3D Scatter Plots

R, lattice

slide-15
SLIDE 15

R, lattice

3D Continuous Plots

slide-16
SLIDE 16

Small Multiples

Tableau

slide-17
SLIDE 17

Small Multiples

Protovis

slide-18
SLIDE 18

Horizon Graphs

slide-19
SLIDE 19

19

Becker 1996

slide-20
SLIDE 20

D3

slide-21
SLIDE 21

EnRoute

  • A. Lex
slide-22
SLIDE 22

Parallel Coordinates

slide-23
SLIDE 23

Use more than two axes

“Hyperdimensional Data Analysis Using Parallel Coordinates”, Wegman, 1990 Based on slide from Munzner

slide-24
SLIDE 24

Parallel Coordinates

slide-25
SLIDE 25

Parallel Coordinates

slide-26
SLIDE 26

Parallel Coordinates

slide-27
SLIDE 27

Correlation

“Hyperdimensional Data Analysis Using Parallel Coordinates”, Wegman, 1990 Based on slide from Munzner

slide-28
SLIDE 28
slide-29
SLIDE 29

Filtering & Brushing

D3

slide-30
SLIDE 30

Parallel Sets

D3

slide-31
SLIDE 31

StratomeX

  • A. Lex
slide-32
SLIDE 32

Glyphs

slide-33
SLIDE 33

Star Plots

  • Space variables around a circle
  • Encode values on “spokes”
  • Data point is now a shape
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37
  • C. Nussbaumer
slide-38
SLIDE 38
  • M. Kirby, H. Marmanis, and D. Laidlaw

Velocity (magnitude & direction) Vorticity (scalar, CW/CCW) Strain Tensor (second order) Turbulent Charge (vector & scalar)

slide-39
SLIDE 39
  • M. Kirby, H. Marmanis, and D. Laidlaw
slide-40
SLIDE 40
  • M. Kirby, H. Marmanis, and D. Laidlaw
slide-41
SLIDE 41
  • M. Kirby, H. Marmanis, and D. Laidlaw
slide-42
SLIDE 42

42

  • G. Kindlmann 2006
slide-43
SLIDE 43
  • G. Kindlmann 2006
slide-44
SLIDE 44
  • G. Kindlmann 2006
slide-45
SLIDE 45

Chernoff Faces

slide-46
SLIDE 46

Dimensionality Reduction

slide-47
SLIDE 47

What about very high- dimensional data?

Based on slide from P . Liang

slide-48
SLIDE 48

Basic Idea

Project the high-dimensional data onto a lower- dimensional subspace using linear or non-linear transformations

y 2 <10 x 2 <64×64 = <4096

y = Ux

Based on slide from P . Liang

slide-49
SLIDE 49

Linear Methods

  • Does the data lie mostly in a hyperplane?
  • If so, what is its dimensionality?

Based on slide from F. Sha

slide-50
SLIDE 50

h"p://www.youtube.com/watch?v=4pnQd6jnCWk

slide-51
SLIDE 51

PCA

Project data to a subspace such as to maximize the variance of the projected data

Based on slide from J. Leskovec

PC vectors are orthogonal

slide-52
SLIDE 52

MusicBox [Anita Lillie]

slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55

Variance and Covariance

  • Variance:
  • How far are data points spread?
  • Covariance:
  • How much do variables change together
slide-56
SLIDE 56

x1 x2 x1 1 x2 1

x2 x1

Covariance Matrix

slide-57
SLIDE 57

x1 x2 x1 1 0.7 x2 0.7 1

x2 x1

Covariance Matrix

slide-58
SLIDE 58

x1 x2 x1 1

  • ­‑0.7

x2

  • ­‑0.7

1

x2 x1

Covariance Matrix

slide-59
SLIDE 59

x2 x1

PCA

slide-60
SLIDE 60

x2 x1

PCA

slide-61
SLIDE 61

x2 x1

PCA

PC ¡1 PC ¡2

slide-62
SLIDE 62

How many PC vectors?

Enough PC vectors to cover 80-90% of the variance

Based on slide from J. Leskovec

Screeplot

slide-63
SLIDE 63

HasGe ¡et ¡al.,”The ¡Elements ¡of ¡StaGsGcal ¡Learning: ¡Data ¡Mining, ¡Inference, ¡and ¡PredicGon”, ¡Springer ¡(2009)

PCA for Handwritten Digits

slide-64
SLIDE 64

Gunnar ¡Grimnes: h"p:// www.flickr.com/ photos/gromgull/ 3329844591/in/ photostream/

Eigenfaces

slide-65
SLIDE 65

PCA ¡VisualizaGon ¡

  • Mondrian ¡painGngs
  • h"p://www.youtube.com/watch?v=xiWpZ5jhvx4

h"p://www.youtube.com/watch?v=7jLXDyQxck

slide-66
SLIDE 66

Gene Array Data

First two PC directions First three PC directions

slide-67
SLIDE 67

Text Documents

>45 features, projected onto two PC dimensions

slide-68
SLIDE 68

Multidimensional Scaling (MDS)

slide-69
SLIDE 69
  • A ¡different ¡goal ¡:

–Find ¡a ¡set ¡of ¡points ¡whose ¡pairwise ¡distances ¡match ¡a ¡ given ¡distance ¡matrix

p1 p2 p3 p4 p5 p1 1 2 3 1 p2 1 2 4 1 p3 2 2 1 3 p4 3 4 1 1 p5 1 1 3 1

p1 p4 p2 p3 p5

1 2 2 3 4 1 1 1

Multi-Dimensional Scaling

slide-70
SLIDE 70

European Cities Data

  • Distances between European cities:
slide-71
SLIDE 71

Result of MDS

Based on slide from T. Yang

! !

slide-72
SLIDE 72

Color Images

  • N. Bonneel
slide-73
SLIDE 73

Facebook Friends

– Distance ¡= ¡1 ¡for ¡friends – Distance ¡= ¡2 ¡for ¡friends ¡of ¡friends ¡; ¡etc.

  • N. Bonneel
slide-74
SLIDE 74

IN-SPIRE, PNNL

slide-75
SLIDE 75

What if data is non-linear?

  • Classic “Swiss Roll” example

PCA

xi

Based on slide from F. Sha

slide-76
SLIDE 76

Non-Linear Methods

  • Intuition: Distortion in local areas, but faithful in

the global structure

Based on slide from F. Sha

slide-77
SLIDE 77

Dimensionality Reduction

  • Linear methods:
  • Principal Component Analysis (PCA) – Hotelling[33]
  • Multidimensional Scaling (MDS) –

Young[38]

  • Nonnegative Matrix Factorization (NMF) – Lee[99]
  • Nonlinear methods:
  • Locally Linear Embeddings (LLE) – Roweis[00]
  • IsoMap – Tenenbaum[00]