CS171 Visualization Alexander Lex alex@seas.harvard.edu Tables - - PowerPoint PPT Presentation

cs171 visualization
SMART_READER_LITE
LIVE PREVIEW

CS171 Visualization Alexander Lex alex@seas.harvard.edu Tables - - PowerPoint PPT Presentation

CS171 Visualization Alexander Lex alex@seas.harvard.edu Tables Part II [xkcd] Next Week Reading: VAD, Chapters 9 Lecture 11: Text & Documents Lecture 12: Homework 3 Design Studio Sections: view coordination, linking & brushing


slide-1
SLIDE 1

CS171 Visualization

Alexander Lex alex@seas.harvard.edu

[xkcd]

Tables Part II

slide-2
SLIDE 2

Next Week

Reading: VAD, Chapters 9 Lecture 11: Text & Documents Lecture 12: Homework 3 Design Studio Sections: view coordination, linking & brushing Updates

Design Studio moved to Thursday Project Proposal moved to HW 4

slide-3
SLIDE 3

Tables & Multi- Dimensional Data

slide-4
SLIDE 4

Comparisons

slide-5
SLIDE 5

Direction

Nicolas Rapp

slide-6
SLIDE 6

Plot Change Instead

https://eagereyes.org/basics/baselines

slide-7
SLIDE 7

Trends Over Time

http://xkcd.com/605/

slide-8
SLIDE 8

Bars vs. Lines

Lines imply connections & 
 sampling from continuous data. Do not use for categorical 
 data.

Zacks 1999

slide-9
SLIDE 9

Baseline Problem (again)

https://eagereyes.org/basics/baselines

True Baseline Clipped Baseline Plotting Change

slide-10
SLIDE 10

Linear vs. Logarithmic Scale

Linear Scale Log Scale

http://finance.yahoo.com/echarts?s=AAPL

Apple Stock Price

http://xkcd.com/1162/

slide-11
SLIDE 11

Aspect Ratios

Rule of Thumb: Banking to 45º (average line 
 slope: 45º)

eagereyes.org

slide-12
SLIDE 12

Correlations

slide-13
SLIDE 13

Scatterplots

slide-14
SLIDE 14

Overplotting

alpha = 1/100

slide-15
SLIDE 15

Compositions

slide-16
SLIDE 16

Stacked Bar Chart

slide-17
SLIDE 17

Comparison of bar chart types

Small 
 Multiples Stacked bar chart Pie Chart Layered
 Bar
 Chart Grouped
 Bar 
 Chart

Streit & Gehlenborg, PoV, Nature Methods, 2014

slide-18
SLIDE 18

Stacked Area Chart

http://stackoverflow.com/questions/2225995/how-can-i-create-stacked-line-graph-with-matplotlib

slide-19
SLIDE 19

100% Stacked Area Chart

http://stackoverflow.com/questions/16875546/create-a-100-stacked-area-chart-with-matplotlib

slide-20
SLIDE 20

Stacked Area vs. Line Graphs

leancrew.com & Practically Efficient

slide-21
SLIDE 21

Distributions

slide-22
SLIDE 22

Histogram

#bins hard to predict make interactive! rule of thumb: #bins = sqrt(n)

10 Bins 20 Bins age age # passengers # passengers

slide-23
SLIDE 23

Box Plots

aka Box-and-Whisker Plot Wikipedia

slide-24
SLIDE 24

Comparison

Streit & Gehlenborg, PoV, Nature Methods, 2014

slide-25
SLIDE 25

Showing Expected Values & Uncertainty

Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error Michael Correll, and Michael Gleicher

slide-26
SLIDE 26

Highdimensional Data

slide-27
SLIDE 27

What is High-dimensional Data?

Tabular data, containing

rows (items) columns (attributes or items) rows >> columns Age Gender Height Bob 25 M 181 Alice 22 F 185 Chris 19 M 175

slide-28
SLIDE 28

High-Dimensional Data Visualization

How many dimensions?

~50 – tractable with “just” vis ~1000 – need analytical methods

How many records?

~ 1000 – “just” vis is fine >> 10,000 – need analytical methods

Homogeneity

Same data type? Same scales?

Age Gender Height Bob 25 M 181 Alice 22 F 185 Chris 19 M 175 BPM 1 BPM 2 BPM 3 Bob 65 120 145 Alice 80 135 185 Chris 45 115 135

slide-29
SLIDE 29

Analytic Component

no / little analytics strong analytics 
 component

Scatterplot Matrices


[Bostock]

Parallel Coordinates


[Bostock]

Pixel-based visualizations /
 heat maps Multidimensional Scaling

[Doerk 2011] [Chuang 2012]

slide-30
SLIDE 30

Geometric Methods

slide-31
SLIDE 31

Parallel Coordinates (PC)

Axes represent attributes Lines connecting axes represent items

Inselberg 1985

A B X Y X Y A B A B

slide-32
SLIDE 32

Parallel Coordinates

Each axis represents dimension Lines connecting axis represent records Suitable for

all tabular data types heterogeneous data

slide-33
SLIDE 33

PC Limitation: 
 Scalability to Many Dimensions

500 axes

slide-34
SLIDE 34

PC Limitation: Scalability to Many Items

Solutions:

Transparency Bundling, Clustering Sampling

slide-35
SLIDE 35

PC Limitations 


Correlations only between adjacent axes

Solution: Interaction

Brushing Let user change order

slide-36
SLIDE 36

PC Limitation: 
 Ambiguity

Solutions:

Brushing Curves

Graham and Kennedy 2003

slide-37
SLIDE 37

Parallel Coordinates

Shows primarily relationships between adjacent axis Limited scalability (~50 dimensions, ~1-5k records)

Transparency of lines

Interaction is crucial

Axis reordering Brushing Filtering

Algorithmic support: Choosing dimensions Choosing order Clustering & aggregating records

http://bl.ocks.org/jasondavies/1341281

slide-38
SLIDE 38

Star Plot

Similar to parallel coordinates Radiate from a common origin

[Coekin1969]

http://www.itl.nist.gov/div898/handbook/eda/section3/starplot.htm http://start1.jpl.nasa.gov/caseStudies/autoTool.cfm

http://bl.ocks.org/kevinschaul/raw/8833989/

slide-39
SLIDE 39

Multiple Line Charts

http://square.github.io/cubism/

slide-40
SLIDE 40

Combining Various Charts

slide-41
SLIDE 41

Scatterplot Matrices (SPLOM)

Matrix of size d*d Each row/column is one dimension Each cell plots a scatterplot of two dimensions

slide-42
SLIDE 42

Scatterplot Matrices

Limited scalability (~20 dimensions, ~500-1k records) Brushing is important Often combined with “Focus Scatterplot” as F+C technique Algorithmic approaches: Clustering & aggregating records Choosing dimensions Choosing order

slide-43
SLIDE 43

SPLOM Aggregation - Heat Map

Datavore: http://vis.stanford.edu/projects/datavore/splom/

slide-44
SLIDE 44

SPLOM F+C, Navigation

[Elmqvist]

slide-45
SLIDE 45
slide-46
SLIDE 46

Flexible Linked Axes (FLINA)

Claessen & van Wijk 2011

slide-47
SLIDE 47

Web-based implementation of 
 FLINA concept

http://vis.pku.edu.cn/mddv/val/ ¡

slide-48
SLIDE 48

Connected Charts

Viau ¡& ¡McGuffin ¡2012 ¡

slide-49
SLIDE 49
  • rigin

ARTISTS Australia Europe North America studio albums WcountH continent first album WyearH number one hits

5 Countries 5 Artists

start of career WyearH career status in business at first album inactive gender gender ∩ inactive sold albums WabsoluteH COUNTRIES population WmillionH Barbados Ireland Sweden UK US

Rihanna U2 ABBA Elton John The Beatles Whitney Houston The Black Eyed Peas Britney Spears Eminem Michael Jackson Madonna Elvis Presley Australia France Italy Sweden Span Austria Germany Netherlands Ireland UK US Canada

inactive active male group female

Artists Countries 12 12 1

Domino

Gratzl ¡et ¡al. ¡2014 ¡

slide-50
SLIDE 50

Data Reduction

Sampling

Don’t show every element, show a (random) subset Efficient for large dataset Apply only for display purposes Outlier-preserving approaches

Filtering

Define criteria to remove data, e.g.,

minimum variability > / < / = specific value for one dimension consistency in replicates, …

Can be interactive, combined with 
 sampling

[Ellis & Dix, 2006]

slide-51
SLIDE 51

Filter Example

http://square.github.io/crossfilter/

slide-52
SLIDE 52

Pixel Based Methods

slide-53
SLIDE 53

Pixel Based Displays

Each cell is a “pixel”, value 
 encoded in color / value Meaning derived from ordering If no ordering inherent, 
 clustering is used Scalable – 1 px per item Good for homogeneous data

same scale & type

[Gehlenborg & Wong 2012]

slide-54
SLIDE 54

3D Pitfall: Occlusion & Perspective

[Gehlenborg and Wong, Nature Methods, 2012]

slide-55
SLIDE 55

3D Pitfall: Occlusion & Perspective

[Gehlenborg and Wong, Nature Methods, 2012]

slide-56
SLIDE 56

Heterogeneous Data?

[Verhaak 2012]

slide-57
SLIDE 57

Bad Color Mapping

slide-58
SLIDE 58

Good Color Mapping

slide-59
SLIDE 59

Color is relative!

slide-60
SLIDE 60

Clustering

Classification of items into “similar” bins Based on similarity measures

Euclidean distance, Pearson correlation, ...

Partitional Algorithms

divide data into set of bins # bins either manually set (e.g., k- means) or automatically determined (e.g., affinity propagation)

Hierarchical Algorithms Produce “similarity tree” – dendrogram Bi-Clustering Clusters dimensions & records Fuzzy clustering allows occurrence of elements in multiples clusters

slide-61
SLIDE 61

Clustering Applications

Clusters can be used to

  • rder (pixel based techniques)

brush (geometric techniques) aggregate

Aggregation

cluster more homogeneous than whole dataset statistical measures, distributions, etc. more meaningful

slide-62
SLIDE 62

Clustered Heat Map

slide-63
SLIDE 63

F+C Approach, with Dendrograms

[Lex, PacificVis 2010]

slide-64
SLIDE 64

Cluster Comparison

slide-65
SLIDE 65

Aggregation

slide-66
SLIDE 66

Design Critique

slide-67
SLIDE 67

EdgeMaps: http://goo.gl/q8Cv7t

http://mariandoerk.de/edgemaps/demo/#music

slide-68
SLIDE 68

Dimensionality Reduction

slide-69
SLIDE 69

Dimensionality Reduction

Reduce high dimensional to lower dimensional space Preserve as much of variation as possible Plot lower dimensional space Principal Component Analysis (PCA)

linear mapping, by order of variance

slide-70
SLIDE 70

PCA Example – CS 171 Project 2013

[Mercer & Pandian] http://mu-8.com/

slide-71
SLIDE 71

Multidimensional Scaling

Nonlinear, better suited for some DS Popular for text analysis

[Doerk 2011]

slide-72
SLIDE 72

Can we Trust Dimensionality Reduction?

http://www-nlp.stanford.edu/projects/dissertations/browser.html

Topical distances between departments in a 2D projection Topical distances between the selected Petroleum Engineering and the others.

[Chuang et al., 2012]