CS-5630 / CS-6630 Visualization for DataScience Tables Alexander - - PowerPoint PPT Presentation

cs 5630 cs 6630 visualization for datascience tables
SMART_READER_LITE
LIVE PREVIEW

CS-5630 / CS-6630 Visualization for DataScience Tables Alexander - - PowerPoint PPT Presentation

CS-5630 / CS-6630 Visualization for DataScience Tables Alexander Lex alex@sci.utah.edu [xkcd] Organizational Review exam in my office hours HW Lab: Wed, 6pm, L110 Make sure to form your project teams! If you cant find a team, e-mail


slide-1
SLIDE 1

CS-5630 / CS-6630 Visualization 
 for DataScience Tables

Alexander Lex alex@sci.utah.edu

[xkcd]

slide-2
SLIDE 2

Organizational

Review exam in my office hours HW Lab: Wed, 6pm, L110 Make sure to form your project teams!

If you can’t find a team, e-mail me

Develop project idea Set up your github repo Guest lecture next Thursday Project Feedback the Tuesday after that Need to submit this info by Friday!

https://goo.gl/4UrmjB

slide-3
SLIDE 3

dataset types

slide-4
SLIDE 4

Exercise: Sketch 2 Ways to Vis. Each Table

BPM T1 BPM T2 BPM T3 Amy 90 130 150 Basil 70 110 109 Clara 60 140 141 Desmond 84 100 108 Charles 81 110 130 Age Best 100 m Furthest Jump Sex Amy 16 13.2 5.2 F Basil 18 12.4 4.2 F Clara 14 14.1 2.5 F Desmond 22 10.01 6.3 M Charles 19 11.3 5.3 M

slide-5
SLIDE 5
slide-6
SLIDE 6

Spatial channels are the most effective for all attribute types

slide-7
SLIDE 7

Recall: attribute semantics

when we arrange tabular data, attributes are chosen to be keys and values

multidimensional

slide-8
SLIDE 8

Scale of Tables

Need different approaches for “normal” and “high- dimensional” tables.

Homogeneity

Same data type? Same scales?

Age Gender Height Bob 25 M 181 Alice 22 F 185 Chris 19 M 175 BPM 1 BPM 2 BPM 3 Bob 65 120 145 Alice 80 135 185 Chris 45 115 135

How many dimensions?

~50 – tractable with “just” vis ~1000 – need analytical methods

How many records?

~ 1000 – “just” vis is fine >> 10,000 – need analytical methods

slide-9
SLIDE 9

Analytic Component

no / little analytics strong analytics 
 component

Scatterplot Matrices


[Bostock]

Parallel Coordinates


[Bostock]

Pixel-based visualizations /
 heat maps Multidimensional Scaling

[Doerk 2011] [Chuang 2012]

slide-10
SLIDE 10

Express Values

No Keys

slide-11
SLIDE 11

Encode using zero keys: scatterplots

Infant Mortality Life Expectance

slide-12
SLIDE 12

Regression Lines

y ∼ β0 + β1x

Approach: use least squares to minimize the sum of the squares of the errors

slide-13
SLIDE 13

Anscombe’s Quartet

slide-14
SLIDE 14

Encode one Key Attribute

slide-15
SLIDE 15

Encode one key attribute:
 bar, dot, & line charts

slide-16
SLIDE 16

Encode Multiple Key Attributes

slide-17
SLIDE 17
slide-18
SLIDE 18

Stacked Bar Chart

Keys: Class, Survival Class is spatial Survival is color Left: absolute values Right: proportional values

slide-19
SLIDE 19

Comparison of bar chart types

Stacked bar chart Pie Chart Layered
 Bar
 Chart Grouped
 Bar 
 Chart

Streit & Gehlenborg, PoV, Nature Methods, 2014

slide-20
SLIDE 20

Stacked Area Chart

slide-21
SLIDE 21

100% Stacked Area Chart

slide-22
SLIDE 22

Stacked Area vs. Line Graphs

leancrew.com & Practically Efficient

slide-23
SLIDE 23

Can you spot the trends?

VizWiz, A. Kriebel

slide-24
SLIDE 24

Tabular / Grid / Matrix - Based Representations

slide-25
SLIDE 25

Tabular Representation

Like spreadsheet - each variable in it’s own column Visual encodings to make it scalable

slide-26
SLIDE 26

Table Lens

Interactive table- based representation

Rao & Card 1994

slide-27
SLIDE 27

Taggle

slide-28
SLIDE 28

Bertifier

Matrix/Table representation Authoring Interface

http://www.aviz.fr/bertifier Charles Perin, Pierre Dragicevic and Jean-Daniel Fekete

slide-29
SLIDE 29

Multiple Line Charts

http://square.github.io/cubism/

[Heer, Sizing the Horizon, 2009]

slide-30
SLIDE 30

Combining Various Charts

slide-31
SLIDE 31

LineUp

Video at http://lineup.caleydo.org

slide-32
SLIDE 32

Rankings are popular

32

slide-33
SLIDE 33

University Harvard, USA Oxford, UK Cambridge, UK Princeton, USA MIT, USA Rank 2. 5. 4. 3. 1. Score 84.2 44.0 64.3 73.8 89.4 Score

slide-34
SLIDE 34

34

Support Multiple Attributes

slide-35
SLIDE 35

University Harvard, USA Oxford, UK Cambridge, UK Princeton, USA MIT, USA Rank 2. 5. 4. 3. 1. Score A B C

Score = f(A, B, C)

slide-36
SLIDE 36

Combiner functions: f(A,B,C)

(Weighted) sum
 Score = wa A + wb B + wc C Maximum
 Score = max(A, B, C) Product Nesting …

Serial Parallel Complex
 Combiners

slide-37
SLIDE 37

Serial Combiner

University Harvard, USA Oxford, UK Cambridge, UK Princeton, USA MIT, USA Rank 2. 5. 4. 3. 1. A B C

wa A + wb B + wc C

(as Stacked Bar)

slide-38
SLIDE 38

Serial Combiner

University Harvard, USA Oxford, UK Cambridge, UK Princeton, USA MIT, USA Rank 2. 5. 4. 3. 1. A B C (as Stacked Bar)

wa A + + wb B wc C

slide-39
SLIDE 39

Serial Combiner

University Harvard, USA MIT, USA Rank 2. 5. 4. 3. 1. Oxford, UK Cambridge, UK Princeton, USA A B C (as Stacked Bar)

wa A + + wb B wc C

slide-40
SLIDE 40
slide-41
SLIDE 41

Flexible Mapping of
 Attributes to Scores

slide-42
SLIDE 42

Min Max 100

1

slide-43
SLIDE 43

100

1

slide-44
SLIDE 44

100

1

slide-45
SLIDE 45

45

slide-46
SLIDE 46

46

Compare Rankings

slide-47
SLIDE 47

Bump Charts

Rank

2. 5. 4. 3. 1. Score University Harvard, USA Oxford, UK Cambridge, UK Princeton, USA MIT, USA Rank 2. 5. 4. 3. 1. Score Score

(+1) (-2) (+1)

slide-48
SLIDE 48

Bump Charts

Rank

2. 5. 3. 1. Score University Oxford, UK Cambridge, UK Princeton, USA MIT, USA Rank 5. 4. 3. 1. Score Score

(+1)

4. Harvard, USA 2.

(-2) (+1)

4. Harvard, USA 2.

(-2)

slide-49
SLIDE 49

Video showing:

  • Creating snapshot for comparison
  • Play with weights
  • Show delta
  • Select by clicking on slopegraph
slide-50
SLIDE 50

http:/ /lineup.caleydo.org http:/ /taggle.caleydoapp.org

50

slide-51
SLIDE 51

Pixel Based Displays

Each cell is a “pixel”, value 
 encoded in color / value Ordering critical for interpretation If no ordering inherent, 
 clustering is used Scalable – 1 px per item Good for homogeneous data

same scale & type

[Gehlenborg & Wong 2012]

slide-52
SLIDE 52

3D Pitfall: Occlusion & Perspective

[Gehlenborg and Wong, Nature Methods, 2012]

slide-53
SLIDE 53

3D Pitfall: Occlusion & Perspective

[Gehlenborg and Wong, Nature Methods, 2012]

slide-54
SLIDE 54

Heterogeneous Data?

[Verhaak 2012]

slide-55
SLIDE 55

Bad Color Mapping

slide-56
SLIDE 56

Good Color Mapping

slide-57
SLIDE 57

Color is relative!

slide-58
SLIDE 58

Clustered Heat Map

slide-59
SLIDE 59

Design Critique

slide-60
SLIDE 60

Document: https://goo.gl/W6w0iI Website: http://goo.gl/D3mIsy

slide-61
SLIDE 61

Context / Critiques

https://vimeo.com/127205447 https://community.jmp.com/t5/JMP-Blog/Graph- makeover-3-D-yield-curve-surface/ba-p/30573 http://www.visualisingdata.com/2015/03/when-3d-works/

slide-62
SLIDE 62

Spatial Axis Orientation

slide-63
SLIDE 63

spatial axis orientation

slide-64
SLIDE 64
slide-65
SLIDE 65

Spatial Axis Orientation

Scatterplot Matrix

slide-66
SLIDE 66

Scatterplot Matrices (SPLOM)

Matrix of size d*d Each row/column is one dimension Each cell plots a scatterplot of two dimensions

slide-67
SLIDE 67

Scatterplot Matrices

Limited scalability (~20 dimensions, ~500-1k records) Brushing is important Often combined with “Focus Scatterplot” as F+C technique Algorithmic approaches: Clustering & aggregating records Choosing dimensions Choosing order

slide-68
SLIDE 68

SPLOM Aggregation - Heat Map

Datavore: http://vis.stanford.edu/projects/datavore/splom/

slide-69
SLIDE 69

SPLOM F+C, Navigation

[Elmqvist]

slide-70
SLIDE 70

Spatial Axis Orientation

Parallel Coordinates

slide-71
SLIDE 71

Parallel Coordinates (PC)

Axes represent attributes Lines connecting axes represent items

Inselberg 1985

A B X Y X Y A B A B

slide-72
SLIDE 72

Parallel Coordinates

Each axis represents dimension Lines connecting axis represent records Suitable for

all tabular data types heterogeneous data

slide-73
SLIDE 73

PC Limitation: 
 Scalability to Many Dimensions

500 axes

slide-74
SLIDE 74

PC Limitation: Scalability to Many Items

Solutions:

Transparency Bundling, Clustering Sampling

slide-75
SLIDE 75

PC Limitations 


Correlations only between adjacent axes

Solution: Interaction

Brushing Let user change order

slide-76
SLIDE 76

PC Limitation: 
 Ambiguity

Solutions:

Brushing Curves

Graham and Kennedy 2003

slide-77
SLIDE 77

Parallel Coordinates

Shows primarily relationships between adjacent axis Limited scalability (~50 dimensions, ~1-5k records)

Transparency of lines

Interaction is crucial

Axis reordering Brushing Filtering

Algorithmic support: Choosing dimensions Choosing order Clustering & aggregating records

http://bl.ocks.org/jasondavies/1341281

slide-78
SLIDE 78

HIERARCHICAL PARALLEL COORDINATES

goal: scale up parallel coordinates to large datasets

challenge: overplotting/occlusion

Fua 1999

slide-79
SLIDE 79

HPC: ENCODING DERIVED DATA

visual representation: variable- width opacity bands

show whole cluster, not just single item min / max: spatial position cluster density: transparency mean: opaque

Fua 1999

slide-80
SLIDE 80

HPC: INTERACTING WITH DERIVED DATA

interactively change level of detail to navigate cluster hierarchy

Fua 1999

slide-81
SLIDE 81

Star Plot

Similar to parallel coordinates Radiate from a common origin

[Coekin1969]

http://www.itl.nist.gov/div898/handbook/eda/section3/starplot.htm http://start1.jpl.nasa.gov/caseStudies/autoTool.cfm

http://bl.ocks.org/kevinschaul/raw/8833989/

slide-82
SLIDE 82

Data Reduction

Sampling

Don’t show every element, show a (random) subset Efficient for large dataset Apply only for display purposes Outlier-preserving approaches

Filtering

Define criteria to remove data, e.g.,

minimum variability > / < / = specific value for one dimension consistency in replicates, …

Can be interactive, combined with 
 sampling

[Ellis & Dix, 2006]

slide-83
SLIDE 83

Spatial Axis Orientation

Hybrids

slide-84
SLIDE 84

Flexible Linked Axes (FLINA)

Claessen & van Wijk 2011

slide-85
SLIDE 85

Web-based implementation of 
 FLINA concept

http://vis.pku.edu.cn/mddv/val/

slide-86
SLIDE 86

Connected Charts

Viau & McGuffin 2012

slide-87
SLIDE 87
  • rigin

ARTISTS Australia Europe North America studio albums WcountH continent first album WyearH number one hits

5 Countries 5 Artists

start of career WyearH career status in business at first album inactive gender gender ∩ inactive sold albums WabsoluteH COUNTRIES population WmillionH Barbados Ireland Sweden UK US

Rihanna U2 ABBA Elton John The Beatles Whitney Houston The Black Eyed Peas Britney Spears Eminem Michael Jackson Madonna Elvis Presley Australia France Italy Sweden Span Austria Germany Netherlands Ireland UK US Canada

inactive active male group female

Artists Countries 12 12 1

Domino

Gratzl et al. 2014

slide-88
SLIDE 88

Spatial Axis Orientation

Parallel Sets

slide-89
SLIDE 89

Parallel Sets

builds on PC to better handle categorical data

discrete small number of values no implied ordering between attributes

task: find relationship between attributes interaction driven technique

slide-90
SLIDE 90

Visual Encoding

boxes scaled by frequency color coded by values for current active dimension

Bendix, Kosara, Hauser, 2005

slide-91
SLIDE 91
slide-92
SLIDE 92

Bendix, Kosara, Hauser, 2005

Visual Encoding

Boxes expand to show histogram

slide-93
SLIDE 93

Bendix, Kosara, Hauser, 2005

Interaction: Reorder

slide-94
SLIDE 94

Bendix, Kosara, Hauser, 2005

Interaction: Aggregate

slide-95
SLIDE 95

Bendix, Kosara, Hauser, 2005

Interaction: Filter

slide-96
SLIDE 96

Bendix, Kosara, Hauser, 2005

Interaction: Highlight

slide-97
SLIDE 97

Filling Space

slide-98
SLIDE 98

filling space

slide-99
SLIDE 99

HiVE example: London property

partitioning attributes house type neighborhood sale time encoding attributes average price (color) number of sales (size) results between neighborhoods, different housing distributions within neighborhoods, similar prices

Slingsby 2009

slide-100
SLIDE 100

Dense pixel display: VisDB

represent each data item, or each attribute in an item as a single pixel can fit as many items on the screen as there are pixels, on the order of millions relies heavily on color coding challenge: what’s the layout?

slide-101
SLIDE 101

The data…

large database where each item has multiple attributes (on the order of 10) goal: visualize the relevance of set of items which satisfy a query plot out data items in a spiral pattern,

  • rdered by relevance

Keim, Kreigel, 1994

slide-102
SLIDE 102

relevance

  • dim. 1
  • dim. 2
  • dim. 3
  • dim. 4
  • dim. 5

factor

Keim, Kreigel, 1994

slide-103
SLIDE 103
  • c. Grouping Arrangement
  • a. Basic Visualization Technique

Keim, Kreigel, 1994