CS171 Visualization Alexander Lex alex@seas.harvard.edu Design - - PowerPoint PPT Presentation

cs171 visualization
SMART_READER_LITE
LIVE PREVIEW

CS171 Visualization Alexander Lex alex@seas.harvard.edu Design - - PowerPoint PPT Presentation

CS171 Visualization Alexander Lex alex@seas.harvard.edu Design Guidelines Tasks [xkcd] Next Week Lecture 7: Homework 2 Design Studio Lecture 8: Interaction Guest Lecture, Jean-Daniel Fekete (INRIA) Sections: D3 & JS: Data


slide-1
SLIDE 1

CS171 Visualization

Alexander Lex alex@seas.harvard.edu

[xkcd]

Design Guidelines Tasks

slide-2
SLIDE 2

Next Week

Lecture 7: Homework 2 Design Studio Lecture 8: Interaction
 Guest Lecture, Jean-Daniel Fekete (INRIA) Sections: D3 & JS: Data Structures, Layouts

slide-3
SLIDE 3

Last Tuesday

The Visualization Alphabet: Marks and Channels

slide-4
SLIDE 4

How can I visually represent two numbers, e.g.,

4 and 8

slide-5
SLIDE 5

Marks & Channels

Marks: represent items or links Channels: change appearance based on attribute Channel = Visual Variable

slide-6
SLIDE 6

Marks for Items

Basic geometric elements 3D mark: Volume, but rarely used

0D 2D 1D

slide-7
SLIDE 7

Marks for Links

Containment Connection

slide-8
SLIDE 8

Channels (aka Visual Variables)

Control appearance proportional to or based on attributes

slide-9
SLIDE 9

Types of Channels

Identity Channels What? Where? Shape Color (hue) Spatial region … Magnitude Channels How much? Position Length Saturation …

Categorical Data Ordinal & Quantitative Data

slide-10
SLIDE 10

Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Spatial region Color hue Motion Shape Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) Channels: Expressiveness Types and Efgectiveness Ranks

slide-11
SLIDE 11

Position

Strongest visual variable Suitable for all data types Problems:

Sometimes not available (spatial data) Cluttering

slide-12
SLIDE 12

Example: Scatterplot

slide-13
SLIDE 13

Length & Size

Good for 1D, OK for 2D, Bad for 3D Easy to see whether one is bigger Aligned bars use position redundantly

slide-14
SLIDE 14

Example 2D Size: Bubbles

slide-15
SLIDE 15

Value/Luminance/Saturation

OK for quantitative data when length & size are used. Not very many shades recognizable

Selective: yes Associative: yes Quantitative: somewhat (with problems) Order: yes Length: limited

slide-16
SLIDE 16

Example: Diverging Value-Scale

slide-17
SLIDE 17

Color

Good for qualitative data (identity channel) Limited number of classes/length (~7-10!) Does not work for quantitative data! Lots of pitfalls! Be careful! My rule:

minimize color use for encoding data use for brushing

Selective: yes Associative: yes Quantitative: no Order: no Length: limited

< < ?????

slide-18
SLIDE 18

Cliff Mass

Color: Bad Example

slide-19
SLIDE 19

Color: Good Example

slide-20
SLIDE 20

Shape

Great to recognize many classes. No grouping, ordering.

Selective: yes Associative: limited Quantitative: no Order: no Length: vast

< < ?????

slide-21
SLIDE 21

Why are quantitative channels different?

S = sensation I = intensity

slide-22
SLIDE 22

How much longer?

A B

2x

slide-23
SLIDE 23

How much longer?

A B

4x

slide-24
SLIDE 24

How much steeper?

A B

~4x

slide-25
SLIDE 25

How much larger (area)?

A B

5x

slide-26
SLIDE 26

How much larger (area)?

A B

3x

slide-27
SLIDE 27

How much larger (diameter)?

A B

2x

slide-28
SLIDE 28

How much darker?

A B

2x

slide-29
SLIDE 29

How much darker?

A B

3x

slide-30
SLIDE 30

Other Factors Affecting Accuracy

Alignment Distractors Distance Common scale …

A B Unframed Aligned Framed Unaligned A B A B Unframed Unaligned

VS VS VS

slide-31
SLIDE 31

Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Spatial region Color hue Motion Shape Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) Channels: Expressiveness Types and Efgectiveness Ranks

slide-32
SLIDE 32

Separability of Attributes

Can we combine multiple visual variables?

slide-33
SLIDE 33

Sins from the past…

[Mueller 09, Mueller 14]

slide-34
SLIDE 34

Common Mistakes

slide-35
SLIDE 35

Death to Pie Charts

Cole Nussbaumer www.storytellingwithdata.com/2011/07/death-to-pie-charts.html

“I hate pie charts. I mean, really hate them.”

Share of coverage

  • n TechCrunch
slide-36
SLIDE 36

Redesign

slide-37
SLIDE 37

Can you spot the differences?

slide-38
SLIDE 38

Can you spot the differences?

slide-39
SLIDE 39

My favorite pie chart

slide-40
SLIDE 40

My second favorite pie chart

slide-41
SLIDE 41

Sunday Star Times, 2012

slide-42
SLIDE 42
  • R. Cunliffe, Stats Chat

Quantity encoded by diameter, not area! Fixing that:

slide-43
SLIDE 43
  • R. Cunliffe, Stats Chat

But is this visual encoding appropriate in the first place?

slide-44
SLIDE 44

Graphical Integrity

Flowing Data

slide-45
SLIDE 45

Scale Distortions

Flowing Data

slide-46
SLIDE 46

What’s wrong?

slide-47
SLIDE 47

Scale Distortions

slide-48
SLIDE 48

Scale Distortions

slide-49
SLIDE 49

Start Scales at 0?

  • A. Kriebel,

VizWiz

slide-50
SLIDE 50

Global Warming?

The Daily Mail, UK, Jan 2012

slide-51
SLIDE 51

Global Warming?

Mother Jones

slide-52
SLIDE 52

Global Warming - Frame the Data

Mother Jones

slide-53
SLIDE 53

The Lie Factor

Tufte, VDQI

Size of effect shown in graphic Size of effect in data

slide-54
SLIDE 54

The Lie Factor

(Size of effect in graphic)/(size of effect in data)

5.3 − 0.6 0.6 /27.5 − 18 18 = 14.8

Tufte, VDQI

slide-55
SLIDE 55

The Lie Factor

Tufte, VDQI

slide-56
SLIDE 56

Tufte’s Integrity Principles

Show data variation, not design variation Clear, detailed, and thorough labeling and appropriate scales Size of the graphic effect should be directly proportional to the numerical quantities (“lie factor”)

slide-57
SLIDE 57

Visualization Design Principles

slide-58
SLIDE 58

Maximize Data-Ink Ratio

0-$24,999 $25,000+ 0-$24,999 $25,000+

slide-59
SLIDE 59

Maximize Data-Ink Ratio

175 350 525 700 Males Females

0-$24,999 $25,000+ 0-$24,999 $25,000+

slide-60
SLIDE 60

Avoid Chartjunk

  • ngoing, Tim Brey

Extraneous visual elements that distract from the message

slide-61
SLIDE 61

Avoid Chartjunk

  • ngoing, Tim Brey
slide-62
SLIDE 62

Avoid Chartjunk

  • ngoing, Tim Brey
slide-63
SLIDE 63

Avoid Chartjunk

  • ngoing, Tim Brey
slide-64
SLIDE 64

Avoid Chartjunk

  • ngoing, Tim Brey
slide-65
SLIDE 65

Avoid Chartjunk

  • ngoing, Tim Brey
slide-66
SLIDE 66

Which is better?

[Bateman et al. 2010]

slide-67
SLIDE 67

Which is better?

https://eagereyes.org/criticism/chart-junk-considered-useful-after-all

[Bateman et al. 2010]

slide-68
SLIDE 68

Don’t

matplotlib gallery

Excel Charts Blog
slide-69
SLIDE 69

Design Critique

slide-70
SLIDE 70

Design Critique

http://goo.gl/DA67PG

slide-71
SLIDE 71

Tasks

Why are we using Visualization?

slide-72
SLIDE 72

Domain and Abstract Tasks

Infinite numbers of domain tasks Can be broken down into simpler abstract tasks We know how to address the abstract tasks! Identify task - data combination: solutions probably exist

slide-73
SLIDE 73

Tasks

Analyze

high-level choices consume vs produce

Search

find a known/unknown item

Query

find out about characteristics of item by itself or relative to others

slide-74
SLIDE 74

Example 1

Find good universities with a high faculty student ratio.

Identify high-ranked universities In this subset: compare universities & identify high faculty student ratio

OR

Derive a ranking with a high weight for faculty student ratio

slide-75
SLIDE 75

Example 2

Contrast Harvard’s reputation scores with MIT’s Match up Harvard with Yale

First, find Harvard and Yale, then compare their (two) reputation scores

slide-76
SLIDE 76

Example 3

Find a combination of weights and parameters where Harvard is better than MIT

Produce a new dataset by deriving from the input parameters

slide-77
SLIDE 77

Result

slide-78
SLIDE 78

High-level actions: Analyze

Consume discover vs present

classic split: explore vs explain

enjoy: casual, social Produce Annotate, record Derive: crucial design choice

Analyze Consume

Present Enjoy Discover

Produce

Annotate Record Derive

tag

slide-79
SLIDE 79

Example: Annotate

slide-80
SLIDE 80

Example: Derive

slide-81
SLIDE 81

Example: Derive

Country Club Club Continent Ronaldo Portugal Real Madrid Europe Lahm Germany Bayern München Europe Robben Netherlands Bayern München Europe Khedira Germany Real Madrid Europe Phogba Italy Juventus Europe Messi Argentina Barcelona Europe

slide-82
SLIDE 82
slide-83
SLIDE 83

Actions: Mid-level search, low- level query

what does user know?

target, location

how much of the data matters?

  • ne, some, all

Search Query Identify Compare Summarize

Target known Target unknown Location known Location unknown

Lookup Locate Browse Explore

slide-84
SLIDE 84

Example Compare (& Derive)

slide-85
SLIDE 85

Why: Targets

Trends ALL DATA Outliers Features ATTRIBUTES One Many

Distribution Dependency Correlation Similarity Extremes

NETWORK DATA SPATIAL DATA Shape Topology

Paths

slide-86
SLIDE 86

Examples

Trends: How did the job market develop since the recession overall? Outliers: Looking at real estate related jobs

slide-87
SLIDE 87

How? A Preview

Encode Manipulate Facet Reduce Arrange Map Change Select Navigate Express Separate Order Align Use Juxtapose Partition Superimpose Filter Aggregate Embed from categorical and ordered attributes

slide-88
SLIDE 88

Next time: Evaluation