Visualization for Communication cs109a (CNN) Engineer deck, the - - PowerPoint PPT Presentation

visualization for communication
SMART_READER_LITE
LIVE PREVIEW

Visualization for Communication cs109a (CNN) Engineer deck, the - - PowerPoint PPT Presentation

Visualization for Communication cs109a (CNN) Engineer deck, the previous day (PCSSCA) (PCSSCA) the previous day (PCSSCA) (VST, Tufte) RISK ASSESSMENT? (VST, Tufte) Chartjunk at hearings (PCSSCA) Ask an interesting What is the


slide-1
SLIDE 1

Visualization for Communication

cs109a

slide-2
SLIDE 2

(CNN)

slide-3
SLIDE 3

Engineer deck, the previous day… (PCSSCA)

slide-4
SLIDE 4

(PCSSCA)

slide-5
SLIDE 5

the previous day… (PCSSCA)

slide-6
SLIDE 6

(VST, Tufte)

slide-7
SLIDE 7
slide-8
SLIDE 8

RISK ASSESSMENT?

(VST, Tufte)

slide-9
SLIDE 9

Chartjunk at hearings

(PCSSCA)

slide-10
SLIDE 10

Ask an interesting question. Get the data. Explore the data. Model the data. Communicate and visualize the results.

What is the scientific goal? What would you do if you had all the data? What do you want to predict or estimate? How were the data sampled? Which data are relevant? Are there privacy issues? Plot the data. Are there anomalies? Are there patterns? Build a model. Fit the model. Validate the model. What did we learn? Do the results make sense? Can we tell a story?

slide-11
SLIDE 11

Visualization Goals

Communicate (Explanatory)

Present data and ideas Explain and inform Provide evidence and support Influence and persuade

Analyze (Exploratory)

Explore the data Assess a situation Determine how to proceed Decide what to do

slide-12
SLIDE 12

Communicate

New York Times

slide-13
SLIDE 13

Napoleon’s March to Russia

https://robots.thoughtbot.com/analyzing-minards-visualization-of-napoleons-1812-march

slide-14
SLIDE 14

Minard’s Graphic on Napoleon’s Russia Campaign

(from wikipedia)

slide-15
SLIDE 15

Minard’s Graphic on Napoleon’s Russia Campaign

(from wikipedia)

slide-16
SLIDE 16

Key Considerations

  • Who is your audience?
  • What questions are you answering?
  • Why should the audience care?
  • What are your major insights and surprises?
  • What change to you want to affect?
slide-17
SLIDE 17

Effective Visualizations

  • 1. Have graphical integrity
  • 2. Keep it simple
  • 3. Use the right display
  • 4. Use color strategically
  • 5. Know your audience
slide-18
SLIDE 18

Have graphical integrity

slide-19
SLIDE 19

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-20
SLIDE 20

http://lsr.nellco.org/cgi/viewcontent.cgi?article=1476&context=nyu_plltwp

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-21
SLIDE 21

Keep it Simple

slide-22
SLIDE 22

Don’t Make Them Think!

  • Your audience does not want to spend cognitive

effort on things you know and can just show them

  • Lead them through the major steps of your story
  • Point out interesting key facts and insights using

captions and annotations

slide-23
SLIDE 23

Don’t Bury the Lead

Cole Nussbaumer

slide-24
SLIDE 24

Don’t Bury the Lead

Cole Nussbaumer

slide-25
SLIDE 25

Use the right display

slide-26
SLIDE 26

Most Efficient Least Efficient

  • C. Mulbrandon

VisualizingEconomics.com

Quantitative Ordered Categories

}

}

}

slide-27
SLIDE 27

Most Effective

VisualizingEconomics.com

slide-28
SLIDE 28

Less Effective

VisualizingEconomics.com

slide-29
SLIDE 29

600 600 500 400 300 200 100 500 400 300 200 100 600 2015 A B C D E F G H I J K L M 14 13 12 11 10 2009 500 400 300 200 100 600 500 400 300 200 100 100 50 25 75 100 25 50 75 50 25 5 E F G H I J K L M 10 14 2015 13 12 11 10 2009

Possible solution to cases when you have data that diverge a lot

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-30
SLIDE 30

Use color strategically

slide-31
SLIDE 31

Colors for Categories

Do not use more than 5-8 colors at once

Ware, “Information Visualization”

slide-32
SLIDE 32

Colors for Ordinal Data

Zeilis et al, 2009, “Escaping RGBland: Selecting Colors for Statistical Graphics”

Vary luminance and saturation

slide-33
SLIDE 33

Colors for Quantitative Data

Rogowitz and Treinish, Why should engineers and scientists be worried about color?

Hue (Rainbow) Luminance Luminance & Hue

slide-34
SLIDE 34

Rainbow Colormap

X

slide-35
SLIDE 35

Rainbow Colormap

  • R. Simmon

Perceptually nonlinear

slide-36
SLIDE 36

Gray

slide-37
SLIDE 37

Color Blindness

Deuteranope Protanope Tritanope

Based on slide from Stone

Red / green deficiencies Blue / Yellow deficiency

slide-38
SLIDE 38

Color Blindness

Normal Protanope Deuteranope Lightness

Based on slide from Stone

slide-39
SLIDE 39

Viridis

slide-40
SLIDE 40

Color Brewer

Cynthia Brewer, Color Use Guidelines for Data Representation

Nominal Ordinal

slide-41
SLIDE 41
slide-42
SLIDE 42

Know your audience

slide-43
SLIDE 43
  • What do they know?
  • What motivates them? What do they desire?
  • What experiences do you share? What are

common goals?

  • What insights can you give them? What tools

and “magical gifts”?

slide-44
SLIDE 44

What is the message?

Exploratory
 Neutral Explanatory
 Opinionated

slide-45
SLIDE 45

Andy Cotgreave, Tableau

slide-46
SLIDE 46

Framing - Why should I care?

  • Tell the audience: “Here is the right way to think

about the problem I was trying to solve.”

  • Catch the audience’s attention and frame the story

using captions and annotations

  • If done well, your insights will seem obvious given

this framing. And that’s a good thing!

slide-47
SLIDE 47

Gun Deaths in 2010

slide-48
SLIDE 48

Tools for interactive graphics

  • R/shiny
  • plotly/dash
  • Tableau
  • d3.js
  • vega-lite/vega
slide-49
SLIDE 49

Is there a story?

Surface it….even if it is incomplete

slide-50
SLIDE 50

2014 Gun Deaths

slide-51
SLIDE 51

(XKCD)

slide-52
SLIDE 52
slide-53
SLIDE 53

Deaths by county, 2014

(crimeresearch.org)

slide-54
SLIDE 54

Careful with amalgamation paradoxes and with outliers

http://journal.frontiersin.org/article/10.3389/fpsyg.2013.00513/full

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-55
SLIDE 55

Ask-n-Ask: what is the story?

  • Is the exact distribution of guns really the important

concern?

  • did we check the uncertainties?
  • Should we be looking at this from a “risk” perspective?
  • we tend to believe what we believe and look for confirmation.
  • we need to be disciplined about interrogating ourselves
  • it is ok (and not against simplicity) to surface our process
slide-56
SLIDE 56

Another example: OKC data

slide-57
SLIDE 57

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 23 23 24 25 25 26 27 28 29 29 30 31 31 32 32 34 35 36 37 38 38 38 39 39 39 40 38 39 40 45 46

a woman’s age vs. the age of the men who look best to her

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

(from Dataclysm)

slide-58
SLIDE 58

a man’s age vs. the age of the women who look best to him

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 20 20 21 21 21 21 22 21 20 20 20 20 20 20 20 20 20 22 20 20 21 21 20 23 21 24 20 20 23 20 22

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-59
SLIDE 59

Sample of 100 men of 40

  • vs. the age of the women

who look best to them

=1 of men

20 25 30 35 40 45 50 most common value: 21

Number

  • f men

Women’s ages

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-60
SLIDE 60

Sample of 100 men of 40

  • vs. the age of the women

who look best to them

=1 of men

20 25 30 35 40 45 50 most common value: 21 20 25 30 35 40 45 50

Number

  • f men

Women’s ages

most common value: 21

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-61
SLIDE 61

Structure of communication graphics

slide-62
SLIDE 62
  • E. Segel
slide-63
SLIDE 63
  • E. Segel
slide-64
SLIDE 64
  • E. Segel
slide-65
SLIDE 65
  • E. Segel
slide-66
SLIDE 66
  • E. Segel

Headline Captions Annotations Call Out Boxes

slide-67
SLIDE 67
  • M. Krzywinski & A. Cairo
slide-68
SLIDE 68

Application to modeling

slide-69
SLIDE 69

IMAC

I: inferential goal (scientific question of interest) M: model (all models are wrong, some are useful) A: algorithms C: conclusions and checking

The C is crucial: what did we learn? Was the model useful, and how well does it fit? How do we know whether the method is working? Do we understand how it is working? Do we need to iterate and improve the model? What are the limitations and future directions?

slide-70
SLIDE 70

Communicating a model

slide-71
SLIDE 71

Survey 1000 customers , with an offer with an administrative cost of $3 and an offer cost of $100, an incentive for the customer to stay with us. Want to predict for our 100000 customer base. If a customer leaves us, we lose the customer lifetime value, which is some kind of measure of the lost profit from that customer. Lets assume this is the average number of months a customer stays with the telecom times the net revenue from the customer per month. We'll assume 3 years and $30/month margin per user lost, for roughly a $1000 loss. admin_cost=3

  • ffer_cost=100

clv=1000 # customer lifetime value

  • TN=people we predicted not to churn who wont churn. We associate no cost with this as they continue being our

customers

  • FP=people we predict to churn. Who wont. Lets associate a admin_cost+offer_cost cost per customer with this as we

will spend some money on getting them not to churn, but we will lose this money.

  • FN=people we predict wont churn. And we send them nothing. But they will. This is the big loss, the clv
  • TP= people who we predict will churn. And they will. These are the people we can do something with. So we make them an
  • ffer. Say a fraction f accept it. Our cost is admin_cost + f*offer_cost + (1-f)*clv.

f = 0.5 tnc = 0. fpc = admin_cost+offer_cost fnc = clv tpc = admin_cost + f*offer_cost + (1. - f)*clv

Telecom Churn Problem

slide-72
SLIDE 72

Average Cost = TN x TNC + TP x TPC + FN x FNC + TP x TPC

553 103 1000

slide-73
SLIDE 73
slide-74
SLIDE 74

Annotated Diagram

Loss made with Preview

slide-75
SLIDE 75

Reduce churn and our cost by sending customers an offer

Making offers within Budget

This study was made on a pilot survey of 1000 customers from our 100000 customer base. Make an offer with an administrative cost of $3 and an offer cost of $100, an incentive for the customer to stay with us. If a customer leaves us, we lose the customer lifetime value (CLV), a roughly $1000 loss. We assume that 50% of those customers targeted will stay with us. If we do nothing we lose $150 per customer including CLV We choose which customers to target according to 2 different models, dt and gnb:

  • Making an offer to 13% of our most likely to leave

customers will cut this cost to a lowest value of $103 per customer according to the dt model, for a total cost of $1.34 million.

  • If we only target 10% of the customers (Budget 1)

using the dt model, we get by in 1.03 million but incur a loss of $110 per customer including CLV.

  • If we target 40% of our customers, we need a

budget (Budget 2) of $4.2 million. Here the gnb model performs better and we will choose customers according to it. We incur a loss of $116 per customer including CLV.

Loss made with Pages

slide-76
SLIDE 76

StoryTelling

slide-77
SLIDE 77
slide-78
SLIDE 78

Edward Tufte

slide-79
SLIDE 79

Stephen Few

slide-80
SLIDE 80

I’ve always believed in the power of data visualization (the representation of information by means of charts, diagrams, maps, etc.) to enable understanding

2012 2016

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo