Visualization for Communication cs109a (CNN) the previous day - - PowerPoint PPT Presentation

visualization for communication
SMART_READER_LITE
LIVE PREVIEW

Visualization for Communication cs109a (CNN) the previous day - - PowerPoint PPT Presentation

Visualization for Communication cs109a (CNN) the previous day (PCSSCA) (VST, Tufte) Engineer deck, the previous day (PCSSCA) (PCSSCA) RISK ASSESSMENT? (VST, Tufte) Chartjunk at hearings (PCSSCA) Ask an interesting What is the


slide-1
SLIDE 1

Visualization for Communication

cs109a

slide-2
SLIDE 2

(CNN)

slide-3
SLIDE 3

the previous day… (PCSSCA)

slide-4
SLIDE 4

(VST, Tufte)

slide-5
SLIDE 5

Engineer deck, the previous day… (PCSSCA)

slide-6
SLIDE 6

(PCSSCA)

slide-7
SLIDE 7

RISK ASSESSMENT?

(VST, Tufte)

slide-8
SLIDE 8

Chartjunk at hearings

(PCSSCA)

slide-9
SLIDE 9

Ask an interesting question. Get the data. Explore the data. Model the data. Communicate and visualize the results.

What is the scientific goal? What would you do if you had all the data? What do you want to predict or estimate? How were the data sampled? Which data are relevant? Are there privacy issues? Plot the data. Are there anomalies? Are there patterns? Build a model. Fit the model. Validate the model. What did we learn? Do the results make sense? Can we tell a story?

slide-10
SLIDE 10

Visualization Goals

Communicate (Explanatory)

Present data and ideas Explain and inform Provide evidence and support Influence and persuade

Analyze (Exploratory)

Explore the data Assess a situation Determine how to proceed Decide what to do

slide-11
SLIDE 11

Communicate

New York Times

slide-12
SLIDE 12

Napoleon’s March to Russia

https://robots.thoughtbot.com/analyzing-minards-visualization-of-napoleons-1812-march

slide-13
SLIDE 13

Minard’s Graphic on Napoleon’s Russia Campaign

(from wikipedia)

slide-14
SLIDE 14

Minard’s Graphic on Napoleon’s Russia Campaign

(from wikipedia)

slide-15
SLIDE 15

Key Considerations

  • Who is your audience?
  • What questions are you answering?
  • Why should the audience care?
  • What are your major insights and surprises?
  • What change to you want to affect?
slide-16
SLIDE 16

Effective Visualizations

  • 1. Have graphical integrity
  • 2. Keep it simple
  • 3. Use the right display
  • 4. Use color strategically
  • 5. Know your audience
slide-17
SLIDE 17

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-18
SLIDE 18

http://lsr.nellco.org/cgi/viewcontent.cgi?article=1476&context=nyu_plltwp

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-19
SLIDE 19

Keep it Simple

slide-20
SLIDE 20

Don’t Make Them Think!

  • Your audience does not want to spend cognitive

effort on things you know and can just show them

  • Lead them through the major steps of your story
  • Point out interesting key facts and insights using

captions and annotations

slide-21
SLIDE 21

Don’t Bury the Lead

Cole Nussbaumer

slide-22
SLIDE 22

Don’t Bury the Lead

Cole Nussbaumer

slide-23
SLIDE 23

Use the right display

slide-24
SLIDE 24

Most Efficient Least Efficient

  • C. Mulbrandon

VisualizingEconomics.com

Quantitative Ordered Categories

}

}

}

slide-25
SLIDE 25

Most Effective

VisualizingEconomics.com

slide-26
SLIDE 26

Less Effective

VisualizingEconomics.com

slide-27
SLIDE 27

600 600 500 400 300 200 100 500 400 300 200 100 600 2015 A B C D E F G H I J K L M 14 13 12 11 10 2009 500 400 300 200 100 600 500 400 300 200 100 100 50 25 75 100 25 50 75 50 25 5 E F G H I J K L M 10 14 2015 13 12 11 10 2009

Possible solution to cases when you have data that diverge a lot

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-28
SLIDE 28

Use color strategically

slide-29
SLIDE 29

Colors for Categories

Do not use more than 5-8 colors at once

Ware, “Information Visualization”

slide-30
SLIDE 30

Colors for Ordinal Data

Zeilis et al, 2009, “Escaping RGBland: Selecting Colors for Statistical Graphics”

Vary luminance and saturation

slide-31
SLIDE 31

Colors for Quantitative Data

Rogowitz and Treinish, Why should engineers and scientists be worried about color?

Hue (Rainbow) Luminance Luminance & Hue

slide-32
SLIDE 32

Rainbow Colormap

X

slide-33
SLIDE 33

Rainbow Colormap

  • R. Simmon

Perceptually nonlinear

slide-34
SLIDE 34

Gray

slide-35
SLIDE 35

Color Blindness

Deuteranope Protanope Tritanope

Based on slide from Stone

Red / green deficiencies Blue / Yellow deficiency

slide-36
SLIDE 36

Color Blindness

Normal Protanope Deuteranope Lightness

Based on slide from Stone

slide-37
SLIDE 37

Viridis

slide-38
SLIDE 38

Color Brewer

Cynthia Brewer, Color Use Guidelines for Data Representation

Nominal Ordinal

slide-39
SLIDE 39
slide-40
SLIDE 40

Know your audience

slide-41
SLIDE 41
  • What do they know?
  • What motivates them? What do they desire?
  • What experiences do you share? What are

common goals?

  • What insights can you give them? What tools

and “magical gifts”?

slide-42
SLIDE 42

What is the message?

Exploratory
 Neutral Explanatory
 Opinionated

slide-43
SLIDE 43

Andy Cotgreave, Tableau

slide-44
SLIDE 44

Framing - Why should I care?

  • Tell the audience: “Here is the right way to think

about the problem I was trying to solve.”

  • Catch the audience’s attention and frame the story

using captions and annotations

  • If done well, your insights will seem obvious given

this framing. And that’s a good thing!

slide-45
SLIDE 45

Gun Deaths in 2010

slide-46
SLIDE 46

Tools for interactive graphics

  • R/shiny
  • plotly/dash
  • Tableau
  • d3.js
  • vega-lite/vega
slide-47
SLIDE 47

Is there a story?

Surface it….even if it is incomplete

slide-48
SLIDE 48

2014 Gun Deaths

slide-49
SLIDE 49

(XKCD)

slide-50
SLIDE 50
slide-51
SLIDE 51

Deaths by county, 2014

(crimeresearch.org)

slide-52
SLIDE 52

Careful with amalgamation paradoxes and with outliers

http://journal.frontiersin.org/article/10.3389/fpsyg.2013.00513/full

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-53
SLIDE 53

Ask Ask Ask

  • Is the exact distribution of guns really the important

concern?

  • did we check the uncertainties?
  • Should we be looking at this from a “risk” perspective?
  • we tend to believe what we believe and look for confirmation.
  • we need to be disciplined about interrogating ourselves
  • it is ok (and not against simplicity) to surface our process
slide-54
SLIDE 54

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 23 23 24 25 25 26 27 28 29 29 30 31 31 32 32 34 35 36 37 38 38 38 39 39 39 40 38 39 40 45 46

a woman’s age vs. the age of the men who look best to her

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

(from Dataclysm)

slide-55
SLIDE 55

a man’s age vs. the age of the women who look best to him

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 20 20 21 21 21 21 22 21 20 20 20 20 20 20 20 20 20 22 20 20 21 21 20 23 21 24 20 20 23 20 22

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-56
SLIDE 56

Sample of 100 men of 40

  • vs. the age of the women

who look best to them

=1 of men

20 25 30 35 40 45 50 most common value: 21

Number

  • f men

Women’s ages

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-57
SLIDE 57

Sample of 100 men of 40

  • vs. the age of the women

who look best to them

=1 of men

20 25 30 35 40 45 50 most common value: 21 20 25 30 35 40 45 50

Number

  • f men

Women’s ages

most common value: 21

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo

slide-58
SLIDE 58

Structure of communication graphics

slide-59
SLIDE 59
  • E. Segel
slide-60
SLIDE 60
  • E. Segel
slide-61
SLIDE 61
  • E. Segel
slide-62
SLIDE 62
  • E. Segel
slide-63
SLIDE 63
  • E. Segel

Headline Captions Annotations Call Out Boxes

slide-64
SLIDE 64
  • M. Krzywinski & A. Cairo
slide-65
SLIDE 65

Application to modeling

slide-66
SLIDE 66

IMAC

I: inferential goal (scientific question of interest) M: model (all models are wrong, some are useful) A: algorithms C: conclusions and checking

The C is crucial: what did we learn? Was the model useful, and how well does it fit? How do we know whether the method is working? Do we understand how it is working? Do we need to iterate and improve the model? What are the limitations and future directions?

slide-67
SLIDE 67

Breast Cancer on a Mammogram

  • False positives OK
  • False Negatives are disaster
  • More people dont have it

Which Model is Better?

(from Foster and Fawcett)

slide-68
SLIDE 68

Communicating a model

slide-69
SLIDE 69

Survey 1000 customers , with an offer with an administrative cost of $3 and an offer cost of $100, an incentive for the customer to stay with us. Want to predict for our 100000 customer base. If a customer leaves us, we lose the customer lifetime value, which is some kind of measure of the lost profit from that customer. Lets assume this is the average number of months a customer stays with the telecom times the net revenue from the customer per month. We'll assume 3 years and $30/month margin per user lost, for roughly a $1000 loss. admin_cost=3

  • ffer_cost=100

clv=1000 # customer lifetime value

  • TN=people we predicted not to churn who wont churn. We associate no cost with this as they continue being our

customers

  • FP=people we predict to churn. Who wont. Lets associate a admin_cost+offer_cost cost per customer with this as we

will spend some money on getting them not to churn, but we will lose this money.

  • FN=people we predict wont churn. And we send them nothing. But they will. This is the big loss, the clv
  • TP= people who we predict will churn. And they will. These are the people we can do something with. So we make them an
  • ffer. Say a fraction f accept it. Our cost is admin_cost + f*offer_cost + (1-f)*clv.

f = 0.5 tnc = 0. fpc = admin_cost+offer_cost fnc = clv tpc = admin_cost + f*offer_cost + (1. - f)*clv

Telecom Churn Problem

slide-70
SLIDE 70

Average Cost = TN x TNC + TP x TPC + FN x FNC + TP x TPC

553 103 1000

slide-71
SLIDE 71
slide-72
SLIDE 72

Annotated Diagram

Loss made with Preview

slide-73
SLIDE 73

Reduce churn and our cost by sending customers an offer

Making offers within Budget

This study was made on a pilot survey of 1000 customers from our 100000 customer base. Make an offer with an administrative cost of $3 and an offer cost of $100, an incentive for the customer to stay with us. If a customer leaves us, we lose the customer lifetime value (CLV), a roughly $1000 loss. We assume that 50% of those customers targeted will stay with us. If we do nothing we lose $150 per customer including CLV We choose which customers to target according to 2 different models, dt and gnb:

  • Making an offer to 13% of our most likely to leave

customers will cut this cost to a lowest value of $103 per customer according to the dt model, for a total cost of $1.34 million.

  • If we only target 10% of the customers (Budget 1)

using the dt model, we get by in 1.03 million but incur a loss of $110 per customer including CLV.

  • If we target 40% of our customers, we need a

budget (Budget 2) of $4.2 million. Here the gnb model performs better and we will choose customers according to it. We incur a loss of $116 per customer including CLV.

Loss made with Pages

slide-74
SLIDE 74

StoryTelling

slide-75
SLIDE 75
slide-76
SLIDE 76

Edward Tufte

slide-77
SLIDE 77

Stephen Few

slide-78
SLIDE 78

I’ve always believed in the power of data visualization (the representation of information by means of charts, diagrams, maps, etc.) to enable understanding

2012 2016

Alberto Cairo • University of Miami • www.thefunctionalart.com • Twitter: @albertocairo