3 Visualizing quantitative Information 1 Outline New ideas - - PowerPoint PPT Presentation

3 visualizing quantitative information
SMART_READER_LITE
LIVE PREVIEW

3 Visualizing quantitative Information 1 Outline New ideas - - PowerPoint PPT Presentation

Elective in Software and Services (Complementi di software e servizi per la societ dell'informazione) Section Inf nfor ormat ation V on Visual sualizat ation on Numbers of credit : 3 Gius usep eppe pe S Sant antucci 3


slide-1
SLIDE 1

1

Elective in Software and Services (Complementi di software e servizi per la società dell'informazione) Section Inf

nfor

  • rmat

ation V

  • n Visual

sualizat ation

  • n

Numbers of credit : 3

Gius usep eppe pe S Sant antucci

3 – Visualizing quantitative Information

slide-2
SLIDE 2

2

Outline

  • New ideas about good and bad graphs
  • Meaning of numbers
  • Tables and graphs
  • Basic table variations
  • Basic graph variations
slide-3
SLIDE 3

3

An example

  • You are a manager of a big company
  • You need to control and to report, every Monday, the current state of

quarterly sales in the Americas, Asia, and Europe, with the goal of verifying your forecast

  • Someone presents you with this graph
  • Are you happy with it?
  • Think how to design something that is more informative for your job.
slide-4
SLIDE 4

4

All the needed information

  • Units !
  • The actual date !
  • Some additional summarizing information (percentage)
  • Planned sales vs actual sales
slide-5
SLIDE 5

5

Another example

  • Is it ok?
  • Try to design

a better bar chart

  • The focus is

the comparison

slide-6
SLIDE 6

6

Comparison !

  • A possible solution

using percentages

slide-7
SLIDE 7

7

The last example: our company against the world!

  • What is the

purpose of this chart?

  • Comparison !
  • What is wrong

whit it?

slide-8
SLIDE 8

8

Even worst : 3D!!!

slide-9
SLIDE 9

9

The last example

  • Is the order

clear?

  • Which is my

company?

  • Who is bigger

G or A?

slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

A better solution

G If you have ordering (ranking) alternatives think about that!

slide-12
SLIDE 12

12

Why do I hate pie-charts?

Length Position Angle Slope Area Volume Colour Density Most accurate Least accurate

The relative difficulty of assessing quantitative value as a function of visual encoding mechanism, as established by Cleveland and McGill

Pie-charts discards the two first choices I do NOT see ANY reason to use them

slide-13
SLIDE 13

13

What about quantitative comparison?

Use position and length Avoid angles Avoid areas Avoid volumes Use colors carefully

slide-14
SLIDE 14

14

  • It works fine

Position

slide-15
SLIDE 15

15

  • The lookup of precise number might be difficult if the

position is not evident (e.g., stacked bar chart)

Length?

? 40 220 180 20

It makes sense to explicitly add figures

slide-16
SLIDE 16

16

  • Length is fine as well , but use the right scale!

Length?

Automatically produced by Excel The reality

slide-17
SLIDE 17

17

  • Human being are very bad in estimating area ratios
  • What is the ratio between this two circles?

35% 40% 45% 50% 55% 60% ?

  • What is the shape that produces the biggest error?
  • The square!
  • Perceptual Guidelines for Creating Rectangular Treemaps (Nicholas Kong et al., Infovis 2010)

Areas: some new surprising issues

slide-18
SLIDE 18

18

  • Someone already thought how to associate quantitative

values to colors and different choices are available

  • Do not reinvent the wheel
  • (The rainbow scale does not work)

Colors / Numerical data

rainbow scale HSI color model (Keim and Kriegel) - Issues in visualizing large databases. Proc. of the IFIP working conference

  • n Visual database Systems, 1995
slide-19
SLIDE 19

19

Other choices (Colin Ware)

slide-20
SLIDE 20

20

  • Colors are fine with categorical data
  • Do not reinvent the wheel (again)
  • The Ewald Hering idea is that there are only 6 elementary

colors arranged in three pairs

  • That gives us up to 12 (6+6) colors easily distinguishable

(11!)

Colors /Categorical data

slide-21
SLIDE 21

21

Some new considerations

  • Chartjunk is not the unique enemy...
  • Before PCs building graphs was a matter of paper and

pencil

– requiring time and effort – pushing you to better understand :

  • the meaning of numbers
  • the graph purpose
  • the graph organization
  • now, with Excel you can produce graphs so fast that you

might loose control...

– you select predefined solutions – you might not understand how the graph is built (row, columns, headings, ...) – you can make mistakes (e.g., missing a row...)

slide-22
SLIDE 22

22

So...

  • 1. Look at the numbers and at the task
  • 2. Plan a graph (even on the paper!), considering

perceptual issues

  • 3. Look for an Excel implementation of your design
  • 4. If 3 fails, proceed without Excel !
slide-23
SLIDE 23

23

Outline

  • New ideas about good and bad graphs
  • Meaning of numbers
  • Tables and graphs
  • Basic table variations
  • Basic graph variations
slide-24
SLIDE 24

24

Types of Data (C. Ware & B. Spence)

  • Entities
  • Relationships
  • Attributes of Entities or Relationships

– Nominal / Ordinal / Interval / Ratio – Categorical / Integer / Real

  • Operations Considered as Data

– Mathematical – Merging lists – Transforming data, etc. – Metadata (derived data)

slide-25
SLIDE 25

25

Type of data (Sthefen Few)

  • Relationships !
  • Quantitative data (allows arithmetic operations)
  • Categorical data (group, identify & organize; no

arithmetic !)

– Nominal – Ordinal – Interval – Hierarchical

slide-26
SLIDE 26

26

Relationships

Quantitative information Relationship Unit of products sold per geographical region Sales related to geography Expenses by department and month Expenses related to

  • rganizational structure and time

The number of students that got

  • ne of the possible exam score

Students counts related to exam's performance

Categories

slide-27
SLIDE 27

27

Relationships

Quantitative information Relationship The effect of a mass-mailing marketing campaign on order volume The numbers of letters sent related to the numbers of orders received Unit produced Nominal product cost

More complex relationships Different types of relationships require different types of display

slide-28
SLIDE 28

28

A quick example

Quantitative (y axis) vs categorical data (x axis and colors)

slide-29
SLIDE 29

29

A quick example

  • Quantitative vs quantitative data
slide-30
SLIDE 30

30

Adapted from Stone & Zellweger

Types of Data

  • Quantitative (allows arithmetic operations)
  • 123, 29.56, …
  • Categorical (group, identify & organize; no arithmetic)

Nominal (name only, no ordering)

  • Direction: North, East, South, West

Ordinal (ordered, not measurable)

  • First, second, third …
  • Hot, warm, cold

Interval (starts out as quantitative, but it is made categorical by subdividing

into ordered ranges)

  • 0-999, 1000-4999, 5000-9999, 10000-19999, …

Hierarchical (successive inclusion)

  • Region: Continent > Country > State > City
  • Animal > Mammal > Horse
slide-31
SLIDE 31

31

Nominal relationship

  • Order is not relevant

– Be aware of some artificial orders (conventional/ alphabetical order) – Maintain consistence across different graphs

  • Just divide up the quantitative value

Region Sales North 50,000 South 20,000 East 40,000 West 20,000 Total 130,000

slide-32
SLIDE 32

32

Ordinal relationship

  • Order is relevant
  • Altering it is not a good idea

Production office Sales First office (1977) 50,000 Second office (2000) 20,000 Third office (2005) 40,000 Total 110,000

slide-33
SLIDE 33

33

Sum 2000785 20086356256 134555 700005254

Interval relationship

  • Several equal intervals (bins) covering the whole range

– Frequency distribution – Other math's Order size Count [0, 1000) 25 [1000, 2000) 19 [2000, 3000) 13 [4000, 5000) 14

slide-34
SLIDE 34

34

Time series relationship

  • Which kind of relationship best describes the

categorical subdivision of time?

Dept Jan Feb Mar Qtotal Marketing 83,883 98,883 95,939 273,655 Sales 38,838 39,848 39,488 118,174

slide-35
SLIDE 35

35

Time series relationship

  • Which kind of relationship best describes the categorical

subdivision of time?

– Obviously is ordinal – But months represent intervals as well

slide-36
SLIDE 36

36

Hierarchical relationship

  • Multiple categories, closely related to each other as separate

levels in a ranked arrangement

  • Commonly used in tables to arrange quantitative information

(e.g., OLAP, On-Line Analytical Processing)

  • http://www.tableausoftware.com/products/desktop
slide-37
SLIDE 37

37

Relationships between quantities

  • Ranking
  • Ratio / Proportion
  • Correlation
slide-38
SLIDE 38

38

Ranking

  • Is an ordinal relationship in which the order is based
  • n the associated quantitative values
slide-39
SLIDE 39

39

Ratio/Proportion

  • It is a relationship involving two quantitative values, compared

by dividing one by the other

  • If one is a part of the whole ( a/a+b ) it is a proportion and it is

typically represented as a percentage (ranging between 0 and 100)

  • If the two values come from different sets it is a ratio, and it can

assume any value, also above 100 and it makes sense consider the difference as well, that could be negative

slide-40
SLIDE 40

40

Proportion example

slide-41
SLIDE 41

41

Ratio example

Department Jan Feb Feb/Jan Sales 9,933 9,293 0.93 Marketing 5,385 5,832 1.08

slide-42
SLIDE 42

42

Correlation relationship

  • Correlation is a relationship in which the values of

two paired set of quantities are compared, looking for a (usually linear) function between them +1

  • 1 0

M1 M1 M1 M2 M2 M2

Directly proportional Inversely proportional No correlation

slide-43
SLIDE 43

43

Data & relationships summary

  • Quantitative information consist of two types of data

– Quantitative – Categorical

  • Relationship among data could be

– Simple associations between quantitative and categorical subdivision – More complex association among multiple set of values

  • Four types of relationship within categories

– Nominal – Ordinal – Interval – Hierarchical

  • Three types of relationships between quantitative values

– Ranking – Ratio – Correlation

slide-44
SLIDE 44

44

Numbers that summarize

  • Measures of average

– Mean – Median – Mode – Midrange

  • Measures of distribution

– Range – Variance – Standard deviation

slide-45
SLIDE 45

45

Mean

  • Nothing to say but that sometimes it is not informative
slide-46
SLIDE 46

46

Median

  • It splits the sorted distribution in two
slide-47
SLIDE 47

47

Moda and midrange (mmm...)

  • Moda is just the most common

element

  • Midrange is (max+min)/2
  • Moda=165,000
  • Midrange =(475,000+25,000)/2=

250,000

slide-48
SLIDE 48

48

Distribution

  • Performances of delivery time of 12 orders of two warehouses
  • Do they perform the same?
  • What is missing?

Warehouse Sum of shipping days Delivery mean Delivery median A 51 4.25 4.5 B 51 4.25 4.5

slide-49
SLIDE 49

49

Distribution

Order # Warehouse A Warehouse B 1 3 1 2 3 1 3 3 1 4 4 3 5 4 3 6 4 4 7 5 5 8 5 5 9 5 5 10 5 6 11 5 7 12 5 10

slide-50
SLIDE 50

50

Range

  • Range is just max-min
  • Range A = 2
  • Range B = 9

Order # Warehouse A Warehouse B 1 3 1 2 3 1 3 3 1 4 4 3 5 4 3 6 4 4 7 5 5 8 5 5 9 5 5 10 5 6 11 5 7 12 5 10

slide-51
SLIDE 51

51

Standard deviation

  • This variability is well described by variance and standard deviation
  • mean:

µ= (x1+ x2+... +xn)/N

  • variance

var=[(x1-m)2+ (x2-m)2 +...(xn-m)2]/N

  • standard deviation σ=var1/2
  • However such concepts are hard to communicate

P

µ µ+1.96σ

68.26% dei dati 95% dei dati

µ+σ µ-σ µ-1.96σ

X ~70% of data

slide-52
SLIDE 52

52

Standard deviation

  • These bar charts compare values with mean, providing a simpler way
  • f communicating standard deviation
slide-53
SLIDE 53

53

Measures of ratio

  • Simple numerical relationship between two values
  • It can be used to summarize data as well
slide-54
SLIDE 54

54

Money (but also college grades)

  • It is one of the few measure whose scale changes

across time

– inflation / deflation – change rate

  • In comparisons you have to take that into account

http://www.gapminder.org/

slide-55
SLIDE 55

55

Number that summarize

slide-56
SLIDE 56

56

Outline

  • New ideas about good and bad graphs
  • Meaning of numbers
  • Tables and graphs
  • Basic table variations
  • Basic graph variations
  • Relationships in graphs
slide-57
SLIDE 57

57

Table and graphs

  • Table and graphs are widely used to communicate quantitative

information

  • Sometimes it is better to just show the (few) numbers
  • The goals of presenting quantitative data are

– Analyzing – Monitoring – Planning – Communicating

  • Remember that we are dealing with data that is

– Quantitative – Categorical

  • Not all numbers carry quantitative information

– Categorical intervals – IDs (e.g., order number)

slide-58
SLIDE 58

58

A very bad table…

37.2 28.39

slide-59
SLIDE 59

59

Quantitative or categorical ?

  • X axes ?
  • Y axes ?
  • Legend ?
  • Bars?
  • Title?
slide-60
SLIDE 60

60

A table without quantitative values

Monday : Fondamenti di Informatica Tuesday: Fondamenti di Informatica Wednesday: Fondamenti di Informatica + Inf. Visualization Friday:

  • Inf. Visualization
slide-61
SLIDE 61

61

Table

  • Data are arranged in columns and row
  • Data are encoded as text (usually)
  • They are used also for non quantitative information (just spatial

arrangement) 1. Table make easy look up values 2. Tables allow for displaying simple relationships between quantitative and categorical subdivision 3. Table allow for local comparisons 4. Tables provide for high precision 5. Table allow for easy management of different units of measure

slide-62
SLIDE 62

62

Choose a table when...

  • If one of the following is true, a table could be a

good choice

  • 1. The report you produce will be used to look up

single values

  • 2. It will be used to compare individual values
  • 3. Precise values are required
  • 4. Different units of measure are involved
slide-63
SLIDE 63

63

A table with non numerical values

slide-64
SLIDE 64

64

Graphs

  • A graph is a visual display of quantitative information
  • Quantitative information is encoded visually
  • More precisely, values are represented and

presented on one or more axes

  • Axes provide scales (quantitative or categorical)
slide-65
SLIDE 65

65

Graphs

  • A graph provides the overall shape of the data
  • Trend
  • Outliers
  • Similarity and differences
  • Low precision
  • Not easy look up
  • Not easy local comparison
  • Not easy handling of different units
slide-66
SLIDE 66

66

Outline

  • New ideas about good and bad graphs
  • Meaning of numbers
  • Tables and graphs
  • Basic table variations
  • Basic graph variations
  • Relationships in graphs
slide-67
SLIDE 67

67

Fundamental variation in table design

  • Relationships in table

– Quantitative to categorical – Quantitative to quantitative

  • Variation in table design

– Unidirectional – Bidirectional – Table design solutions

slide-68
SLIDE 68

68

Quantitative to categorical relationships

  • 1. 1:1 - One set of quantitative values and one set of

categorical subdivisions

  • 2. 1:n - One set of quantitative values and the

intersection of multiple categories

  • 3. 1:hn- One set of quantitative values and the

intersection of hierarchical categories

slide-69
SLIDE 69

69

1:1 - One set of quantitative values and

  • ne set of categorical subdivision

nominal

slide-70
SLIDE 70

70

1:n - One set of quantitative values (sales) and the intersection of multiple categories (salespersons & months)

nominal + interval (time)

slide-71
SLIDE 71

71

1:hn - One set of quantitative values (sales) and the intersection of hierarchical categories (Product Line -> Family -> Product)

Interaction could be a key issue. Interaction? No interaction!

slide-72
SLIDE 72

72

Quantitative to quantitative relationships

  • 1. Among one set of quantitative values associated

with multiple categorical subdivision

  • 2. Among distinct sets of quantitative values

associated with the same categorical subdivision

slide-73
SLIDE 73

73

Among one set of quantitative values (sales) associated with multiple categorical subdivision (sales by several salespersons in different months)

  • Here the focus is the comparison among

homogeneous values

slide-74
SLIDE 74

74

Among distinct sets of quantitative values (sales, returns, net) associated with the same categorical subdivision (a salesperson)

  • Here the focus is the comparison among NOT

homogeneous values (not the unit but the category)

slide-75
SLIDE 75

75

Variation - Unidirectional

  • Categories are arranged across columns or rows but

not in both directions

slide-76
SLIDE 76

76

Variation - Unidirectional

  • Categories are arranged across columns or rows but

not in both directions (here we have two categories)

slide-77
SLIDE 77

77

Variation - Bidirectional

  • Categories are on both axes
  • Such tables are called crosstab or pivot table.
slide-78
SLIDE 78

78

Variation - Bidirectional

  • They save space

Unidirectional Bidirectional

slide-79
SLIDE 79

79

Graphs

  • Several components

– scales on axes – grid lines – bar – legends – ...

  • Quantitative values
  • Categorical subdivision
slide-80
SLIDE 80

80

Graphs' variation

  • The primary source of variation is the choice (or

combination) of different components used to encode quantitative values: – point – lines – bars – shapes with 2D area

slide-81
SLIDE 81

81

Points

  • Scatter plot
  • Points vs lines or

bars

slide-82
SLIDE 82

82

Points vs lines

  • Points and lines
  • Only lines
  • Use lines only when both axes are numerical or

there exists an order (e.g., intervals)

slide-83
SLIDE 83

83

Trend line (correlation)

slide-84
SLIDE 84

84

Bars

  • Thickness is not relevant
  • Thickness must be constant
slide-85
SLIDE 85

85

Bars

  • Do not lie!
slide-86
SLIDE 86

86

Bars

  • Start scale by zero!
slide-87
SLIDE 87

87

Shapes with 2D area

  • Classical pie chart
  • Part of a larger family of area

graphs

  • Remember its limitations
  • Where is the scale ?
  • Our visual perception is not

good to accurately assess and compare quantitative values using areas (or worst, slices)

So, simply, do not use them at all !!

slide-88
SLIDE 88

88

Bargrams (not used in business)

slide-89
SLIDE 89

89

Categorical subdivision

  • Position
  • Color
  • Point shape
  • Fill pattern
  • Line style
slide-90
SLIDE 90

90

Position

  • X axis
slide-91
SLIDE 91

91

Color

  • We will see perceptual issues about colors...
slide-92
SLIDE 92

92

Point shape

  • Only applicable when points represents quantitative

values

slide-93
SLIDE 93

93

Position, Color, Point shape

slide-94
SLIDE 94

94

Fill pattern

mmm, hard to see and causing moirè vibration

slide-95
SLIDE 95

95

Moirè vibration

use as the last resource

slide-96
SLIDE 96

96

Line style

slide-97
SLIDE 97

97

Outline

  • New ideas about good and bad graphs
  • Meaning of numbers
  • Tables and graphs
  • Basic table variations
  • Basic graph variations
  • Relationships in graphs
slide-98
SLIDE 98

98

Relationships in Graphs

  • Nominal comparison
  • Time series
  • Ranking
  • Part-to-whole
  • Deviation
  • Distribution
  • Correlation
slide-99
SLIDE 99

99

Nominal comparison

  • Nominal categorical attribute
  • Quantitative values that are compared each other
slide-100
SLIDE 100

100

Nominal comparison

  • If bars are quite similar it is possible to narrow the quantitative scale

removing the zero and focusing on the lowest and highest values

  • In this case is better to use points (do not lie)
slide-101
SLIDE 101

101

Time series

  • Time categorical subdivision
  • Quantitative values that are compared each other for

– Change – Rise – Fluctuate – Decline – Trend – ..

slide-102
SLIDE 102

102

Ranking

  • Categorical subdivision sorted by size
  • Quantitative values that are compared each other for

– Larger than – Smaller than – Equal to – nth position – ...

slide-103
SLIDE 103

103

Part-to-whole

  • How individual quantitative values, associated to categorical relate to the

complete set of values

  • It is a proportion, usually expressed as percentage
  • Quantitative values that are compared each other for

– Percent – Share – ...

Problems of shapes with 2D areas (like pie charts)

slide-104
SLIDE 104

104

Part-to-whole

  • Much better
slide-105
SLIDE 105

105

Part-to-whole

  • Useful stacked bars (mmm...)
slide-106
SLIDE 106

106

Part-to-whole - Pareto diagrams

  • Software errors share
  • Less intuitive
slide-107
SLIDE 107

107

Deviation design

  • The degree to which one or more quantitative values differ in

relation to a primary set of values

  • Color is categorical: bad vs good data
slide-108
SLIDE 108

108

Deviation design

  • Same data as percentage
slide-109
SLIDE 109

109

Deviation design

  • Same data as percentage
slide-110
SLIDE 110

110

Deviation design + time-series

slide-111
SLIDE 111

111

Deviation design + time-series

  • Note that the horizontal line represents very different

values

slide-112
SLIDE 112

112

Control chart

µ and σ can give more information to expert people What is wrong with this graph?

slide-113
SLIDE 113

113

Lines are wrong here!

µ µ+3σ

5 10 15 20 25 30 35 40 45 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

slide-114
SLIDE 114

114

Distribution (values)

  • Histogram
slide-115
SLIDE 115

115

Distribution (shape)

  • Frequency polygon
slide-116
SLIDE 116

116

Distribution (values+shape)

slide-117
SLIDE 117

117

Multiple distributions (shapes)

slide-118
SLIDE 118

118

Multiple distributions (boxplots)

1. On average women are paid less 2. The disparity becomes increasingly greater as one's salary increases 3. Salaries vary the most for women in the higher salary grades

slide-119
SLIDE 119

119

Correlation

slide-120
SLIDE 120

120

Correlation

slide-121
SLIDE 121

121

Correlation

slide-122
SLIDE 122

122

Correlation

slide-123
SLIDE 123

123

Summary

Relationship Points Lines Points & Lines Bars Nominal comparison When narrowing the scale and removing the zero Avoid Avoid horizontal or vertical Time series Avoid x=time y= quantitative emphasis on trends x=time y= quantitative emphasis on trends and individual values x=time y= quantitative emphasis on individual values Ranking When narrowing the scale and removing the zero Avoid Avoid horizontal or vertical Part-to-whole Avoid Avoid Avoid horizontal or vertical Deviation Avoid Useful combined with time series Useful combined with time series and emphasis on individual values horizontal or vertical vertical with time series SingleDistribution Multiple Distribut. Avoid Use to mark median in boxplots

  • emph. on pattern

up to 5 distributions Avoid Avoid Histogram As boxplots Correlation Scatter plot Avoid Only as a trend (not connecting points) horizontal or vertical