CS 147: Computer Systems Performance Analysis Mistakes in Graphical - - PowerPoint PPT Presentation

cs 147 computer systems performance analysis
SMART_READER_LITE
LIVE PREVIEW

CS 147: Computer Systems Performance Analysis Mistakes in Graphical - - PowerPoint PPT Presentation

CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Mistakes in Graphical Presentation CS 147: Computer Systems Performance Analysis Mistakes in Graphical Presentation 1 / 45 Overview CS147 Overview 2015-06-15 Common Mistakes


slide-1
SLIDE 1

CS 147: Computer Systems Performance Analysis

Mistakes in Graphical Presentation

1 / 45

CS 147: Computer Systems Performance Analysis

Mistakes in Graphical Presentation

2015-06-15

CS147

slide-2
SLIDE 2

Overview

Common Mistakes in Graphics Excess Information Multiple Scales Symbols for Text Poor Scales Bad Line Usage Pictorial Games Non-Zero Origins Double Whammy No Confidence Intervals Height Scaling Histogram Problems Graphical Integrity Special-Purpose Charts A Few Examples

2 / 45

Overview

Common Mistakes in Graphics Excess Information Multiple Scales Symbols for Text Poor Scales Bad Line Usage Pictorial Games Non-Zero Origins Double Whammy No Confidence Intervals Height Scaling Histogram Problems Graphical Integrity Special-Purpose Charts A Few Examples

2015-06-15

CS147 Overview

slide-3
SLIDE 3

Common Mistakes in Graphics Excess Information

Excess Information

◮ Sneaky trick to meet length limits ◮ Rules of thumb:

◮ 6 curves on line chart ◮ 10 bars on bar chart ◮ 8 slices on pie chart ◮ (But note that Tufte hates pie charts)

◮ Extract essence; don’t cram things in

3 / 45

Excess Information

◮ Sneaky trick to meet length limits ◮ Rules of thumb: ◮ 6 curves on line chart ◮ 10 bars on bar chart ◮ 8 slices on pie chart ◮ (But note that Tufte hates pie charts) ◮ Extract essence; don’t cram things in

2015-06-15

CS147 Common Mistakes in Graphics Excess Information Excess Information

slide-4
SLIDE 4

Common Mistakes in Graphics Excess Information

Way Too Much Information

1 R E P L 2 3 4 5 6 7 8 100 200 300 400

Time

CP FIND FINDGREP GREP LS MAB RCP RM

4 / 45

Way Too Much Information

1 R E P L 2 3 4 5 6 7 8 100 200 300 400 Time CP FIND FINDGREP GREP LS MAB RCP RM

2015-06-15

CS147 Common Mistakes in Graphics Excess Information Way Too Much Information What’s important on that chart?

  • Times for cp and rcp rise with number of replicas
  • Most other benchmarks are near constant
  • Exactly constant for rm
slide-5
SLIDE 5

Common Mistakes in Graphics Excess Information

The Right Amount of Information

1 2 3 4 5 6 7 8

Replicas

100 200 300 400

Time

cp compile rm

5 / 45

The Right Amount of Information

1 2 3 4 5 6 7 8 Replicas 100 200 300 400 Time cp compile rm

2015-06-15

CS147 Common Mistakes in Graphics Excess Information The Right Amount of Information

slide-6
SLIDE 6

Common Mistakes in Graphics Multiple Scales

Multiple Scales

◮ Another way to meet length limits ◮ Basically, two graphs overlaid on each other ◮ Confuses reader (which line goes with which scale?) ◮ Misstates relationships

◮ Implies equality of magnitude that doesn’t exist 6 / 45

Multiple Scales

◮ Another way to meet length limits ◮ Basically, two graphs overlaid on each other ◮ Confuses reader (which line goes with which scale?) ◮ Misstates relationships ◮ Implies equality of magnitude that doesn’t exist

2015-06-15

CS147 Common Mistakes in Graphics Multiple Scales Multiple Scales

slide-7
SLIDE 7

Common Mistakes in Graphics Multiple Scales

Some Especially Bad Multiple Scales

1 2 3 4 5 10 15 20 25 30 35 40 45

Throughput Response Time

10 100 1000

7 / 45

Some Especially Bad Multiple Scales

1 2 3 4 5 10 15 20 25 30 35 40 45 Throughput Response Time 10 100 1000

2015-06-15

CS147 Common Mistakes in Graphics Multiple Scales Some Especially Bad Multiple Scales

slide-8
SLIDE 8

Common Mistakes in Graphics Symbols for Text

Using Symbols in Place of Text

◮ Graphics should be self-explanatory

◮ Remember that the graphs often draw the reader in

◮ So use explanatory text, not symbols ◮ This means no Greek letters!

◮ Unless your conference is in Athens... 8 / 45

Using Symbols in Place of Text

◮ Graphics should be self-explanatory ◮ Remember that the graphs often draw the reader in ◮ So use explanatory text, not symbols ◮ This means no Greek letters! ◮ Unless your conference is in Athens...

2015-06-15

CS147 Common Mistakes in Graphics Symbols for Text Using Symbols in Place of Text

slide-9
SLIDE 9

Common Mistakes in Graphics Symbols for Text

It’s All Greek To Me...

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

ρ

2 4 6 8 10 12

w

9 / 45

It’s All Greek To Me...

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ρ 2 4 6 8 10 12 w

2015-06-15

CS147 Common Mistakes in Graphics Symbols for Text It’s All Greek To Me...

slide-10
SLIDE 10

Common Mistakes in Graphics Symbols for Text

Explanation is Easy

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Offered Load

2 4 6 8 10 12

Waiting Time

Waiting Time as a Function

  • f Offered Load

10 / 45

Explanation is Easy

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Offered Load 2 4 6 8 10 12 Waiting Time

Waiting Time as a Function

  • f Offered Load

2015-06-15

CS147 Common Mistakes in Graphics Symbols for Text Explanation is Easy

slide-11
SLIDE 11

Common Mistakes in Graphics Poor Scales

Poor Scales

◮ Fiddle with axis ranges (and logarithms) to get your message

across

◮ But don’t lie or cheat

◮ Sometimes trimming off high ends makes things clearer

◮ Brings out low-end detail 11 / 45

Poor Scales

◮ Fiddle with axis ranges (and logarithms) to get your message

across

◮ But don’t lie or cheat ◮ Sometimes trimming off high ends makes things clearer ◮ Brings out low-end detail

2015-06-15

CS147 Common Mistakes in Graphics Poor Scales Poor Scales

slide-12
SLIDE 12

Common Mistakes in Graphics Poor Scales

A Poor Axis Range

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 2000 4000 6000 8000 10000 12000

12 / 45

A Poor Axis Range

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 2000 4000 6000 8000 10000 12000

2015-06-15

CS147 Common Mistakes in Graphics Poor Scales A Poor Axis Range

slide-13
SLIDE 13

Common Mistakes in Graphics Poor Scales

A Logarithmic Range

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 1 10 100 1000 10000

13 / 45

A Logarithmic Range

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 1 10 100 1000 10000

2015-06-15

CS147 Common Mistakes in Graphics Poor Scales A Logarithmic Range

slide-14
SLIDE 14

Common Mistakes in Graphics Poor Scales

A Truncated Range

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 10000

14 / 45

A Truncated Range

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 10000

2015-06-15

CS147 Common Mistakes in Graphics Poor Scales A Truncated Range

slide-15
SLIDE 15

Common Mistakes in Graphics Bad Line Usage

Using Lines Incorrectly

◮ Don’t connect points unless interpolation is meaningful ◮ Don’t smooth lines that are based on samples

◮ Exception: fitted non-linear curves 15 / 45

Using Lines Incorrectly

◮ Don’t connect points unless interpolation is meaningful ◮ Don’t smooth lines that are based on samples ◮ Exception: fitted non-linear curves

2015-06-15

CS147 Common Mistakes in Graphics Bad Line Usage Using Lines Incorrectly

slide-16
SLIDE 16

Common Mistakes in Graphics Bad Line Usage

Incorrect Line Usage

1 2 3 4 5 6 7 8

Replicas

100 200 300 400

Time

cp compile rm

16 / 45

Incorrect Line Usage

1 2 3 4 5 6 7 8 Replicas 100 200 300 400 Time cp compile rm

2015-06-15

CS147 Common Mistakes in Graphics Bad Line Usage Incorrect Line Usage

slide-17
SLIDE 17

Pictorial Games Non-Zero Origins

Non-Zero Origins and Broken Scales

◮ People expect (0,0) origins

◮ Subconsciously

◮ So non-zero origins are great way to lie ◮ More common than not in popular press ◮ Also very common to cheat by omitting part of scale

◮ “Really, Your Honor, I included (0,0)” 17 / 45

Non-Zero Origins and Broken Scales

◮ People expect (0,0) origins ◮ Subconsciously ◮ So non-zero origins are great way to lie ◮ More common than not in popular press ◮ Also very common to cheat by omitting part of scale ◮ “Really, Your Honor, I included (0,0)”

2015-06-15

CS147 Pictorial Games Non-Zero Origins Non-Zero Origins and Broken Scales

slide-18
SLIDE 18

Pictorial Games Non-Zero Origins

Non-Zero Origins

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 21 22 23 24 25 26 27

Us Them

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 40 60 80 100

Us Them

18 / 45

Non-Zero Origins

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 21 22 23 24 25 26 27 Us Them 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 40 60 80 100 Us Them

2015-06-15

CS147 Pictorial Games Non-Zero Origins Non-Zero Origins

slide-19
SLIDE 19

Pictorial Games Non-Zero Origins

The Three-Quarters Rule

Highest point should be 3/4 of scale or more

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 5 10 15 20 25 30

Us Them

19 / 45

The Three-Quarters Rule

Highest point should be 3/4 of scale or more 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 5 10 15 20 25 30

Us Them

2015-06-15

CS147 Pictorial Games Non-Zero Origins The Three-Quarters Rule

slide-20
SLIDE 20

Pictorial Games Double Whammy

Double-Whammy Graphs

◮ Put two related measures on same graph

◮ One is (almost) function of other

◮ Hits reader twice with same information

◮ And thus overstates impact

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 40 60 Sales ($) Units Shipped

20 / 45

Double-Whammy Graphs

◮ Put two related measures on same graph ◮ One is (almost) function of other ◮ Hits reader twice with same information ◮ And thus overstates impact 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 40 60 Sales ($) Units Shipped

2015-06-15

CS147 Pictorial Games Double Whammy Double-Whammy Graphs

slide-21
SLIDE 21

Pictorial Games No Confidence Intervals

Omitting Confidence Intervals

◮ Statistical data is inherently fuzzy ◮ But means appear precise ◮ Giving confidence intervals can make it clear there’s no real

difference

◮ So liars and fools leave them out 21 / 45

Omitting Confidence Intervals

◮ Statistical data is inherently fuzzy ◮ But means appear precise ◮ Giving confidence intervals can make it clear there’s no real

difference

◮ So liars and fools leave them out

2015-06-15

CS147 Pictorial Games No Confidence Intervals Omitting Confidence Intervals

slide-22
SLIDE 22

Pictorial Games No Confidence Intervals

Graph Without Confidence Intervals

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 60 70

22 / 45

Graph Without Confidence Intervals

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 60 70

2015-06-15

CS147 Pictorial Games No Confidence Intervals Graph Without Confidence Intervals

slide-23
SLIDE 23

Pictorial Games No Confidence Intervals

Graph With Confidence Intervals

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 60 70

23 / 45

Graph With Confidence Intervals

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 60 70

2015-06-15

CS147 Pictorial Games No Confidence Intervals Graph With Confidence Intervals

slide-24
SLIDE 24

Pictorial Games Height Scaling

Scaling by Height Instead of Area

Clip art is popular with illustrators: Women in the Workforce 1960 1980

24 / 45

Scaling by Height Instead of Area

Clip art is popular with illustrators: Women in the Workforce 1960 1980

2015-06-15

CS147 Pictorial Games Height Scaling Scaling by Height Instead of Area

slide-25
SLIDE 25

Pictorial Games Height Scaling

The Trouble with Height Scaling

◮ Previous graph had heights of 2:1 ◮ But people perceive areas, not heights

◮ So areas should be what’s proportional to data

◮ Tufte defines lie factor: size of effect in graphic divided by size

  • f effect in data

◮ Not limited to area scaling ◮ But especially insidious there (quadratic effect) 25 / 45

The Trouble with Height Scaling

◮ Previous graph had heights of 2:1 ◮ But people perceive areas, not heights ◮ So areas should be what’s proportional to data ◮ Tufte defines lie factor: size of effect in graphic divided by size

  • f effect in data
◮ Not limited to area scaling ◮ But especially insidious there (quadratic effect)

2015-06-15

CS147 Pictorial Games Height Scaling The Trouble with Height Scaling

slide-26
SLIDE 26

Pictorial Games Height Scaling

Scaling by Area

Same graph with 2:1 area: Women in the Workforce 1960 1980

26 / 45

Scaling by Area

Same graph with 2:1 area: Women in the Workforce 1960 1980

2015-06-15

CS147 Pictorial Games Height Scaling Scaling by Area

slide-27
SLIDE 27

Pictorial Games Histogram Problems

Poor Histogram Cell Size

◮ Picking bucket size is always problem ◮ Prefer 5 or more observations per bucket ◮ Choice of bucket size can affect results:

5 10 15 20 25 30 2 4 6 8 10 12

27 / 45

Poor Histogram Cell Size

◮ Picking bucket size is always problem ◮ Prefer 5 or more observations per bucket ◮ Choice of bucket size can affect results: 5 10 15 20 25 30 2 4 6 8 10 12

2015-06-15

CS147 Pictorial Games Histogram Problems Poor Histogram Cell Size Note that green bars are steadily decreasing, but blue bars rise, fall, and rise again. It’s not clear which is correct (given small counts in the smaller buckets).

slide-28
SLIDE 28

Graphical Integrity

Principles of Graphics Integrity (Tufte)

◮ Proportional representation of numbers ◮ Clear, detailed, thorough labeling ◮ Show data variation, not design variation ◮ Use deflated money units ◮ Don’t have more dimensions than data has ◮ Don’t quote data out of context

28 / 45

Principles of Graphics Integrity (Tufte)

◮ Proportional representation of numbers ◮ Clear, detailed, thorough labeling ◮ Show data variation, not design variation ◮ Use deflated money units ◮ Don’t have more dimensions than data has ◮ Don’t quote data out of context

2015-06-15

CS147 Graphical Integrity Principles of Graphics Integrity (Tufte)

slide-29
SLIDE 29

Graphical Integrity

Proportional Representation of Numbers

◮ Maintain lie factor of 1.0 ◮ Use areas, not heights, with clip art ◮ Avoiding “decorative” graphs will do wonders

◮ Not too hard for most engineers! 29 / 45

Proportional Representation of Numbers

◮ Maintain lie factor of 1.0 ◮ Use areas, not heights, with clip art ◮ Avoiding “decorative” graphs will do wonders ◮ Not too hard for most engineers!

2015-06-15

CS147 Graphical Integrity Proportional Representation of Numbers

slide-30
SLIDE 30

Graphical Integrity

Clear, Detailed, Thorough Labeling

◮ Goal is to defeat distortion and ambiguity ◮ Write explanations on graphic itself ◮ Label important events in the data

30 / 45

Clear, Detailed, Thorough Labeling

◮ Goal is to defeat distortion and ambiguity ◮ Write explanations on graphic itself ◮ Label important events in the data

2015-06-15

CS147 Graphical Integrity Clear, Detailed, Thorough Labeling

slide-31
SLIDE 31

Graphical Integrity

Show Data Variation, Not Design Variation

◮ Use one design for entire graphic ◮ In papers, try to use one design for all graphs ◮ Again, artistic license is big culprit

31 / 45

Show Data Variation, Not Design Variation

◮ Use one design for entire graphic ◮ In papers, try to use one design for all graphs ◮ Again, artistic license is big culprit

2015-06-15

CS147 Graphical Integrity Show Data Variation, Not Design Variation

slide-32
SLIDE 32

Graphical Integrity

Use Deflated Money Units

◮ Often necessary to show money over time

◮ Even in computer science ◮ E.g., price/performance over time ◮ Or expected future cost of a disk

◮ Nominal dollars are meaningless ◮ Derate by some standard inflation measure

◮ That’s what the WWW is for! 32 / 45

Use Deflated Money Units

◮ Often necessary to show money over time ◮ Even in computer science ◮ E.g., price/performance over time ◮ Or expected future cost of a disk ◮ Nominal dollars are meaningless ◮ Derate by some standard inflation measure ◮ That’s what the WWW is for!

2015-06-15

CS147 Graphical Integrity Use Deflated Money Units

slide-33
SLIDE 33

Graphical Integrity

Don’t Have More Dimensions Than Data Has

◮ This gets back to the Lie Factor ◮ 1-D data (e.g., money) should occupy one dimension on the

graph: not

◮ Clip art is prohibited by this rule

◮ But if you have to, use an area measure

$1.00 $2.00

33 / 45

Don’t Have More Dimensions Than Data Has

◮ This gets back to the Lie Factor ◮ 1-D data (e.g., money) should occupy one dimension on the

graph: not

◮ Clip art is prohibited by this rule ◮ But if you have to, use an area measure

$1.00 $2.00

2015-06-15

CS147 Graphical Integrity Don’t Have More Dimensions Than Data Has

slide-34
SLIDE 34

Graphical Integrity

Don’t Quote Data Out of Context

Tufte’s example:

1954 1955 1956 1957 250 275 300 325 350

Traffic Deaths and Enforcement of Speed Limits

Before stricter enforcement After stricter enforcement

34 / 45

Don’t Quote Data Out of Context

Tufte’s example: 1954 1955 1956 1957 250 275 300 325 350 Traffic Deaths and Enforcement of Speed Limits

Before stricter enforcement After stricter enforcement

2015-06-15

CS147 Graphical Integrity Don’t Quote Data Out of Context

slide-35
SLIDE 35

Graphical Integrity

The Same Data in Context

1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 50 100 150 200 250 300 350

Connecticut Traffic Deaths, 1951-1959

35 / 45

The Same Data in Context

1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 50 100 150 200 250 300 350

Connecticut Traffic Deaths, 1951-1959

2015-06-15

CS147 Graphical Integrity The Same Data in Context

slide-36
SLIDE 36

Special-Purpose Charts

Special-Purpose Charts

◮ Tukey’s box plot ◮ Histograms ◮ Scatter plots ◮ Gantt charts ◮ Kiviat graphs

36 / 45

Special-Purpose Charts

◮ Tukey’s box plot ◮ Histograms ◮ Scatter plots ◮ Gantt charts ◮ Kiviat graphs

2015-06-15

CS147 Special-Purpose Charts Special-Purpose Charts

slide-37
SLIDE 37

Special-Purpose Charts

Tukey’s Box Plot

◮ Shows range, median, quartiles all in one: minimum quartile quartile median maximum ◮ Tufte can’t resist improvements:

  • r
  • r even

37 / 45

Tukey’s Box Plot

◮ Shows range, median, quartiles all in one: minimum quartile quartile median maximum ◮ Tufte can’t resist improvements:

  • r
  • r even

2015-06-15

CS147 Special-Purpose Charts Tukey’s Box Plot

slide-38
SLIDE 38

Special-Purpose Charts

Histograms

Tufte improves everything about them:

1st 2nd 3rd 4th

Quarter

20 40 60 80 100

38 / 45

Histograms

Tufte improves everything about them: 1st 2nd 3rd 4th Quarter 20 40 60 80 100

2015-06-15

CS147 Special-Purpose Charts Histograms

slide-39
SLIDE 39

Special-Purpose Charts

Scatter Plots

◮ Useful in statistical analysis ◮ Also excellent for huge quantities of data

◮ Can show patterns otherwise invisible

5 10 5 10 15 20

39 / 45

Scatter Plots

◮ Useful in statistical analysis ◮ Also excellent for huge quantities of data ◮ Can show patterns otherwise invisible 5 10 5 10 15 20

2015-06-15

CS147 Special-Purpose Charts Scatter Plots

slide-40
SLIDE 40

Special-Purpose Charts

Better Scatter Plots

◮ Again, Tufte improves the standard

◮ But it can be a pain with automated tools

◮ Can use modified Tukey box plot for axes:

20 40 60 80 10 20 30 40

40 / 45

Better Scatter Plots

◮ Again, Tufte improves the standard ◮ But it can be a pain with automated tools ◮ Can use modified Tukey box plot for axes: 20 40 60 80 10 20 30 40

2015-06-15

CS147 Special-Purpose Charts Better Scatter Plots

slide-41
SLIDE 41

Special-Purpose Charts

Gantt Charts

◮ Shows relative duration of Boolean conditions ◮ Arranged to make lines continuous

◮ Each level after first follows FTTF pattern ◮ (Possibly repeated)

20 40 60 80 100% Network I/O CPU

41 / 45

Gantt Charts

◮ Shows relative duration of Boolean conditions ◮ Arranged to make lines continuous ◮ Each level after first follows FTTF pattern ◮ (Possibly repeated) 20 40 60 80 100% Network I/O CPU

2015-06-15

CS147 Special-Purpose Charts Gantt Charts Gantt charts are any chart with horizontal lines showing spans on the X axis. Also useful for scheduling; shows simultaneous tasks. Lines are divided in mid-true; any vertical line shows one unique combo of conditions. Length of line with particular condition shows percentage of time system spends in that state.

slide-42
SLIDE 42

Special-Purpose Charts

Kiviat Graphs

◮ Also called “star charts” or “radar plots” ◮ Useful for looking at balance between HB and LB metrics

42 / 45

Kiviat Graphs

◮ Also called “star charts” or “radar plots” ◮ Useful for looking at balance between HB and LB metrics

2015-06-15

CS147 Special-Purpose Charts Kiviat Graphs

slide-43
SLIDE 43

A Few Examples

A Very Bad Graph

43 / 45

A Very Bad Graph

2015-06-15

CS147 A Few Examples A Very Bad Graph

slide-44
SLIDE 44

A Few Examples

A Good Graph: Sunspots

44 / 45

A Good Graph: Sunspots

2015-06-15

CS147 A Few Examples A Good Graph: Sunspots Vertical scale is latitude of sunspot; length of bar is extent of latitude width of sunspot (longitude width is not in the graph). The 11-year cycle is easily visible. The horizontal scale is empty in a few places where sunspot data extends into it. This graph was drawn in 1904 by Edward Walter Maunder (1851-1928). It is commonly called a “butterfly diagram” for obvious reasons.

slide-45
SLIDE 45

A Few Examples

A Superb Graph: DEC Traces

45 / 45

A Superb Graph: DEC Traces

2015-06-15

CS147 A Few Examples A Superb Graph: DEC Traces X axis is time (instructions executed). Y axis is memory address referenced, modulo 4 MB. Red lines are data accesses, blue

  • instructions. Green is perhaps stack? Note how parallel access to

arrays is easy to see, as well as occasional faster access and reverse-order access.