CS 147: Computer Systems Performance Analysis
Mistakes in Graphical Presentation
1 / 45
CS 147: Computer Systems Performance Analysis
Mistakes in Graphical Presentation
CS 147: Computer Systems Performance Analysis Mistakes in Graphical - - PowerPoint PPT Presentation
CS147 2015-06-15 CS 147: Computer Systems Performance Analysis Mistakes in Graphical Presentation CS 147: Computer Systems Performance Analysis Mistakes in Graphical Presentation 1 / 45 Overview CS147 Overview 2015-06-15 Common Mistakes
1 / 45
CS 147: Computer Systems Performance Analysis
Mistakes in Graphical Presentation
2 / 45
Overview
Common Mistakes in Graphics Excess Information Multiple Scales Symbols for Text Poor Scales Bad Line Usage Pictorial Games Non-Zero Origins Double Whammy No Confidence Intervals Height Scaling Histogram Problems Graphical Integrity Special-Purpose Charts A Few Examples
Common Mistakes in Graphics Excess Information
◮ 6 curves on line chart ◮ 10 bars on bar chart ◮ 8 slices on pie chart ◮ (But note that Tufte hates pie charts)
3 / 45
Excess Information
◮ Sneaky trick to meet length limits ◮ Rules of thumb: ◮ 6 curves on line chart ◮ 10 bars on bar chart ◮ 8 slices on pie chart ◮ (But note that Tufte hates pie charts) ◮ Extract essence; don’t cram things in
Common Mistakes in Graphics Excess Information
4 / 45
Way Too Much Information
1 R E P L 2 3 4 5 6 7 8 100 200 300 400 Time CP FIND FINDGREP GREP LS MAB RCP RM
Common Mistakes in Graphics Excess Information
5 / 45
The Right Amount of Information
1 2 3 4 5 6 7 8 Replicas 100 200 300 400 Time cp compile rm
Common Mistakes in Graphics Multiple Scales
◮ Implies equality of magnitude that doesn’t exist 6 / 45
Multiple Scales
◮ Another way to meet length limits ◮ Basically, two graphs overlaid on each other ◮ Confuses reader (which line goes with which scale?) ◮ Misstates relationships ◮ Implies equality of magnitude that doesn’t exist
Common Mistakes in Graphics Multiple Scales
7 / 45
Some Especially Bad Multiple Scales
1 2 3 4 5 10 15 20 25 30 35 40 45 Throughput Response Time 10 100 1000
Common Mistakes in Graphics Symbols for Text
◮ Remember that the graphs often draw the reader in
◮ Unless your conference is in Athens... 8 / 45
Using Symbols in Place of Text
◮ Graphics should be self-explanatory ◮ Remember that the graphs often draw the reader in ◮ So use explanatory text, not symbols ◮ This means no Greek letters! ◮ Unless your conference is in Athens...
Common Mistakes in Graphics Symbols for Text
9 / 45
It’s All Greek To Me...
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ρ 2 4 6 8 10 12 w
Common Mistakes in Graphics Symbols for Text
10 / 45
Explanation is Easy
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Offered Load 2 4 6 8 10 12 Waiting Time
Waiting Time as a Function
Common Mistakes in Graphics Poor Scales
◮ But don’t lie or cheat
◮ Brings out low-end detail 11 / 45
Poor Scales
◮ Fiddle with axis ranges (and logarithms) to get your message
across
◮ But don’t lie or cheat ◮ Sometimes trimming off high ends makes things clearer ◮ Brings out low-end detailCommon Mistakes in Graphics Poor Scales
12 / 45
A Poor Axis Range
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 2000 4000 6000 8000 10000 12000
Common Mistakes in Graphics Poor Scales
13 / 45
A Logarithmic Range
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 1 10 100 1000 10000
Common Mistakes in Graphics Poor Scales
14 / 45
A Truncated Range
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 10000
Common Mistakes in Graphics Bad Line Usage
◮ Exception: fitted non-linear curves 15 / 45
Using Lines Incorrectly
◮ Don’t connect points unless interpolation is meaningful ◮ Don’t smooth lines that are based on samples ◮ Exception: fitted non-linear curves
Common Mistakes in Graphics Bad Line Usage
16 / 45
Incorrect Line Usage
1 2 3 4 5 6 7 8 Replicas 100 200 300 400 Time cp compile rm
Pictorial Games Non-Zero Origins
◮ Subconsciously
◮ “Really, Your Honor, I included (0,0)” 17 / 45
Non-Zero Origins and Broken Scales
◮ People expect (0,0) origins ◮ Subconsciously ◮ So non-zero origins are great way to lie ◮ More common than not in popular press ◮ Also very common to cheat by omitting part of scale ◮ “Really, Your Honor, I included (0,0)”
Pictorial Games Non-Zero Origins
18 / 45
Non-Zero Origins
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 21 22 23 24 25 26 27 Us Them 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 40 60 80 100 Us Them
Pictorial Games Non-Zero Origins
19 / 45
The Three-Quarters Rule
Highest point should be 3/4 of scale or more 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 5 10 15 20 25 30
Us Them
Pictorial Games Double Whammy
◮ One is (almost) function of other
◮ And thus overstates impact
20 / 45
Double-Whammy Graphs
◮ Put two related measures on same graph ◮ One is (almost) function of other ◮ Hits reader twice with same information ◮ And thus overstates impact 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20 40 60 Sales ($) Units Shipped
Pictorial Games No Confidence Intervals
◮ So liars and fools leave them out 21 / 45
Omitting Confidence Intervals
◮ Statistical data is inherently fuzzy ◮ But means appear precise ◮ Giving confidence intervals can make it clear there’s no real
difference
◮ So liars and fools leave them outPictorial Games No Confidence Intervals
22 / 45
Graph Without Confidence Intervals
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 60 70
Pictorial Games No Confidence Intervals
23 / 45
Graph With Confidence Intervals
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 10 20 30 40 50 60 70
Pictorial Games Height Scaling
24 / 45
Scaling by Height Instead of Area
Clip art is popular with illustrators: Women in the Workforce 1960 1980
Pictorial Games Height Scaling
◮ So areas should be what’s proportional to data
◮ Not limited to area scaling ◮ But especially insidious there (quadratic effect) 25 / 45
The Trouble with Height Scaling
◮ Previous graph had heights of 2:1 ◮ But people perceive areas, not heights ◮ So areas should be what’s proportional to data ◮ Tufte defines lie factor: size of effect in graphic divided by size
Pictorial Games Height Scaling
26 / 45
Scaling by Area
Same graph with 2:1 area: Women in the Workforce 1960 1980
Pictorial Games Histogram Problems
27 / 45
Poor Histogram Cell Size
◮ Picking bucket size is always problem ◮ Prefer 5 or more observations per bucket ◮ Choice of bucket size can affect results: 5 10 15 20 25 30 2 4 6 8 10 12
Graphical Integrity
28 / 45
Principles of Graphics Integrity (Tufte)
◮ Proportional representation of numbers ◮ Clear, detailed, thorough labeling ◮ Show data variation, not design variation ◮ Use deflated money units ◮ Don’t have more dimensions than data has ◮ Don’t quote data out of context
Graphical Integrity
◮ Not too hard for most engineers! 29 / 45
Proportional Representation of Numbers
◮ Maintain lie factor of 1.0 ◮ Use areas, not heights, with clip art ◮ Avoiding “decorative” graphs will do wonders ◮ Not too hard for most engineers!
Graphical Integrity
30 / 45
Clear, Detailed, Thorough Labeling
◮ Goal is to defeat distortion and ambiguity ◮ Write explanations on graphic itself ◮ Label important events in the data
Graphical Integrity
31 / 45
Show Data Variation, Not Design Variation
◮ Use one design for entire graphic ◮ In papers, try to use one design for all graphs ◮ Again, artistic license is big culprit
Graphical Integrity
◮ Even in computer science ◮ E.g., price/performance over time ◮ Or expected future cost of a disk
◮ That’s what the WWW is for! 32 / 45
Use Deflated Money Units
◮ Often necessary to show money over time ◮ Even in computer science ◮ E.g., price/performance over time ◮ Or expected future cost of a disk ◮ Nominal dollars are meaningless ◮ Derate by some standard inflation measure ◮ That’s what the WWW is for!
Graphical Integrity
◮ But if you have to, use an area measure
33 / 45
Don’t Have More Dimensions Than Data Has
◮ This gets back to the Lie Factor ◮ 1-D data (e.g., money) should occupy one dimension on the
graph: not
◮ Clip art is prohibited by this rule ◮ But if you have to, use an area measure
$1.00 $2.00
Graphical Integrity
34 / 45
Don’t Quote Data Out of Context
Tufte’s example: 1954 1955 1956 1957 250 275 300 325 350 Traffic Deaths and Enforcement of Speed Limits
Before stricter enforcement After stricter enforcement
Graphical Integrity
35 / 45
The Same Data in Context
1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 50 100 150 200 250 300 350
Connecticut Traffic Deaths, 1951-1959
Special-Purpose Charts
36 / 45
Special-Purpose Charts
◮ Tukey’s box plot ◮ Histograms ◮ Scatter plots ◮ Gantt charts ◮ Kiviat graphs
Special-Purpose Charts
37 / 45
Tukey’s Box Plot
◮ Shows range, median, quartiles all in one: minimum quartile quartile median maximum ◮ Tufte can’t resist improvements:
Special-Purpose Charts
38 / 45
Histograms
Tufte improves everything about them: 1st 2nd 3rd 4th Quarter 20 40 60 80 100
Special-Purpose Charts
◮ Can show patterns otherwise invisible
39 / 45
Scatter Plots
◮ Useful in statistical analysis ◮ Also excellent for huge quantities of data ◮ Can show patterns otherwise invisible 5 10 5 10 15 20
Special-Purpose Charts
◮ But it can be a pain with automated tools
40 / 45
Better Scatter Plots
◮ Again, Tufte improves the standard ◮ But it can be a pain with automated tools ◮ Can use modified Tukey box plot for axes: 20 40 60 80 10 20 30 40
Special-Purpose Charts
◮ Each level after first follows FTTF pattern ◮ (Possibly repeated)
41 / 45
Gantt Charts
◮ Shows relative duration of Boolean conditions ◮ Arranged to make lines continuous ◮ Each level after first follows FTTF pattern ◮ (Possibly repeated) 20 40 60 80 100% Network I/O CPU
Special-Purpose Charts
42 / 45
Kiviat Graphs
◮ Also called “star charts” or “radar plots” ◮ Useful for looking at balance between HB and LB metrics
A Few Examples
43 / 45
A Very Bad Graph
A Few Examples
44 / 45
A Good Graph: Sunspots
A Few Examples
45 / 45
A Superb Graph: DEC Traces