What can go wrong with statistics: Some typical errors & How to - - PowerPoint PPT Presentation

what can go wrong with statistics some typical errors how
SMART_READER_LITE
LIVE PREVIEW

What can go wrong with statistics: Some typical errors & How to - - PowerPoint PPT Presentation

Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU Mnchen What can go wrong with statistics: Some typical errors & How to lie with statistics Content adopted partially from: Lutz Prechelt


slide-1
SLIDE 1

Network Security, WS 2008/09, Chapter 9 1 IN2045 – Dis crete Event Simulation, WS 2011/2012 1

Chair for Network Architectures and Services —Prof. Carle Department of Computer Science TU München

What can go wrong with statistics: Some typical errors & How to lie with statistics

Content adopted partially from: Lutz Prechelt Daniel Huff Jon Hasenbank Gerd Bosbach / Jens Jürgen Korff

slide-2
SLIDE 2

Network Security, WS 2008/09, Chapter 9 2 IN2045 – Dis crete Event Simulation, WS 2011/2012 2

Some Starters – Russian Election 2012

slide-3
SLIDE 3

Network Security, WS 2008/09, Chapter 9 3 IN2045 – Dis crete Event Simulation, WS 2011/2012 3

Some Starters – Car Production in UK

T e x t m a s t e r f

  • r

m a t e d u r c h K l i c k e n b e a r b e i

  • Zweite Ebene
  • D

r i t t e E b e n e – V i e r t e E b e n e

» Fünfte Ebene

slide-4
SLIDE 4

Network Security, WS 2008/09, Chapter 9 4 IN2045 – Dis crete Event Simulation, WS 2011/2012 4

Motivation

“There are three kinds of lies: Lies, Damned Lies, and Statistics.” – attributed to Benjamin Disraeli

 Statistics are commonly used to make a point or back-up one’s

position

  • 82.7% of all statistics are made up on the spot.

 Three sources of errors:

  • If done in manipulative way, statistics can be deceiving
  • If not done carefully, statistics can be deceiving
  • Inadvertent methodological errors and / or wrong

assumptions also will fool the person who is doing the statistics!

  • If not read carefully, statistics can be deceiving
slide-5
SLIDE 5

Network Security, WS 2008/09, Chapter 9 5 IN2045 – Dis crete Event Simulation, WS 2011/2012 5

Purpose of this talk

 Avoid common inadvertent errors

  • “Les s ons for author”

 Be aware of the s ubtle tricks that others

may play on you

  • (and that you s hould never play on others !)
  • “Les s ons for reader”
slide-6
SLIDE 6

Network Security, WS 2008/09, Chapter 9 6 IN2045 – Dis crete Event Simulation, WS 2011/2012 6

 Large parts of this slide set is based on ideas

from Darrell Huff: How to Lie With S tatistics,

(Victor Gollancz 1954, Pelican Books 1973, Penguin Books 1991)

  • but the slides use different examples
  • Most slides made by Lutz Prechelt
  • The book is short (120 p.), entertaining, and

insightful

  • Many different editions available
  • Other, similar books

exist as well Source #1

slide-7
SLIDE 7

Network Security, WS 2008/09, Chapter 9 7 IN2045 – Dis crete Event Simulation, WS 2011/2012 7

Source #2

 Other s ource of ideas :

Gerd Bos bach, Jens Jürgen Korff: Lügen mit Zahlen

(Heyne-Verlag, 2. Auflage, 2011)

  • The book is very readable and entertaining
  • You may notice s trong political opinions –

s ometimes you might as k yours elves if the book does not its elf us e the power of numbers and graphs to manipulate the reader…

slide-8
SLIDE 8

Network Security, WS 2008/09, Chapter 9 8 IN2045 – Dis crete Event Simulation, WS 2011/2012 8

Example: Human Growth Hormone Spam (HGH)

slide-9
SLIDE 9

Network Security, WS 2008/09, Chapter 9 9 IN2045 – Dis crete Event Simulation, WS 2011/2012 9

Remark

 We use this real spam email as an arbitrary example  and will make unwarranted assumptions about what is behind it

  • for illustrative purposes
  • I do not claim that HGH treatment is useful, useless, or harmful

Note:

 HGH is on the IOC doping list

  • http://www.dshs-koeln.de/biochemie/rubriken/01_doping/06.html
  • "Für die therapeutische Anwendung von HGH kommen derzeit nur zwei

wesentliche Krankheitsbilder in Frage: Zwergwuchs bei Kindern und HGH- Mangel beim Erwachsenen"

  • "Die Wirksamkeit von HGH bei S portlern muss allerdings bisher stark in

Frage gestellt werden, da bisher keine wissenschaftliche S tudie zeigen konnte, dass eine zusätzliche HGH-Applikation bei Personen, die eine normale HGH-Produktion aufweisen, zu Leistungssteigerungen führen kann."

slide-10
SLIDE 10

Network Security, WS 2008/09, Chapter 9 10 IN2045 – Dis crete Event Simulation, WS 2011/2012 10

Problem 1: What do they mean?

 "Body fat loss: up to 82%"

  • OK, can be measured

 "Wrinkle reduction: up to 61%"

  • Maybe they count the wrinkles and measure their depth?

 "Energy level: up to 84%"

  • What is this?
  • Also note they use language loosely:
  • Loss in percent: OK; reduction in percent: OK
  • Level in percent??? (should be 'increase')
slide-11
SLIDE 11

Network Security, WS 2008/09, Chapter 9 11 IN2045 – Dis crete Event Simulation, WS 2011/2012 11

Lesson for readers: What did they actually measure?

 Always question the definition of the measures for which

somebody gives you statistics

  • Surprisingly often, there is no stringent definition at all
  • Or multiple different definitions are used
  • and incomparable data get mixed
  • Or the definition has dubious value
  • For example, "Energy level" may be a subjective

estimate of patients who knew they were treated with a "wonder drug"

slide-12
SLIDE 12

Network Security, WS 2008/09, Chapter 9 12 IN2045 – Dis crete Event Simulation, WS 2011/2012 12

Lesson for authors: Be clear about what you measure

 Before you s tart:

  • What effect do you want to analyze?
  • What could be good metrics to meas ure it?
  • Try out different metrics and compare them

 When writing things up:

  • Define your metrics clearly and unders tandable.
  • Bad example: “We analyzed the delays in our s imulated

network”.

  • One-way or RTT?
  • Total delays ? But what if wire length is cons tant?
  • Good example: “We analyzed the one-way delays in our

s imulated network. Since propagation delays are cons tant in a wired network, we analyzed only the queuing delays and trans mis s ion delays .”

slide-13
SLIDE 13

Network Security, WS 2008/09, Chapter 9 13 IN2045 – Dis crete Event Simulation, WS 2011/2012 13

 Wrinkle reduction: up to 61%  So that was the best value. What about the rest?  Maybe the distribution was like this:

reduction

  • o
  • 10

20 30 40 50 60

M

Note: This data is pure fantasy!

Problem 2: A maximum does not say much

slide-14
SLIDE 14

Network Security, WS 2008/09, Chapter 9 14 IN2045 – Dis crete Event Simulation, WS 2011/2012 14

Lesson for readers: Dare ask for unbiased measures

 Always ask for neutral, informative measures

  • in particular when talking to a party with vested interest
  • Extremes are rarely useful to show that someting is

generally large (or small)

  • Averages are better
  • But even averages can be very misleading
  • see the following example later in this presentation
  • If the shape of the distribution is unknown, we need

summary information about variability at the very least

  • e.g. the data from the plot in the previous slide has

arithmetic mean 10 and standard deviation 8

  • Note: In different situations,

rather different kinds of information might be required for judging something

slide-15
SLIDE 15

Network Security, WS 2008/09, Chapter 9 15 IN2045 – Dis crete Event Simulation, WS 2011/2012 15

Lesson for authors: Is it really significant?

 Are there many outliers ?  Do not us e minimum or maximum values for comparis on of, e.g.,

“before – after”

  • Compare the means
  • Think about what kind of mean to us e:
  • Arithmetic mean?
  • Geometric mean?
  • Better: compare the medians

 Or even better: Us e s tatis tical tes ts (e.g., Student’s t tes t) to

prove that the change (before – after) is s tatis tically s ignificant

slide-16
SLIDE 16

Network Security, WS 2008/09, Chapter 9 16 IN2045 – Dis crete Event Simulation, WS 2011/2012 16

Problem 3: Underlying population

 Wrinkle reduction: up to 61%  Maybe they measured a very special set of people?

reduction

M

  • o
  • M
  • o
  • o
  • healthy

heartAttack

  • 20

20 40 60 Note: This data is pure fantasy!

slide-17
SLIDE 17

Network Security, WS 2008/09, Chapter 9 17 IN2045 – Dis crete Event Simulation, WS 2011/2012 17

Lesson: Insist on unbiased samples

 How and where the data was collected can have a

tremendous impact on the results

 It is important to understand whether there is a certain

(possibly intended) tendency in this

 A fair statistic talks about possible bias it contains  If it does not, ask.

Notes:

 A biased sample may be the best one can get  Sometimes we can suspect that there is a bias,

but cannot be sure, and we do not know the exact type of the bias

slide-18
SLIDE 18

Network Security, WS 2008/09, Chapter 9 18 IN2045 – Dis crete Event Simulation, WS 2011/2012 18

Lesson 4: ‘Cum hoc ergo propter hoc’ is wrong!

 Trans lation: “With this , therefore becaus e of this ”  Meaning: Correlation does not mean caus ation  Correlation may s ugges t caus ation (effect A caus es effect B), but

there als o can be other reas ons for a correlation between A and B

 Nitpicking: ‘Pos t hoc ergo propter hoc’ is almos t the s ame

thing:

  • After this , therefore becaus e of this
  • Implies a temporal relation between A and B,
  • whereas ‘cum hoc…’ only implies s ome correlation
slide-19
SLIDE 19

Network Security, WS 2008/09, Chapter 9 19 IN2045 – Dis crete Event Simulation, WS 2011/2012 19

Correlation does not mean causation (1)

 “If A is correlated with B, then A caus es B”

  • Perhaps neither of thes e things has produced the other, but

both are a product of s ome third factor C

  • It may be the other way round: B caus es A
  • Correlation can actually be of any of s everal types and can be

limited to a range

  • The correlation may be pure coincidence,

e.g. #pirates vs . global temperature

  • Given a s mall s ample, you are likely to find s ome s ubs tantial

correlation between any pair of characters or events

slide-20
SLIDE 20

Network Security, WS 2008/09, Chapter 9 20 IN2045 – Dis crete Event Simulation, WS 2011/2012 20

Correlation does not mean causation (2)

 Example 1: “Queueing delays increas ed; therefore

throughput for individual TCP connections decreas ed”

  • Could be true
  • Could be due to an increas ed # of total TCP connections
  • Could be actually unrelated

 Example 2: “Chance for recovery decreas es with an

increas ing period of cancer treatment by radiation; this s hows that longer expos ure to radiation is dangerous ”. Well, maybe, but…

  • …us ually, longer therapies are required for more

s evere/bigger types of cancer – and you are les s likely to s urveve thes e

slide-21
SLIDE 21

Network Security, WS 2008/09, Chapter 9 21 IN2045 – Dis crete Event Simulation, WS 2011/2012 21

Correlation does not mean causation (3)

 Example 3:

“Birth rates have been decreas ing for decades . So has the number of s torks . This proves that babies are delivered by the s tork!”

 Example 4:

“The number of TV s tations has increas ed, as well as the amount

  • f money that people s pend on travelling.

This proves the efficiency of travel ads on TV.”

slide-22
SLIDE 22

Network Security, WS 2008/09, Chapter 9 22 IN2045 – Dis crete Event Simulation, WS 2011/2012 22

Correlation does not mean causation: Lessons

 Often, there is a hidden background variable (e.g., s ize of the

tumor)

 Time is a good candidate for a background variable (e.g., s torks

vs . babies , TV s tations vs . travel expens es )

slide-23
SLIDE 23

Network Security, WS 2008/09, Chapter 9 23 IN2045 – Dis crete Event Simulation, WS 2011/2012 23

Fishing for correlations

 Correlation can be a purely random

effect!

 Statis ticians as s ume that in ~5% of all

cas es , two arbitrarily chos en variables appear to be correlated

 Example:

  • Determine 20 parameters (=rnd

variables ) in s ome s imulation experiment

  • Can create ½ · 20 · 19 = 190 pairs of

random variables

  • 5% of 190 = about 9 – 10 “correlations ”

that are in fact purely random!

Textmas terformate durch Klicken

Zweite Ebene

Dritte Ebene

Vierte Ebene Fünfte Ebene http://www.xkcd.com/882/

slide-24
SLIDE 24

Network Security, WS 2008/09, Chapter 9 24 IN2045 – Dis crete Event Simulation, WS 2011/2012 24

Problem 5: Is HGH even part of the cause?

 Wrinkle reduction: up to 61%  Maybe that could happen even without HGH?

reduction

M

  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • o
  • M
  • o
  • M
  • o
  • o
  • h.A.,noHGH

healthy heartAttack

  • 20

20 40 60 Note: This data is pure fantasy!

slide-25
SLIDE 25

Network Security, WS 2008/09, Chapter 9 25 IN2045 – Dis crete Event Simulation, WS 2011/2012 25

 Sometimes the data is not just biased,

it contains hardly anything other than bias

 If you see a presumably (=author) or assertedly (=reader)

causal relationship ("A causes B"), ask yourself:

  • Does it really make sense?
  • Would A really have this much influence on B?
  • Couldn‘t it be just the other way round?
  • What other influences besides A may be important?
  • What is the relative weight of A compared to these?

Lesson: Question causality

slide-26
SLIDE 26

Network Security, WS 2008/09, Chapter 9 26 IN2045 – Dis crete Event Simulation, WS 2011/2012 26

Percentages

 “Wohl- und übelwollende Benutzer gleichermaßen

s chätzen es [das Prozent] wegen s einer Aura von mathematis cher Neutralität und Sachlichkeit. ‘Prozent’ […] riecht man Kaufmanns kontor und doppelter Buchführung; die Serios ität quillt nur s o aus den Knopflöchern. Prozente s tehen für Glaubwürdigkeit und Autorität, Prozente s trahlen Gewis s heit aus , Prozente zeigen, das s man rechnen kann, s ie verleihen Autorität und Überlegenheit, ums o mehr, und wahrs cheinlich noch dadurch vers tärkt, als s o mancher Adres s at einer modernen Prozentpredigt überhaupt nicht weiß, was eigentlich Prozente s ind.” – Walter Krämer

slide-27
SLIDE 27

Network Security, WS 2008/09, Chapter 9 27 IN2045 – Dis crete Event Simulation, WS 2011/2012 27

Percentages and absolute numbers (1)

You’re in hos pital, and the doctor tells you…:

 “Medication A has a 10% higher chance to cure your dis eas e, but

the thrombos is ris k is increas ed by 100% in comparis on to medication B.”

  • Which one would you pick?

 “With medication B, about 1 in 7,000 patients s uffers from

thrombos is . With medication A, about 2 in 7,000 patients s uffers from thrombos is , but it has a 10% higher chance to cure your dis eas e.”

  • Which one would you pick?
  • Mathematically, the two des criptions are equivalent!

 Your decis ion probably depends on the gravenes s of your

dis eas e (e.g., headache vs . liver cancer)

 Les s on: Percentages can be mis leading!

slide-28
SLIDE 28

Network Security, WS 2008/09, Chapter 9 28 IN2045 – Dis crete Event Simulation, WS 2011/2012 28

Example - Percentages and absolute numbers

T e x t m a s t e r f

  • r

m a t e d u r c h K l i c k e n b e a r b e i t

  • Zweite Ebene
  • D

r i t t e E b e n e – V i e r t e E b e n e

» Fünfte Ebene

slide-29
SLIDE 29

Network Security, WS 2008/09, Chapter 9 29 IN2045 – Dis crete Event Simulation, WS 2011/2012 29

Example - Percentages and absolute numbers

T e x t m a s t e r f

  • r

m a t e d u r c h K l i c k e n b e a r b e i t

  • Zweite Ebene
  • D

r i t t e E b e n e – V i e r t e E b e n e

» Fünfte Ebene

slide-30
SLIDE 30

Network Security, WS 2008/09, Chapter 9 30 IN2045 – Dis crete Event Simulation, WS 2011/2012 30

Percentages and absolute numbers (2)

 “In the pas t year, we have employed an additional 1,000 teachers

in North Rhine Wes tphalia. This s hows our great commitment and financial efforts to improve our s chool s ys tem.” – Sounds good, does n’t it?

 How many s chools are there in NRW?

  • About 7,000
  • Only one in s even s chools (about 14%) gets an additional

teacher!

 How many teachers are there in NRW in total?

  • About 130,000
  • Res ult: Les s than 1% increas e…

 Les s on: Abs olute numbers can be mis leading, too!

slide-31
SLIDE 31

Network Security, WS 2008/09, Chapter 9 31 IN2045 – Dis crete Event Simulation, WS 2011/2012 31

Percentages of what? – Two examples

 In 2008, Pres ident Bus h as s erted that the USA would reduce their

emis s ions of greenhous e gas es by the year 2050 by at leas t 50%.

 50% – but as compared to what?

  • In relation to the year 1990? – International s tandard
  • In relation to the year with the highes t emis s ions ?
  • …which might yet be to come!?

 The s hare of nuclear energy in Germany is about 25%

  • True for electrical energy

 The s hare of nuclear energy in Germany is about 13%

  • True for total primary energy cons umption
slide-32
SLIDE 32

Network Security, WS 2008/09, Chapter 9 32 IN2045 – Dis crete Event Simulation, WS 2011/2012 32

Percentages (4)

 “In the pas t year, we could boos t our company’s rate of return by

400%!”

  • Wow, 400%. Impres s ive!

 “That is becaus e we increas ed our rate of return from 0.1% to

0.5%.”

  • Jus t 0.5%. How inefficient!

 Les s ons

  • Always as k (or write out): “percentage of what?”
  • Always as k for (or write out)
  • The percentages
  • And the abs olute numbers
  • Percentages of percentages often don’t make s ens e and can

be an indication of foul play (cf. next s lide)

slide-33
SLIDE 33

Network Security, WS 2008/09, Chapter 9 33 IN2045 – Dis crete Event Simulation, WS 2011/2012 33

Prozentzahlen und Prozentpunkte

 Wahl 2010:

  • Partei A: 40%
  • Partei B: 10%

 Wahl 2014:

  • Partei A: 30%
  • Partei B: 20%

 „Partei A hat 10% verloren, Partei B hat 10% gewonnen“

  • Fals ch: Partei A hat
  • 10 Prozentpunkte verloren
  • 25% verloren (denn 40/30 = 0,75)

– …aber auch nicht der abs oluten Stimmen, da vermutlich unters chiedliche Wahlbeteiligung, unters chiedliche Anzahl Wahlberechtigte, etc. etc.

 Lektion: Es gibt einen wichtigen Unters chied zwis chen Prozent

und Prozentpunkten!

slide-34
SLIDE 34

Network Security, WS 2008/09, Chapter 9 34 IN2045 – Dis crete Event Simulation, WS 2011/2012 34

Example 2: Tungu and Bulugu

 We look at the yearly per-capita

income in two small hypothetic island states: Tungu and Bulugu

 Statement:

"The average yearly income in T ungu is 94.3% higher than in Bulugu."

slide-35
SLIDE 35

Network Security, WS 2008/09, Chapter 9 35 IN2045 – Dis crete Event Simulation, WS 2011/2012 35

Problem 1: Misleading averages

 The island states are rather small:

81 people in T ungu and 80 in Bulugu

 And the income distribution is not as even in T

ungu:

income

M

  • o
  • o
  • o
  • M
  • o
  • o
  • o
  • Bulugu

Tungu 1000 2000 3000 4000 5000 Note: This data is pure fantasy!

slide-36
SLIDE 36

Network Security, WS 2008/09, Chapter 9 36 IN2045 – Dis crete Event Simulation, WS 2011/2012 36

income

M

  • o
  • o
  • o o
  • o
  • o
  • M
  • o
  • o
  • o
  • o
  • oo o
  • o
  • o o
  • o
  • Bulugu

Tungu 10^3.0 10^3.5 10^4.0 10^4.5 10^5.0

Mis leading averages and outliers

 The only reason is Dr. Waldner, owner of a

software company, who has been enjoying his retirement in T ungu for a year

slide-37
SLIDE 37

Network Security, WS 2008/09, Chapter 9 37 IN2045 – Dis crete Event Simulation, WS 2011/2012 37

Les s on: Ques tion appropriatenes s

 A certain statistic (very often the arithmetic average) may be

inappropriate for characterizing a sample

 If there is any doubt, ask that additional information be

provided

  • such as standard deviation
  • or some quantiles, e.g.: 0, 0.25, 0.5, 0.75, 1

Note: 0.25 quantile is equivalent to 25-percentile etc.

M

  • o
  • o
  • o
  • o
  • oo o
  • o
  • o o
  • o
  • Tungu
slide-38
SLIDE 38

Network Security, WS 2008/09, Chapter 9 38 IN2045 – Dis crete Event Simulation, WS 2011/2012 38

Logarithmic axes

 Waldner earns 160.000 per year.

How much more that is than the other T unguans have, is impossible to see on the logarithmic axis we just used

income

M

  • o
  • M
  • o
  • Bulugu

Tungu 50000 100000 150000

Waldner

slide-39
SLIDE 39

Network Security, WS 2008/09, Chapter 9 39 IN2045 – Dis crete Event Simulation, WS 2011/2012 39

Les s on: Beware of inappropriate vis ualizations (#1)

 Lesson for reader: Always look at the axes. Are they linear or

logarithmic?

 Lesson for author:

  • Logarithmic axes are very useful for reading hugely

different values from a graph with some precision

  • But they totally defeat the imagination!
  • If you decide to use logarithmic axes, always state this fact

in your text!

 There are many more kinds of inappropriate visualizations

  • see later in this presentation
slide-40
SLIDE 40

Network Security, WS 2008/09, Chapter 9 40 IN2045 – Dis crete Event Simulation, WS 2011/2012 40

Problem 4: Mis leading precis ion

 "The average yearly income in T

ungu is 94.3% higher than in Bulugu"

 Assume that tomorrow Mrs. Alulu Nirudu from T

ungu gives birth to her twins

 There are now 83 rather than 81 people on T

ungu

 The average income drops from 3922 to 3827  The difference to Bulugu drops from 94.3% to 89.7%

slide-41
SLIDE 41

Network Security, WS 2008/09, Chapter 9 41 IN2045 – Dis crete Event Simulation, WS 2011/2012 41

Les s on for reader: Do not be eas ily impres s ed

 The usual reason for presenting very precise numbers is the

wish to impress people

Round numbers are always false“

  • But round numbers are much easier to remember and

compare

 Clearly tell people you will not be impressed by precision

  • in particular if the precision is purely imaginary
slide-42
SLIDE 42

Network Security, WS 2008/09, Chapter 9 42 IN2045 – Dis crete Event Simulation, WS 2011/2012 42

Les s on for author: Think about precis ion

 Do you really have enough data that would make s ens e to give

  • ut precis e numbers ?

 Compromis e: Give exact number in tables /figures , but round

them in text.

 Do not exaggerate: If you find your s ys tems yields a 52,91%

increas e in throughput

  • Don’t s ay: “Our s ys tem increas es throughput by more than

50%”

  • Do s ay: “Our experiments s ugges t that our s ys tem can

achieve throughput increas es of around 50%”

slide-43
SLIDE 43

Network Security, WS 2008/09, Chapter 9 43 IN2045 – Dis crete Event Simulation, WS 2011/2012 43

Example 3: Phantas mo Corporation s tock price

 We look at the

recent development of the price of shares for Phantasmo Corporation

 "Phantasmo shows a

remarkably strong and consistent value growth and continues to be a top recommendation"

100 200 300 400 180 182 184 186 188 190 192 day stock price

(Phantasmo and this data are purely imaginary)

slide-44
SLIDE 44

Network Security, WS 2008/09, Chapter 9 44 IN2045 – Dis crete Event Simulation, WS 2011/2012 44

Problem: Looks can be mis leading

  • The following two plots show

exactly the same data!

  • and the same as the

plot on the previous slide!

100 200 300 400 180 182 184 186 188 190 192 day stock price

100 200 300 400 180 182 184 186 188 190 192 day stock p

slide-45
SLIDE 45

Network Security, WS 2008/09, Chapter 9 45 IN2045 – Dis crete Event Simulation, WS 2011/2012 45

Problem: Scales can be mis leading

 What really happened

is shown here: We intuitively interpret a trend plot on a ratio scale

100 200 300 400 180 182 184 186 188 190 192 day stock price

100 200 300 400 50 100 150 200 day stock price

slide-46
SLIDE 46

Network Security, WS 2008/09, Chapter 9 46 IN2045 – Dis crete Event Simulation, WS 2011/2012 46

So look carefully!

found on focus.msn.de on 2004-03-04:

slide-47
SLIDE 47

Network Security, WS 2008/09, Chapter 9 47 IN2045 – Dis crete Event Simulation, WS 2011/2012 47

Problem: Scales can be mis s ing

 The most insolent

persuaders may even leave the scale out altogether!

100 200 300 400 day

  • Never forget to label

your axes!

  • Never forget to put a

scale on your axes!

slide-48
SLIDE 48

Network Security, WS 2008/09, Chapter 9 48 IN2045 – Dis crete Event Simulation, WS 2011/2012 48

Problem: Scales can be abus ed

 Observe the

global impression first

2005

slide-49
SLIDE 49

Network Security, WS 2008/09, Chapter 9 49 IN2045 – Dis crete Event Simulation, WS 2011/2012 49

Problem: People may invent unexpected things

 Quelle: Werbeanzeige der Donau-

Universität Krems

  • DIE ZEIT, 07.10.2004
  • What‘s wrong?

2 Jahre 4 Jahre

slide-50
SLIDE 50

Network Security, WS 2008/09, Chapter 9 50 IN2045 – Dis crete Event Simulation, WS 2011/2012 50

Axis s cales : Les s ons for author

 Warning: Mos t plotting s oftware automatically

s elects boundaries for you (e.g., GNU R)

  Always as k yours elves :

Do thes e automatically chos en axis limits make s ens e?

 When plotting probabilities , pleas e cons ider

manually s etting the axis to the interval [0 … 1]

 When us ing a logs cale, pleas e

  • … explicitly write about this either in the text or in

the caption

  • … explicitly tell this to your audience when giving a

talk

slide-51
SLIDE 51

Network Security, WS 2008/09, Chapter 9 51 IN2045 – Dis crete Event Simulation, WS 2011/2012 51

Pie charts (1/3)

Note: This data is pure fantasy!

slide-52
SLIDE 52

Network Security, WS 2008/09, Chapter 9 52 IN2045 – Dis crete Event Simulation, WS 2011/2012 52

Pie charts (2/3)

Note: This data is pure fantasy!

slide-53
SLIDE 53

Network Security, WS 2008/09, Chapter 9 53 IN2045 – Dis crete Event Simulation, WS 2011/2012 53

Pie charts (3/3)

 What percentages do the two graphs s how?

Gues s !

 Ans wer:

  • Both s how the s ame data: A 94% : 6% ratio!
  • The difference only lies in the angle of the pies .
slide-54
SLIDE 54

Network Security, WS 2008/09, Chapter 9 54 IN2045 – Dis crete Event Simulation, WS 2011/2012 54

Les s on: Dis trus t pie charts !

 Pie charts s hould never be us ed

  • Perception dependent on the angle
  • Even wors e with 3D pie charts :

Parts at the front are artificially increas ed due to the pie’s 3D height; they thus s eem to be bigger

  • A very s ubtle way to vis ually tune your data
  • Unfortunately, s till very common

 Dis trus t pie charts that do not give numbers as well

  • Think about the numbers , compare them
  • Think about the pres entation: are they trying to beautify the

impres s ion?

slide-55
SLIDE 55

Network Security, WS 2008/09, Chapter 9 55 IN2045 – Dis crete Event Simulation, WS 2011/2012 55

Bubble charts

as terformate durch Klicken bearbeiten

eite Ebene

Dritte Ebene – Vierte Ebene

» Fünfte Ebene Textmas terformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene

Which diagram s hows the values 2, 3, 4? Both do! Left one: Radius is proportional to meas urements Exaggerates differences : 4 looks much larger than 2 Right one: Area is proportional to meas urements Underes timates differences : 4 looks only s lightly larger than 2

Note: This data is pure fantasy!

slide-56
SLIDE 56

Network Security, WS 2008/09, Chapter 9 56 IN2045 – Dis crete Event Simulation, WS 2011/2012 56

Pictograms

Note: This data is pure fantasy!

http://sciencev1.orf.at/static2.orf.at/science/storyimg/storypart_155543.jp g

slide-57
SLIDE 57

Network Security, WS 2008/09, Chapter 9 57 IN2045 – Dis crete Event Simulation, WS 2011/2012 57

Pictogram – Comparis on Apartment s ize

slide-58
SLIDE 58

Network Security, WS 2008/09, Chapter 9 58 IN2045 – Dis crete Event Simulation, WS 2011/2012 58

Pictogram – Comparis on Apartment s ize

slide-59
SLIDE 59

Network Security, WS 2008/09, Chapter 9 59 IN2045 – Dis crete Event Simulation, WS 2011/2012 59

Les s on: Bubble charts and pictograms

 This les s on is more or les s s imilar to pie charts :  Bubble charts us ually s hould not be us ed

  • Radius proportionality exaggerates differences ,

but area proportionality often lets underes timate differences

  • A very s ubtle way to vis ually tune your data
  • Of cours e, a bubble chart + pie chart may convey more

information, but pleas e try to vis ualize it differently…

  • If you really, really want to us e a bubble chart, then us e the

area proportionality variant, and clearly explain this in your text, and als o put the actual numbers right next to the bubbles

 Dis trus t bubble charts that do not give the numbers as well

  • Think about the numbers , compare them
  • Think about the pres entation: Did they really need to us e

bubble charts ? Or are they trying to beautify the impres s ion? Sometimes size really matters.

slide-60
SLIDE 60

Network Security, WS 2008/09, Chapter 9 60 IN2045 – Dis crete Event Simulation, WS 2011/2012 60

Summary les s on for the reader: Seeing is believing

 …but often, it shouldn't be!  Always consider what it really is that you are seeing  Do not believe anything purely intuitively  Do not believe anything that does not have a well-defined

meaning

 Be sceptic about pie and bubble charts

  • … in particular if they do not even print the actual numbers but
  • nly rely on the pure graphical presentation
  • … in particular if they use 3D pies
slide-61
SLIDE 61

Network Security, WS 2008/09, Chapter 9 61 IN2045 – Dis crete Event Simulation, WS 2011/2012 61

Example 4: blend-a-med Night Effects

 What do they not say? Think about it…  What exactly does "sichtbar" mean?

What exactly does „hell“ or „heller“ mean?

 What was the scope, what were the results of the clinical

trials?

 What other effects does Night Effects have?

slide-62
SLIDE 62

Network Security, WS 2008/09, Chapter 9 62 IN2045 – Dis crete Event Simulation, WS 2011/2012 62

Example 5: The better tool?

 We consider the time it takes programmers to write a certain

program using different IDEs:

  • Aguilder or
  • Egglips

 Statement (by the maker of Aguilder):

"In an experiment with 12 persons, the ones using Egglips required on average 24.6% more time to finish the same task than those using Aguilder. Both groups consisted of equally capable people and received the same amount and quality of training."

 Assume Egglips and Aguilder are in fact just as good.

What may have gone wrong here?

slide-63
SLIDE 63

Network Security, WS 2008/09, Chapter 9 63 IN2045 – Dis crete Event Simulation, WS 2011/2012 63 time

M

  • o
  • M
  • Aguilder

Egglips 100 200 300 1

M

  • M
  • 2

M

  • M
  • Aguilder

Egglips 3

M

  • M
  • 4

100 200 300

Problem: Has anybody ignored any data?

 Solution: Just

repeat the experiment a few times and pick the

  • utcome you

like best

Note: This data is pure fantasy!

slide-64
SLIDE 64

Network Security, WS 2008/09, Chapter 9 64 IN2045 – Dis crete Event Simulation, WS 2011/2012 64

Les s on for the reader: Demand complete information

 If somebody presents conclusions

  • based on only a subset of the available data
  • and has selected which subset to use
  • then everything is possible

 There is no direct way to detect such repetitions,

BUT for any one single execution . . .

slide-65
SLIDE 65

Network Security, WS 2008/09, Chapter 9 65 IN2045 – Dis crete Event Simulation, WS 2011/2012 65

Digres s ion: Hypothes is tes ting

 …a significance test (or confidence intervals) can determine

how likely it was to obtain this result if the conclusion is wrong:

  • Null hypothesis: Assume both tools produce equal work

times overall

  • Then how often will we get a difference this large when we

use samples of size 6 persons?

  • If the probability is small,

the result is plausibly real

  • If the probability is large,

the result is plausibly incidental

slide-66
SLIDE 66

Network Security, WS 2008/09, Chapter 9 66 IN2045 – Dis crete Event Simulation, WS 2011/2012 66

Digres s ion: Hypothes is tes ting

 …a significance test (or confidence intervals) can determine

how likely it was to obtain this result if the conclusion is wrong:

  • Null hypothesis: Assume both tools produce equal work

times overall

  • Then how often will we get a difference this large when we

use samples of size 6 persons?

  • If the probability is small,

the result is plausibly real

  • If the probability is large,

the result is plausibly incidental

slide-67
SLIDE 67

Network Security, WS 2008/09, Chapter 9 67 IN2045 – Dis crete Event Simulation, WS 2011/2012 67

Statis tical s ignificance tes t: Example

 Our data:

  • Aguilder:

175, 186, 137, 117, 92.8, 93.7 (mean 133)

  • Egglips:

171, 155, 157, 181, 175, 160 (mean 166)

 Null hypothesis:

  • We assume the distributions underlying these data are

both normal distributions with the same variance and

  • the means of the actual distributions are in fact equal

 Then we can compute the probability for seeing this

difference of 33 from two samples of size 6

 The procedure for doing this is called the t-test

(recall the confidence intervals? – It‘s a very similar calculation)

slide-68
SLIDE 68

Network Security, WS 2008/09, Chapter 9 68 IN2045 – Dis crete Event Simulation, WS 2011/2012 68

So? (Les s ons for the author)

 So in our case we probably would believe the result and not find out

that the experimenters had in fact cheated

  • (And indeed they were lucky to get the result they got)

Note:

 There are many different kinds of hypothesis tests and various things

can be done wrong when using them

  • In particular, watch out what the tes t as s umes
  • and what the p-value means , namely:
  • The probability of seeing this data if the null hypothesis is true
  • Note: The p-value is not the probability that the null hypothes is is

true!

  • But unless the distribution of your samples is very strange or very

different, using the t-test is usually OK.

  • Note: There are quite a number of different tests called “t test”.
  • They have subtle yet important differences…
slide-69
SLIDE 69

Network Security, WS 2008/09, Chapter 9 69 IN2045 – Dis crete Event Simulation, WS 2011/2012 69

Example: Error bars

 “Although a high variability in our meas urements res ults in rather

large error bars , our s imulation res ults s how a clear increas e in [whatever].”

 What’s wrong here?

slide-70
SLIDE 70

Network Security, WS 2008/09, Chapter 9 70 IN2045 – Dis crete Event Simulation, WS 2011/2012 70

Les s on: Error bars

 What are the error bars ? How are they defined?

  • Minimum and maximum values ?
  • Confidence intervals ?
  • If s o, at which level? 95%? 99%?
  • Mean ± two s tandard deviations ?
  • Mean ± two s tandard errors ?
  • Firs t and third quartile? 10% and 90% quantile?
  • Chebys hov* or Chernov bounds ?

*als o: Ts chebys cheff, Ts chebys chow, Chebys hev, … Same with Ts chernoff, …

 Reader: Dis trus t error bars that are not explained  Author:

  • Clearly s tate what kind of error bars you’re us ing
  • Us ually, the bes t choice is to us e confidence intervals , but

s tandard deviation and s tandard error als o very common

slide-71
SLIDE 71

Network Security, WS 2008/09, Chapter 9 71 IN2045 – Dis crete Event Simulation, WS 2011/2012 71

Les s on for the author: Common errors for t tes ts and confidence intervals

 Recall: “But unles s the dis tribution of your s amples is very s trange or very

different, us ing the t-tes t is us ually OK.”

 If you do not have many s amples (les s than ~30), then you mus t check that your

input data looks more or les s normally dis tributed

  • At leas t check that the dis tribution does not look terribly s kewed
  • Better: do a QQ plot
  • Even better: us e a normality tes t

 You might make many runs , group them together and exploit the Central Limit

Theorem to get normally dis tributed data, but…:

  • Warning: Only defined if the variance of your s amples is finite!
  • Therefore won’t work with, e.g., Pareto-dis tributed s amples ( <2)

α

 You mus t ens ure that the s amples are not correlated!

  • For example, a time s eries is often autocorrelated
  • Group s amples and calculate their average (Central Limit Theorem); make

groups large enough to let autocorrelation vanis h

  • Check with ACF plot
  • r autocorrelation tes t
  • r s tationarity tes t
slide-72
SLIDE 72

Network Security, WS 2008/09, Chapter 9 72 IN2045 – Dis crete Event Simulation, WS 2011/2012 72

Les s on for the author: Check your prerequis ites and as s umptions !

 Similar errors can be committed with other s tatis tical methods  Us ual s us pects :

  • Input has to be normally dis tributed, or follow s ome other

dis tribution

  • Input mus t not be correlated
  • Input has to come from a s tationary proces s
  • Input mus t be at leas t 30 s amples (10; 50; 100; …)
  • The two inputs mus t have the s ame variances
  • The variance mus t be finite
  • The two inputs mus t have the s ame dis tribution types
  • of cours e, all this depends on the chos en method!
slide-73
SLIDE 73

Network Security, WS 2008/09, Chapter 9 73 IN2045 – Dis crete Event Simulation, WS 2011/2012 73

Example 6: Economic growth (GER vs . USA)

 On 2003-10-30, the US Buerau of Economic Analysis (BEA)

announced

  • USA economic growth in 3rd quarter: 7.2%

 Assume that same day the German S tatistisches Bundesamt had

announced

  • D economic growth in 3rd quarter: 2%
  • (Note: This value is fictitious)

 Note: Both values refer to gross domestic product (GDP,

"Brutto-Inlandsprodukt", BIP)

 Which economy was growing faster?

slide-74
SLIDE 74

Network Security, WS 2008/09, Chapter 9 74 IN2045 – Dis crete Event Simulation, WS 2011/2012 74

Problem: Different definitions

 The US BEA extrapolates the growth for each quarter

to a full year

  • Statistisches Bundesamt does not

 Thus, the actual US growth factor during (from start to end of)

this quarter was only x, where x4 = 1.072.

  • x = 1.0175
  •  US growth was only 1.75%

in this quarter

slide-75
SLIDE 75

Network Security, WS 2008/09, Chapter 9 75 IN2045 – Dis crete Event Simulation, WS 2011/2012 75

Example 7: Unemployment rate (D vs . USA)

 (Source: DIE ZEIT 2004-02-05, p. 23: "Rot-weiß-blaues

Zahlenwunder")

 2003-1

1: USA: 5.9% D: 10.5%

 Which country had the higher unemployment rate?  What does the number mean?:

  • D: registered as unemployed at the Arbeitsamt
  • USA: telephone-based micro-census by Bureau of Labor Statistics

(BLS):

  • 1. Are you without work? (less than 1 hour last week)
  • 2. Are you actively searching for work?
  • 3. Could you start on a new job within 14 days?
  • Only people with 3x "yes" qualify as unemployed
  • A similar census is performed by Statistisches Bundesamt
  • Result: 9.3% unemployed (rather than 10.5%)

– called "erwerbslos" (as opposed to "arbeitslos")

  • Because people are more honest on the telephone
  • But the rules are still not quite the same…
slide-76
SLIDE 76

Network Security, WS 2008/09, Chapter 9 76 IN2045 – Dis crete Event Simulation, WS 2011/2012 76

Unemployment rate (continued)

 USA: The census ignores

  • people who read job ads, but do not search actively
  • people who do not believe they can find a job
  • counting them would increase the rate by 0.5%
  • 15-year-olds (who are unemployed very frequently)

 D: All these are included in the numbers  Furthermore: People disappear from the statistic

  • USA: 760 of every 100000 people are in prison (as of 2003). That

decreases the rate by 0.75%

  • D: 80 of every 100000. Decreases rate by 0.08%
  • D: Some people are "parked" on ABM
  • And more effects (in both countries)

 The overall result is hard to say

slide-77
SLIDE 77

Network Security, WS 2008/09, Chapter 9 77 IN2045 – Dis crete Event Simulation, WS 2011/2012 77

Les s on: Demand precis e definitions

 Only because two numbers have the same name does not mean they

are equivalent

  • in particular if they come from different contexts

 If no precise definitions of terms are available,

  • nly very large differences can be trusted
slide-78
SLIDE 78

Network Security, WS 2008/09, Chapter 9 78 IN2045 – Dis crete Event Simulation, WS 2011/2012 78

Example 8: productivity

 Steve Walters on comp.software-eng (early 1990s):

  • "We just finished a software development project and discovered

some curious metrics. This was a project in which we had good domain experience and about six years of metrics, both team productivity and other analogous software of similar scope and functionality .

  • The difference with this project was that we switched from a

functional design methodology to OO.

  • First the good news: the overall team productivity

(SLOC/person month) was almost three times our previous rate.

  • Now for the bad news: the delivered SLOC was almost three times

greater than estimated, based on the metrics from our previous projects."

slide-79
SLIDE 79

Network Security, WS 2008/09, Chapter 9 79 IN2045 – Dis crete Event Simulation, WS 2011/2012 79

Les s on: Precis e meas urements can be invalid

 Often a statistic is used for a purpose that it does not exactly fit to.

  • Perhaps nothing better is realistically possible

 But even if the numbers themselves are correct and precise, the

conclusions may be totally wrong.

  • It is not sufficient that statistics are correct when at the same time

they are inappropriate

  • Here: SLOC/personmonth has low construct validity for

measuring productivity

 Such proxy measurements are very common.

  • Beware!
slide-80
SLIDE 80

Network Security, WS 2008/09, Chapter 9 80 IN2045 – Dis crete Event Simulation, WS 2011/2012 80

Lutz Prechelt, prechelt@inf.fu-berlin.de

Real-world example: 25-fold reliability

 "Warum billigere Tintenpatronen verwenden,

wenn Original HP Tinten bis zu 25-mal zuverlässiger sind?"

  • "Why use cheaper ink cartridges

when genuine HP ink is up to 25 times more reliable?"

slide-81
SLIDE 81

Network Security, WS 2008/09, Chapter 9 81 IN2045 – Dis crete Event Simulation, WS 2011/2012 81

Lutz Prechelt, prechelt@inf.fu-berlin.de

25-fold reliability explanation

 DOA: Dead-on-arrival (<

10 pages usable capacity)

 PF: premature failure (<

75% of avg. non-DOA yield)

 HU: high unusable (>

10% pages with low quality)

color cartridges

slide-82
SLIDE 82

Network Security, WS 2008/09, Chapter 9 82 IN2045 – Dis crete Event Simulation, WS 2011/2012 82

Lutz Prechelt, prechelt@inf.fu-berlin.de

25-fold reliability explanation (2)

 Percentage of PF cartridges

(less than 75% of the avg. capacity of all cart's.) per brand

20 40 60 80 100 120 10 20 30 40 50 size percent

slide-83
SLIDE 83

Network Security, WS 2008/09, Chapter 9 83 IN2045 – Dis crete Event Simulation, WS 2011/2012 83

25-fold reliability explanation (3)

More problems with this data:

 52/120 =

43% is what they used

 52/103 =

50% is right if PF excludes DOA (as claimed)

 (52–17)/103 =

34% is right if PF includes DOA

slide-84
SLIDE 84

Network Security, WS 2008/09, Chapter 9 84 IN2045 – Dis crete Event Simulation, WS 2011/2012 84

Summary

 When confronted with data or conclusions from data

  • ne should always ask:
  • Can they possibly know this? How?
  • What do they really mean?
  • Is the purported reason the real reason?
  • Are the samples and measures unbiased and appropriate?
  • Are the measures well-defined and valid?
  • Are measures or visualizations misleading?
  • Has something important been left out?
  • Are there any inconsistencies (contradictions)?

 When we collect and prepare data, we should

  • work thoroughly and carefully
  • and avoid distortions of any kind
slide-85
SLIDE 85

Network Security, WS 2008/09, Chapter 9 85 IN2045 – Dis crete Event Simulation, WS 2011/2012 85

Will Rogers phenomenon (1)

Revenues per s ales man of company HuiSoft for two cons ecutive years , in k€:

2010 2011

Bielefeld München Bielefeld München 5000 5000 5000 5000 6000 10000 6000 7000 15000 7000 15000 20000 10000 20000 µ=6000 µ=12500 µ=7000 µ=13333 +16.7%+6.7%

No increas e in total numbers

Jus t one employee moved from München to Bielefeld

Yet an increas e in revenue per s ales man at both POPs !

slide-86
SLIDE 86

Network Security, WS 2008/09, Chapter 9 86 IN2045 – Dis crete Event Simulation, WS 2011/2012 86

Will Rogers phenomenon (2)

 Will Rogers (1879–1935),

American comedian and philos opher

 Named after one of his jokes :

Frage: Wenn die 10% dümms ten Saarländer nach Rheinland-Pfalz ziehen, was pas s iert dann? Antwort: In beiden Bundes ländern s teigt der IQ an.

 (originally with Oklahomans and Californians …)  Les s on:

  • Will Rogers phenomena are ubiquitous ,
  • yet can be difficult to s pot
  • …even for the authors thems elves !
  • Warning – it’s a s word that cuts both ways :

Sometimes looking at the details is better, s ometimes looking at the aggregated numbers makes more s ens e (as in the s ales example)

slide-87
SLIDE 87

Network Security, WS 2008/09, Chapter 9 87 IN2045 – Dis crete Event Simulation, WS 2011/2012 87

Simps on Paradox (1)

 Univers ität Es chweilerhof dis criminates agains t female s tudents !  Let’s s ee what faculties are the mos t s exis t ones :

Applications Acceptance rate Faculty female acc. male acc. female male Engineering 10 8 80 50 80% 63% CS 5 4 60 40 80% 67% Philos ophy80 20 40 10 25% 25% Law 30 15 40 10 50% 25% Total 125 47 220 110 (←s ignificant numbers )

  • Acc. rate

37.6% 50.0%

 None of them!? How can that be?

  • Women applied at faculties with more competition
slide-88
SLIDE 88

Network Security, WS 2008/09, Chapter 9 88 IN2045 – Dis crete Event Simulation, WS 2011/2012 88

Simps on Paradox (2)

 So who is right? Should the univers ity be punis hed?

  • The women’s rights activis ts ? After all, 37.6% vs . 50% is

s ignificant – and dividing the total number into faculties s imply introduces a bias into the picture.

  • The univers ity? After all, not a s ingle faculty does actually

dis criminate agains t women (in fact, mos t dis criminate agains t men).

 Ans wer: In this cas e, the univers ity is right

  • A s tudent applies at a s pecific faculty that he or s he choos es

hers elf

  • A s tudent does not apply at univers ity and lets the univers ity

choos e the faculty

 Les s on:

  • Simps on Paradox is more ubiquitous than you would think,

yet can be difficult to s pot …even for the authors thems elves !

  • Warning – it’s a s word that cuts both ways :

Sometimes looking at the details makes more s ens e (as in this cas e), s ometimes looking at the aggregated numbers is better.

slide-89
SLIDE 89

Network Security, WS 2008/09, Chapter 9 89 IN2045 – Dis crete Event Simulation, WS 2011/2012 89

Simps on Paradox (3)

slide-90
SLIDE 90

Network Security, WS 2008/09, Chapter 9 90 IN2045 – Dis crete Event Simulation, WS 2011/2012 90

Philos ophical / meta-as pects

slide-91
SLIDE 91

Network Security, WS 2008/09, Chapter 9 91 IN2045 – Dis crete Event Simulation, WS 2011/2012 91

Problem: Skew/leptokurtic dis tributions are not made for man(1)

 In the s tone age, man was s urrounded mainly by more or les s

normally dis tributed (i.e., s ymmetrically dis tributed) random variables : Sizes of people, pregnancy durations , food cons umption, etc.

  • Once you’ve s een a few s amples , you get the picture
  • Outliers are rare
  • Outliers do not affect the mean (e.g., avg weight is 80kg,

fattes t man on earth weighs 400kg) 99% of all values between the red bars

slide-92
SLIDE 92

Network Security, WS 2008/09, Chapter 9 92 IN2045 – Dis crete Event Simulation, WS 2011/2012 92

Problem: Skew/leptokurtic dis tributions are not made for man(2)

 Today, man is s urrounded by s kew dis tributions with high kurtos is

(leptokurtic), e.g., income (log-normal/ Pareto), earth quakes (Pareto), popularities (Zipf),…

  • Outliers like Dr. Waldner are comparably common – but you need

more than jus t “a few” s amples to s ee them

  • Outliers like Dr. Waldner do s trongly affect the mean!

 Les s on: As k: Is it a s kew, leptokurtic dis tribution?

90% of all values right of red bar; Median way more to the right; Mean even waaaaaay more to the right

slide-93
SLIDE 93

Network Security, WS 2008/09, Chapter 9 93 IN2045 – Dis crete Event Simulation, WS 2011/2012 93

Catas trophe probabilities

 Some (fictitious !) s tatements :

  • The probability that nuclear power plant X s uffers a

catas trophic accident is les s than 10–10 per year

  • The probability that the AFDX avionics network in an aircraft

fails is les s than 10–11 per hour of operation

  • The probability that Rigel will burs t into a s upernova is les s

than 10–7 during the next thous and years

  • The probability for an eruption of the Laacher See volcano in

the Eifel region is les s than 10–8 during the next hundred years

 What do they have in common? (apart from being made up)

  • A [catas tophic] high-impact event…
  • …with an extremely low probability
slide-94
SLIDE 94

Network Security, WS 2008/09, Chapter 9 94 IN2045 – Dis crete Event Simulation, WS 2011/2012 94

Low probabilities , high s takes

 On what grounds do thes e probabilities hold?

  • The underlying theory is correct
  • The underlying theory is applicable for the cas e being

cons idered

  • The cas e being cons idered is really the general cas e, not a

hidden s pecial cas e

  • The confidence level for the res ult (if applicable) als o s hows a

very high probability that the res ult is correct

  • The s ys tem under cons ideration has been correctly

trans formed into a correct theoretical model

  • The meas urement data us ed to parameterize/calibrate the

theoretical model has been meas ured correctly

  • The s oftware that analys es the theoretical model (e.g.,

s imulation, numerical analys is ,…) has been correctly implemented

  • The hardware that executes the model s oftware does not

introduce errors (FDIV bug; RAM contents altered due to α particle decay; …)

 If jus t one condition fails , the entire probability calculation is

flawed!

slide-95
SLIDE 95

Network Security, WS 2008/09, Chapter 9 95 IN2045 – Dis crete Event Simulation, WS 2011/2012 95

Low probabilities , high s takes

 Claim

Reality

Everything alright Catastrophe

  • ccurs

Don’t know, becaus e the calculations are flawed

slide-96
SLIDE 96

Network Security, WS 2008/09, Chapter 9 96 IN2045 – Dis crete Event Simulation, WS 2011/2012 96

Low probabilities , high s takes

 Es timated probability that a s cientific claim is flawed?

  • About 10–4, according to the paper below
  • Mileage will vary – s ome more rigid, s ome les s

 Cons equences

  • Let’s not take any ris ks !? No LHC, no SETI, no biotech, no

ITER, no-nothing? Should we live in caves !?

  • Have we become too ris k-avers e?

 More information in this very readable paper:

Ord, Toby, Hillerbrand: Probing the improbable: Methodological challenges for ris ks with low probabilities and high s takes . Journal of Ris k Res earch, 2010

slide-97
SLIDE 97

Network Security, WS 2008/09, Chapter 9 97 IN2045 – Dis crete Event Simulation, WS 2011/2012 97

Les s ons

 For authors :

  • Know your boundaries
  • Clearly s tate your as s umptions
  • Clearly warn about pos s ibilities that as s umptions may not

hold in reality

 For readers :

  • Double-check the as s umptions
  • As k for s econds , third, … opinions , preferrably us ing

completely different methods

slide-98
SLIDE 98

Network Security, WS 2008/09, Chapter 9 98 IN2045 – Dis crete Event Simulation, WS 2011/2012 98

Ris k avers ion: How we lie to ours elves

 Do mobile phones caus e cancer?

  • Very little evidence, long-term s tudies were needed
  • Res ult:
  • Pos s ibly caus es cancer
  • Only for people who us e them for many hours per week
  • Still a very low incidence rate

 But many people try to get rid of bas e s tations in their

neighbourhood

  • “Well, it is jus t in cas e – you never know if there is s omething

about thos e allegations ”

 How often is calling an ambulance/the firemen via a mobile phone

s ignificantly fas ter than running to the neares t land-line phone?

  • How many “non-cas ualties ” this way per year?
slide-99
SLIDE 99

Network Security, WS 2008/09, Chapter 9 99 IN2045 – Dis crete Event Simulation, WS 2011/2012 99

Ris k avers ion: How we lie to ours elves

 Do cars and motorcycles caus e deaths ?

Yes , and very much s o:

  • About 4,000 cas ualties in Germany per year (p.a.) due to traffic

accidents

  • About 80,000,000 inhabitants in Germany
  • Roughly 800,000 people die in Germany p.a.

 Incidence:

About 0.5% of all deaths are traffic accidents !

  • That’s jus t the deaths . We are ignoring other s erious

cons equences s uch as mutilations , month-long recovery treatments , ps ychological traumata, financial los s es , etc.

 Compare: How many % of all deaths in Germany are directly or

indirectly linked to mobile phones p.a.?

slide-100
SLIDE 100

Network Security, WS 2008/09, Chapter 9 100 IN2045 – Dis crete Event Simulation, WS 2011/2012 100

Ris k avers ion: How we lie to ours elves

 Reproduction is fun! (if done on purpos e…)  But what about the ris ks ?

  • Mortality among mothers in labour: 80 ppm = almos t 0.1‰
  • Ris k that the child s uffers from a chromos ome aberration

(tris omy 21/Down s yndrome, Cri du Chat, tris omy 18, tris omy 13, etc.): about 1/160 = 0.63%

 Would you enter a car if the ris k of having a s erious accident

(fatal or heavy injuries ) were 0.63% per…

  • Per journey?
  • Per 100km?
  • Per 10,000km?
  • Per car lifetime?
slide-101
SLIDE 101

Network Security, WS 2008/09, Chapter 9 101 IN2045 – Dis crete Event Simulation, WS 2011/2012 101

Ris k avers ion: How we lie to ours elves

Les s ons : 1. Often, we take ris ks without noticing their true extent (even though we actually know it) 2. Often, we refus e taking ris ks that are magnitudes s maller than thos e from point 1. 3. Mos t occurrences of point 2 do not make any s ens e, but we jus t do not notice. 4. On the other hand: If we are aware of thes e phenomena, if we counter them by acting “rationally” agains t our intuition/common s tandards , and then the unlikely accident happens , we will feel very guilty, and everybody will s ay “I told you s o”… 5. Als o note that we mos tly are talking about very low probabilities again…

slide-102
SLIDE 102

Network Security, WS 2008/09, Chapter 9 102 IN2045 – Dis crete Event Simulation, WS 2011/2012 102

Summary

 When confronted with data or conclusions from data

  • ne should always ask:
  • Can they possibly know this? How?
  • What do they really mean?
  • Is the purported reason the real reason?
  • Are the samples and measures unbiased and appropriate?
  • Are the measures well-defined and valid?
  • Are measures or visualizations misleading?
  • Has something important been left out?
  • Are there any inconsistencies (contradictions)?

 When we collect and prepare data, we should

  • work thoroughly and carefully
  • check our assumptions and prerequisites
  • avoid distortions of any kind
slide-103
SLIDE 103

Network Security, WS 2008/09, Chapter 9 103 IN2045 – Dis crete Event Simulation, WS 2011/2012 103

Chair for Network Architectures and Services —Prof. Carle Department of Computer Science TU München

How to Lie with Statis tics Chapters 8-10: Daniel Huff

slide-104
SLIDE 104

Network Security, WS 2008/09, Chapter 9 104 IN2045 – Dis crete Event Simulation, WS 2011/2012 104

Chapter 8: Post-Hoc rides Again

 A common kind is when the relations hip is real

but it is not pos s ible to be s ure which of the variables is the caus e and which is the effect

  • Perhaps there is a real correlation, yet neither of the

variables has any effect at all on the others

  • Watch out for a correlation conclus ion beyond the

data which it has been demons trated. (Ex. More rain = better crops )

slide-105
SLIDE 105

Network Security, WS 2008/09, Chapter 9 105 IN2045 – Dis crete Event Simulation, WS 2011/2012 105

Chapter 9: How to Statisculate

 Any percentage figure bas ed on a s mall number of

cas es is likely to be mis leading

 The “s hifting bas e” -- percentages taken off

different totals to imply different amounts

 Percentages added together, or mathematically

us ed in other ways (Ex. “I mix ‘em fifty-fifty: one hors e, one rabbit.”)

slide-106
SLIDE 106

Network Security, WS 2008/09, Chapter 9 106 IN2045 – Dis crete Event Simulation, WS 2011/2012 106

Chapter 10: How to Talk Back to a Statistic

 1s t thing to look for is bias  Cons cious bias

  • Direct mis s tatements
  • Ambiguous s tatement
  • Selection of favorable data
  • Suppres s ion of unfavorable
  • Units of meas urement may be s hifted
  • Improper meas ure (trickery covered by the us e of the word

“average”)

slide-107
SLIDE 107

Network Security, WS 2008/09, Chapter 9 107 IN2045 – Dis crete Event Simulation, WS 2011/2012 107

Chapter 10: How to Talk Back to a Statistic

 Who s ays s o? How do they know?

  • A bias ed s ample, or that has been s elected

improperly or has s elected its elf

  • Reported correlation: is it big enough to mean

anything? Are there enough cas es to add up to any s ignificance? Look for a meas ure of reliability (s ources for error)