what can go wrong with statistics some typical errors how
play

What can go wrong with statistics: Some typical errors & How to - PowerPoint PPT Presentation

Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU Mnchen What can go wrong with statistics: Some typical errors & How to lie with statistics Content adopted partially from: Lutz Prechelt


  1. Correlation does not mean causation (1)  “If A is correlated with B, then A caus es B”  Perhaps neither of thes e things has produced the other, but both are a product of s ome third factor C  It may be the other way round: B caus es A  Correlation can actually be of any of s everal types and can be limited to a range  The correlation may be pure coincidence, e.g. #pirates vs . global temperature  Given a s mall s ample, you are likely to find s ome s ubs tantial correlation between any pair of characters or events IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 19 19

  2. Correlation does not mean causation (2)  Example 1: “Queueing delays increas ed; therefore throughput for individual TCP connections decreas ed”  Could be true  Could be due to an increas ed # of total TCP connections  Could be actually unrelated  Example 2: “Chance for recovery decreas es with an increas ing period of cancer treatment by radiation; this s hows that longer expos ure to radiation is dangerous ”. Well, maybe, but…  …us ually, longer therapies are required for more s evere/bigger types of cancer – and you are les s likely to s urveve thes e IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 20 20

  3. Correlation does not mean causation (3)  Example 3: “Birth rates have been decreas ing for decades . So has the number of s torks . This proves that babies are delivered by the s tork!”  Example 4: “The number of TV s tations has increas ed, as well as the amount of money that people s pend on travelling. This proves the efficiency of travel ads on TV.” IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 21 21

  4. Correlation does not mean causation: Lessons  Often, there is a hidden background variable (e.g., s ize of the tumor)  Time is a good candidate for a background variable (e.g., s torks vs . babies , TV s tations vs . travel expens es ) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 22 22

  5. Fishing for correlations  Correlation can be a purely random Textmas terformate durch Klicken effect! Zweite Ebene  Statis ticians as s ume that in ~5% of all Dritte Ebene cas es , two arbitrarily chos en variables Vierte Ebene Fünfte Ebene appear to be correlated  Example:  Determine 20 parameters (=rnd variables ) in s ome s imulation experiment  Can create ½ · 20 · 19 = 190 pairs of random variables  5% of 190 = about 9 – 10 “correlations ” that are in fact purely random! http://www.xkcd.com/882/ IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 23 23

  6. Problem 5: Is HGH even part of the cause?  Wrinkle reduction: up to 61%  Maybe that could happen even without HGH? Note: This data is pure fantasy! M heartAttack o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M healthy o o o o o o o o o o o o o o o o o o oo o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M h.A.,noHGH o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o -20 0 20 40 60 reduction IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 24 24

  7. Lesson: Question causality  Sometimes the data is not just biased, it contains hardly anything other than bias  If you see a presumably (=author) or assertedly (=reader) causal relationship ("A causes B"), ask yourself:  Does it really make sense?  Would A really have this much influence on B?  Couldn‘t it be just the other way round?  What other influences besides A may be important?  What is the relative weight of A compared to these? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 25 25

  8. Percentages  “Wohl- und übelwollende Benutzer gleichermaßen s chätzen es [das Prozent] wegen s einer Aura von mathematis cher Neutralität und Sachlichkeit. ‘Prozent’ […] riecht man Kaufmanns kontor und doppelter Buchführung; die Serios ität quillt nur s o aus den Knopflöchern. Prozente s tehen für Glaubwürdigkeit und Autorität, Prozente s trahlen Gewis s heit aus , Prozente zeigen, das s man rechnen kann, s ie verleihen Autorität und Überlegenheit, ums o mehr, und wahrs cheinlich noch dadurch vers tärkt, als s o mancher Adres s at einer modernen Prozentpredigt überhaupt nicht weiß, was eigentlich Prozente s ind.” – Walter Krämer IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 26 26

  9. Percentages and absolute numbers (1) You’re in hos pital, and the doctor tells you…:  “Medication A has a 10% higher chance to cure your dis eas e, but the thrombos is ris k is increas ed by 100% in comparis on to medication B.”  Which one would you pick?  “With medication B, about 1 in 7,000 patients s uffers from thrombos is . With medication A, about 2 in 7,000 patients s uffers from thrombos is , but it has a 10% higher chance to cure your dis eas e.”  Which one would you pick?  Mathematically, the two des criptions are equivalent!  Your decis ion probably depends on the gravenes s of your dis eas e (e.g., headache vs . liver cancer)  Les s on: Percentages can be mis leading! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 27 27

  10.  Example - Percentages and absolute numbers T  Zweite Ebene e x t • m D a r – i s t t V t e i e » Fünfte Ebene e E r r b t f e o e n E r e m b e a n e t e d u r c h K l i c k e n b e a r b e IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 28 28 i t

  11.  Example - Percentages and absolute numbers T  Zweite Ebene e x t • m D a r – i s t t V e t i e » Fünfte Ebene e E r r b t f e o e n E r e m b e a n e t e d u r c h K l i c k e n b e a r b e IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 29 29 i t

  12. Percentages and absolute numbers (2)  “In the pas t year, we have employed an additional 1,000 teachers in North Rhine Wes tphalia. This s hows our great commitment and financial efforts to improve our s chool s ys tem.” – Sounds good, does n’t it?  How many s chools are there in NRW?  About 7,000  Only one in s even s chools (about 14%) gets an additional teacher!  How many teachers are there in NRW in total?  About 130,000  Res ult: Les s than 1% increas e…  Les s on: Abs olute numbers can be mis leading, too! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 30 30

  13. Percentages of what? – Two examples  In 2008, Pres ident Bus h as s erted that the USA would reduce their emis s ions of greenhous e gas es by the year 2050 by at leas t 50%.  50% – but as compared to what?  In relation to the year 1990? – International s tandard  In relation to the year with the highes t emis s ions ? • …which might yet be to come!?  The s hare of nuclear energy in Germany is about 25%  True for electrical energy  The s hare of nuclear energy in Germany is about 13%  True for total primary energy cons umption IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 31 31

  14. Percentages (4)  “In the pas t year, we could boos t our company’s rate of return by 400%!”  Wow, 400%. Impres s ive!  “That is becaus e we increas ed our rate of return from 0.1% to 0.5%.”  Jus t 0.5%. How inefficient!  Les s ons  Always as k (or write out): “percentage of what?”  Always as k for (or write out) • The percentages • And the abs olute numbers  Percentages of percentages often don’t make s ens e and can be an indication of foul play (cf. next s lide) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 32 32

  15. Prozentzahlen und Prozentpunkte  Wahl 2010:  Partei A: 40%  Partei B: 10%  Wahl 2014:  Partei A: 30%  Partei B: 20%  „Partei A hat 10% verloren, Partei B hat 10% gewonnen“  Fals ch: Partei A hat • 10 Prozentpunkte verloren • 25% verloren (denn 40/30 = 0,75) – …aber auch nicht der abs oluten Stimmen, da vermutlich unters chiedliche Wahlbeteiligung, unters chiedliche Anzahl Wahlberechtigte, etc. etc.  Lektion: Es gibt einen wichtigen Unters chied zwis chen Prozent und Prozentpunkten! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 33 33

  16. Example 2: Tungu and Bulugu  We look at the yearly per-capita income in two small hypothetic island states: Tungu and Bulugu  Statement: "The average yearly income in T ungu is 94.3% higher than in Bulugu." IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 34 34

  17. Problem 1: Misleading averages  The island states are rather small: 81 people in T ungu and 80 in Bulugu  And the income distribution is not as even in T ungu: Note: This data is pure fantasy! M Tungu o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M Bulugu o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 0 1000 2000 3000 4000 5000 income IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 35 35

  18. Mis leading averages and outliers  The only reason is Dr. Waldner, owner of a software company, who has been enjoying his retirement in T ungu for a year M Tungu o o o o o o o o o oo o o oo o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M Bulugu o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o 10^3.0 10^3.5 10^4.0 10^4.5 10^5.0 income IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 36 36

  19. Les s on: Ques tion appropriatenes s  A certain statistic (very often the arithmetic average) may be inappropriate for characterizing a sample  If there is any doubt, ask that additional information be provided  such as standard deviation  or some quantiles, e.g.: 0, 0.25, 0.5, 0.75, 1 Note: 0.25 quantile is equivalent to 25-percentile etc. M Tungu o o o o o o o o o oo o o oo o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 37 37

  20. Logarithmic axes  Waldner earns 160.000 per year. How much more that is than the other T unguans have, is impossible to see on the logarithmic axis we just used M Tungu o o o o o o o oo o o o oo o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M Bulugu o o o o o o o o o o oo o o o o o oo o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o Waldner 0 50000 100000 150000 income IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 38 38

  21. Les s on: Beware of inappropriate vis ualizations (#1)  Lesson for reader: Always look at the axes. Are they linear or logarithmic?  Lesson for author:  Logarithmic axes are very useful for reading hugely different values from a graph with some precision  But they totally defeat the imagination!  If you decide to use logarithmic axes, always state this fact in your text!  There are many more kinds of inappropriate visualizations  see later in this presentation IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 39 39

  22. Problem 4: Mis leading precis ion  "The average yearly income in T ungu is 94.3% higher than in Bulugu"  Assume that tomorrow Mrs. Alulu Nirudu from T ungu gives birth to her twins  There are now 83 rather than 81 people on T ungu  The average income drops from 3922 to 3827  The difference to Bulugu drops from 94.3% to 89.7% IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 40 40

  23. Les s on for reader: Do not be eas ily impres s ed  The usual reason for presenting very precise numbers is the wish to impress people  „ Round numbers are always false“  But round numbers are much easier to remember and compare  Clearly tell people you will not be impressed by precision  in particular if the precision is purely imaginary IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 41 41

  24. Les s on for author: Think about precis ion  Do you really have enough data that would make s ens e to give out precis e numbers ?  Compromis e: Give exact number in tables /figures , but round them in text.  Do not exaggerate: If you find your s ys tems yields a 52,91% increas e in throughput  Don’t s ay: “Our s ys tem increas es throughput by more than 50%”  Do s ay: “Our experiments s ugges t that our s ys tem can achieve throughput increas es of around 50%” IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 42 42

  25. Example 3: Phantas mo Corporation s tock price  We look at the (Phantasmo and this data recent development of are purely imaginary) 192 the price of shares for Phantasmo 190 Corporation 188  "Phantasmo shows a stock price remarkably strong 186 and consistent value 184 growth and continues to be a top 182 recommendation" 180 0 100 200 300 400 day IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 43 43

  26. Problem: Looks can be mis leading • The following two plots show 192 exactly the same data! • and the same as the 190 plot on the previous slide! 188 186 stock price 184 182 180 stock p 192 0 100 200 300 400 190 188 186 184 182 180 day 0 100 200 300 400 day IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 44 44

  27. Problem: Scales can be mis leading  What really happened is shown here: 200 We intuitively interpret a trend plot on a ratio scale 150 stock price 100 192 190 50 188 stock price 186 0 184 0 100 200 300 400 182 day 180 0 100 200 300 400 day IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 45 45

  28. So look carefully! found on focus.msn.de on 2004-03-04: IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 46 46

  29. Problem: Scales can be mis s ing  The most insolent persuaders may even leave the scale out altogether ! • Never forget to label your axes! • Never forget to put a scale on your axes! 0 100 200 300 400 day IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 47 47

  30. Problem: Scales can be abus ed  Observe the global impression first 2005 IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 48 48

  31. Problem: People may invent unexpected things  Quelle: Werbeanzeige der Donau- Universität Krems  DIE ZEIT, 07.10.2004  What‘s wrong? 2 Jahre 4 Jahre IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 49 49

  32. Axis s cales : Les s ons for author  Warning: Mos t plotting s oftware automatically s elects boundaries for you (e.g., GNU R)   Always as k yours elves : Do thes e automatically chos en axis limits make s ens e?  When plotting probabilities , pleas e cons ider manually s etting the axis to the interval [0 … 1]  When us ing a logs cale, pleas e  … explicitly write about this either in the text or in the caption  … explicitly tell this to your audience when giving a talk IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 50 50

  33. Pie charts (1/3) Note: This data is pure fantasy! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 51 51

  34. Pie charts (2/3) Note: This data is pure fantasy! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 52 52

  35. Pie charts (3/3)  What percentages do the two graphs s how? Gues s !  Ans wer:  Both s how the s ame data: A 94% : 6% ratio!  The difference only lies in the angle of the pies . IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 53 53

  36. Les s on: Dis trus t pie charts !  Pie charts s hould never be us ed  Perception dependent on the angle  Even wors e with 3D pie charts : Parts at the front are artificially increas ed due to the pie’s 3D height; they thus s eem to be bigger  A very s ubtle way to vis ually tune your data  Unfortunately, s till very common  Dis trus t pie charts that do not give numbers as well  Think about the numbers , compare them  Think about the pres entation: are they trying to beautify the impres s ion? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 54 54

  37. Bubble charts Textmas terformate durch Klicken bearbeiten as terformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene eite Ebene Vierte Ebene Dritte Ebene Fünfte Ebene – Vierte Ebene » Fünfte Ebene Note: This data is pure fantasy! Which diagram s hows the values 2, 3, 4? Both do! Left one: Radius is proportional to meas urements Exaggerates differences : 4 looks much larger than 2 Right one: Area is proportional to meas urements Underes timates differences : 4 looks only s lightly larger than 2 IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 55 55

  38. Pictograms Note: This data is pure fantasy! http://sciencev1.orf.at/static2.orf.at/science/storyimg/storypart_155543.jp g IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 56 56

  39. Pictogram – Comparis on Apartment s ize IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 57 57

  40. Pictogram – Comparis on Apartment s ize IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 58 58

  41. Les s on: Bubble charts and pictograms  This les s on is more or les s s imilar to pie charts :  Bubble charts us ually s hould not be us ed  Radius proportionality exaggerates differences , but area proportionality often lets underes timate differences  A very s ubtle way to vis ually tune your data  Of cours e, a bubble chart + pie chart may convey more information, but pleas e try to vis ualize it differently…  If you really, really want to us e a bubble chart, then us e the area proportionality variant, and clearly explain this in your text, and als o put the actual numbers right next to the bubbles  Dis trus t bubble charts that do not give the numbers as well  Think about the numbers , compare them  Think about the pres entation: Did they really need to us e bubble charts ? Or are they trying to beautify the impres s ion? Sometimes size really matters. IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 59 59

  42. Summary les s on for the reader: Seeing is believing  …but often, it shouldn't be!  Always consider what it really is that you are seeing  Do not believe anything purely intuitively  Do not believe anything that does not have a well-defined meaning  Be sceptic about pie and bubble charts  … in particular if they do not even print the actual numbers but only rely on the pure graphical presentation  … in particular if they use 3D pies IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 60 60

  43. Example 4: blend-a-med Night Effects  What do they not say? Think about it…  What exactly does "sichtbar" mean? What exactly does „hell“ or „heller“ mean?  What was the scope, what were the results of the clinical trials?  What other effects does Night Effects have? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 61 61

  44. Example 5: The better tool?  We consider the time it takes programmers to write a certain program using different IDEs:  Aguilder or  Egglips  Statement (by the maker of Aguilder): "In an experiment with 12 persons, the ones using Egglips required on average 24.6% more time to finish the same task than those using Aguilder. Both groups consisted of equally capable people and received the same amount and quality of training."  Assume Egglips and Aguilder are in fact just as good. What may have gone wrong here? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 62 62

  45. Problem: Has anybody ignored any data? 0 100 200 300  Solution: Just 3 4 Note: This data is repeat the pure fantasy! experiment a M M few times and Egglips o o o oo o o o o o o o pick the outcome you M M Aguilder o o o o o o o o o o o o like best 1 2 M M Egglips o o o o o o o o o o o o M M Aguilder o o o o o o o o o o o o 0 100 200 300 time IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 63 63

  46. Les s on for the reader: Demand complete information  If somebody presents conclusions  based on only a subset of the available data  and has selected which subset to use  then everything is possible  There is no direct way to detect such repetitions, BUT for any one single execution . . . IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 64 64

  47. Digres s ion: Hypothes is tes ting  …a significance test (or confidence intervals) can determine how likely it was to obtain this result if the conclusion is wrong:  Null hypothesis: Assume both tools produce equal work times overall  Then how often will we get a difference this large when we use samples of size 6 persons? • If the probability is small, the result is plausibly real • If the probability is large, the result is plausibly incidental IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 65 65

  48. Digres s ion: Hypothes is tes ting  …a significance test (or confidence intervals) can determine how likely it was to obtain this result if the conclusion is wrong:  Null hypothesis: Assume both tools produce equal work times overall  Then how often will we get a difference this large when we use samples of size 6 persons? • If the probability is small, the result is plausibly real • If the probability is large, the result is plausibly incidental IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 66 66

  49. Statis tical s ignificance tes t: Example  Our data:  Aguilder: 175, 186, 137, 117, 92.8, 93.7 (mean 133)  Egglips: 171, 155, 157, 181, 175, 160 (mean 166)  Null hypothesis:  We assume the distributions underlying these data are both normal distributions with the same variance and  the means of the actual distributions are in fact equal  Then we can compute the probability for seeing this difference of 33 from two samples of size 6  The procedure for doing this is called the t-test (recall the confidence intervals? – It‘s a very similar calculation) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 67 67

  50. So? (Les s ons for the author)  So in our case we probably would believe the result and not find out that the experimenters had in fact cheated  (And indeed they were lucky to get the result they got) Note:  There are many different kinds of hypothesis tests and various things can be done wrong when using them  In particular, watch out what the tes t as s umes  and what the p-value means , namely: • The probability of seeing this data if the null hypothesis is true • Note: The p-value is not the probability that the null hypothes is is true!  But unless the distribution of your samples is very strange or very different, using the t-test is usually OK. • Note: There are quite a number of different tests called “t test”. • They have subtle yet important differences… IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 68 68

  51. Example: Error bars  “Although a high variability in our meas urements res ults in rather large error bars , our s imulation res ults s how a clear increas e in [whatever].”  What’s wrong here? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 69 69

  52. Les s on: Error bars  What are the error bars ? How are they defined?  Minimum and maximum values ?  Confidence intervals ? • If s o, at which level? 95%? 99%?  Mean ± two s tandard deviations ?  Mean ± two s tandard errors ?  Firs t and third quartile? 10% and 90% quantile?  Chebys hov* or Chernov bounds ? *als o: Ts chebys cheff, Ts chebys chow, Chebys hev, … Same with Ts chernoff, …  Reader: Dis trus t error bars that are not explained  Author:  Clearly s tate what kind of error bars you’re us ing  Us ually, the bes t choice is to us e confidence intervals , but s tandard deviation and s tandard error als o very common IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 70 70

  53. Les s on for the author: Common errors for t tes ts and confidence intervals  Recall: “But unles s the dis tribution of your s amples is very s trange or very different, us ing the t-tes t is us ually OK.”  If you do not have many s amples (les s than ~30), then you mus t check that your input data looks more or les s normally dis tributed  At leas t check that the dis tribution does not look terribly s kewed  Better: do a QQ plot  Even better: us e a normality tes t  You might make many runs , group them together and exploit the Central Limit Theorem to get normally dis tributed data, but…:  Warning: Only defined if the variance of your s amples is finite!  Therefore won’t work with, e.g., Pareto-dis tributed s amples ( <2) α  You mus t ens ure that the s amples are not correlated!  For example, a time s eries is often autocorrelated  Group s amples and calculate their average (Central Limit Theorem); make groups large enough to let autocorrelation vanis h  Check with ACF plot or autocorrelation tes t or s tationarity tes t IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 71 71

  54. Les s on for the author: Check your prerequis ites and as s umptions !  Similar errors can be committed with other s tatis tical methods  Us ual s us pects :  Input has to be normally dis tributed, or follow s ome other dis tribution  Input mus t not be correlated  Input has to come from a s tationary proces s  Input mus t be at leas t 30 s amples (10; 50; 100; …)  The two inputs mus t have the s ame variances  The variance mus t be finite  The two inputs mus t have the s ame dis tribution types  …  of cours e, all this depends on the chos en method! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 72 72

  55. Example 6: Economic growth (GER vs . USA)  On 2003-10-30, the US Buerau of Economic Analysis (BEA) announced  USA economic growth in 3rd quarter: 7.2%  Assume that same day the German S tatistisches Bundesamt had announced  D economic growth in 3rd quarter: 2%  (Note: This value is fictitious)  Note: Both values refer to gross domestic product (GDP, "Brutto-Inlandsprodukt", BIP)  Which economy was growing faster? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 73 73

  56. Problem: Different definitions  The US BEA extrapolates the growth for each quarter to a full year  Statistisches Bundesamt does not  Thus, the actual US growth factor during (from start to end of) this quarter was only x, where x4 = 1.072.  x = 1.0175   US growth was only 1.75% in this quarter IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 74 74

  57. Example 7: Unemployment rate (D vs . USA)  (Source: DIE ZEIT 2004-02-05, p. 23: "Rot-weiß-blaues Zahlenwunder")  2003-1 1: USA: 5.9% D: 10.5%  Which country had the higher unemployment rate?  What does the number mean?:  D: registered as unemployed at the Arbeitsamt  USA: telephone-based micro-census by Bureau of Labor Statistics (BLS): • 1. Are you without work? (less than 1 hour last week) • 2. Are you actively searching for work? • 3. Could you start on a new job within 14 days? • Only people with 3x "yes" qualify as unemployed  A similar census is performed by Statistisches Bundesamt • Result: 9.3% unemployed (rather than 10.5%) – called "erwerbslos" (as opposed to "arbeitslos") • Because people are more honest on the telephone • But the rules are still not quite the same… IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 75 75

  58. Unemployment rate (continued)  USA: The census ignores  people who read job ads, but do not search actively  people who do not believe they can find a job • counting them would increase the rate by 0.5%  15-year-olds (who are unemployed very frequently)  D: All these are included in the numbers  Furthermore: People disappear from the statistic  USA: 760 of every 100000 people are in prison (as of 2003). That decreases the rate by 0.75%  D: 80 of every 100000. Decreases rate by 0.08%  D: Some people are "parked" on ABM  And more effects (in both countries)  The overall result is hard to say IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 76 76

  59. Les s on: Demand precis e definitions  Only because two numbers have the same name does not mean they are equivalent  in particular if they come from different contexts  If no precise definitions of terms are available, only very large differences can be trusted IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 77 77

  60. Example 8: productivity  Steve Walters on comp.software-eng (early 1990s):  "We just finished a software development project and discovered some curious metrics. This was a project in which we had good domain experience and about six years of metrics, both team productivity and other analogous software of similar scope and functionality .  The difference with this project was that we switched from a functional design methodology to OO.  First the good news: the overall team productivity (SLOC/person month) was almost three times our previous rate.  Now for the bad news: the delivered SLOC was almost three times greater than estimated, based on the metrics from our previous projects." IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 78 78

  61. Les s on: Precis e meas urements can be invalid  Often a statistic is used for a purpose that it does not exactly fit to.  Perhaps nothing better is realistically possible  But even if the numbers themselves are correct and precise, the conclusions may be totally wrong.  It is not sufficient that statistics are correct when at the same time they are inappropriate • Here: SLOC/personmonth has low construct validity for measuring productivity  Such proxy measurements are very common.  Beware! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 79 79

  62. Real-world example: 25-fold reliability  " Warum billigere Tintenpatronen verwenden, wenn Original HP Tinten bis zu 25-mal zuverlässiger sind?"  "Why use cheaper ink cartridges when genuine HP ink is up to 25 times more reliable?" Lutz Prechelt, prechelt@inf.fu-berlin.de IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 80 80

  63. 25-fold reliability explanation color cartridges  DOA: Dead-on-arrival (< 10 pages usable capacity)  PF: premature failure (< 75% of avg. non-DOA yield)  HU: high unusable (> 10% pages with low quality) Lutz Prechelt, prechelt@inf.fu-berlin.de IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 81 81

  64. 25-fold reliability explanation (2)  Percentage of PF cartridges 50 (less than 75% of the avg. 40 capacity of all cart's.) per 30 brand percent 20 10 0 0 20 40 60 80 100 120 size Lutz Prechelt, prechelt@inf.fu-berlin.de IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 82 82

  65. 25-fold reliability explanation (3) More problems with this data:  52/120 = 43% is what they used  52/103 = 50% is right if PF excludes DOA (as claimed)  (52–17)/103 = 34% is right if PF includes DOA IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 83 83

  66. Summary  When confronted with data or conclusions from data one should always ask:  Can they possibly know this? How?  What do they really mean?  Is the purported reason the real reason?  Are the samples and measures unbiased and appropriate?  Are the measures well-defined and valid?  Are measures or visualizations misleading?  Has something important been left out?  Are there any inconsistencies (contradictions)?  When we collect and prepare data, we should  work thoroughly and carefully  and avoid distortions of any kind IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 84 84

  67. Will Rogers phenomenon (1)  Revenues per s ales man of company HuiSoft for two cons ecutive years , in k€: 2010 2011 Bielefeld München Bielefeld München 5000 5000 5000 5000 6000 10000 6000 7000 15000 7000 15000 20000 10000 20000 µ=6000 µ=12500 µ=7000 µ=13333 +16.7%+6.7%  No increas e in total numbers  Jus t one employee moved from München to Bielefeld  Yet an increas e in revenue per s ales man at both POPs ! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 85 85

  68. Will Rogers phenomenon (2)  Will Rogers (1879–1935), American comedian and philos opher  Named after one of his jokes : Frage: Wenn die 10% dümms ten Saarländer nach Rheinland-Pfalz ziehen, was pas s iert dann? Antwort: In beiden Bundes ländern s teigt der IQ an.  (originally with Oklahomans and Californians …)  Les s on:  Will Rogers phenomena are ubiquitous ,  yet can be difficult to s pot  …even for the authors thems elves !  Warning – it’s a s word that cuts both ways : Sometimes looking at the details is better, s ometimes looking at the aggregated numbers makes more s ens e (as in the s ales example) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 86 86

  69. Simps on Paradox (1)  Univers ität Es chweilerhof dis criminates agains t female s tudents !  Let’s s ee what faculties are the mos t s exis t ones : Applications Acceptance rate Faculty female acc. male acc. female male Engineering 10 8 80 50 80% 63% CS 5 4 60 40 80% 67% Philos ophy80 20 40 10 25% 25% Law 30 15 40 10 50% 25% 125 47 220 110 ( ← s ignificant numbers ) Total Acc. rate 37.6% 50.0%  None of them!? How can that be?  Women applied at faculties with more competition IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 87 87

  70. Simps on Paradox (2)  So who is right? Should the univers ity be punis hed?  The women’s rights activis ts ? After all, 37.6% vs . 50% is s ignificant – and dividing the total number into faculties s imply introduces a bias into the picture.  The univers ity? After all, not a s ingle faculty does actually dis criminate agains t women (in fact, mos t dis criminate agains t men).  Ans wer: In this cas e, the univers ity is right  A s tudent applies at a s pecific faculty that he or s he choos es hers elf  A s tudent does not apply at univers ity and lets the univers ity choos e the faculty  Les s on:  Simps on Paradox is more ubiquitous than you would think, yet can be difficult to s pot …even for the authors thems elves !  Warning – it’s a s word that cuts both ways : Sometimes looking at the details makes more s ens e (as in this cas e), s ometimes looking at the aggregated numbers is better. IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 88 88

  71. Simps on Paradox (3) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 89 89

  72. Philos ophical / meta-as pects IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 90 90

  73. Problem: Skew/leptokurtic dis tributions are not made for man(1)  In the s tone age, man was s urrounded mainly by more or les s normally dis tributed (i.e., s ymmetrically dis tributed) random variables : Sizes of people, pregnancy durations , food cons umption, etc.  Once you’ve s een a few s amples , you get the picture  Outliers are rare  Outliers do not affect the mean (e.g., avg weight is 80kg, fattes t man on earth weighs 400kg) 99% of all values between the red bars IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 91 91

  74. Problem: Skew/leptokurtic dis tributions are not made for man(2)  Today, man is s urrounded by s kew dis tributions with high kurtos is (leptokurtic), e.g., income (log-normal/ Pareto), earth quakes (Pareto), popularities (Zipf),…  Outliers like Dr. Waldner are comparably common – but you need more than jus t “a few” s amples to s ee them  Outliers like Dr. Waldner do s trongly affect the mean! 90% of all values right of red bar; Median way more to the right; Mean even waaaaaay more to the right  Les s on: As k: Is it a s kew, leptokurtic dis tribution? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 92 92

  75. Catas trophe probabilities  Some (fictitious !) s tatements :  The probability that nuclear power plant X s uffers a catas trophic accident is les s than 10–10 per year  The probability that the AFDX avionics network in an aircraft fails is les s than 10–11 per hour of operation  The probability that Rigel will burs t into a s upernova is les s than 10–7 during the next thous and years  The probability for an eruption of the Laacher See volcano in the Eifel region is les s than 10–8 during the next hundred years  What do they have in common? (apart from being made up)  A [catas tophic] high-impact event…  …with an extremely low probability IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 93 93

  76. Low probabilities , high s takes  On what grounds do thes e probabilities hold?  The underlying theory is correct  The underlying theory is applicable for the cas e being cons idered  The cas e being cons idered is really the general cas e, not a hidden s pecial cas e  The confidence level for the res ult (if applicable) als o s hows a very high probability that the res ult is correct  The s ys tem under cons ideration has been correctly trans formed into a correct theoretical model  The meas urement data us ed to parameterize/calibrate the theoretical model has been meas ured correctly  The s oftware that analys es the theoretical model (e.g., s imulation, numerical analys is ,…) has been correctly implemented  The hardware that executes the model s oftware does not introduce errors (FDIV bug; RAM contents altered due to α particle decay; …)  If jus t one condition fails , the entire probability calculation is flawed! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 94 94

  77. Low probabilities , high s takes  Claim Reality Don’t know, becaus e the Everything Catastrophe calculations are flawed alright occurs IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 95 95

  78. Low probabilities , high s takes  Es timated probability that a s cientific claim is flawed?  About 10–4, according to the paper below  Mileage will vary – s ome more rigid, s ome les s  Cons equences  Let’s not take any ris ks !? No LHC, no SETI, no biotech, no ITER, no-nothing? Should we live in caves !?  Have we become too ris k-avers e?  More information in this very readable paper: Ord, Toby, Hillerbrand: Probing the improbable: Methodological challenges for ris ks with low probabilities and high s takes . Journal of Ris k Res earch, 2010 IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 96 96

  79. Les s ons  For authors :  Know your boundaries  Clearly s tate your as s umptions  Clearly warn about pos s ibilities that as s umptions may not hold in reality  For readers :  Double-check the as s umptions  As k for s econds , third, … opinions , preferrably us ing completely different methods IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 97 97

  80. Ris k avers ion: How we lie to ours elves  Do mobile phones caus e cancer?  Very little evidence, long-term s tudies were needed  Res ult: • Pos s ibly caus es cancer • Only for people who us e them for many hours per week • Still a very low incidence rate  But many people try to get rid of bas e s tations in their neighbourhood  “Well, it is jus t in cas e – you never know if there is s omething about thos e allegations ”  How often is calling an ambulance/the firemen via a mobile phone s ignificantly fas ter than running to the neares t land-line phone?  How many “non-cas ualties ” this way per year? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 98 98

  81. Ris k avers ion: How we lie to ours elves  Do cars and motorcycles caus e deaths ? Yes , and very much s o:  About 4,000 cas ualties in Germany per year (p.a.) due to traffic accidents  About 80,000,000 inhabitants in Germany  Roughly 800,000 people die in Germany p.a.  Incidence: About 0.5% of all deaths are traffic accidents !  That’s jus t the deaths . We are ignoring other s erious cons equences s uch as mutilations , month-long recovery treatments , ps ychological traumata, financial los s es , etc.  Compare: How many % of all deaths in Germany are directly or indirectly linked to mobile phones p.a.? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 99 99

  82. Ris k avers ion: How we lie to ours elves  Reproduction is fun! (if done on purpos e…)  But what about the ris ks ?  Mortality among mothers in labour: 80 ppm = almos t 0.1‰  Ris k that the child s uffers from a chromos ome aberration (tris omy 21/Down s yndrome, Cri du Chat, tris omy 18, tris omy 13, etc.): about 1/160 = 0.63%  Would you enter a car if the ris k of having a s erious accident (fatal or heavy injuries ) were 0.63% per…  Per journey?  Per 100km?  Per 10,000km?  Per car lifetime? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 100 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend