How to lie w ith statistics Prof. Dr. Lutz Prechelt Freie - PowerPoint PPT Presentation

Course "Empirical Evaluation in Informatics" How to lie w ith statistics Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http: / / www.inf.fu-berlin.de/ inst/ ag-se/ • • What do they mean? Pseudo-precision • • Biased measures Plain false statements • • Biased samples What is not being said? • • What is the real reason? "Just try again" • • Misleading averages Incomparable measures • • Misleading visualizations Invalid measures 1 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

"Empirische Bewertung in der Informatik" W ie m an m it Statistik lügt Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http: / / www.inf.fu-berlin.de/ inst/ ag-se/ • • Was ist überhaupt gemeint? Pseudopräzision • • Verzerrt das benutzte Maß? Glatte Falschaussagen • • Verzerrt die Was wird nicht gesagt? Stichprobenauswahl? • "Probier einfach noch mal" • Ist das wirklich der Grund? • Unvergleichbare Daten • Irreführende Mittelwerte • Gültigkeit von Maßen • Irreführende Darstellungen 2 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Source • This slide set is based on ideas from Darrell Huff: "How to Lie With Statistics", (Victor Gollancz 1954, Pelican Books 1973, Penguin Books 1991) • but the slides use different examples • I urge everyone to read this book in full • It is short (120 p.), entertaining, and insightful • Many different editions available • Other, similar books exist as well 3 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Example: Human Growth Hormone (HGH) Original spam email, received 2004-02 4 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Remark • We use this real spam email as an arbitrary example • and will make unwarranted assumptions about what is behind it • for illustrative purposes • I do not claim that HGH treatment is useful, useless, or harmful Note: • HGH is on the IOC doping list • http: / / www.dshs-koeln.de/ biochemie/ rubriken/ 01_doping/ 06.html • "Für die therapeutische Anwendung von HGH kommen derzeit nur zwei wesentliche Krankheitsbilder in Frage: Zwergwuchs bei Kindern und HGH- Mangel beim Erwachsenen" • "Die Wirksamkeit von HGH bei Sportlern muss allerdings bisher stark in Frage gestellt werden, da bisher keine wissenschaftliche Studie zeigen konnte, dass eine zusätzliche HGH-Applikation bei Personen, die eine normale HGH-Produktion aufweisen, zu Leistungssteigerungen führen kann." 5 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Problem 1: What do they mean? • "Body fat loss: up to 82% " • OK, can be measured • "Wrinkle reduction: up to 61% " • Maybe they count the wrinkles and measure their depth? • "Energy level: up to 84% " • What is this? • Also note they use language loosely: • Loss in percent: OK; reduction in percent: OK • Level in percent??? (should be 'increase') 6 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Lesson: Dare ask what • Always question the definition of the measures for which somebody gives you statistics • Surprisingly often, there is no stringent definition at all • Or multiple different definitions are used • and incomparable data get mixed • Or the definition has dubious value • e.g. "Energy level" may be a subjective estimate of patients who knew they were treated with a "wonder drug" 7 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Problem 2: A maximum does not say much • Wrinkle reduction: up to 61% • So that was the best value. What about the rest? • Maybe the distribution was like this: M o o o o oo o o oo o o o o o o o o o o o o o o o oo o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 0 10 20 30 40 50 60 reduction 8 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Lesson: Dare ask for unbiased measures • Always ask for neutral, informative measures • in particular when talking to a party with vested interest • Extremes are rarely useful to show that someting is generally large (or small) • Averages are better • But even averages can be very misleading • see the following example later in this presentation • If the shape of the distribution is unknown, we need summary information about variability at the very least • e.g. the data from the plot in the previous slide has arithmetic mean 10 and standard deviation 8 • Note: In different situations, rather different kinds of information might be required for judging something 9 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Problem 3: Underlying population • Wrinkle reduction: up to 61% • Maybe they measured a very special set of people? M heartAttack oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M healthy o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o -20 0 20 40 60 reduction 10 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Lesson: Insist on unbiased samples • How and where from the data was collected can have a tremendous impact on the results • It is important to understand whether there is a certain (possibly intended) tendency in this • A fair statistic talks about possible bias it contains • If it does not, ask. Notes: • A biased sample may be the best one can get • Sometimes we can suspect that there is a bias, but cannot be sure 11 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Problem 4: Is HGH even part of the cause? • Wrinkle reduction: up to 61% • Maybe that could happen even without HGH? M heartAttack o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M healthy o o o o o o o o o o o o o o o o o o oo o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M h.A.,noHGH o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o -20 0 20 40 60 reduction 12 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Lesson: Question causality • Sometimes the data is not just biased, it contains hardly anything else than bias • If somebody presents you with a presumably causal relationship ("A causes B"), ask yourself: • What other influences besides A may be important? • What is the relative weight of A compared to these? 13 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Example 2: Tungu and Bulugu • We look at the yearly per-capita income in two small hypothetic island states: Tungu and Bulugu • Statement: "The average yearly income in Tungu is 94.3% higher than in Bulugu." 14 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Problem 1: Misleading averages • The island states are rather small: 8 1 people in Tungu and 8 0 in Bulugu • And the income distribution is not as even in Tungu: M Tungu o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M Bulugu o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 0 1000 2000 3000 4000 5000 income 15 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

Misleading averages and outliers • The only reason is Dr. Waldner, owner of a small software company in Berlin, who since last year is enjoying his retirement in Tungu M Tungu o o o o o o o o o oo o o o oo o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M Bulugu o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o 10^3.0 10^3.5 10^4.0 10^4.5 10^5.0 income 16 / 50 Lutz Prechelt, prechelt@inf.fu-berlin.de

How to lie w ith statistics Prof. Dr. Lutz Prechelt Freie - PowerPoint PPT Presentation

Course "Empirical Evaluation in Informatics" How to lie w ith statistics Prof. Dr. Lutz Prechelt Freie Universitt Berlin, Institut fr Informatik http: / / www.inf.fu-berlin.de/ inst/ ag-se/ What do they mean?

Lie nilpotent group algebras central series Lie nilpotency index and central series Computation

Lie Theory From Basics to the Heisenberg Lie Group Noah Migoski IU Math DRP April, 2020 Noah

Introduction to Lie Groups, Lie Algebra, and Representation Theory Dennica Mitev University of

Special geometry Simon G. Chiossi Special geometry with solvable Lie groups Lie groups

What Makes a Lie a Lie? Dr. Sara L. Uckelman s.l.uckelman@durham.ac.uk @SaraLUckelman 10 Jan

Statistics on Lie groups: using the pseudo-Riemannian framework? Nina Miolane, Xavier Pennec

Constructing n -Engel Lie rings Serena Cical` o University of Trento Advisor: Willem A. de

On the curvatures of subalgebras of nilpotent Lie algebras Ana Hini c Gali c La Trobe

Lie Foliations Producing Harmonic Morphisms Sigmundur Gudmundsson Department of Mathematics

Lie Theory without groups 2020 Erd s Memorial Lecture Fall Western Sectional Meeting, October

Wreath Lie Algebras Cristina Di Pietro Cristina Di Pietro 1 Lie Algebras, their

Analysis on singular spaces, Lie manifolds, and non-commutative geometry II Lie manifolds Victor

The Capelli eigenvalue problem for Lie superalgebras Hadi Salmasian Department of Mathematics

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Linear connections on Lie groups The affine space of linear connections on a compact Lie group G

Lie Superalgebras and Sage Daniel Bump July 26, 2018 With the connivance of Brubaker, Schilling

Supporting and Training Americas Heroes! 1 Naval Base Coronado Happenings 2 Southern

Renewable Energy Regulatory Support Project Chaired by Prof. Dr. J. Scott Younger Morrisey Hotel

Structuring Solar PV Agreements: Negotiating Real Estate Entitlements and Offtake Agreements

Presentation to the LIOB on: Self-Generation Incentive Program Equity & Equity Resiliency

Med Sedemera p Aktietorget - til Nasdaq Smallcap p 4 r 1 | Saniona: Reduced risk high

The Earls Story and Gaining Public Trust in the Beef Industry Tom Lynch-Staunton Kevin Boon

Interactive Learning for Your Volunteers Sarah Walker Volunteer Services Coordinator What I Hope

Third MAXIMA training school Breast Model Validation for Monte Carlo Evaluation of Normalized

Sambuz

Useful Links

Newsletter

Mail Us

How to lie w ith statistics Prof. Dr. Lutz Prechelt Freie - PowerPoint PPT Presentation

Course "Empirical Evaluation in Informatics" How to lie w ith statistics Prof. Dr. Lutz Prechelt Freie Universitt Berlin, Institut fr Informatik http: / / www.inf.fu-berlin.de/ inst/ ag-se/ What do they mean?

Lie nilpotent group algebras central series Lie nilpotency index and central series Computation

Lie Theory From Basics to the Heisenberg Lie Group Noah Migoski IU Math DRP April, 2020 Noah

Introduction to Lie Groups, Lie Algebra, and Representation Theory Dennica Mitev University of

Special geometry Simon G. Chiossi Special geometry with solvable Lie groups Lie groups

What Makes a Lie a Lie? Dr. Sara L. Uckelman s.l.uckelman@durham.ac.uk @SaraLUckelman 10 Jan

Statistics on Lie groups: using the pseudo-Riemannian framework? Nina Miolane, Xavier Pennec

Constructing n -Engel Lie rings Serena Cical` o University of Trento Advisor: Willem A. de

On the curvatures of subalgebras of nilpotent Lie algebras Ana Hini c Gali c La Trobe

Lie Foliations Producing Harmonic Morphisms Sigmundur Gudmundsson Department of Mathematics

Lie Theory without groups 2020 Erd s Memorial Lecture Fall Western Sectional Meeting, October

Wreath Lie Algebras Cristina Di Pietro Cristina Di Pietro 1 Lie Algebras, their

Analysis on singular spaces, Lie manifolds, and non-commutative geometry II Lie manifolds Victor

The Capelli eigenvalue problem for Lie superalgebras Hadi Salmasian Department of Mathematics

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Linear connections on Lie groups The affine space of linear connections on a compact Lie group G

Lie Superalgebras and Sage Daniel Bump July 26, 2018 With the connivance of Brubaker, Schilling

Supporting and Training Americas Heroes! 1 Naval Base Coronado Happenings 2 Southern

Renewable Energy Regulatory Support Project Chaired by Prof. Dr. J. Scott Younger Morrisey Hotel

Structuring Solar PV Agreements: Negotiating Real Estate Entitlements and Offtake Agreements

Presentation to the LIOB on: Self-Generation Incentive Program Equity &amp; Equity Resiliency

Med Sedemera p Aktietorget - til Nasdaq Smallcap p 4 r 1 | Saniona: Reduced risk high

The Earls Story and Gaining Public Trust in the Beef Industry Tom Lynch-Staunton Kevin Boon

Interactive Learning for Your Volunteers Sarah Walker Volunteer Services Coordinator What I Hope

Third MAXIMA training school Breast Model Validation for Monte Carlo Evaluation of Normalized

Sambuz

Useful Links

Newsletter

Mail Us

Presentation to the LIOB on: Self-Generation Incentive Program Equity & Equity Resiliency