( p < 0.05) Dimitri Van De Ville MIP:lab IBI-STI/CNP (EPFL) - PowerPoint PPT Presentation

Statistical testing in the era of big data   ( p < 0.05) Dimitri Van De Ville   MIP:lab IBI-STI/CNP (EPFL)   RADIO (UniGE)   http://miplab.epfl.ch/ @dvdevill #CNP Retreat   Feb 11-12, 2020

CNP Retreat 2020 — Stats Workshop Panic Dimitri Van De Ville 2

CNP Retreat 2020 — Stats Workshop Big Data Dimitri Van De Ville 3

4 Dimitri Van De Ville Paradox? p<0.05 Mount   CNP Retreat 2020 — Stats Workshop Is big data Big   destroying   p-values? Data

Roadmap of the workshop 5 Dimitri Van De Ville ▪ Contradictory tendencies ▪ Many (emotive) reports about p-value crisis ▪ Reviewers even more picky on statistical significance ▪ Sufficient power, multiple comparisons, replication, … ▪ Adage: Never enough data ▪ Big data has arrived, and will become bigger ▪ Is classical hypothesis testing doomed?   Should we all go into Bayesian statistics?   Machine-learning approaches will be the only solution? CNP Retreat 2020 — Stats Workshop ▪ Here, revisit the basic statistical hypothesis testing ▪ to understand the core issue ▪ to solve it within the conventional framework

̂ ̂ ̂ ̂ ̂ One-sample t-test in a nutshell 6 Dimitri Van De Ville ▪ Consider samples modeled to reflect a true effect with a random N μ 𝒪 (0, σ 2 ) x n = μ + e n n = 1,…, N Gaussian* deviation : , ▪ Estimator of is average μ μ ▪ Estimator of uncertainty on is standard deviation μ σ μ ▪ We define t = N σ ▪ Question: is there evidence from the data that the underlying μ ≠ 0 CNP Retreat 2020 — Stats Workshop * Popularity of Gaussian hypothesis? Central limit theorem!

One-sample t-test in a nutshell 7 Dimitri Van De Ville ▪ Null hypothesis : no effect, μ = 0 ℋ 0 ▪ (Implicit) alternative hypothesis : ℋ 1 μ ≠ 0 ▪ Under the null, follows a known distribution   t degrees of freedom) (Student t-distribution with N − 1 -value is probability to mistakenly reject : ℋ 0 p = P ( | t | > T | ℋ 0 ) ▪ p ▪ Result is considered significant if <0.05 p “If one in twenty does not seem high enough odds, we may, if we prefer CNP Retreat 2020 — Stats Workshop it, draw the line at one in fifty or one in a hundred. Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fails to reach this level.” — R.A. Fisher, “The arrangement of field experiments”. Journal of the Ministry of Agriculture of Great Britain. 33:503-513, 1926

One-sample t-test in a nutshell 8 Dimitri Van De Ville ▪ Thus, -value indicates probability of false positive (FP) p ▪ Typically, no explicit : ℋ 1 ▪ No control on false negatives; i.e., P ( | t | ≤ T | ℋ 1 not true ) ▪ One can only control specificity (1-FP rate), not sensitivity (1-FN rate) ▪ No proof of no effect because no point of comparison CNP Retreat 2020 — Stats Workshop

Fallacy of statistical testing 9 Dimitri Van De Ville ▪ Any true effect can become significant for sufficiently large μ 0 ≠ 0 N N > T 2 σ 2 μ 0 N > T μ 2 σ 0 ▪ “[ ] must be big enough that an effect of such magnitude as to be of N scientific significance will also be statistically significant. It is just as important, however, that the study not be too big, where an effect of little scientific importance is nevertheless statistically detectable” ▪ As increases, discriminability , as measured by classification N accuracy, of individual samples becomes very small CNP Retreat 2020 — Stats Workshop ▪ As increases, consistency , as measured by population prevalence, of N the effect becomes very small [Lenth, 2001]

      Effect size 10 Dimitri Van De Ville ▪ Bottomline: -values are relevant if effect size is non-trivial! p N R 2 = μ 2 /( μ 2 + σ 2 ) ρ = R 2 ▪ Standardized effect size: Cohen’s ; ; d = t / Coefficient of Classification Population Effect size Cohen’s d Correlation determination R 2 accuracy prevalence ~1 ~1/2=0.50 ~0.71 ~70% ~50% Large ~1/2=0.50 ~1/5=0.20 ~0.45 ~60% ~20% Medium ~1/4=0.25 ~1/17=0.06 ~0.24 ~55% ~6% Small ~1/8=0.13 ~1/65=0.02 ~0.12 ~52.5% ~1% Trivial CNP Retreat 2020 — Stats Workshop 0 0 0 50% 0% None ▪ “ … one should be cautious that extremely large studies may be more likely to find a formally statistical significant difference for a trivial effect that is not really meaningfully different from the null.” (Ioannidis, 2005) [Friston, NeuroImage , 2012]

Sample size and sensitivity 11 Dimitri Van De Ville ▪ Consider now fixed specificity , then we have   α = 0.05 α = ∫ ∞ T ( t ; N − 1) dt u ( α ) T | ℋ 0 CNP Retreat 2020 — Stats Workshop α [Friston, NeuroImage , 2012]

  Sample size and sensitivity 12 Dimitri Van De Ville ▪ Consider now fixed specificity , then we have   α = 0.05 α = ∫ ∞ T ( t ; N − 1) dt u ( α ) ▪ Under the assumption of a true effect size , we can compute sensitivity as   d 1 − β ( d ) = ∫ ∞ T ( t ; N − 1, d N ) dt T | ℋ 0 T | ℋ 1 u ( α ) where is the non-central   T ( t ; K , δ ) t-distribution with degrees of   K 1 − β CNP Retreat 2020 — Stats Workshop freedom and non-centrality   parameter δ ▪ Sensitivity depends on sample   size ( ) and effect size ( ) N d α [Friston, NeuroImage , 2012]

    Under-powered? 13 Dimitri Van De Ville ▪ Sensitivity depends on sample size ( ) and effect size ( ) N d ▪ Significant effect with small sample size   is likely to be caused by large effect   1 − β ( d ) size! ▪ If you are criticized in this way:   “The fact that we have demonstrated a   significant result in a relatively under-   100 % powered study suggests that the effect   size is large. This means, quantitatively,   50 % CNP Retreat 2020 — Stats Workshop our result is stronger than if we had   used a larger sample-size.”   0 % = conflation of significance and power [Friston, NeuroImage , 2012]

Over-powered? 14 Dimitri Van De Ville ▪ Sensitivity depends on sample size ( ) and effect size ( ) N d ▪ Sensitivity to trivial effect sizes   1 increases with sample size! 0.9 ▪ Ultimately, with very large sample   0.8 sizes, sensitivity will reach 100%   0.7 for every non-null effect size 0.6 ▪ Explains a lot about the crisis! sensitivity 0.5 ▪ More is not better 0.4 CNP Retreat 2020 — Stats Workshop 0.3 0.2 0.1 0 10 20 30 40 50 60 70 80 90 100 sample size [Friston, NeuroImage , 2012]

Loss-function analysis 15 Dimitri Van De Ville ▪ Let us define a simple loss function : l ▪ Cost for detecting trivial effect size of [bad] +1 1/8 ▪ Cost for detecting large effect size of [good] − 1 1 0 0 ▪ Expected loss:     -0.1 -0.1 l = (1 − β (1/8)) − (1 − β (1)) -0.2 -0.2 -0.3 -0.3 = β (1) − β (1/8) -0.4 -0.4 ▪ Optimal sample size at minimal loss loss loss -0.5 -0.5 CNP Retreat 2020 — Stats Workshop ▪ Does not increase dramatically even   -0.6 -0.6 if significance needs to be (much)   -0.7 -0.7 stronger (e.g., due to multiple   -0.8 -0.8 comparisons) -0.9 -0.9 -1 -1 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 sample size sample size

    Protected inference 16 Dimitri Van De Ville ▪ Inference is based on controlling FP rate under , which translates in a ℋ 0 flat sensitivity at for no effect: α 1 1 ▪ specificity =   0.9 0.9 sensitivity to null effects 0.8 0.8 ▪ So let us suppress sensitivity to   0.7 0.7 trivial effects instead!   1 − β ( d ) = ∫ ∞ 0.6 0.6 sensitivity sensitivity T ( t ; N − 1, d N ) dt 0.5 0.5 u ( α ) where this time we use   0.4 0.4 α ( d ′ � ) = ∫ ∞ CNP Retreat 2020 — Stats Workshop 0.3 0.3 T ( t ; N − 1, d ′ � N ) dt 0.2 0.2 u ( α ) with d ′ � = 1/8 0.1 0.1 specificity 0 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 sample size sample size [Friston, NeuroImage , 2012]

Protected inference 17 Dimitri Van De Ville ▪ Protection fixes and thus increasing becomes harmless β (1/8) = 0.05 N ▪ Concretely, threshold to be applied to -values is penalized t 0 10 -0.1 9 -0.2 8 -0.3 7 -0.4 T threshold 6 loss -0.5 5 -0.6 CNP Retreat 2020 — Stats Workshop 4 -0.7 protection 3 -0.8 2 no protection -0.9 -1 1 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 sample size sample size [Friston, NeuroImage , 2012]

̂ ̂ ̂ ̂ A note on non-parametric testing 18 Dimitri Van De Ville ▪ Consider samples modeled to reflect a true effect with a random N μ deviation of unknown, but symmetric distribution: , x n = μ + e n n = 1,…, N ▪ Estimator of is average (could also be median etc) μ μ ▪ Null hypothesis : no effect, ℋ 0 μ = 0 ▪ In that case, we can randomly flip or permute the signs of and x n μ (0) recompute our measure of interest under the null as , k = 1,…, K k μ (0) μ (0) ▪ If or , then is rejected with μ > max ̂ μ < min ̂ ℋ 0 p = 2/( K + 1) k k CNP Retreat 2020 — Stats Workshop ▪ Use randomizations to be able to assess 0.05 significance K = 39 ▪ Less assumptions about distribution, but essentially same problem that trivial effects will be picked up as increases N

( p < 0.05) Dimitri Van De Ville MIP:lab IBI-STI/CNP (EPFL) - PowerPoint PPT Presentation

Statistical testing in the era of big data ( p < 0.05) Dimitri Van De Ville MIP:lab IBI-STI/CNP (EPFL) RADIO (UniGE) http://miplab.epfl.ch/ @dvdevill #CNP Retreat Feb 11-12, 2020 CNP Retreat 2020 Stats Workshop Panic

Proofs of transcendance Sophie Bernard, Laurence Rideau, Pierre-Yves Strub Yves Bertot November

Quantum Transport in InAs/GaSb Wei Pan Sandia National Laboratories Albuquerque, New Mexico, USA

TITLE PAGE TITLE HERE TITLE HERE DIVISIBILITY OF QUBIT CHANNELS AND DYNAMICAL MAPS David

CON ONTRACTING W WITH A FOOD OOD SER ERVICE M E M ANAGEM EM EN ENT CO COM PANY Co nt r

Chapter 8 1 Learning Objectives Calculate the expected rate of return and volatility 1. for a

Critical Node Detection Problem ITALY May, 2008 Panos Pardalos Distinguished Professor

Network Layer Mobile IP Slides adapted from Prof. Dr.-Ing. Jochen H. Schiller and W. Stallings 1

OMNI-CHANNEL FRAUD MANAGEMENT Strategies & survival tactics for retailers June 16, 2015

ne L eptoquark T o Rule Them All Martin Bauer, Matthias Neubert, PRL 116 (2016) 141802 Martin

Most Valuable Book of 2017 Frank Kashner Apprentice Motion Picture Film Editor - IASTE

Welcome and Introduction Rachel Daeger, CAE Executive Director Society for Nutrition Education and

Context-Free Grammars Carl Pollard Ohio State University Linguistics 680 Formal Foundations

Leveraging Juvenile Justice Food Environments to Advance Health Equity March 19, 2020 For

Tagging Scientific Publications Using Wikipedia and NLP Tools Comparison on the ArXiv dataset

Flavor Physics beyond the SM 48 FCNC Processes in the SM F = 2 F = 1 W q W b b b u c

Robust coarse spaces for the boundary element method Xavier Claeys, Pierre Marchand, Frdric

Victorian Default Offer 2021 Consultation Paper Online public forum Tuesday 14 July 2020

TINA Sweep Policy Change The "Sweep", as a creation of the 1980's designed to

Introduction to Formal Languages Carl Pollard Department of Linguistics Ohio State University

Towards precision neutrino physics Patrick Huber Center for Neutrino Physics at Virginia Tech

Using Treebanks tgrep2 Lecture 2: 07/12/2011 Using Corpora For discovery For evaluation

Jussi Enkovaara Martti Louhivuori Python in High-Performance Computing CSC IT Center for

e3? ag5 Foqccdr' te.r'f : {o-vr .. Ortl'"r P "{ fq r^a f .'ovr I Par'.'cult e

LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS INF-PM-FPG Andy Pschel

( p < 0.05) Dimitri Van De Ville MIP:lab IBI-STI/CNP (EPFL) - PowerPoint PPT Presentation

Statistical testing in the era of big data ( p < 0.05) Dimitri Van De Ville MIP:lab IBI-STI/CNP (EPFL) RADIO (UniGE) http://miplab.epfl.ch/ @dvdevill #CNP Retreat Feb 11-12, 2020 CNP Retreat 2020 Stats Workshop Panic

Proofs of transcendance Sophie Bernard, Laurence Rideau, Pierre-Yves Strub Yves Bertot November

Quantum Transport in InAs/GaSb Wei Pan Sandia National Laboratories Albuquerque, New Mexico, USA

TITLE PAGE TITLE HERE TITLE HERE DIVISIBILITY OF QUBIT CHANNELS AND DYNAMICAL MAPS David

CON ONTRACTING W WITH A FOOD OOD SER ERVICE M E M ANAGEM EM EN ENT CO COM PANY Co nt r

Chapter 8 1 Learning Objectives Calculate the expected rate of return and volatility 1. for a

Critical Node Detection Problem ITALY May, 2008 Panos Pardalos Distinguished Professor

Network Layer Mobile IP Slides adapted from Prof. Dr.-Ing. Jochen H. Schiller and W. Stallings 1

OMNI-CHANNEL FRAUD MANAGEMENT Strategies &amp; survival tactics for retailers June 16, 2015

ne L eptoquark T o Rule Them All Martin Bauer, Matthias Neubert, PRL 116 (2016) 141802 Martin

Most Valuable Book of 2017 Frank Kashner Apprentice Motion Picture Film Editor - IASTE

Welcome and Introduction Rachel Daeger, CAE Executive Director Society for Nutrition Education and

Context-Free Grammars Carl Pollard Ohio State University Linguistics 680 Formal Foundations

Leveraging Juvenile Justice Food Environments to Advance Health Equity March 19, 2020 For

Tagging Scientific Publications Using Wikipedia and NLP Tools Comparison on the ArXiv dataset

Flavor Physics beyond the SM 48 FCNC Processes in the SM F = 2 F = 1 W q W b b b u c

Robust coarse spaces for the boundary element method Xavier Claeys, Pierre Marchand, Frdric

Victorian Default Offer 2021 Consultation Paper Online public forum Tuesday 14 July 2020

TINA Sweep Policy Change The &quot;Sweep&quot;, as a creation of the 1980's designed to

Introduction to Formal Languages Carl Pollard Department of Linguistics Ohio State University

Towards precision neutrino physics Patrick Huber Center for Neutrino Physics at Virginia Tech

Using Treebanks tgrep2 Lecture 2: 07/12/2011 Using Corpora For discovery For evaluation

Jussi Enkovaara Martti Louhivuori Python in High-Performance Computing CSC IT Center for

e3? ag5 Foqccdr' te.r'f : {o-vr .. Ortl'&quot;r P &quot;{ fq r^a f .'ovr I Par'.'cult e

LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS INF-PM-FPG Andy Pschel

OMNI-CHANNEL FRAUD MANAGEMENT Strategies & survival tactics for retailers June 16, 2015

TINA Sweep Policy Change The "Sweep", as a creation of the 1980's designed to

e3? ag5 Foqccdr' te.r'f : {o-vr .. Ortl'"r P "{ fq r^a f .'ovr I Par'.'cult e