Sta$s$cal Significance Tes$ng In Theory and In Prac$ce - PowerPoint PPT Presentation

Sta$s$cal ¡Significance ¡Tes$ng ¡ In ¡Theory ¡and ¡In ¡Prac$ce ¡ Ben ¡Cartere8e ¡ University ¡of ¡Delaware ¡ h8p://ir.cis.udel.edu/ICTIR13tutorial ¡

Hypotheses ¡and ¡Experiments ¡ • Hypothesis: ¡ – Using ¡an ¡SVM ¡for ¡classifica$on ¡will ¡give ¡be8er ¡accuracy ¡ than ¡using ¡Naïve ¡Bayes ¡ – A ¡“Symbol-‑Refined ¡Tree ¡Subs$tu$on ¡Grammar” ¡will ¡give ¡ be8er ¡parsing ¡results ¡than ¡a ¡simple ¡TSG ¡ – Expanding ¡a ¡short ¡keyword ¡query ¡with ¡synonyms ¡will ¡ improve ¡search ¡engine ¡effec$veness ¡ • Experiment: ¡ – Build ¡a ¡baseline ¡system ¡ – “Improve” ¡it ¡based ¡on ¡your ¡hypothesis ¡ – Test ¡both ¡systems ¡on ¡one ¡or ¡more ¡datasets ¡

Experimental ¡Results ¡ from ¡Shindo ¡et ¡al., ¡ Bayesian ¡Symbol-‑Refined ¡Tree ¡Subs5tu5on ¡Grammars ¡for ¡Syntac5c ¡Parsing, ¡ ACL ¡2012 ¡

So ¡What? ¡ • “Do ¡these ¡results ¡support ¡my ¡hypothesis? ¡ • “Are ¡these ¡results ¡meaningful?” ¡ • “Is ¡it ¡possible ¡that ¡my ¡results ¡are ¡due ¡to ¡ chance?” ¡  ¡sta$s$cal ¡significance ¡tes$ng! ¡

Part ¡1 ¡ TESTING ¡STATISTICAL ¡SIGNIFICANCE ¡

Using ¡R ¡ • R ¡is ¡a ¡so^ware ¡environment ¡for ¡sta$s$cal ¡ compu$ng ¡ • Includes ¡built-‑in ¡implementa$ons ¡of ¡many ¡ common ¡tests ¡ – Also ¡has ¡its ¡own ¡programming ¡language ¡for ¡ implemen$ng ¡your ¡own ¡ • Download ¡from ¡h8p://r-‑project.org ¡ – Download ¡TREC-‑8 ¡evalua$on ¡data ¡from ¡ h8p://ir.cis.udel.edu/ICTIR13tutorial/trec8.RData ¡

Commonly-‑Used ¡Tests ¡ • Parametric: ¡ – Student’s ¡t-‑test ¡ – ANOVA ¡ • Non-‑parametric: ¡ – Wilcoxon ¡signed ¡rank ¡test ¡ – Sign ¡test/binomial ¡test ¡ • Distribu$on-‑free: ¡ – Randomiza$on ¡test ¡ – Bootstrap ¡test ¡

Student’s ¡t-‑test ¡ ˆ µ = B − A = 0.214 Example ¡ A ¡ B ¡ B-‑A ¡ 1 ¡ .25 ¡ .35 ¡ +.10 ¡ ˆ B − A = 0.291 σ 2 ¡ .43 ¡ .84 ¡ +.41 ¡ 3 ¡ .39 ¡ .15 ¡ -‑.24 ¡ 4 ¡ .75 ¡ .75 ¡ 0 ¡ ˆ µ 5 ¡ .43 ¡ .68 ¡ +.25 ¡ t = n = 2.33 ˆ 6 ¡ .15 ¡ .85 ¡ +.70 ¡ σ B − A 7 ¡ .20 ¡ .80 ¡ +.60 ¡ 8 ¡ .52 ¡ .50 ¡ -‑.02 ¡ 9 ¡ .49 ¡ .58 ¡ +.09 ¡ 10 ¡ .50 ¡ .75 ¡ +.25 ¡ 8 ¡

Student’s ¡t-‑test ¡ ˆ µ = B − A = 0.214 ˆ σ B − A = 0.291 B − A = 0.291 σ ˆ µ t = n = 2.33 ˆ σ B − A p − value = 0.02 9 ¡

Wilcoxon ¡Signed-‑Rank ¡Test ¡ Example ¡ A ¡ B ¡ B-‑A ¡ Rank ¡ B-‑A ¡ 1 ¡ .25 ¡ .35 ¡ +.10 ¡ 1 ¡ -‑.02 ¡ 2 ¡ .43 ¡ .84 ¡ +.41 ¡ 2 ¡ +.09 ¡ W = 40 − 5 = 35 3 ¡ .39 ¡ .15 ¡ -‑.24 ¡ 3 ¡ +.10 ¡ 4 ¡ .75 ¡ .75 ¡ 0 ¡ 4 ¡ -‑.24 ¡ 5 ¡ .43 ¡ .68 ¡ +.25 ¡ 5.5 ¡ +.25 ¡ 6 ¡ .15 ¡ .85 ¡ +.70 ¡ 5.5 ¡ +.25 ¡ 7 ¡ .20 ¡ .80 ¡ +.60 ¡ 7 ¡ +.41 ¡ 8 ¡ .52 ¡ .50 ¡ -‑.02 ¡ 8 ¡ +.60 ¡ 9 ¡ .49 ¡ .58 ¡ +.09 ¡ 9 ¡ +.70 ¡ 10 ¡ .50 ¡ .75 ¡ +.25 ¡

Wilcoxon ¡Signed-‑Rank ¡Test ¡ 0.015 W = 40 − 5 = 35 0.010 Density 0.005 p − value = 0.03 0.000 -60 -40 -20 0 20 40 60 W

Sign ¡Test ¡ Example ¡ A ¡ B ¡ B-‑A ¡ B ¡> ¡A? ¡ 1 ¡ .25 ¡ .35 ¡ +.10 ¡ +1 ¡ 2 ¡ .43 ¡ .84 ¡ +.41 ¡ +1 ¡ S ¡= ¡7 ¡ 3 ¡ .39 ¡ .15 ¡ -‑.24 ¡ -‑1 ¡ 4 ¡ .75 ¡ .75 ¡ 0 ¡ 0 ¡ p(7 ¡| ¡10 ¡trials, ¡½ ¡probability) ¡= ¡0.05 ¡ 5 ¡ .43 ¡ .68 ¡ +.25 ¡ +1 ¡ 6 ¡ .15 ¡ .85 ¡ +.70 ¡ +1 ¡ 7 ¡ .20 ¡ .80 ¡ +.60 ¡ +1 ¡ 8 ¡ .52 ¡ .50 ¡ -‑.02 ¡ -‑1 ¡ 9 ¡ .49 ¡ .58 ¡ +.09 ¡ +1 ¡ 10 ¡ .50 ¡ .75 ¡ +.25 ¡ +1 ¡

Randomiza$on ¡Test ¡ Example ¡ Example ¡ Example ¡ A ¡ A ¡ A ¡ B ¡ B ¡ B ¡ B-‑A ¡ B-‑A ¡ B-‑A ¡ ˆ 0 = B − A = 0.214 1 ¡ 1 ¡ 1 ¡ .25 ¡ .35 ¡ .25 ¡ .25 ¡ .35 ¡ .35 ¡ +.10 ¡ +.10 ¡ -‑.10 ¡ µ 2 ¡ 2 ¡ 2 ¡ .84 ¡ .43 ¡ .43 ¡ .84 ¡ .43 ¡ .84 ¡ +.41 ¡ +.41 ¡ -‑.41 ¡ ˆ 1 = − 0.008 µ 3 ¡ 3 ¡ 3 ¡ .39 ¡ .39 ¡ .39 ¡ .15 ¡ .15 ¡ .15 ¡ -‑.24 ¡ -‑.24 ¡ -‑.24 ¡ 4 ¡ 4 ¡ 4 ¡ .75 ¡ .75 ¡ .75 ¡ .75 ¡ .75 ¡ .75 ¡ 0 ¡ 0 ¡ 0 ¡ ˆ 2 = − 0.093 µ 5 ¡ 5 ¡ 5 ¡ .43 ¡ .68 ¡ .68 ¡ .43 ¡ .68 ¡ .43 ¡ +.25 ¡ -‑.25 ¡ -‑.25 ¡ 6 ¡ 6 ¡ 6 ¡ .15 ¡ .15 ¡ .85 ¡ .85 ¡ .85 ¡ .15 ¡ +.70 ¡ +.70 ¡ -‑.70 ¡ 7 ¡ 7 ¡ 7 ¡ .20 ¡ .20 ¡ .80 ¡ .80 ¡ .20 ¡ .80 ¡ +.60 ¡ +.60 ¡ -‑.60 ¡ 8 ¡ 8 ¡ 8 ¡ .50 ¡ .50 ¡ .52 ¡ .52 ¡ .50 ¡ .52 ¡ +.02 ¡ +.02 ¡ -‑.02 ¡ 9 ¡ 9 ¡ 9 ¡ .58 ¡ .49 ¡ .49 ¡ .58 ¡ .58 ¡ .49 ¡ +.09 ¡ +.09 ¡ 0.09 ¡ 10 ¡ 10 ¡ 10 ¡ .75 ¡ .50 ¡ .50 ¡ .50 ¡ .75 ¡ .75 ¡ +.25 ¡ +.25 ¡ -‑.25 ¡

Randomiza$on ¡Test ¡ ˆ 0 = B − A = 0.214 µ p − value = 0.02 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 mean

Bootstrap ¡Test ¡ Example ¡ A ¡ B ¡ B-‑A ¡ s1 ¡ s2 ¡ s3 ¡ 1 ¡ .25 ¡ .35 ¡ +.10 ¡ -‑.24 ¡ +.25 ¡ -‑.24 ¡ 2 ¡ .43 ¡ .84 ¡ +.41 ¡ +.41 ¡ +.10 ¡ +.60 ¡ 3 ¡ .39 ¡ .15 ¡ -‑.24 ¡ -‑.02 ¡ +.25 ¡ -‑.70 ¡ 4 ¡ .75 ¡ .75 ¡ 0 ¡ 0 ¡ +.60 ¡ +.25 ¡ 5 ¡ .43 ¡ .68 ¡ +.25 ¡ +.25 ¡ +.70 ¡ +.70 ¡ 6 ¡ .15 ¡ .85 ¡ +.70 ¡ +.10 ¡ -‑.02 ¡ +.41 ¡ 7 ¡ .20 ¡ .80 ¡ +.60 ¡ +.25 ¡ +.10 ¡ -‑.02 ¡ 8 ¡ .52 ¡ .50 ¡ -‑.02 ¡ +.10 ¡ +.25 ¡ -‑.24 ¡ 9 ¡ .49 ¡ .58 ¡ +.09 ¡ +.25 ¡ 0 ¡ +.70 ¡ 10 ¡ .50 ¡ .75 ¡ +.25 ¡ +.10 ¡ -‑.02 ¡ +.25 ¡

Bootstrap ¡Distribu$on ¡ p − value = 0.005 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 mean

ANOVA ¡ • Compare ¡variance ¡due ¡to ¡system ¡to ¡variance ¡ due ¡to ¡topic ¡ Example ¡ A ¡ B ¡ B-‑A ¡ 1 ¡ .25 ¡ .35 ¡ +.10 ¡ 2 = MSE = 0.042 ˆ σ 2 ¡ .43 ¡ .84 ¡ +.41 ¡ 2 = MST = 0.229 3 ¡ .39 ¡ .15 ¡ -‑.24 ¡ ˆ σ S 4 ¡ .75 ¡ .75 ¡ 0 ¡ 5 ¡ .43 ¡ .68 ¡ +.25 ¡ 6 ¡ .15 ¡ .85 ¡ +.70 ¡ F = MST MSE = 5.41 7 ¡ .20 ¡ .80 ¡ +.60 ¡ 8 ¡ .52 ¡ .50 ¡ -‑.02 ¡ 9 ¡ .49 ¡ .58 ¡ +.09 ¡ 10 ¡ .50 ¡ .75 ¡ +.25 ¡

Summary ¡ • These ¡are ¡6 ¡of ¡the ¡most ¡common ¡tests ¡seen ¡in ¡ IR ¡experimenta$on ¡ – Many ¡others ¡in ¡the ¡literature: ¡ • Chi-‑squared ¡ • Propor$on ¡test ¡ • ANCOVA/MANOVA/MANCOVA ¡ • All ¡have ¡in ¡common: ¡ – The ¡use ¡of ¡some ¡probability ¡distribu$on, ¡ computa$on ¡of ¡a ¡p-‑value ¡from ¡that ¡distribu$on ¡

Part ¡2 ¡ FUNDAMENTALS ¡OF ¡ ¡ SIGNIFICANCE ¡TESTING ¡

Tes$ng ¡Paradigms ¡ Ronald ¡Fisher ¡ Egon ¡Pearson ¡ Jerzy ¡Neyman ¡ Harold ¡Jeffreys ¡

What ¡Are ¡Tests ¡Really ¡Telling ¡Us? ¡ • Formal ¡set-‑up: ¡ – H 0 : ¡ ¡μ ¡= ¡0 ¡ – H 1 : ¡ ¡μ ¡≠ ¡0 ¡ • The ¡null ¡hypothesis ¡is ¡a ¡model ¡ – We ¡are ¡looking ¡to ¡prove ¡the ¡model ¡false ¡ • The ¡p-‑value ¡is ¡the ¡probability ¡that ¡you ¡would ¡ have ¡found ¡the ¡same ¡results ¡if ¡H 0 ¡were ¡true ¡ – If ¡that ¡probability ¡is ¡low, ¡conclude ¡H 0 ¡is ¡false ¡

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce - PowerPoint PPT Presentation

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben Cartere8e University of Delaware h8p://ir.cis.udel.edu/ICTIR13tutorial Hypotheses and Experiments

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

Be#er Tes(ng with Less Work: QuickCheck Tes(ng in Prac(ce

Advanced fMRI Prac/cal Nonparametric Inference, Power & Meta-Analysis Thomas E. Nichols

Greenhouse Gas CEQA Greenhouse Gas CEQA Significance Threshold Significance Threshold

TES Communications Strategy Not SELEP TES Communications Strategy Why is it important? You

Towards prac+cal incremental recomputa+on for scien+sts Philip J. Guo and Dawson Engler Workshop

Sta$s$cal Hypothesis Tes$ng Ghostbusters Ghostbusters How many

F orwa rd L ooking Sta te me nt Ce rta in o f the sta te me nts ma de in this Pre se nta tio

Medi-Cal Healthier California for All Drug Medi-Cal Organized Delivery System Program Renewal and

CAL IF ORNIA HIGH- - SPE SPE E D RAIL CAL IF ORNIA HIGH E D RAIL CAL IF ORNIA HIGH-

PRAC feedback to working parties Presented by: V. Hivert, R.Anderson (PRAC) 25 September 2019

Objec(ves Review Lab 1 Linux prac(ce Programming prac(ce Print statements

Significance How important is it? Thoughts on historical significance A property must have

Users Really Do Plug in USB Drives They Find Matthew Tischer, Zakir Durumeric, Sam Foster, Sunny

When its better to ask forgiveness than get permission Chris Thompson, Maritza Johnson, Serge

The intersection axiom of conditional independence : some new results Richard D. Gill

Estimating CIs on proportions How confident can I be in my estimate? (e.g., 0 of 10 vs. 0

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Interesting Patterns Jilles Vreeken 15 May 2015 Questions of the Day What is interestingness?

Normalization and differential expression II Katharina H oel Statistical Analysis of RNA-Seq

Sambuz

Useful Links

Newsletter

Mail Us

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce - PowerPoint PPT Presentation

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben Cartere8e University of Delaware h8p://ir.cis.udel.edu/ICTIR13tutorial Hypotheses and Experiments

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

Be#er Tes(ng with Less Work: QuickCheck Tes(ng in Prac(ce

Advanced fMRI Prac/cal Nonparametric Inference, Power &amp; Meta-Analysis Thomas E. Nichols

Greenhouse Gas CEQA Greenhouse Gas CEQA Significance Threshold Significance Threshold

TES Communications Strategy Not SELEP TES Communications Strategy Why is it important? You

Towards prac+cal incremental recomputa+on for scien+sts Philip J. Guo and Dawson Engler Workshop

Sta$s$cal Hypothesis Tes$ng Ghostbusters Ghostbusters How many

F orwa rd L ooking Sta te me nt Ce rta in o f the sta te me nts ma de in this Pre se nta tio

Medi-Cal Healthier California for All Drug Medi-Cal Organized Delivery System Program Renewal and

CAL IF ORNIA HIGH- - SPE SPE E D RAIL CAL IF ORNIA HIGH E D RAIL CAL IF ORNIA HIGH-

PRAC feedback to working parties Presented by: V. Hivert, R.Anderson (PRAC) 25 September 2019

Objec(ves Review Lab 1 Linux prac(ce Programming prac(ce Print statements

Significance How important is it? Thoughts on historical significance A property must have

Users Really Do Plug in USB Drives They Find Matthew Tischer, Zakir Durumeric, Sam Foster, Sunny

When its better to ask forgiveness than get permission Chris Thompson, Maritza Johnson, Serge

The intersection axiom of conditional independence : some new results Richard D. Gill

Estimating CIs on proportions How confident can I be in my estimate? (e.g., 0 of 10 vs. 0

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Interesting Patterns Jilles Vreeken 15 May 2015 Questions of the Day What is interestingness?

Normalization and differential expression II Katharina H oel Statistical Analysis of RNA-Seq

Sambuz

Useful Links

Newsletter

Mail Us

Advanced fMRI Prac/cal Nonparametric Inference, Power & Meta-Analysis Thomas E. Nichols