Permutation tests
Fabian Pedregosa October 3, 2017
Data Science Learn2Launch, UC Berkeley
Permutation tests Fabian Pedregosa October 3, 2017 Data Science - - PowerPoint PPT Presentation
Permutation tests Fabian Pedregosa October 3, 2017 Data Science Learn2Launch, UC Berkeley Announcements Next week is the first presentation! 1. 10 min presentation (by teams) + 5 min questions 2. At least: objective of the project,
Fabian Pedregosa October 3, 2017
Data Science Learn2Launch, UC Berkeley
Announcements
⇒ register at AWSEducate: https://www.awseducate.com/Registration. If this is not enough, come and see me.
1/21
Structure of this lecture
10) on final grade.
2/21
Motivation
We will answer the burning question
3/21
4/21
Experiment
5/21
Data
Beer Water 27 19 20 21 19 13 20 23 17 22 15 22 21 24 31 15 22 20 26 28 20 12 24 24 27 19 25 21 19 18 31 24 28 16 23 20 24 29 21 21 18 27 20 meanbeer = 23.6 meanwater = 19.2 meanbeer − meanwater = 4.4
6/21
Statistical problem
Is the difference of 4.4 sufficient to claim that drinking beer makes you more attractive to mosquitos? What is the probability of this happening by chance? = ⇒ Statistical problem. Null hypothesis (H0), both means are equal and the difference is due to chance. Instances of this problem are pervasive in data science: does an upgrade increase user engagement?, is the new algorithm generating more revenue? is the new treatment effective? etc. Two approaches: i) Statistics 101 and ii) computational method.
7/21
Stats 101
8/21
Stats 101
¯ X1− ¯ X2 sp√ 2/n, where sp =
X1+s2 X2
2 8/21
Stats 101
¯ X1− ¯ X2 sp√ 2/n, where sp =
X1+s2 X2
2
f (t) = Γ( ν+1
2 )
√νπ Γ( ν
2 )
ν
2
8/21
Stats 101
¯ X1− ¯ X2 sp√ 2/n, where sp =
X1+s2 X2
2
f (t) = Γ( ν+1
2 )
√νπ Γ( ν
2 )
ν
2
8/21
Stats 101
¯ X1− ¯ X2 sp√ 2/n, where sp =
X1+s2 X2
2
f (t) = Γ( ν+1
2 )
√νπ Γ( ν
2 )
ν
2
using the Welch–Satterthwaite equation ν ≈
1
N1 + s2
2
N2
2
s4
1
N2
1 ν1 +
s4
2
N2
2 ν2
8/21
Stats 101
¯ X1− ¯ X2 sp√ 2/n, where sp =
X1+s2 X2
2
f (t) = Γ( ν+1
2 )
√νπ Γ( ν
2 )
ν
2
using the Welch–Satterthwaite equation ν ≈
1
N1 + s2
2
N2
2
s4
1
N2
1 ν1 +
s4
2
N2
2 ν2
8/21
Stats 101
¯ X1− ¯ X2 sp√ 2/n, where sp =
X1+s2 X2
2
f (t) = Γ( ν+1
2 )
√νπ Γ( ν
2 )
ν
2
using the Welch–Satterthwaite equation ν ≈
1
N1 + s2
2
N2
2
s4
1
N2
1 ν1 +
s4
2
N2
2 ν2
8/21
Skeptic: I don’t believe this!
Data
Beer Water 27 19 20 21 19 13 20 23 17 22 15 22 21 24 31 15 22 20 26 28 20 12 24 24 27 19 25 21 19 18 31 24 28 16 23 20 24 29 21 21 18 27 20 meanbeer = 23.6 meanwater = 19.2 meanbeer − meanwater = 4.4
9/21
Data
Beer Water 21 19 20 27 19 27 15 23 17 22 20 22 21 24 31 15 22 20 26 28 20 12 24 24 27 19 25 23 19 27 31 24 28 16 21 20 24 29 21 17 18 27 20 meanbeer = X meanwater = Y meanbeer − meanwater = −0.9
10/21
Data
1 permutation
2 1 1 2 3 4 5 0.0 0.2 0.4 0.6 0.8 1.0
11/21
Data
10 permutation
3 2 1 1 2 3 4 5 0.0 0.5 1.0 1.5 2.0
12/21
Data
100 permutation
4 3 2 1 1 2 3 4 5 2 4 6 8 10 12 14 16 18
13/21
Data
1000 permutation
4 2 2 4 20 40 60 80 100 120
14/21
Data
10000 permutation
6 4 2 2 4 6 200 400 600 800 1000 1200 1400 1600
15/21
Data
100000 permutation
8 6 4 2 2 4 6 2000 4000 6000 8000 10000 12000 14000 16000 18000
16/21
Data
We have constructed the empirical distribution of the test statistic meanbeer − meanwater
17/21
Data
We have constructed the empirical distribution of the test statistic meanbeer − meanwater How likely is it that we arrived to a value of 4.4 by chance?
17/21
Data
We have constructed the empirical distribution of the test statistic meanbeer − meanwater How likely is it that we arrived to a value of 4.4 by chance? Easy, p = number of times that the statistic ≥ 4.4 total number of permutations This is the exact definition of p-value!
17/21
In this experiment, p-value = 0.0004 and so the null hypothesis can be rejected.
18/21
Go to the github repository for lecture 2 https://github.com/dsl2l2017/lecture_2 Do the third and last exercise.
19/21
References i
Marti Anderson and Cajo Ter Braak. Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation, 2003. Marti J Anderson. Permutation tests for univariate or multivariate analysis of variance and regression. Canadian journal of fisheries and aquatic sciences, 2001. Phillip Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer Science & Business Media, 2013.
20/21
References ii
Thierry Lef` evre, Louis-Cl´ ement Gouagna, Kounbobr Roch Dabir´ e, Eric Elguero, Didier Fontenille, Fran¸ cois Renaud, Carlo Costantini, and Fr´ ed´ eric Thomas. Beer consumption increases human attractiveness to malaria mosquitoes. PloS one, 2010.
21/21