Permutation tests Fabian Pedregosa October 3, 2017 Data Science - PowerPoint PPT Presentation

Permutation tests Fabian Pedregosa October 3, 2017 Data Science Learn2Launch, UC Berkeley

Announcements • Next week is the first presentation! 1. 10 min presentation (by teams) + 5 min questions 2. At least: objective of the project, dataset, exploratory analysis. • Server, more CPUs, GPUs, etc = ⇒ register at AWSEducate: https://www.awseducate.com/Registration . If this is not enough, come and see me. • Office hours: me 3pm-5pm SDH 421, Bowen Mondays on demand. 1/21

Structure of this lecture • Me: explain the method of permutation tests. • You: solve problem based on this method. • You: volunteer presents his solution, gets +0.5 point bonus (out of 10) on final grade. • Me: Introduction to supervised learning. Logistic regression. 2/21

Permutation tests

Motivation We will answer the burning question Does drinking beer make you more attractive to mosquitos? 3/21

Experiment 5/21

Data Beer Water 27 19 20 21 19 13 20 23 17 22 15 22 21 24 31 15 22 20 26 28 20 12 24 24 27 19 25 21 19 18 31 24 28 16 23 20 24 29 21 21 18 27 20 mean beer = 23 . 6 mean water = 19 . 2 mean beer − mean water = 4 . 4 6/21

Statistical problem Is the difference of 4.4 sufficient to claim that drinking beer makes you more attractive to mosquitos? What is the probability of this happening by chance? = ⇒ Statistical problem. Null hypothesis ( H 0 ), both means are equal and the difference is due to chance. Instances of this problem are pervasive in data science: does an upgrade increase user engagement?, is the new algorithm generating more revenue? is the new treatment effective? etc. Two approaches: i ) Statistics 101 and ii ) computational method. 7/21

Statistics 101

Stats 101 • t -test 8/21

Stats 101 • t -test � s 2 X 1 + s 2 X 1 − ¯ ¯ X 2 X 2 • Test statistic: t = s p √ 2 / n , where s p = 2 8/21

Stats 101 • t -test � s 2 X 1 + s 2 X 1 − ¯ ¯ X 2 X 2 • Test statistic: t = s p √ 2 / n , where s p = 2 • Which under the null hypothesis follows a Student t distribution − ν +1 Γ( ν +1 2 ) 1 + t 2 � � 2 f ( t ) = √ νπ Γ( ν 2 ) ν 8/21

Stats 101 • t -test � s 2 X 1 + s 2 X 1 − ¯ ¯ X 2 X 2 • Test statistic: t = s p √ 2 / n , where s p = 2 • Which under the null hypothesis follows a Student t distribution − ν +1 Γ( ν +1 2 ) 1 + t 2 � � 2 f ( t ) = √ νπ Γ( ν 2 ) ν • ν = degrees of freedom 8/21

Stats 101 • t -test � s 2 X 1 + s 2 X 1 − ¯ ¯ X 2 X 2 • Test statistic: t = s p √ 2 / n , where s p = 2 • Which under the null hypothesis follows a Student t distribution − ν +1 Γ( ν +1 2 ) 1 + t 2 � � 2 f ( t ) = √ νπ Γ( ν 2 ) ν • ν = degrees of freedom The degrees of freedom ν is approximated using the Welch–Satterthwaite equation � 2 � s 2 s 2 N 1 + 1 2 N 2 ≈ ν s 4 s 4 1 ν 1 + 1 2 N 2 N 2 2 ν 2 8/21

Stats 101 • t -test � s 2 X 1 + s 2 X 1 − ¯ ¯ X 2 X 2 • Test statistic: t = s p √ 2 / n , where s p = 2 • Which under the null hypothesis follows a Student t distribution − ν +1 Γ( ν +1 2 ) 1 + t 2 � � 2 f ( t ) = √ νπ Γ( ν 2 ) ν Skeptic: I don’t believe this! • ν = degrees of freedom The degrees of freedom ν is approximated using the Welch–Satterthwaite equation � 2 � s 2 s 2 N 1 + 1 2 N 2 ≈ ν s 4 s 4 1 ν 1 + 1 2 N 2 N 2 2 ν 2 8/21

Computational method

Data Beer Water 27 19 20 21 19 13 20 23 17 22 15 22 21 24 31 15 22 20 26 28 20 12 24 24 27 19 25 21 19 18 31 24 28 16 23 20 24 29 21 21 18 27 20 mean beer = 23 . 6 mean water = 19 . 2 mean beer − mean water = 4 . 4 9/21

Data Beer Water 21 19 20 27 19 27 15 23 17 22 20 22 21 24 31 15 22 20 26 28 20 12 24 24 27 19 25 23 19 27 31 24 28 16 21 20 24 29 21 17 18 27 20 mean beer = X mean water = Y mean beer − mean water = − 0 . 9 10/21

Data 1 permutation 1.0 0.8 0.6 0.4 0.2 0.0 2 1 0 1 2 3 4 5 11/21

Data 10 permutation 2.0 1.5 1.0 0.5 0.0 3 2 1 0 1 2 3 4 5 12/21

Data 100 permutation 18 16 14 12 10 8 6 4 2 0 4 3 2 1 0 1 2 3 4 5 13/21

Data 1000 permutation 120 100 80 60 40 20 0 4 2 0 2 4 14/21

Data 10000 permutation 1600 1400 1200 1000 800 600 400 200 0 6 4 2 0 2 4 6 15/21

Data 100000 permutation 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 8 6 4 2 0 2 4 6 16/21

Data We have constructed the empirical distribution of the test statistic mean beer − mean water 17/21

Data We have constructed the empirical distribution of the test statistic mean beer − mean water How likely is it that we arrived to a value of 4 . 4 by chance? 17/21

Data We have constructed the empirical distribution of the test statistic mean beer − mean water How likely is it that we arrived to a value of 4 . 4 by chance? Easy, p = number of times that the statistic ≥ 4.4 total number of permutations This is the exact definition of p -value! 17/21

In this experiment, p -value = 0.0004 and so the null hypothesis can be rejected. 18/21

Now its your turn! Go to the github repository for lecture 2 https://github.com/dsl2l2017/lecture_2 Do the third and last exercise. 19/21

References i Marti Anderson and Cajo Ter Braak. Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation , 2003. Marti J Anderson. Permutation tests for univariate or multivariate analysis of variance and regression. Canadian journal of fisheries and aquatic sciences , 2001. Phillip Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer Science & Business Media, 2013. 20/21

References ii Thierry Lef` evre, Louis-Cl´ ement Gouagna, Kounbobr Roch Dabir´ e, Eric Elguero, Didier Fontenille, Fran¸ cois Renaud, Carlo Costantini, and Fr´ ed´ eric Thomas. Beer consumption increases human attractiveness to malaria mosquitoes. PloS one , 2010. 21/21

Permutation tests Fabian Pedregosa October 3, 2017 Data Science - PowerPoint PPT Presentation

Permutation tests Fabian Pedregosa October 3, 2017 Data Science Learn2Launch, UC Berkeley Announcements Next week is the first presentation! 1. 10 min presentation (by teams) + 5 min questions 2. At least: objective of the project,

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

Comparing User-Provided Tests to Developer-Provided Tests Ren Just, Chris Parnin, Ian Drosos,

Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 Randomization Model Population

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

Enumeration schemes for permutation patterns dashed permutation patterns Lara Pudwell Dashed

Algorithms for Permutation groups Alice Niemeyer UWA, RWTH Aachen Alice Niemeyer (UWA, RWTH

Statistics on permutation tableaux Pawel Hitczenko Drexel University parts based on joint work

In vitro tests and experimental animal In vitro tests and experimental animal In vitro tests and

Generalized Measurement Invariance Tests for Proposed Proposed Tests Tests Factor Analysis

Hypothesis Tests using Excel T.TEST function V1e 11/12/2013 Two group hypothesis tests using

Hypothesis Tests using Z.TEST function in Excel 2008 V1c 11/16/2012 Hypothesis Tests [Excel

Gravity tests by atom interferometry: Gravity tests by atom interferometry: Gravity tests by atom

Cutting the dendrogram through permutation tests Dario Bruzzese Domenico Vistocco

AMath 483/583 Lecture 23 Notes: Outline: Linear systems: LU factorization and condition

R q = 1 : rook placements S n = { permutations w of { 1 , 2 , . . . , n }} Let S { 1 , 2 , . .

Descending Plane Partitions and Permutations Arvind Ayyer Institut de Physique Th eorique CEA

= = p CSE 541 x x x x = n p 1-norm: x x =

MA/CSSE 473 Day 13 Permutation Generation MA/CSSE 473 Day 13 HW 6 due Monday , HW 7 next

Information Capacity of the BSC Permutation Channel Anuran Makur EECS Department, Massachusetts

On Synchronized Permutation tests in Two-way ANOVA Dario Basso 1 , Luigi Salmaso 1 , Fortunato

Words and Automata, Lecture 1 Dominique Perrin 18 octobre 2012 Dominique Perrin Words and

Permutation tests Fabian Pedregosa October 3, 2017 Data Science - PowerPoint PPT Presentation

Permutation tests Fabian Pedregosa October 3, 2017 Data Science Learn2Launch, UC Berkeley Announcements Next week is the first presentation! 1. 10 min presentation (by teams) + 5 min questions 2. At least: objective of the project,

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

Nonparametric hypothesis tests and permutation tests 1.7 &amp; 2.3. Probability Generating

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

Comparing User-Provided Tests to Developer-Provided Tests Ren Just, Chris Parnin, Ian Drosos,

Lecture 4: Permutation Methods Applied Statistics 2014 1 / 21 Randomization Model Population

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

Enumeration schemes for permutation patterns dashed permutation patterns Lara Pudwell Dashed

Algorithms for Permutation groups Alice Niemeyer UWA, RWTH Aachen Alice Niemeyer (UWA, RWTH

Statistics on permutation tableaux Pawel Hitczenko Drexel University parts based on joint work

In vitro tests and experimental animal In vitro tests and experimental animal In vitro tests and

Generalized Measurement Invariance Tests for Proposed Proposed Tests Tests Factor Analysis

Hypothesis Tests using Excel T.TEST function V1e 11/12/2013 Two group hypothesis tests using

Hypothesis Tests using Z.TEST function in Excel 2008 V1c 11/16/2012 Hypothesis Tests [Excel

Gravity tests by atom interferometry: Gravity tests by atom interferometry: Gravity tests by atom

Cutting the dendrogram through permutation tests Dario Bruzzese Domenico Vistocco

AMath 483/583 Lecture 23 Notes: Outline: Linear systems: LU factorization and condition

R q = 1 : rook placements S n = { permutations w of { 1 , 2 , . . . , n }} Let S { 1 , 2 , . .

Descending Plane Partitions and Permutations Arvind Ayyer Institut de Physique Th eorique CEA

= = p CSE 541 x x x x = n p 1-norm: x x =

MA/CSSE 473 Day 13 Permutation Generation MA/CSSE 473 Day 13 HW 6 due Monday , HW 7 next

Information Capacity of the BSC Permutation Channel Anuran Makur EECS Department, Massachusetts

On Synchronized Permutation tests in Two-way ANOVA Dario Basso 1 , Luigi Salmaso 1 , Fortunato

Words and Automata, Lecture 1 Dominique Perrin 18 octobre 2012 Dominique Perrin Words and

Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating