Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam - PowerPoint PPT Presentation

Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 1 / 47

Introduction Throughout this slides we will focus only on randomized experiments, i.e the treatment is assigned at random. We will follow the notation of Paul Rosenbaum and the book Observational Studies , which is highly recommended. Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 2 / 47

Fisher’s exact inference Fisher introduced the idea of exact inference in the book, The Design of Experiments, in 1935. Exact inference uses only the random assignment of treatment to test the sharp null of no treatment effect, H 0 : τ i = 0 ∀ i The test of the sharp null hypothesis is distribution and model free! The key elements in Fishers argument are: For a valid test of no treatment effect on the units included in an 1 experiment, it is sufficient to require that treatment be allocated at random to experimental units - these units may be both heterogeneous in their responses and not a sample from a population. Probability enters the experiment only through the random assignment 2 of treatments, a process controlled by the experimenter. Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 3 / 47

Introduction As with any statistical hypotheses test, we need the following elements: 1 Data 2 Null hypothesis 3 Test statistic 4 The distribution of the test statistic under the null hypothesis Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 4 / 47

Definitions: basic setup Using Rosenbaums notation: there are N units divided into S strata or blocks, which are formed on the basis of pre-treatment characteristics There are n s units in stratum s for s = 1 , ..., S so N = � S s =1 n s Define Z si as an indicator variable whether the i th unit in stratum s receives treatment or control. If unit i in stratum s receives treatment, Z si = 1 and if the unit receives control, Z si = 0 Define m s as the number of treated units in stratum s , so m s = � n s i =1 Z si , and 0 ≤ m s ≤ n s Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 5 / 47

Definitions: Unit We will simplify the notation and focus on the case in which there is only one strata, i.e S = 1 and N = n s . The number of treated units is m = � N i =1 Z si What is a unit? Answer: A unit is an opportunity to apply or withhold the treatment A unit may be a person who will receive either the treatment or the control. A group of people may form a single unit: all children in a particular classroom or school . A single person may present several opportunities to apply different treatments, in which case each opportunity is a unit. Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 6 / 47

Notation Let r = ( r 1 , . . . , r N ) be the vector of observed responses Let Ω be the set containing all possible treatment assignments Let z = ( z 1 , . . . , z N ) be a treatment assignment, z ∈ Ω, z i ∈ { 0 , 1 } Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 7 / 47

Treatment assignment � N � The set Ω contains K = , possible treatment assignments z . m In the most common experiments, each possible treatment assignments is given the same probability, Pr ( Z = z ) = 1 / K for all z in Ω. For example consider a randomized experiment with 2 strata, S = 2, four units in the first stratum, n 1 = 4, and two units in the second stratum, n 2 = 2. Half of the units in each stratum received treatment, m 1 = 2 and m 2 = 1. What is the set of all possible � 4 � � 2 � treatment assignment? Ω = · = 12 2 1 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 8 / 47

Treatment assignment: example Example 1: N = 30, given that the number of treated units is m = 15, how large is Ω? Is the allocation of treatment I.I.D? Answer: cov ( T i , T j ) � = 0, the treatment assignment is not I.I.D > choose(30,15) [1] 155117520 Example 2: N = 30, the treatment is assignment mechanism is, T i ∼ Bernulli ( p = 1 / 2). How large is Ω? Is the allocation of treatment I.I.D? Answer: The treatment assignment is I.I.D [1] 155117520 > 2^30 In example 2, Ω is 6 . 92 times larger than in example 1. We will usually consider the situation in which m is given Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 9 / 47

Sharp null hypothesis The most common hypothesis associated with randomization inference is the sharp null of no effect for all units A unit labeled as treated will have the exact same outcome as a unit labeled as control Let r z be the vector of potential responses for randomization assignment z The sharp null hypothesis is that r z is the same for all z , ∀ z r z = r Under the null, the units responses are fixed and the only random element is the meaningless rotation of labels (between control and treatment) Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 10 / 47

Test statistic A test statistic t ( Z , r ) is a quantity computed from the treatment assignment Z and the response r . Consider the following test statistic: the difference in sample means for the treated and control groups: � Z i r i � � N � N � N i =1 Z i r i i =1 (1 − Z i ) r i m − (1 − Z i ) r i t ( Z , r ) = = − N − m m N − m i =1 and in matrix notation, t ( Z , r ) = Z T r Z T 1 − (1 − Z ) T r (1 − Z ) T 1 Why is Z in a capital letter and r not? To indicate that under the null Z is a random variable and r is fixed Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 11 / 47

Hypothesis testing The hypothesis test of the sharp null, H 0 : r z = r H 1 : r z � = r We seek the probability of a value of the test statistic as extreme or more extreme than observed, under the null hypothesis In order to calculate the P − value , we need to know (or approximate) the distribution of the test statistic The treatment assignment Z follows a known randomization mechanism which we can simulate or exhaustively list Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 12 / 47

Calculating significant level ( P − value ) Let T be the observed value of this test statistic. Suppose we would like to reject the null for large values of T . The p-value is, � Pr H 0 ( t ( Z , r ) ≥ T ) = I [ t ( z , r ) ≥ T ] Pr H 0 ( Z = z ) z ∈ Ω where I [ t ( z , r ) ≥ T ] is an indicator whether the value of the test statistic under the treatment assignment z is higher than the observed test statistic, T Under the null, H 0 , the treatment has no effect and hence r is fixed regardless of the assignment z Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 13 / 47

Calculating significant level ( P − value ) In the case that all treatment assignments are equally likely, 1 Pr H 0 ( Z = z ) = | Ω | and, � z ∈ Ω I [ t ( z , r ) ≥ T ] Pr H 0 ( t ( Z , r ) ≥ T ) = | Ω | = |{ z ∈ Ω : t ( z , r ) ≥ T }| | Ω | The indicator variable I [ t ( z , r ) ≥ T ] is a random variable which is distributed, B ( n = 1 , prob = P H 0 ( I [ t ( z , r ) ≥ T ])) (Bernoulli distribution) � |{ z ∈ Ω : t ( z , r ) ≥ T }| = 1 I [ t ( Z , r ) ≥ T ] = E ( I [ t ( z , r ) ≥ T ]) | Ω | | Ω | Z ∈ Ω Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 14 / 47

Calculating significant level ( P-value ) When Ω is small we can exhaustively go over all the elements in Ω and calculate, |{ z ∈ Ω : t ( z , r ) ≥ T }| - as in the Lady tasting tea example How can we calculate the P-value when Ω is to large to enumerate all possible treatment assignments? 1 Use a Monte-Carlo approximation 2 Use an asymptotic approximation for the distribution of the test statistic Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 15 / 47

Monte-Carlo approximation: step-by-step 1 Draw a SRS (simple random sample) of size m from the data and call it X (the treatment group), and call the rest of the data Y (the control group) 2 Compute the test statistic, t ( Z , r ), as you would if X and Y would have been the originals data, denote this test statistic by t b ( Z , r ) 3 Repeat this procedure B times (many times), saving the results, so you have: t 1 ( Z , r ) , t 2 ( Z , r ) , t 3 ( Z , r ) , . . . , t B ( Z , r ) 4 The distribution of t b ( Z , r ) approximates the true distribution of t ( Z , r ) under the null (the sharp null) In particular, a p-value can be computed by using, 1 B × # { b : t b ( z , r ) ≥ T } Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 16 / 47

Monte-Carlo approximation: Theory Recall P H 0 ( I [ t ( z , r ) ≥ T ]) = E ( I [ t ( z , r ) ≥ T ]) The intuitive estimator for the P-value is the proportion of times the indicator variable receives a value of 1 in the Monta-Carlo simulation: � � � B E ( I [ t ( z , r ) ≥ T ]) = 1 P − value = � � t b ( z , r ) ≥ T I B b =1 where B is the number of samples Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 17 / 47

Example of permutation inference We want to compare x 1 and x 2 , denote x 2 as the treatment group and x 1 as the control group. The treatment was allocated randomly > set.seed(13) > x1 = rexp(1000,rate=0.6) > x2 = rexp(1000,rate=0.5) The observed difference in means is, > mean(x2)-mean(x1) [1] 0.4204367 In order to calculate a significant level we need to know (or approximate) the distribution of t ( Z , r ) under the null Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 18 / 47

Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam - PowerPoint PPT Presentation

Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 1 / 47 Introduction Throughout this slides we will focus only on randomized experiments, i.e the treatment is assigned at random.

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

Enumeration schemes for permutation patterns dashed permutation patterns Lara Pudwell Dashed

Algorithms for Permutation groups Alice Niemeyer UWA, RWTH Aachen Alice Niemeyer (UWA, RWTH

Statistics on permutation tableaux Pawel Hitczenko Drexel University parts based on joint work

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6

Section 3 : Permutation Inference Yotam Shem-Tov Fall 2014 1/39 Yotam Shem-Tov STAT 239/ PS

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics

Sta$s$cal Hypothesis Tes$ng Ghostbusters Ghostbusters How many

Welcome to the course! EX P ERIMEN TAL DES IGN IN P YTH ON Luke Hayden Instructor

Fundamentals Tamuno Alfred, PhD Biostatistician DataCamp Designing and Analyzing Clinical

Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where

Insights into the Treatment of SBD With Imaging Richard J. Schwab, M.D. Professor of Medicine

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is Feature? X (Independent)

Cluster approach for EEG analysis: predicting upcoming sensorimotor event. Maria Luiza Rangel

Sambuz

Useful Links

Newsletter

Mail Us