Hypothesis testing DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation

hypothesis testing
SMART_READER_LITE
LIVE PREVIEW

Hypothesis testing DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation

Hypothesis testing DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda Example In a medical study 10% of women and 12.5% of men suffer from heart disease Hypothesis:


slide-1
SLIDE 1

Hypothesis testing

DS GA 1002 Statistical and Mathematical Models

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda

slide-2
SLIDE 2

Example

In a medical study 10% of women and 12.5% of men suffer from heart disease Hypothesis: Men are more prone to have heart disease than women If there are 20 people in the study, effect could be by chance If there are 20 000 people, we are more convinced Hypothesis testing makes this precise

slide-3
SLIDE 3

Hypothesis testing

Framework to decide whether patterns in data are random fluctuations Aim: Establish whether a predefined hypothesis is supported by the data

slide-4
SLIDE 4

The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing

slide-5
SLIDE 5

Null and alternative hypotheses

Null hypothesis H0: There is no underlying phenomenon (men are not more prone to heart disease) Alternative hypothesis H1: There is an underlying phenomenon We reject H0 if it does not explain the data well Failing to reject H0 does not mean that we think it holds, we just don’t have enough evidence Frequentist perspective: A hypothesis holds or does not hold deterministically

slide-6
SLIDE 6

Tests

A test is a procedure to decide whether to reject the null hypothesis General strategy:

  • 1. Compute a test statistic from the data T (x1, . . . , xn)
  • 2. Decide on a rejection region R such that if T (x1, . . . , xn) ∈ R it is

very unlikely that the null hypothesis holds

  • 3. Reject the null hypothesis if T (x1, . . . , xn) ∈ R
slide-7
SLIDE 7

Errors Reject H0? No Yes H0 is true

  • Type I error

H1 is true Type II error

slide-8
SLIDE 8

Size and significance level

Priority: Control Type I errors The size of a test is the probability of making a Type I error The significance level is an upper bound on the size

slide-9
SLIDE 9

Significance level

The effect is significant (at a level of 5%) Translation: Given the assumed probabilistic model, the probability that we reject the null hypothesis when it is true is at most 5%

slide-10
SLIDE 10

p value

The p value is the smallest significance level at which we would reject H0 for a particular dataset It is a function of the data, not a probability

slide-11
SLIDE 11

Power

The power of a test is the probability of rejecting H0 under H1 For a given significance level, we want as much power as possible Problem: We need to know the distribution of the data under H1!

slide-12
SLIDE 12

Overview

  • 1. Choose a conjecture
  • 2. Determine the corresponding null hypothesis
  • 3. Choose a test
  • 4. Gather the data
  • 5. Compute the test statistic from the data
  • 6. Compute the p value and reject the null hypothesis if it is below a

predefined limit (typically 1% or 5%)

slide-13
SLIDE 13

Example: Clutch

Conjecture: NBA player is more effective in 4th quarter Null hypothesis: He’s equally effective Test statistic: Games out of 20 in which he scores more points per minute in the 4th quarter What threshold do we need to ensure a significance level of 1%, 5% The test statistic is 14, what is the p value?

slide-14
SLIDE 14

Example: Clutch

T0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := {t | t ≥ η} The size of the test is P (T0 > η)

slide-15
SLIDE 15

Example: Clutch

T0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := {t | t ≥ η} The size of the test is P (T0 > η) What is the distribution of the test statistic T0 under the null hypothesis?

slide-16
SLIDE 16

Example: Clutch

T0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := {t | t ≥ η} The size of the test is P (T0 > η) What is the distribution of the test statistic T0 under the null hypothesis? Binomial with parameters 20 and 1/2

slide-17
SLIDE 17

Example: Clutch

T0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := {t | t ≥ η} The size of the test is P (T0 > η) = 1 2n

n

  • k=η

n k

  • What is the distribution of the test statistic T0 under the null hypothesis?

Binomial with parameters 20 and 1/2

slide-18
SLIDE 18

Distribution under null hypothesis

η 1 2 3 4 5 P (T0 ≥ η) 1.000 1.000 1.000 0.999 0.994 η 6 7 8 9 10 P (T0 ≥ η) 0.979 0.942 0.868 0.748 0.588 η 11 12 13 14 15 P (T0 ≥ η) 0.412 0.252 0.132 0.058 0.021 η 16 17 18 19 20 P (T0 ≥ η) 0.006 0.001 0.000 0.000 0.000

slide-19
SLIDE 19

Example: Clutch

What threshold do we need to ensure a significance level of 1%? What threshold do we need to ensure a significance level of 5%? The test statistic is 14, what is the p value?

slide-20
SLIDE 20

Example: Clutch

What threshold do we need to ensure a significance level of 1%? 16 What threshold do we need to ensure a significance level of 5%? The test statistic is 14, what is the p value?

slide-21
SLIDE 21

Example: Clutch

What threshold do we need to ensure a significance level of 1%? 16 What threshold do we need to ensure a significance level of 5%? 15 The test statistic is 14, what is the p value?

slide-22
SLIDE 22

Example: Clutch

What threshold do we need to ensure a significance level of 1%? 16 What threshold do we need to ensure a significance level of 5%? 15 The test statistic is 14, what is the p value? 5.8 %

slide-23
SLIDE 23

Example: Clutch

What threshold do we need to ensure a significance level of 1%? 16 What threshold do we need to ensure a significance level of 5%? 15 The test statistic is 14, what is the p value? 5.8 % Is this the probability that the null hypothesis holds?

slide-24
SLIDE 24

Example: Clutch

What threshold do we need to ensure a significance level of 1%? 16 What threshold do we need to ensure a significance level of 5%? 15 The test statistic is 14, what is the p value? 5.8 % Is this the probability that the null hypothesis holds? No!

slide-25
SLIDE 25

The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing

slide-26
SLIDE 26

Parametric testing

Data are sampled from a known distribution with unknown parameters Probability measure Pθ depends on θ Frequentist perspective The parameter is deterministic and so are the hypotheses Notation: X is a random vector distributed according to Pθ, the data x are a realization of X

slide-27
SLIDE 27

If H0 is θ = θ0

The size of a test with test statistic T and rejection region R is α := Pθ0

  • T(

X) ∈ R

  • If the rejection region is of the form T (

x) ≥ η α = Pθ0

  • T(

X) ≥ η

  • Smallest η at which we reject H0 is T (

x) p = Pθ0

  • T(

X) ≥ T ( x)

  • p value: probability under H0 of observing a test statistic that is as

extreme as the one we observe

slide-28
SLIDE 28

Composite hypotheses

θ = θ0 is a simple hypothesis A composite hypothesis is of the form θ ∈ S for a certain set S The size of a composite test is α = sup

θ∈H0

  • T(

X) ≥ η

  • The p value is

p = sup

θ∈H0

  • T(

X) ≥ T ( x)

slide-29
SLIDE 29

Power function

The power function of the test is defined as β (θ) := Pθ

  • T(

X) ∈ R

  • We want β (θ) ≈ 0 for θ ∈ H0 and β (θ) ≈ 1 for θ ∈ H1
slide-30
SLIDE 30

Example: Coin flip

Conjecture: Coin is biased towards heads θ > 1/2 Null hypothesis: Coin not biased towards heads θ ≤ 1/2 Test statistic: Number of heads out of n = 5, 10, 100 flips Rejection region: Heads = n, Heads ≥ 3n/5 Power function?

slide-31
SLIDE 31

Coin flip power function

If η = n, β1 (θ) = Pθ

  • T(

X) ∈ R

  • If η = 3n/5,

β2 (θ) = Pθ

  • T(

X) ∈ R

slide-32
SLIDE 32

Coin flip power function

If η = n, β1 (θ) = Pθ

  • T(

X) ∈ R

  • = θn

If η = 3n/5, β2 (θ) = Pθ

  • T(

X) ∈ R

slide-33
SLIDE 33

Coin flip power function

If η = n, β1 (θ) = Pθ

  • T(

X) ∈ R

  • = θn

If η = 3n/5, β2 (θ) = Pθ

  • T(

X) ∈ R

  • =

n

  • k=3n/5

n k

  • θk (1 − θ)n−k
slide-34
SLIDE 34

η = n

0.25 0.50 0.75

θ

0.05 0.25 0.50 0.75

β(θ)

n = 5 n = 50 n = 100

slide-35
SLIDE 35

η ≥ 3n/5

0.25 0.50 0.75

θ

0.05 0.25 0.50 0.75

β(θ)

n = 5 n = 50 n = 100

slide-36
SLIDE 36

Likelihood-ratio test

Threshold ratio between likelihoods {Λ (x) ≤ η}, where Λ ( x) := supθ∈H0 L

x (θ)

supθ∈H1 L

x (θ)

Intuition: Unless the ratio is low, we cannot rule out the null hypothesis

slide-37
SLIDE 37

Example: Gaussian with known variance σ2

Conjecture: µ = µ0 Null hypothesis: µ = µ0 Test statistic: Likelihood ratio Find threshold for significance level α

slide-38
SLIDE 38

Example: Gaussian with known variance σ2

Empirical mean maximizes likelihood for any value of σ av ( x) := 1 n

n

  • i=1
  • xi = arg max

µ L x (µ, σ)

slide-39
SLIDE 39

Example: Gaussian with known variance σ2

Λ ( x) = supµ∈H0 L

x (µ)

supµ∈H1 L

x (µ)

slide-40
SLIDE 40

Example: Gaussian with known variance σ2

Λ ( x) = supµ∈H0 L

x (µ)

supµ∈H1 L

x (µ)

= L

x (µ0)

L

x (av (

x))

slide-41
SLIDE 41

Example: Gaussian with known variance σ2

Λ ( x) = supµ∈H0 L

x (µ)

supµ∈H1 L

x (µ)

= L

x (µ0)

L

x (av (

x)) = exp

  • − 1

2σ2

n

  • i=1
  • (

xi − av ( x))2 − ( xi − µ0)2

slide-42
SLIDE 42

Example: Gaussian with known variance σ2

Λ ( x) = supµ∈H0 L

x (µ)

supµ∈H1 L

x (µ)

= L

x (µ0)

L

x (av (

x)) = exp

  • − 1

2σ2

n

  • i=1
  • (

xi − av ( x))2 − ( xi − µ0)2 = exp

  • − 1

2σ2

  • −2 av (

x)

n

  • i=1
  • xi + n av (

x)2 − 2µ0

n

  • i=1
  • xi + nµ2
slide-43
SLIDE 43

Example: Gaussian with known variance σ2

Λ ( x) = supµ∈H0 L

x (µ)

supµ∈H1 L

x (µ)

= L

x (µ0)

L

x (av (

x)) = exp

  • − 1

2σ2

n

  • i=1
  • (

xi − av ( x))2 − ( xi − µ0)2 = exp

  • − 1

2σ2

  • −2 av (

x)

n

  • i=1
  • xi + n av (

x)2 − 2µ0

n

  • i=1
  • xi + nµ2
  • = exp
  • −n (av (

x) − µ0)2 2σ2

slide-44
SLIDE 44

Example: Gaussian with known variance σ2

The likelihood test is |av ( x) − µ0| ≥ σ

  • −2 log η

n Under the null hypothesis av( X) is Gaussian with mean µ0 and var. σ2/n α = Pµ0

  • av(

X) − µ0 σ/√n

  • −2 log η
  • = 2 Q
  • −2 log η
  • .

For a significant level of α, |av ( x) − µ0| ≥ σ Q−1 (α/2) √n

slide-45
SLIDE 45

Neyman-Pearson Lemma

If H0 is θ = θ0 and H1 is θ = θ1 then the likelihood-ratio test has the highest power among all tests with a fixed size

slide-46
SLIDE 46

Neyman-Pearson Lemma: Proof

We denote the rejection region of the likelihood-ratio test by RLR An arbitrary test with rejection region R has power Pθ1

  • X ∈ R
  • Our aim is to prove

Pθ1

  • X ∈ RLR
  • ≥ Pθ1
  • X ∈ R
  • r equivalently

Pθ1

  • X ∈ Rc ∩ RLR
  • ≥ Pθ1
  • X ∈ Rc

LR ∩ R

slide-47
SLIDE 47

Neyman-Pearson Lemma: Proof

Both tests have size α so Pθ0

  • X ∈ R
  • = α = Pθ0
  • X ∈ RLR
  • .

and consequently Pθ0

  • X ∈ Rc ∩ RLR
  • = Pθ0
  • X ∈ RLR
  • − Pθ0
  • X ∈ R ∩ RLR
  • = Pθ0
  • X ∈ R
  • − Pθ0
  • X ∈ R ∩ RLR
  • = Pθ0
  • X ∈ R ∩ Rc

LR

slide-48
SLIDE 48

Neyman-Pearson Lemma: Proof

◮ If Λ (

x) ∈ RLR fθ1 ( x) ≥ fθ0 ( x) η

◮ If Λ (

x) ∈ Rc

LR

fθ1 ( x) ≤ fθ0 ( x) η

slide-49
SLIDE 49

Neyman-Pearson Lemma: Proof

Pθ1

  • X ∈ Rc ∩ RLR
slide-50
SLIDE 50

Neyman-Pearson Lemma: Proof

Pθ1

  • X ∈ Rc ∩ RLR
  • =
  • x∈Rc∩RLR

fθ1 ( x) d x

slide-51
SLIDE 51

Neyman-Pearson Lemma: Proof

Pθ1

  • X ∈ Rc ∩ RLR
  • =
  • x∈Rc∩RLR

fθ1 ( x) d x ≥ 1 η

  • x∈Rc∩RLR

fθ0 ( x) d x

slide-52
SLIDE 52

Neyman-Pearson Lemma: Proof

Pθ1

  • X ∈ Rc ∩ RLR
  • =
  • x∈Rc∩RLR

fθ1 ( x) d x ≥ 1 η

  • x∈Rc∩RLR

fθ0 ( x) d x = 1 ηPθ0

  • X ∈ Rc ∩ RLR
slide-53
SLIDE 53

Neyman-Pearson Lemma: Proof

Pθ1

  • X ∈ Rc ∩ RLR
  • =
  • x∈Rc∩RLR

fθ1 ( x) d x ≥ 1 η

  • x∈Rc∩RLR

fθ0 ( x) d x = 1 ηPθ0

  • X ∈ Rc ∩ RLR
  • = 1

ηPθ0

  • X ∈ R ∩ Rc

LR

slide-54
SLIDE 54

Neyman-Pearson Lemma: Proof

Pθ1

  • X ∈ Rc ∩ RLR
  • =
  • x∈Rc∩RLR

fθ1 ( x) d x ≥ 1 η

  • x∈Rc∩RLR

fθ0 ( x) d x = 1 ηPθ0

  • X ∈ Rc ∩ RLR
  • = 1

ηPθ0

  • X ∈ R ∩ Rc

LR

  • = 1

η

  • x∈R∩Rc

LR

fθ0 ( x) d x

slide-55
SLIDE 55

Neyman-Pearson Lemma: Proof

Pθ1

  • X ∈ Rc ∩ RLR
  • =
  • x∈Rc∩RLR

fθ1 ( x) d x ≥ 1 η

  • x∈Rc∩RLR

fθ0 ( x) d x = 1 ηPθ0

  • X ∈ Rc ∩ RLR
  • = 1

ηPθ0

  • X ∈ R ∩ Rc

LR

  • = 1

η

  • x∈R∩Rc

LR

fθ0 ( x) d x ≥

  • x∈R∩Rc

LR

fθ1 ( x) d x

slide-56
SLIDE 56

Neyman-Pearson Lemma: Proof

Pθ1

  • X ∈ Rc ∩ RLR
  • =
  • x∈Rc∩RLR

fθ1 ( x) d x ≥ 1 η

  • x∈Rc∩RLR

fθ0 ( x) d x = 1 ηPθ0

  • X ∈ Rc ∩ RLR
  • = 1

ηPθ0

  • X ∈ R ∩ Rc

LR

  • = 1

η

  • x∈R∩Rc

LR

fθ0 ( x) d x ≥

  • x∈R∩Rc

LR

fθ1 ( x) d x = Pθ1

  • X ∈ R ∩ Rc

LR

slide-57
SLIDE 57

The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing

slide-58
SLIDE 58

Permutation test

Aim: Compare two datasets xA and xB Null hypothesis: The two datasets are sampled from the same distribution No parametric model...

slide-59
SLIDE 59

Test statistic

Choose test statistic T and evaluate the difference Tdiff ( x) := T ( xA) − T ( xB) , Test: R := {t | t ≥ η} Problem: How do we determine significance level or p value?

slide-60
SLIDE 60

Main insight: Exchangeability under permutations

Under H0 distribution of Tdiff( X) does not change if we permute labels Joint distribution of X1, X2, . . . , Xn and of any permutation

  • X24,

Xn, . . . , X3 are the same Values of Tdiff after permuting tdiff,1, . . . tdiff,n! are uniformly distributed P

  • Tdiff(

X) ≥ η

  • = 1

n!

n!

  • i=1

1tdiff,i≥η This is the size of the test! p = P

  • Tdiff(

X) ≥ Tdiff ( x)

  • = 1

n!

n!

  • i=1

1tdiff,i≥Tdiff(

x)

slide-61
SLIDE 61

Permutation test

  • 1. Choose a conjecture as to how

xA and xB are different

  • 2. Choose a test statistic Tdiff
  • 3. Compute Tdiff (

x)

  • 4. Permute the labels m times and compute the corresponding values of

Tdiff: tdiff,1, tdiff,2, . . . tdiff,m

  • 5. Compute the approximate p value

p = P

  • Tdiff(

X) ≥ Tdiff ( x)

  • = 1

m

m

  • i=1

1tdiff,i≥Tdiff(

x)

and reject the null hypothesis if below a predefined limit (1%, 5%)

slide-62
SLIDE 62

Cholesterol levels

  • 1. Study with 86 men and 182 women
  • 2. Conjecture: men have higher cholesterol than women
  • 3. Test statistic: empirical mean of cholesterol level
  • 4. 261.3 mg/dl amongst men and 242.0 mg/dl amongst women
  • 5. Null hypothesis: No difference, permuting data yields same distribution
  • 6. We sample 106 permutations to compute an approximate p value
slide-63
SLIDE 63

Cholesterol levels

100 150 200 250 300 350 400 450 2 4 6 8 10 12 14 16 Men Women

slide-64
SLIDE 64

p value = 0.119%

Approximate distribution under the null hypothesis of the difference between the empirical means in men and women

20.00 10.00 0.00 10.00 19.22 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

slide-65
SLIDE 65

p value = 0.112%

Approximate distribution under the null hypothesis of the difference between the empirical means in men and women

20.00 10.00 0.00 10.00 19.22 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

slide-66
SLIDE 66

p value = 0.115%

Approximate distribution under the null hypothesis of the difference between the empirical means in men and women

20.00 10.00 0.00 10.00 19.22 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

slide-67
SLIDE 67

Blood pressure

  • 1. Study with 86 men and 182 women
  • 2. Conjecture: men have higher blood pressure than women
  • 3. Test statistic: empirical mean of blood pressure
  • 4. 133.2 mmHg mg/dl amongst men and 130.6 mg/dl amongst women
  • 5. Null hypothesis: No difference, permuting data yields same distribution
  • 6. We sample 106 permutations to compute an approximate p value
slide-68
SLIDE 68

Blood pressure

80 100 120 140 160 180 200 220 5 10 15 20 25 30 Men Women

slide-69
SLIDE 69

p value = 13.48%

Approximate distribution under the null hypothesis of the difference between the empirical means in men and women

5.0 0.0 5.0 2.6 0.00 0.05 0.10 0.15 0.20

slide-70
SLIDE 70

p value = 13.56%

Approximate distribution under the null hypothesis of the difference between the empirical means in men and women

5.0 0.0 5.0 2.6 0.00 0.05 0.10 0.15 0.20

slide-71
SLIDE 71

p value = 13.50%

Approximate distribution under the null hypothesis of the difference between the empirical means in men and women

5.0 0.0 5.0 2.6 0.00 0.05 0.10 0.15 0.20

slide-72
SLIDE 72

The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing

slide-73
SLIDE 73

Multiple testing

Often, we perform many simultaneous hypothesis tests Computational genomics, many genes could be relevant For n independent tests of size α P (at least one false positive) = 1 − P (no false positives) (1) (2)

slide-74
SLIDE 74

Multiple testing

Often, we perform many simultaneous hypothesis tests Computational genomics, many genes could be relevant For n independent tests of size α P (at least one false positive) = 1 − P (no false positives) (1) = 1 − (1 − α)n (2) For α=1% and n = 500 genes, P (at least one false positive) = 0.99!

slide-75
SLIDE 75

Bonferroni’s method

Given n hypothesis tests, compute the corresponding p values p1, . . . , pn For a fixed significance level α reject the ith null hypothesis if pi > α n Probability of making a Type I error is bounded by α

slide-76
SLIDE 76

Bonferroni’s method

Union bound: For any events S1, . . . , Sn P (∪n

i=1Si) ≤ n

  • i=1

P (Si)

slide-77
SLIDE 77

Bonferroni’s method

Union bound: For any events S1, . . . , Sn P (∪n

i=1Si) ≤ n

  • i=1

P (Si) P (Type I error) = P (∪n

i=1Type I error for test i)

slide-78
SLIDE 78

Bonferroni’s method

Union bound: For any events S1, . . . , Sn P (∪n

i=1Si) ≤ n

  • i=1

P (Si) P (Type I error) = P (∪n

i=1Type I error for test i)

n

  • i=1

P (Type I error for test i)

slide-79
SLIDE 79

Bonferroni’s method

Union bound: For any events S1, . . . , Sn P (∪n

i=1Si) ≤ n

  • i=1

P (Si) P (Type I error) = P (∪n

i=1Type I error for test i)

n

  • i=1

P (Type I error for test i) = n · α n = α

slide-80
SLIDE 80

Example: Clutch

Conjecture: 10 NBA players, some are more effective in 4th quarter Null hypothesis: None more effective in 4th quarter Test statistic: Games out of 20 where player scores more in the 4th What threshold do we need to ensure a significance level of 5%?

slide-81
SLIDE 81

Distribution under null hypothesis

η 1 2 3 4 5 P (T0 ≥ η) 1.000 1.000 1.000 0.999 0.994 η 6 7 8 9 10 P (T0 ≥ η) 0.979 0.942 0.868 0.748 0.588 η 11 12 13 14 15 P (T0 ≥ η) 0.412 0.252 0.132 0.058 0.021 η 16 17 18 19 20 P (T0 ≥ η) 0.006 0.001 0.000 0.000 0.000