Hypothesis testing DS GA 1002 Statistical and Mathematical Models - PowerPoint PPT Presentation

Hypothesis testing DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda

Example In a medical study 10% of women and 12.5% of men suffer from heart disease Hypothesis: Men are more prone to have heart disease than women If there are 20 people in the study, effect could be by chance If there are 20 000 people, we are more convinced Hypothesis testing makes this precise

Hypothesis testing Framework to decide whether patterns in data are random fluctuations Aim: Establish whether a predefined hypothesis is supported by the data

The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing

Null and alternative hypotheses Null hypothesis H 0 : There is no underlying phenomenon (men are not more prone to heart disease) Alternative hypothesis H 1 : There is an underlying phenomenon We reject H 0 if it does not explain the data well Failing to reject H 0 does not mean that we think it holds, we just don’t have enough evidence Frequentist perspective: A hypothesis holds or does not hold deterministically

Tests A test is a procedure to decide whether to reject the null hypothesis General strategy: 1. Compute a test statistic from the data T ( x 1 , . . . , x n ) 2. Decide on a rejection region R such that if T ( x 1 , . . . , x n ) ∈ R it is very unlikely that the null hypothesis holds 3. Reject the null hypothesis if T ( x 1 , . . . , x n ) ∈ R

Errors Reject H 0 ? No Yes H 0 is true Type I error � H 1 is true Type II error �

Size and significance level Priority: Control Type I errors The size of a test is the probability of making a Type I error The significance level is an upper bound on the size

Significance level The effect is significant (at a level of 5%) Translation: Given the assumed probabilistic model, the probability that we reject the null hypothesis when it is true is at most 5%

p value The p value is the smallest significance level at which we would reject H 0 for a particular dataset It is a function of the data, not a probability

Power The power of a test is the probability of rejecting H 0 under H 1 For a given significance level, we want as much power as possible Problem: We need to know the distribution of the data under H 1 !

Overview 1. Choose a conjecture 2. Determine the corresponding null hypothesis 3. Choose a test 4. Gather the data 5. Compute the test statistic from the data 6. Compute the p value and reject the null hypothesis if it is below a predefined limit (typically 1% or 5%)

Example: Clutch Conjecture: NBA player is more effective in 4th quarter Null hypothesis: He’s equally effective Test statistic: Games out of 20 in which he scores more points per minute in the 4th quarter What threshold do we need to ensure a significance level of 1 % , 5 % The test statistic is 14, what is the p value?

Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η )

Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η ) What is the distribution of the test statistic T 0 under the null hypothesis?

Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is P ( T 0 > η ) What is the distribution of the test statistic T 0 under the null hypothesis? Binomial with parameters 20 and 1 / 2

Example: Clutch T 0 represents the test statistic under the null hypothesis We consider a rejection region of the form R := { t | t ≥ η } The size of the test is n P ( T 0 > η ) = 1 � n � � 2 n k k = η What is the distribution of the test statistic T 0 under the null hypothesis? Binomial with parameters 20 and 1 / 2

Distribution under null hypothesis η 1 2 3 4 5 P ( T 0 ≥ η ) 1.000 1.000 1.000 0.999 0.994 η 6 7 8 9 10 P ( T 0 ≥ η ) 0.979 0.942 0.868 0.748 0.588 η 11 12 13 14 15 P ( T 0 ≥ η ) 0.412 0.252 0.132 0.058 0.021 η 16 17 18 19 20 P ( T 0 ≥ η ) 0.006 0.001 0.000 0.000 0.000

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? What threshold do we need to ensure a significance level of 5 % ? The test statistic is 14, what is the p value?

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? The test statistic is 14, what is the p value?

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value?

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 %

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 % Is this the probability that the null hypothesis holds?

Example: Clutch What threshold do we need to ensure a significance level of 1 % ? 16 What threshold do we need to ensure a significance level of 5 % ? 15 The test statistic is 14, what is the p value? 5.8 % Is this the probability that the null hypothesis holds? No!

The hypothesis testing framework Parametric testing Nonparametric testing Multiple testing

Parametric testing Data are sampled from a known distribution with unknown parameters Probability measure P θ depends on θ Frequentist perspective The parameter is deterministic and so are the hypotheses Notation: � X is a random vector distributed according to P θ , x are a realization of � the data � X

If H 0 is θ = θ 0 The size of a test with test statistic T and rejection region R is � � T ( � α := P θ 0 X ) ∈ R If the rejection region is of the form T ( � x ) ≥ η � � T ( � α = P θ 0 X ) ≥ η Smallest η at which we reject H 0 is T ( � x ) � � T ( � p = P θ 0 X ) ≥ T ( � x ) p value: probability under H 0 of observing a test statistic that is as extreme as the one we observe

Composite hypotheses θ = θ 0 is a simple hypothesis A composite hypothesis is of the form θ ∈ S for a certain set S The size of a composite test is � � T ( � α = sup X ) ≥ η P θ θ ∈H 0 The p value is � � T ( � X ) ≥ T ( � p = sup P θ x ) θ ∈H 0

Power function The power function of the test is defined as � � T ( � β ( θ ) := P θ X ) ∈ R We want β ( θ ) ≈ 0 for θ ∈ H 0 and β ( θ ) ≈ 1 for θ ∈ H 1

Example: Coin flip Conjecture: Coin is biased towards heads θ > 1 / 2 Null hypothesis: Coin not biased towards heads θ ≤ 1 / 2 Test statistic: Number of heads out of n = 5 , 10 , 100 flips Rejection region: Heads = n , Heads ≥ 3 n / 5 Power function?

Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R

Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R = θ n If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R

Coin flip power function If η = n , � � T ( � β 1 ( θ ) = P θ X ) ∈ R = θ n If η = 3 n / 5, � � T ( � β 2 ( θ ) = P θ X ) ∈ R n � n � θ k ( 1 − θ ) n − k � = k k = 3 n / 5

η = n n = 5 n = 50 0.75 n = 100 β ( θ ) 0.50 0.25 0.05 0.25 0.50 0.75 θ

η ≥ 3 n / 5 n = 5 n = 50 0.75 n = 100 β ( θ ) 0.50 0.25 0.05 0.25 0.50 0.75 θ

Likelihood-ratio test Threshold ratio between likelihoods { Λ ( x ) ≤ η } , where x ) := sup θ ∈H 0 L � x ( θ ) Λ ( � sup θ ∈H 1 L � x ( θ ) Intuition: Unless the ratio is low, we cannot rule out the null hypothesis

Example: Gaussian with known variance σ 2 Conjecture: µ � = µ 0 Null hypothesis: µ = µ 0 Test statistic: Likelihood ratio Find threshold for significance level α

Example: Gaussian with known variance σ 2 Empirical mean maximizes likelihood for any value of σ n x ) := 1 � av ( � � x i = arg max µ L � x ( µ, σ ) n i = 1

Example: Gaussian with known variance σ 2 x ) = sup µ ∈H 0 L � x ( µ ) Λ ( � sup µ ∈H 1 L � x ( µ )

Example: Gaussian with known variance σ 2 x ) = sup µ ∈H 0 L � x ( µ ) Λ ( � sup µ ∈H 1 L � x ( µ ) L � x ( µ 0 ) = L � x ( av ( � x ))

Example: Gaussian with known variance σ 2 x ) = sup µ ∈H 0 L � x ( µ ) Λ ( � sup µ ∈H 1 L � x ( µ ) L � x ( µ 0 ) = L � x ( av ( � x )) � n x i − µ 0 ) 2 �� − 1 x )) 2 − ( � � � = exp ( � x i − av ( � 2 σ 2 i = 1

Hypothesis testing DS GA 1002 Statistical and Mathematical Models - PowerPoint PPT Presentation

Hypothesis testing DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda Example In a medical study 10% of women and 12.5% of men suffer from heart disease Hypothesis:

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019

Hypothesis Testing for a Proportion August 21, 2019 August 21, 2019 1 / 64 Hypothesis Testing

Machine Learning for Ontology Mining: Perspectives and Issues Claudia dAmato Department of

Dialogue Modelling, Language Processing Dynamics and Linguistic Knowledge Eleni

Modelling and Reasoning about State Nick Benton Microsoft Research, Cambridge k Introduction

4: Significance Testing Machine Learning and Real-world Data Simone Teufel and Ann Copestake

First: Past exam question HMMs are sometimes used for chunking : identifying short sequences of

Practical Evaluation of Protected RNS Scalar Multiplication CHES 2019 By Louiza

and International Benchmarks William E. Kovacic George Washington University/Kings College