Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 - PowerPoint PPT Presentation

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 9: Hypothesis testing II Jason Mezey jgm45@cornell.edu March 3, 2016 (Th) 8:40-9:55

Announcements • Homework #3, #4 will be graded and available next week

Summary of lecture 9 • Last lecture, began our discussion of hypothesis testing • Today, we will review critical concepts and consider a particular class of hypothesis tests (Likelihood Ratio Tests) • Next lecture (!!!) we will begin our discussion of concepts in quantitative genetics

Conceptual Overview System Experiment Question Sample s l Inference e d o M . b o r P Statistics Assumptions

Hypothesis Tests H 0 : θ = c θ ∈ Θ = T ( x ) , P Pr ( T ( X ) | H 0 : θ = c ) → [ X 1 = x 1 , ..., X n = x n ] , Pr ([ X 1 = x 1 , ..., X n = x n ]) Sampling Distribution Sample of size n X = x , Pr ( X ) X = Random Variable x Pr ( F ) X ( ω ) , ω ∈ Ω Experiment Ω F

Overview of essential concepts • Inference - the process of reaching a conclusion about the true probability distribution (from an assumed family of probability distributions indexed by parameters) on the basis of a sample • System, Experiment, Experimental Trial, Sample Space, Sigma Algebra, Probability Measure, Random Vector, Parameterized Probability Model, Sample, Sampling Distribution, Statistic, Statistic Sampling Distribution, Estimator, Estimator Sampling distribution Null Hypothesis, Sampling Distribution Conditional on the Null, p-value, One-or-Two-Tailed, Type I Error, Critical Value, Reject / Do Not Reject 1 - Type I, Type II Error, Power, Alternative Hypothesis

Review of hypothesis testing • To build a framework to answer a question about a parameter, we need to start with a definition of hypothesis • Hypothesis - an assumption about a parameter • More specifically, we are going to start our discussion with a null hypothesis , which states that a parameter takes a specific value, i.e. a constant H 0 : θ = c • Once we have assumed a null hypothesis, we know the probability distribution of the statistic, assuming the null hypothesis is true: t Pr ( T ( X = x | θ = c )) • p-value - the probability of obtaining a value of a statistic T ( x ), or more extreme , conditional on H0 being true: ⇥ ), pval = Pr ( | T ( x ) | � t | H 0 : θ = c ) w −∞ ∞ pval ( T ( x )) : T ( x ) → [0 , 1] • Note that a p-value is a function of a statistic (!!)

: µ = 0 Assume H0 is correct (!): Pr(T( x ) | H0) α =0.05 α =0.05 is H 0 : µ = 0 T( x ) d c α ) d c α ) - =1.96 d c α ) =1.64 p = 0.77 p = 0.45 Sample 1 : one-sided test two-sided test T( x )= -0.755 Sample 1I : T( x )= 2.8 p = 0.005 p = 0.0025

Results of hypothesis decisions I: when H0 is correct (!!) • There are only two possible decisions we can make as a result of our hypothesis test: reject or cannot reject H 0 is true H 0 is false cannot reject H 0 1- α , (correct) β , type II error reject H 0 α , type I error 1 − β , power (correct) Pr(T( x ) | H0) T( x )

Results of hypothesis decisions I: when H0 is correct (!!) • There are only two possible decisions we can make as a result of our hypothesis test: reject or cannot reject H 0 is true H 0 is false cannot reject H 0 1- α , (correct) β , type II error reject H 0 α , type I error 1 − β , power (correct) Pr(T( x ) | H0) T( x ) d c α ) =1.64

Assume H0 is wrong (!): µ = 3 Pr(T( x ) | H0) α =0.05 α =0.05 is H 0 : µ = 0 T( x ) d c α ) =1.64 d c α ) d c α ) - =1.96 p = 0.45 p = 0.77 Sample 1 : one-sided test two-sided test T( x )= -0.755 p = 0.0025 p = 0.005 Sample 1I : T( x )= 2.8

Results of hypothesis decisions II: when H0 is wrong (!!) • There are only two possible decisions we can make as a result of our hypothesis test: reject or cannot reject H 0 is true H 0 is false cannot reject H 0 1- α , (correct) β , type II error reject H 0 α , type I error 1 − β , power (correct) Pr(T( x ) | H0)

Results of hypothesis decisions II: when H0 is wrong (!!) • There are only two possible decisions we can make as a result of our hypothesis test: reject or cannot reject H 0 is true H 0 is false cannot reject H 0 1- α , (correct) β , type II error reject H 0 α , type I error 1 − β , power (correct) Pr(T( x ) | H0) T( x ) d c α ) =1.64

Technical definitions • Technically, correct decision given H0 is true is (for one-sided, similar for two-sided): � c α 1 − α = Pr ( T ( x ) | θ = c ) dT ( x ) −∞ • Type I error (H0 is true) is (for one-sided): � � ∞ α = Pr ( T ( x ) | θ = c ) dT ( x ) � c α • Correct decision given H0 is false is (for one-sided): � � c α β = Pr ( T ( x ) | θ ) dT ( x ) � −∞ • Power is (for one-sided): � −∞ � ∞ 1 − β = Pr ( T ( x ) | θ ) dT ( x ) c α

Important concepts I • REMEMBER (!!): there are two possible outcomes of a hypothesis test: we reject or we cannot reject • We never know for sure whether we are right (!!) • If we cannot reject, this does not mean H0 is true (why?) • Note that we can control the level of type I error because we decide 3. α on the value of

Important concepts II • Unlike type I error , which we can set, we cannot control power α directly (since it depends on the actual parameter value) • However, since power depends on how far the true value of 1 − β parameter is from the H0, we can make decisions to increase power depending on how we set up our experiment and test: • Greater sample size = greater power 1 − β • Greater the value of that we set = greater power 1 − β α (trade-off!) • One-sided or two-sided test (which is more powerful?) • How we define our statistic (a more technical concept...)

Final general concept • We need one more concept to complete our formal introduction to hypothesis testing: the alternative hypothesis (H A ) • This defines the set (interval) of values that we are concerned with, i.e. where we suspect our true parameter value will fall if our H0 is incorrect , i.e. for our example above: H A : µ ⇥ = 0 H A : µ > 0 • A complete hypothesis testing setup includes both H0 and H A • H A makes the concept of one- and two-tailed explicit • REMINDER (!!): If you reject H0 you cannot say H A is true (!!)

Understanding p-values... • Inference - the process of reaching a conclusion about the true probability distribution (from an assumed family of probability distributions indexed by parameters) on the basis of a sample • System, Experiment, Experimental Trial, Sample Space, Sigma Algebra, Probability Measure, Random Vector, Parameterized Probability Model, Sample, Sampling Distribution, Statistic, Statistic Sampling Distribution, Estimator, Estimator Sampling distribution Null Hypothesis, Sampling Distribution Conditional on the Null, p-value, One-or-Two-Tailed, Type I Error, Critical Value, Reject / Do Not Reject 1 - Type I, Type II Error, Power, Alternative Hypothesis

What if we did an infinite number of experiments to test our null? • Note that since we have induced a probability model on our r.v. -> sample -> statistic, and a p-value is a function on a statistic, we also have a probability distribution on our p-values • This is the possible p-values we could obtain over an infinite number of different samples (sets of experimental trials)! • This distribution is always (!!) the uniform distribution on [0,1] when the null hypothesis is true (!!) regardless of the statistic or hypothesis test: e Pr ( pval ) ∼ U [0 , 1].

Likelihood ratio tests I • Since there are an unlimited number of ways to define statistics, there are an unlimited number of ways to define hypothesis tests • However, some are more “optimal” than others in terms of having good power, having nice mathematical properties, etc. • The most widely used framework (which we will largely be concerned with in this class) are Likelihood Ratio Tests (LRT) • Similar to MLE’s (and they include MLE’s to calculate the statistic!) they have a confusing structure at first glance, however, just remember these are simply a statistic (sample in, number out) that we use like any other statistic, i.e. with the number out, we can calculate a p-value etc.

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 - PowerPoint PPT Presentation

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 9: Hypothesis testing II Jason Mezey jgm45@cornell.edu March 3, 2016 (Th) 8:40-9:55 Announcements Homework #3, #4 will be graded and available next week Summary of

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture18: Logistic regression

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 23: Pedigree and inbred

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 26: Inbred line analysis

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture20: Haplotype testing and

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture18: Alternative tests and

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 24: Analysis of

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture19: Alternative Tests,

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 26: Introduction to

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture21: Multiple genotypes

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Jason Mezey Biological

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture20: Multiple phenotypes

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 7: Maximum likelihood

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 23: Introduction to

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 22: Continued

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 24: (Brief) Introduction

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture20: Minimum GWAS steps;

Diversity through time... Changes in dinosaur diversity by continent Count species? genera?

Human-Computer Interaction CS5340 HCI Round 3 Prof. Stephen Intille

Information Extraction and Question-Answering Systems Foundations and methods Dr. Gnter

NCA TS Advisory Council and CAN Review Board CHRISTOPHER P . AUSTIN, M.D. DIRECTOR, NCATS

PET Brain Imaging in Cobalt Induced Chronic Toxic Encephalopathy Associated With Chromium Cobalt

The Knowledge of God The starting point for apologetics Original Presentation This

EECS E6870 - Speech Recognition Administrivia Lecture 11 Linear Discriminant Analysis

in Major League Baseball Stan Conte, PT , DPT , ATC MIT Sloan Sports Analytic Conference March

Sambuz

Useful Links

Newsletter

Mail Us