Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 - - PowerPoint PPT Presentation

quantitative genomics and genetics btry 4830 6830 pbsb
SMART_READER_LITE
LIVE PREVIEW

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 - - PowerPoint PPT Presentation

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 9: Hypothesis testing II Jason Mezey jgm45@cornell.edu March 3, 2016 (Th) 8:40-9:55 Announcements Homework #3, #4 will be graded and available next week Summary of


slide-1
SLIDE 1

Lecture 9: Hypothesis testing II

Jason Mezey jgm45@cornell.edu March 3, 2016 (Th) 8:40-9:55

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01

slide-2
SLIDE 2

Announcements

  • Homework #3, #4 will be graded and available next week
slide-3
SLIDE 3

Summary of lecture 9

  • Last lecture, began our discussion of hypothesis testing
  • Today, we will review critical concepts and consider a

particular class of hypothesis tests (Likelihood Ratio Tests)

  • Next lecture (!!!) we will begin our discussion of concepts in

quantitative genetics

slide-4
SLIDE 4

Conceptual Overview

System Question

Experiment

Sample

Assumptions

Inference P r

  • b

. M

  • d

e l s

Statistics

slide-5
SLIDE 5

Experiment

X = x , Pr(X)

X =

Random Variable

Pr(F)

F

→ [X1 = x1, ..., Xn = xn] , Pr([X1 = x1, ..., Xn = xn])

Sample of size n Sampling Distribution

x

X(ω), ω ∈ Ω

T(x), P

=

θ ∈ Θ

Pr(T(X)|H0 : θ = c)

H0 : θ = c

Hypothesis Tests

slide-6
SLIDE 6

Overview of essential concepts

  • Inference - the process of reaching a conclusion about the true

probability distribution (from an assumed family of probability distributions indexed by parameters) on the basis of a sample

  • System, Experiment, Experimental Trial, Sample Space,

Sigma Algebra, Probability Measure, Random Vector, Parameterized Probability Model, Sample, Sampling Distribution, Statistic, Statistic Sampling Distribution, Estimator, Estimator Sampling distribution Null Hypothesis, Sampling Distribution Conditional on the Null, p-value, One-or-Two-Tailed, Type I Error, Critical Value, Reject / Do Not Reject 1 - Type I, Type II Error, Power, Alternative Hypothesis

slide-7
SLIDE 7

Review of hypothesis testing

  • To build a framework to answer a question about a parameter, we need to

start with a definition of hypothesis

  • Hypothesis - an assumption about a parameter
  • More specifically, we are going to start our discussion with a null hypothesis,

which states that a parameter takes a specific value, i.e. a constant

  • Once we have assumed a null hypothesis, we know the probability

distribution of the statistic, assuming the null hypothesis is true:

  • p-value - the probability of obtaining a value of a statistic T(x), or more

extreme, conditional on H0 being true:

  • Note that a p-value is a function of a statistic (!!)

H0 : θ = c

t Pr(T(X = x|θ = c))

), pval = Pr(|T(x)| t|H0 : θ = c) w −∞ ∞

⇥ pval(T(x)) : T(x) → [0, 1]

slide-8
SLIDE 8

Assume H0 is correct (!):

Pr(T(x) | H0)

is H0 : µ = 0

T(x)

: µ = 0

Sample 1: Sample 1I:

d cα)

α =0.05

=1.64

α =0.05

d cα) d cα)

=1.96

  • p = 0.005

p = 0.45

p = 0.0025

p = 0.77

T(x)= -0.755 T(x)= 2.8

  • ne-sided test

two-sided test

slide-9
SLIDE 9
  • There are only two possible decisions we can make as a result of our

hypothesis test: reject or cannot reject

Results of hypothesis decisions I: when H0 is correct (!!)

H0 is true H0 is false cannot reject H0 1-α, (correct) β, type II error reject H0 α, type I error 1 − β, power (correct)

Pr(T(x) | H0)

T(x)

slide-10
SLIDE 10
  • There are only two possible decisions we can make as a result of our

hypothesis test: reject or cannot reject

Results of hypothesis decisions I: when H0 is correct (!!)

H0 is true H0 is false cannot reject H0 1-α, (correct) β, type II error reject H0 α, type I error 1 − β, power (correct)

Pr(T(x) | H0)

d cα)

=1.64

T(x)

slide-11
SLIDE 11
  • There are only two possible decisions we can make as a result of our

hypothesis test: reject or cannot reject

Results of hypothesis decisions I: when H0 is correct (!!)

H0 is true H0 is false cannot reject H0 1-α, (correct) β, type II error reject H0 α, type I error 1 − β, power (correct)

Pr(T(x) | H0)

T(x)

d cα)

=1.64

slide-12
SLIDE 12

Assume H0 is wrong (!):

Pr(T(x) | H0)

is H0 : µ = 0

T(x)

Sample 1: Sample 1I:

d cα)

α=0.05

=1.64

α=0.05

d cα) d cα)

=1.96

  • p = 0.005

p = 0.45 p = 0.0025

p = 0.77

T(x)= -0.755 T(x)= 2.8

  • ne-sided test

two-sided test

µ = 3

slide-13
SLIDE 13
  • There are only two possible decisions we can make as a result of our

hypothesis test: reject or cannot reject

Results of hypothesis decisions II: when H0 is wrong (!!)

H0 is true H0 is false cannot reject H0 1-α, (correct) β, type II error reject H0 α, type I error 1 − β, power (correct)

Pr(T(x) | H0)

slide-14
SLIDE 14

H0 is true H0 is false cannot reject H0 1-α, (correct) β, type II error reject H0 α, type I error 1 − β, power (correct)

Pr(T(x) | H0)

Results of hypothesis decisions II: when H0 is wrong (!!)

  • There are only two possible decisions we can make as a result of our

hypothesis test: reject or cannot reject

d cα)

=1.64

T(x)

slide-15
SLIDE 15

H0 is true H0 is false cannot reject H0 1-α, (correct) β, type II error reject H0 α, type I error 1 − β, power (correct)

Pr(T(x) | H0)

Results of hypothesis decisions II: when H0 is wrong (!!)

  • There are only two possible decisions we can make as a result of our

hypothesis test: reject or cannot reject T(x)

d cα)

=1.64

slide-16
SLIDE 16
  • Technically, correct decision given H0 is true is (for one-sided, similar

for two-sided):

  • Type I error (H0 is true) is (for one-sided):
  • Correct decision given H0 is false is (for one-sided):
  • Power is (for one-sided):

Technical definitions

1 − α = cα

−∞

Pr(T(x)|θ = c)dT(x)

  • α =

Pr(T(x)|θ = c)dT(x)

  • β =

−∞

Pr(T(x)|θ)dT(x)

  • −∞

1 − β = ∞

Pr(T(x)|θ)dT(x)

slide-17
SLIDE 17
  • REMEMBER (!!): there are two possible outcomes of a hypothesis

test: we reject or we cannot reject

  • We never know for sure whether we are right (!!)
  • If we cannot reject, this does not mean H0 is true (why?)
  • Note that we can control the level of type I error because we decide
  • n the value of

Important concepts I

  • 3. α
slide-18
SLIDE 18
  • Unlike type I error , which we can set, we cannot control power

directly (since it depends on the actual parameter value)

  • However, since power depends on how far the true value of

parameter is from the H0, we can make decisions to increase power depending on how we set up our experiment and test:

  • Greater sample size = greater power
  • Greater the value of that we set = greater power

(trade-off!)

  • One-sided or two-sided test (which is more powerful?)
  • How we define our statistic (a more technical concept...)

α

1 − β

1 − β

α

1 − β

Important concepts II

slide-19
SLIDE 19
  • We need one more concept to complete our formal introduction to

hypothesis testing: the alternative hypothesis (HA)

  • This defines the set (interval) of values that we are concerned with,

i.e. where we suspect our true parameter value will fall if our H0 is incorrect, i.e. for our example above:

  • A complete hypothesis testing setup includes both H0 and HA
  • HA makes the concept of one- and two-tailed explicit
  • REMINDER (!!): If you reject H0 you cannot say HA is true (!!)

Final general concept

HA : µ > 0 HA : µ ⇥= 0

slide-20
SLIDE 20

Understanding p-values...

  • Inference - the process of reaching a conclusion about the true

probability distribution (from an assumed family of probability distributions indexed by parameters) on the basis of a sample

  • System, Experiment, Experimental Trial, Sample Space,

Sigma Algebra, Probability Measure, Random Vector, Parameterized Probability Model, Sample, Sampling Distribution, Statistic, Statistic Sampling Distribution, Estimator, Estimator Sampling distribution Null Hypothesis, Sampling Distribution Conditional on the Null, p-value, One-or-Two-Tailed, Type I Error, Critical Value, Reject / Do Not Reject 1 - Type I, Type II Error, Power, Alternative Hypothesis

slide-21
SLIDE 21
  • Note that since we have induced a probability model on our r.v. ->

sample -> statistic, and a p-value is a function on a statistic, we also have a probability distribution on our p-values

  • This is the possible p-values we could obtain over an infinite number
  • f different samples (sets of experimental trials)!
  • This distribution is always (!!) the uniform distribution on [0,1] when

the null hypothesis is true (!!) regardless of the statistic or hypothesis test:

e Pr(pval) ∼ U[0, 1].

What if we did an infinite number

  • f experiments to test our null?
slide-22
SLIDE 22
  • Since there are an unlimited number of ways to define statistics,

there are an unlimited number of ways to define hypothesis tests

  • However, some are more “optimal” than others in terms of having

good power, having nice mathematical properties, etc.

  • The most widely used framework (which we will largely be

concerned with in this class) are Likelihood Ratio Tests (LRT)

  • Similar to MLE’s (and they include MLE’s to calculate the statistic!)

they have a confusing structure at first glance, however, just remember these are simply a statistic (sample in, number out) that we use like any other statistic, i.e. with the number out, we can calculate a p-value etc.

Likelihood ratio tests I

slide-23
SLIDE 23
  • Likelihood Ratio Tests use a statistic with the following structure:
  • is the likelihood function
  • is the parameter that maximizes the

likelihood given the sample restricted to the set of parameters defined by H0, which we symbolize by

  • is the parameter that maximizes the

likelihood given the sample restricted to the set of parameters defined by HA or more usually the values

  • We will assume the following for the alternative set of hypotheses,

for example:

Likelihood ratio tests II

lihood is Θ0,

1

r Θ1 = ΘA

  • r Θ1 = ΘA⌅Θ0,

e cases we will consider, we if H0 : µ = c then HA : µ ⇤= c)

Λ = L(ˆ θ0|x) L(ˆ θ1|x)

| L(θ|x)

| ˆ θ0 = argmaxθ∈Θ0L(θ|x)

| ˆ θ1 = argmaxθ∈Θ1L(θ|x)

slide-24
SLIDE 24
  • Again, consider our simplified normal r.v. with sample n
  • The likelihood is:
  • and the LRT statistic for is:
  • where we have:

Likelihood ratio tests III

H0 : µ = c

L(θ|x) =

1 (2πσ2)

n 2 e

Pn

i=1 −(xi−µ)2 2σ2

LRT = Λ =

1 (2π∗MLE(ˆ σ2)) n 2 e Pn i=1 −(xi−H0(µ))2 2∗MLE(ˆ σ2) 1 (2π∗MLE(ˆ σ2)) n 2 e Pn i=1 −(xi−MLE(ˆ µ))2 2∗MLE(ˆ σ2)

MLE(ˆ µ) = mean(x) = 1

n

Pn

i=1 xi

P MLE(ˆ σ2) = 1

n

Pn

i=1(xi − mean(x))2

H0(µ) = c

slide-25
SLIDE 25
  • Remember, to calculate a p-value, we need to know the sampling distribution

under the null (note likelihood ratio tests are two-sided test)

  • If we consider the following transformation:
  • It turns out that, under conditions that often apply, as the sample size

the sampling distribution of this statistic under the null approaches (in the specific case on the last slide, the d.f. = k = 1!!):

Likelihood ratio tests IV

(n → ∞)

| LRT = −2ln(Λ) = −2ln

  • L(ˆ

θ0|x) L(ˆ θ1|x) ⇥

| Pr(LRT|H0 : θ = c) → χ2

d.f.

slide-26
SLIDE 26
  • There is a difference between a sampling distribution (under the null) that

approaches a distribution as and a case where we know the exact distribution for any size n (i.e., for the former, the null distribution is approximate)

  • Why use a test statistic where the distribution under the null is approximate

(since we need to know this distribution to do the hypothesis test!)?

  • The approximation is very close even for moderate sized n
  • An LRT is a very versatile way of constructing a hypothesis test with “good”

properties for many types of cases

  • Even better, for some specific tests, the sampling distribution under the null for

ANY sample size n is known exactly for a specified transformation of the likelihood ratio statistic

  • Note that this is the case for many of the tests you are familiar with (t-tests, F-

tests, tests of the linear regression slope, etc.), that is, these tests are forms of likelihood ratio test statistic!!!

Likelihood ratio tests V

(n → ∞)

slide-27
SLIDE 27

That’s it for today

  • Next week: foundational concepts in quantitative genomics!