Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference - - PowerPoint PPT Presentation

▶

Apr 02, 2024 297 likes •673 views

Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference Theory Johan Pensar STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 1 / 36 Overview Methods of Finding Tests 1 Likelihood Ratio Tests

SLIDE 1

Chapter 8: Hypothesis Testing

STK4011/9011: Statistical Inference Theory

Johan Pensar

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 1 / 36

SLIDE 2

Overview

1

Methods of Finding Tests Likelihood Ratio Tests Union-Intersection and Intersection-Union Tests

2

Methods of Evaluating Tests Error Probabilities and The Power Function Most Powerful Tests Covers Sec 8.1, 8.2.1, 8.2.3, 8.3.1, and 8.3.2 in CB.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 2 / 36

SLIDE 3

Hypothesis Testing

A hypothesis is here a statement about a population parameter θ. The goal of a hypothesis test is to decide, based on a sample from the population, which

f two complementary hypotheses is true:

H0 : θ 2 Θ0 (null hypothesis), H1 : θ 2 Θc

0 (alternative hypothesis).

Example: Let θ denote the average change in a patient’s blood pressure after taking a

drug. Then, an experimenter might want to test

H0 : θ = 0, H1 : θ 6= 0.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 3 / 36

SLIDE 4

Hypothesis Testing

A hypothesis test is thus basically a rule that specifies:

For which sample values x, H0 is accepted as true (or not rejected), For which sample values x, H0 is rejected and H1 is instead accepted as true.

The subset of the sample space for which H0 is rejected is called the rejection region. Typically, a hypothesis test is specified in terms of a test statistic W (X). Example: A test could be that H0 is to be rejected if W (X) = ¯ X is larger than some constant c, that is, the rejection region is {x : ¯ x > c}.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 4 / 36

SLIDE 5

Overview

1

Methods of Finding Tests Likelihood Ratio Tests Union-Intersection and Intersection-Union Tests

2

Methods of Evaluating Tests Error Probabilities and The Power Function Most Powerful Tests

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 5 / 36

SLIDE 6

The Likelihood Ratio Test (LRT)

Recall, for a random sample X1, . . . , Xn from a population with pdf/pmf f (x | θ). the likelihood function is L(θ | x) = f (x | θ) =

n

Y

i=1

f (xi | θ). Def: Let Θ denote the entire parameter space. The likelihood ratio test statistic for testing H0 : θ 2 Θ0 versus H1 : θ 2 Θc

0 is

λ(x) = supΘ0 L(θ | x) supΘ L(θ | x) , and a likelihood ratio test (LRT) is any test that has a rejection region of the form {x : λ(x)  c}, where 0  c  1.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 6 / 36

SLIDE 7

The Rationale Behind the LRT

Let ˆ θ be the MLE of θ obtained by doing unrestricted maximization of L(θ | x), that is, w.r.t the entire parameter space Θ = Θ0 [ Θc

0.

Let ˆ θ0 be the MLE of θ obtained by doing restricted maximization of L(θ | x) w.r.t the “null parameter space” Θ0. Then, the LRT statistic is λ(x) = L(ˆ θ0 | x) L(ˆ θ | x) , and it has a small value if the observed sample is much more likely for a parameter point in Θc

0 than for any parameter point in Θ0 (in which case H0 is rejected).

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 7 / 36

SLIDE 8

Example: Normal LRT

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 8 / 18

SLIDE 9

Example: Exponential LRT

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 9 / 18

SLIDE 10

LRT with Sufficient Statistics

Let T(X) be a sufficient statistic for θ, that is, it contains all the information about θ available in the sample. Then, we can construct an LRT statistic λ⇤(t), based on T and with likelihood function L⇤(θ | t) = g(t | θ), which is equivalent to the LRT statistic λ(x) based on the complete sample. Thm 8.2.4: If T(X) is a sufficient statistic for θ, and λ⇤(t) and λ(x) are the LRT statistics based on T and X, respectively, then λ⇤ T(x)

= λ(x) for every x in the sample

space.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 10 / 36

SLIDE 11

Proof of Thm 8.2.4

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 11 / 18

SLIDE 12

Examples: LRT and Sufficiency

Let X1, . . . , Xn be iid N(θ, 1). Then, ¯ X is a sufficient statistic for θ and the likelihood function associated with ¯ X ⇠ N(θ, 1/n) could be used to conclude that an LRT of H0 : θ = θ0 versus H1 : θ 6= θ0 rejects H0 for large values of | ¯ X θ0|. Let X1, . . . , Xn be iid with an exponential pdf f (x | θ) = e(xθ), x θ. Then, X(1) is a sufficient statistic for θ and the likelihood function of X(1) could be used to conclude that an LRT of H0 : θ  θ0 versus H1 : θ > θ0 rejects H0 for large values of X(1).

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 12 / 36

SLIDE 13

Nuisance Parameters

Likelihood ratio tests are also useful in situations where there are nuisance parameters present in the model. A nuisance parameter is not of direct inferential interest but it is present in the model. The presence of nuisance parameters does not affect the LRT construction method, but it might lead to a different test.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 13 / 36

SLIDE 14

Example: Normal LRT with Unknown Variance

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 14 / 18

SLIDE 15

Overview

1

Methods of Finding Tests Likelihood Ratio Tests Union-Intersection and Intersection-Union Tests

2

Methods of Evaluating Tests Error Probabilities and The Power Function Most Powerful Tests

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 15 / 36

SLIDE 16

The Union-Intersection Method

The union-intersection method can be used when the null hypothesis can be expressed as an intersection: H0 : θ 2 \

γ2Γ

Θγ. Assume that there are tests available for each of the problems of testing H0γ : θ 2 Θγ versus H1γ : θ 2 Θc

γ,

with the rejection region of H0γ being {x : Tγ(x) 2 Rγ}. Then, the rejection region for H0 is [

γ2Γ

{x : Tγ(x) 2 Rγ}.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 16 / 36

SLIDE 17

Example: Normal Union-Intersection Test

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 17 / 18

SLIDE 18

The Intersection-Union Method

The intersection-union method can be used when the null hypothesis can be expressed as a union: H0 : θ 2 [

γ2Γ

Θγ. Assume that there are tests available for each of the problems of testing H0γ : θ 2 Θγ versus H1γ : θ 2 Θc

γ,

with the rejection region of H0γ being {x : Tγ(x) 2 Rγ}. Then, the rejection region for H0 is \

γ2Γ

{x : Tγ(x) 2 Rγ}.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 18 / 36

SLIDE 19

Overview

1

Methods of Finding Tests Likelihood Ratio Tests Union-Intersection and Intersection-Union Tests

2

Methods of Evaluating Tests Error Probabilities and The Power Function Most Powerful Tests

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 19 / 36

SLIDE 20

Evaluating Tests

When deciding whether to reject or accept (or not reject) the null hypothesis H0, the experimenter might make a mistake. Hypothesis tests are evaluated by how likely they are to make different type of mistakes or errors. While the error probabilities can be controlled (to some extent), there is typically a tradeoff between different types of errors.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 20 / 36

SLIDE 21

The Two Types of Errors in Hypothesis Testing

A hypothesis test of H0 : θ 2 Θ0 versus H1 : θ 2 Θc

0 can make two types of mistakes:

Type I error: H0 is rejected when θ 2 Θ0, Type II error: H0 is accepted when θ 2 Θc

Decision Accept H0 Reject H0 Truth H0 Correct decision Type I error H1 Type II error Correct decision

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 21 / 36

SLIDE 22

The Power Function

Let R denote a test’s rejection region (samples for which H0 is rejected):

If θ 2 Θ0, the probability of a Type I error is Pθ(X 2 R), If θ 2 Θc

0, the probability of a Type II error is Pθ(X 2 Rc) = 1 Pθ(X 2 R).

Thus, we have that Pθ(X 2 R) = ( P(Type I Error), if θ 2 Θ0, 1 P(Type II Error), if θ 2 Θc

0.

Def: The power function of a hypothesis test with rejection region R is the function of θ defined by β(θ) = Pθ(X 2 R).

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 22 / 36

SLIDE 23

Example: Binomial Power Function

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 23 / 36

SLIDE 24

Example: Normal Power Function

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 24 / 36

SLIDE 25

Controlling the Probability of Type I Errors

For a fixed sample size, it is typically impossible make the probabilities of both errors arbitrarily small. The common approach is to only consider tests that control the Type 1 error probability at a specified level. In the above class of tests, one then wants to find the test with smallest Type II error probability. Def: For 0  α  1, a test with power function β(θ) is

A size α test if supθ∈Θ0 β(θ) = α, A level α test if supθ∈Θ0 β(θ)  α.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 25 / 36

SLIDE 26

Important when Setting Up a Hypothesis Test

An experimenter typically specifies the level of the test she wants to use, with common choices being α = 0.1, 0.05, 0.01, thereby controlling the probability of Type I error. In this approach, the probability of Type II error is not controlled and care must be taken when formulating H0 and H1:

Assume that the experimenter expects/hopes that an experiment will give support a particular hypothesis (aka research hypothesis), H1 should be defined as the hypothesis the experimenter hopes to prove, By using a level α test with a small α, the experimenter is guarding against saying that the data support the research hypothesis when it is false.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 26 / 36

SLIDE 27

Example: Size of Normal LRT

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 27 / 36

SLIDE 28

Example: Size of Exponential LRT

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 28 / 36

SLIDE 29

Unbiased Tests

A natural feature that we would like in a test is that it is more likely to reject H0 if θ 2 Θc

0 than if θ 2 Θ0.

Def: A test with power function β(θ) is unbiased if β(θ0) β(θ00) for every θ0 2 Θc

0 and

θ00 2 Θ0. Example: Let X1, . . . , Xn be a random sample from a N(µ, σ2) population with known σ2. We saw earlier that an LRT of H0 : θ  θ0 versus H1 : θ > θ0 has a power function β(θ) that is increasing in θ. It follows that the test is unbiased, since β(θ) > β(θ0) = max

tθ0 β(t)

for all θ > θ0.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 29 / 36

SLIDE 30

Overview

1

Methods of Finding Tests Likelihood Ratio Tests Union-Intersection and Intersection-Union Tests

2

Methods of Evaluating Tests Error Probabilities and The Power Function Most Powerful Tests

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 30 / 36

SLIDE 31

Uniformly Most Powerful Test Within a Class

A class of level α tests controls the probability of Type I error. A good test in such a class would also have a small Type II error probability, that is, a large power function for θ 2 Θc

0.

Def: Let C be a class of tests for testing H0 : θ 2 Θ0 versus H1 : θ 2 Θc

0. A test in class

C, with power function β(θ), is a uniformly most powerful (UMP) class C test if β(θ) β0(θ) for every θ 2 Θc

0 and every β0(θ) that is a power function of a test in class C.

Here, class C in the above definition will be the class of all level α tests, and the test is then called a UMP level α test.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 31 / 36

SLIDE 32

The Neyman-Pearson Lemma

Thm 8.3.12: Consider testing H0 : θ = θ0 versus H1 : θ = θ1, where the pdf or pmf corresponding to θi is f (x | θi), i = 0, 1, using a test with rejection region R that satisfies x 2 R if f (x | θ1) > kf (x | θ0) and x 2 Rc if f (x | θ1) < kf (x | θ0), (1) for some k 0, and α = Pθ0(X 2 R). (2) Then

(i) (Sufficiency) Any test that satisfies (1) and (2) is a UMP level α test. (ii) (Necessity) If there exists a test satisfying (1) and (2) with k > 0, then every UMP level α test is a size α test (satisfies (2)) and every UMP level α test satisfies (1) except perhaps on a set A satisfying Pθ0(X 2 A) = Pθ1(X 2 A) = 0.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 32 / 36

SLIDE 33

Connecting the Neyman-Pearson Lemma to Sufficiency

Cor 8.3.13: Consider the hypothesis problem posed in Theorem 8.3.12. Assume that T(X) is a sufficient statistic for θ and g(t | θi) is the pdf or pmf of T corresponding to θi, i = 0, 1. Then any test based on T with rejection region S (a subset of the sample space of T) is a UMP level α test if it satisfies t 2 S if g(t | θ1) > kg(t | θ0) and t 2 Sc if g(t | θ1) < kg(t | θ0), (3) for some k 0, where α = Pθ0(T 2 S). (4)

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 33 / 36

SLIDE 34

Composite Hypotheses

Hypotheses that specify only a single distribution for the sample X (such as in the N-P Lemma) are called simple hypotheses. In most cases, the hypotheses of interest specify more than one distribution, and are the called composite hypotheses:

Hypotheses that state that a parameter is large, e.g. H : θ θ0, or small, e.g. H : θ < θ0, are called one-sided hypotheses, Hypotheses that state that a parameter is either large or small, e.g. H : θ 6= θ0, are called two-sided hypotheses.

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 34 / 36

SLIDE 35

The Karlin-Rubin Theorem - Monotone Likelihood Ratio

A large class of problems that admit UMP level α tests involve one-sided hypotheses and pdfs or pmfs with the monotone likelihood ratio property. Def: A family of pdfs/pmfs {g(t | θ) : θ 2 Θ} for a univariate random variable T with real-valued parameter θ has a monotone likelihood ratio (MLR) if, for every θ2 > θ1, g(t | θ2)/g(t | θ1) is a monotone function of t on {t : g(t | θ1) > 0 and g(t | θ2) > 0}. Note that c/0 is defined as 1 if 0 < c. An exponential family with g(t | θ) = h(t)c(θ)ew(θ)t has an MLR if w(θ) is nondecreasing. Thm 8.3.17: Consider testing H0 : θ  θ0 versus H1 : θ > θ0. Assume that T is a sufficient statistic for θ and the family of pdfs/pmfs {g(t | θ) : θ 2 Θ} of T has an MLR. Then for any t0, the test that rejects H0 iff T > t0 is a UMP level α test, where α = Pθ0(T > t0).

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 35 / 36

SLIDE 36

Example: UMP Unbiased Normal Test

STK4011/9011: Statistical Inference Theory Chapter 8: Hypothesis Testing 36 / 36