Lecture 8 Hypothesis Testing I-Hsiang Wang Department of - - PowerPoint PPT Presentation

▶

Jan 22, 2024 208 likes •428 views

Lecture 8 Hypothesis Testing I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 21 I-Hsiang Wang IT Lecture 8 In this lecture, we elaborate more on binary hypothesis

SLIDE 1

Lecture 8 Hypothesis Testing

I-Hsiang Wang

Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw

December 20, 2016

1 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 2

In this lecture, we elaborate more on binary hypothesis testing, focusing on the following aspects:

1 Fundamental performance limits of binary hypothesis testing.

Log likelihood, Neyman-Pearson Test Optimal trade-off between α and β

(α: Probability of false alarm/type-I error/false positive) (β: Probability of miss detection/type-II error/false negative).

2 Asymptotic performance of testing from n i.i.d. samples as n → ∞.

Stein's regime vs. Chernoff's regime Error exponents

Along the side, we will introduce large deviation theory, an important set of probabilistic theoretical tools that not only help characterize the asymptotic performance limits of binary hypothesis testing but also play an important role in other problems.

2 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 3

Binary Hypothesis Testing: More Details

1 Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Tradeoff between α and β Asymptotic Performance: Prelude

3 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 4

Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test

1 Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Tradeoff between α and β Asymptotic Performance: Prelude

4 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 5

Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test

Setup (Recap)

{ H0 : X ∼ P0

(Null Hypothesis, θ = 0)

H1 : X ∼ P1

(Alternative Hypothesis, θ = 1)

(1) Unknown binary parameter θ. Data generating distribution Pθ. Data/Observation/Sample X ∼ Pθ. Decision rule (randomized test) φ : X → [0, 1]. Outcome ˆ

θ = 1 with probability φ(X).

Loss function: 0-1 loss 1{ˆ

θ ̸= θ}.

Probability of Errors (prove the following as an exercise)

  

Probability of Type-I Error :

αφ = EX∼P0 [φ(X)]

Probability of Type-II Error :

βφ = EX∼P1 [1 − φ(X)]

(2)

5 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 6

Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test

Likelihood, Log Likelihood Ratio, and Likelihood Ratio Test

1 L(θ|x) ≜ Pθ(x) (viewed as a function of parameter θ given the data x) is called the likelihood function of θ. 2 For binary HT, likelihood ratio L(x) ≜ L(1|x)

L(0|x) = P1(x) P0(x)

3 Log likelihood ratio (LLR)

l(x) ≜ log L(x) = log P1(x) − log P0(x).

4 A (randomized) likelihood ratio test (LRT) is a test φτ,γ defined as follows:

(parametrized by cosntants τ ∈ R and γ ∈ (0, 1))

φτ,γ (x) =      1

if l (x) > τ

γ

if l (x) = τ if l (x) < τ . Remark: In this lecture we assume the logarithm above is base-2.

6 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 7

Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test

Performance of LRT For a LRT φτ,γ, the probabilities of errors are

{ α = P0 {l(X) > τ} + γP0 {l(X) = τ} = L0{l > τ} + γL0{l = τ} β = P1 {l(X) < τ} + (1 − γ)P1 {l(X) = τ} = L1{l ≤ τ} − γL1{l = τ}

(3) where L0, L1 are the distributions of the LLR under P0 and P1 respectively. The following facts will be useful later. The proofs are left as exercise. Proposition 1 For a LRT φτ,γ, its probabilities of type-I and type II errors satisfy α ≤ 2−τ, β ≤ 2τ,

L0{l > τ} ≤ α ≤ L0{l ≥ τ}, and L1{l < τ} ≤ β ≤ L1{l ≤ τ}.

Furthermore, the distributions of LLR satisfy L1(l) = 2lL0(l).

7 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 8

Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test

Neyman-Pearson Theorem and Neyman-Pearson Test

Recall the Neyman-Pearson problem which aims to find the lowest probability of type-II error under the constraint that the probability of the type-I error is at most α:

β∗(α) ≜ infφ: X→[0,1]

αφ≤α

βφ

(4) Let us re-state the Neyman-Pearson Theorem to emphasize the fact that β∗(α) can be attained by a randomized LRT φτ,γ, called the Neyman-Pearson Test. Theorem 1 (Neyman-Pearson: (Randomized) LRT is Optimal) For any α ∈ [0, 1], β∗(α) is attained by a (randomized) LRT φτ ∗,γ∗ with the parameters (τ ∗, γ∗), where the pair (τ ∗, γ∗) ∈ R × [0, 1] is the unique solution to α = L0{l > τ} + γL0{l = τ}. Hence, the inf{·} in (4) is attainable and hence becomes min{·}.

8 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 9

Binary Hypothesis Testing: More Details Tradeoff between α and β

1 Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Tradeoff between α and β Asymptotic Performance: Prelude

9 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 10

Binary Hypothesis Testing: More Details Tradeoff between α and β

Tradeoff between Probability of Type-I and Type-II Errors

Define the collection of all feasible (Probability of Type-I Error, Probability of Type-II Error) as follows:

R(P0, P1) ≜ {(αφ, βφ) | φ : X → [0, 1]}

(5) Proposition 2 (Properties of R(P0, P1))

R(P0, P1) satisfies the following properties:

1 It is closed and convex. 2 It contains the diagonal line {(a, 1 − a) | a ∈ [0, 1]}. 3 It is symmetric w.r.t. the diagonal line: (α, β) ∈ R(P0, P1) ⇐

⇒ (1 − α, 1 − β) ∈ R(P0, P1).

4 Lower boundary (below the diagonal line) {β∗(α) | α ∈ [0, 1]} is attained by Neyman-Pearson Test.

10 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 11

Binary Hypothesis Testing: More Details Tradeoff between α and β

α (PFA) β (PMD) 1 1 R(P0, P1)

(a) |X| = ∞

1 1 α (PFA) β (PMD) R(P0, P1)

(b) |X| < ∞

Intuition:

R(P0, P1) tells how "dissimilar" P0 and P1 are.

The larger R(P0, P1) is, the easier it is to distinguish P0 and P1.

11 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 12

Binary Hypothesis Testing: More Details Tradeoff between α and β

Proof Sketch of Proposition 2 Closedness is due to Neyman-Pearson Theorem (inf{·} is attainable and becomes min{·}) and the symmetry property (Property 3). Convexity is proved by consider a convex combination φ(λ) of two tests φ(0) and φ(1), where

φ(λ)(x) ≜ (1 − λ)φ(0)(x) + λφ(1)(x). Derive its (α, β) and convexity is immediately proved.

Consider a blind test which flips a biased (Ber(a)) coin to make decision regardless of which x

bserves. In other words, φ(x) = a, ∀ x ∈ X . Then show that the type-I error probability and

type-II error probability are indeed a and (1 − a) respectively. Symmetry is proved by consider an opposite test φ against the test φ achieving (α, β), where

φ(x) = 1 − φ(x), ∀ x ∈ X .

12 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 13

Binary Hypothesis Testing: More Details Tradeoff between α and β

Example 1 Draw R(P0, P1) for the following cases:

P0 = Ber(a) and P1 = Ber(b). P0 = P1. P0 ⊥ P1, that is, ⟨P0, P1⟩ = 0.

13 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 14

Binary Hypothesis Testing: More Details Tradeoff between α and β

Bounds on R(P0, P1)

We can run through Neyman-Pearson Tests over all α ∈ [0, 1] and obtain the lower boundary

{β∗(α) | α ∈ [0, 1]} of the region R(P0, P1), which suffices to characterize the entire region.

However, this might be more challenging than you initially think, especially when the observation becomes high-dimensional (like the decoding of channel code). Hence, we often would like to have inner and outer bounds of the region R(P0, P1). Inner Bound is about achievability: Come up with tests which have tractable performance bounds. Often we use deterministic LRT with carefully chosen a Outer Bound is about converse: Show that the performance of all feasible tests must satisfy certain properties.

14 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 15

Binary Hypothesis Testing: More Details Tradeoff between α and β

Outer Bounds

Lemma 1 (Weak Converse) For all (α, β) ∈ R(P0, P1), db (1 − α ∥β) ≤ D (P0 ∥P1), db (β ∥1 − α) ≤ D (P1 ∥P0). Remark: The weak converse bound is characterized by the information divergence. Interestingly, information divergences are the expectation of LLR:

D (P0 ∥P1) = EX∼P0 [−l(X)] = −EL0 [l] , D (P1 ∥P0) = EX∼P1 [l(X)] = EL1 [l].

Lemma 2 (Strong Converse) For all (α, β) ∈ R(P0, P1) and τ ∈ R, α + 2−τβ ≥ L0{l > τ}, β + 2τα ≥ L1{l > τ}. Remark: The strong converse requires the knowledge of the distributions of the LLR, while the weak converse only needs the expected values of the LLR.

15 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 16

Binary Hypothesis Testing: More Details Tradeoff between α and β

Proof Sketch of Converse Bounds Lemma 1: Data processing decreases the information divergence. Lemma 2: Say (α, β) ∈ R(P0, P1) is achieved by test φ. Then,

α + 2−τβ = EX∼P0 [φ(X)] + 2−τEX∼P1 [1 − φ(X)] = EX∼P0 [φ(X)] + 2−τEX∼P0 [ 2l(X)(1 − φ(X)) ] = EX∼P0 [ (1 − 2l(X)−τ)φ(X) + 2l(X)−τ] ≥ EX∼P0 [( (1 − 2l(X)−τ)φ(X) + 2l(X)−τ) 1 {l(X) > τ} ] ≥ EX∼P0 [1 {l(X) > τ}] = L0{l > τ}.

The other bound can be shown similarly by swapping α ↔ β, P0 ↔ P1, τ ↔ −τ.

16 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 17

Binary Hypothesis Testing: More Details Asymptotic Performance: Prelude

1 Binary Hypothesis Testing: More Details Recap: Log Likelihood, Neyman-Pearson Test Tradeoff between α and β Asymptotic Performance: Prelude

17 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 18

Binary Hypothesis Testing: More Details Asymptotic Performance: Prelude

i.i.d. Observations

So far we focus on the general setting where the observation space X can be arbitrary alphabets. In the following, we consider product space X n, length-n observation sequence Xn drawn i.i.d. from one of the two distributions, and the two hypotheses are

H0 : Xi

i.i.d.

∼ P0, i = 1, 2, . . . , n ≡ Xn ∼ P ⊗n H1 : Xi

i.i.d.

∼ P1, i = 1, 2, . . . , n ≡ Xn ∼ P ⊗n

(6) The corresponding probability of errors for a test φ(n) are

α(n) ≡ P(n)

FA ≜ P {H1 is chosen | H0} = EXn∼P ⊗n

[ φ(n)(Xn) ] β(n) ≡ P(n)

MD ≜ P {H0 is chosen | H1} = EXn∼P ⊗n

[ 1 − φ(n)(Xn) ]

18 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 19

Binary Hypothesis Testing: More Details Asymptotic Performance: Prelude

Main Question

How does R(P ⊗n

0 , P ⊗n 1 ) look like as n → ∞?

Since Neyman-Pearson Tests (LRT) characterize R, we shall begin with exploring LRT under i.i.d. observations.

19 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 20

Binary Hypothesis Testing: More Details Asymptotic Performance: Prelude

LRT under i.i.d. Observation

With i.i.d. observations, the LLR of a sequence xn ∈ X n is

l (xn) = log

∏

i=1 P1(xi) P0(xi) = n

∑

i=1

l(xi)

and hence a (randomized) LRT becomes

φ(n)

τn,γn (xn) =

   1/0

if ∑n

i=1 l (xi) ≷ τn

γn

if ∑n

i=1 l (xi) = τn

.

(7) Furthermore, the probabilities of type-I and type-II errors are

{ α(n) = L⊗n {∑n

i=1 li > τn} + γnL⊗n

{∑n

i=1 li = τn}

β(n) = L⊗n

{∑n

i=1 li ≤ τn} − γnL⊗n 1

{∑n

i=1 li = τn}

(8)

20 / 21 I-Hsiang Wang IT Lecture 8

SLIDE 21

Binary Hypothesis Testing: More Details Asymptotic Performance: Prelude

By Proposition 1, we have

L⊗n {∑n

i=1 li > τn} ≤ α(n) ≤ L⊗n

{∑n

i=1 li ≥ τn} ,

L⊗n

{∑n

i=1 li < τn} ≤ β(n) ≤ L⊗n 1

{∑n

i=1 li ≤ τn} .

So it seems the key is the asymptotic behavior of P ⊗n {∑n

i=1 Zi ≥ ζ} for some Zi

i.i.d.

∼ P .

By the weak law of large numbers, we know that if ζ = n (E [Z] + ε) for some ε > 0, then

P ⊗n {∑n

i=1 Zi ≥ ζ} vanishes as n → ∞. But the question is, how fast does it converge to 0?

O(n−1)? O(n−2)? O((log n)−1)? O(2−n)?

It turns out that large deviation theory establishes the asymptotic behavior as follows:

P ⊗n {∑n

i=1 Zi ≥ ζ} ≍ 2−n(I(ζ)+o(1)), where I(ζ) ∈ [0, ∞).

Next, we first take a detour to introduce Large Deviation Theory and then come back to characterize the asymptotic performance limits of hypothesis testing.

21 / 21 I-Hsiang Wang IT Lecture 8