Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong - - PowerPoint PPT Presentation

second order asymptotics of sequential hypothesis testing
SMART_READER_LITE
LIVE PREVIEW

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong - - PowerPoint PPT Presentation

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong Li and Vincent Y. F. Tan June 4, 2020 1/20 1 / 20 Outline Problem Setup Literature Review Main Result Numerical Examples Proof of the Main Result 2/20 2 /


slide-1
SLIDE 1

1/20

Second-Order Asymptotics of Sequential Hypothesis Testing

Yonglong Li and Vincent Y. F. Tan June 4, 2020

1 / 20

slide-2
SLIDE 2

2/20

Outline

◮ Problem Setup ◮ Literature Review ◮ Main Result ◮ Numerical Examples ◮ Proof of the Main Result

2 / 20

slide-3
SLIDE 3

3/20

Sequential Hypothesis Testing

Problem Setup

◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X.

3 / 20

slide-4
SLIDE 4

3/20

Sequential Hypothesis Testing

Problem Setup

◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X. ◮ {Xk}∞

k=1 are i.i.d. random variables distributed according to P.

3 / 20

slide-5
SLIDE 5

3/20

Sequential Hypothesis Testing

Problem Setup

◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X. ◮ {Xk}∞

k=1 are i.i.d. random variables distributed according to P.

◮ Fn is the σ-algebra generated by X n

1 .

3 / 20

slide-6
SLIDE 6

3/20

Sequential Hypothesis Testing

Problem Setup

◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X. ◮ {Xk}∞

k=1 are i.i.d. random variables distributed according to P.

◮ Fn is the σ-algebra generated by X n

1 .

◮ T is a stopping time adapted to the filtration {Fn}∞

n=1 and

FT is the σ-algebra associated with T.

3 / 20

slide-7
SLIDE 7

3/20

Sequential Hypothesis Testing

Problem Setup

◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X. ◮ {Xk}∞

k=1 are i.i.d. random variables distributed according to P.

◮ Fn is the σ-algebra generated by X n

1 .

◮ T is a stopping time adapted to the filtration {Fn}∞

n=1 and

FT is the σ-algebra associated with T. ◮ δ is a {0, 1}-valued function measurable with respect to FT. δ = i means Hi is the underlying hypothesis. A pair (δ, T) is called a sequential hypothesis test (SHT).

3 / 20

slide-8
SLIDE 8

4/20

Sequential Hypothesis Testing

Problem Setup

◮ P1|0(δ, T) = PT

0 (δ = 1)

and P0|1(δ, T) = PT

1 (δ = 0).

◮ Expectation constraint on the sample size T: for any integer n, maxi=0,1 EPi[T] ≤ n.

Sequential Probability Ratio Test

One important class of SHTs is the family of sequential probability ratio tests (SPRTs). Let Yk = log p0(Xk)

p1(Xk) and Sn = n k=1 Yk. For

any pair of positive real numbers α and β, an SPRT with parameters (α, β) is defined as follows δ =

  • if ST > β

1 if ST < −α, where T = inf{n ≥ 1 : Sn / ∈ [−α, β]}.

4 / 20

slide-9
SLIDE 9

5/20

Problem Setup

Error Exponents

Given a sequence of SHTs {(δn, Tn)}∞

n=1 satisfy the expectation

constraint, we are concerned with the error exponents (E0, E1) defined as E0 = lim inf

n→∞

1 n log 1 P1|0(δn, Tn) and E1 = lim inf

n→∞

1 n log 1 P0|1(δn, Tn).

5 / 20

slide-10
SLIDE 10

6/20

Literature Review

◮ In 1948, Wald and Wolfowitz showed that E0 ≤ D(P1P0) and E1 ≤ D(P0P1) and the error exponent can be achieved by a sequence of SPRTs.

6 / 20

slide-11
SLIDE 11

6/20

Literature Review

◮ In 1948, Wald and Wolfowitz showed that E0 ≤ D(P1P0) and E1 ≤ D(P0P1) and the error exponent can be achieved by a sequence of SPRTs. ◮ In 1986 and 1988, Lotov studied the series expansion of the error probabilities for a sequence of SPRTs.

6 / 20

slide-12
SLIDE 12

6/20

Literature Review

◮ In 1948, Wald and Wolfowitz showed that E0 ≤ D(P1P0) and E1 ≤ D(P0P1) and the error exponent can be achieved by a sequence of SPRTs. ◮ In 1986 and 1988, Lotov studied the series expansion of the error probabilities for a sequence of SPRTs. ◮ For the fixed length binary hypothesis testing problem, Strassen showed that the backoff from the optimal exponent D(P0P1) is of the order Θ( 1

√n) and characterized the implied

constant as a function of the relative entropy variance and the Gaussian cumulative distribution function.

6 / 20

slide-13
SLIDE 13

7/20

Main Result

Secon-order Term under Expectation Constraint

For fixed λ ∈ [0, 1], Fn(λ) = sup

(δn,Tn):maxi=0,1 EPi [Tn]≤n

λ 1 n log P1|0(δn, Tn) + D(P1P0)

  • + (1 − λ)

1 n log P0|1(δn, Tn) + D(P0P1)

  • .

(1) Let F(λ) = lim sup

n→∞ Fn(λ)

and F(λ) = lim inf

n→∞ Fn(λ).

If F(λ) = F(λ), then we term this common value as the second-order exponent of SHT under the expectation constraint and we denote it simply as F(λ).

7 / 20

slide-14
SLIDE 14

8/20

Main Result

Second-order Asymptotics under the Expectation Constraint

Let {αk}∞

k=1 and {βk}∞ k=1 be two increasing sequences of positive

real numbers such that αk → ∞ and βk → ∞ as k → ∞. Let T(βk) = inf{n ≥ 1 : Sn > βk} and ˜ T(αk) = inf{n ≥ 1 : −Sn > αk}. Furthermore, let Rk = ST(βk) − βk and ˜ Rk = −S ˜

T(αk) − αk. It is known that

◮ if the true hypothesis is H0, {Rk}∞

k=1 converges in distribution

to some random variable R and the limit is independent of the choice of {αk}∞

k=1;

◮ if the true hypothesis is H1, { ˜ Rk}∞

k=1 converges in distribution

to some random variable ˜ R and the limit is independent of the choice of {βk}∞

k=1.

8 / 20

slide-15
SLIDE 15

9/20

Main Result

Second-order Asymptotics under the Expectation Constraint

Define A(P0, P1) = E[R], ˜ A(P0, P1) = E[ ˜ R], B(P0, P1) = log E[e−R], ˜ B(P0, P1) = log E[e− ˜

R].

We note that ˜ A(P0, P1) = A(P1, P0) and ˜ B(P0, P1) = B(P1, P0) in general.

Theorem 1

Let P0 and P1 be such that maxi=0,1 EPi

  • log p0(X1)

p1(X1)

  • 2

< ∞ and log p0(X1)

p1(X1) is non-arithmetic when X1 ∼ P0. Then for every

λ ∈ [0, 1], F(λ) = F(λ) = F(λ) = λ

  • A(P0, P1) + ˜

B(P0, P1)

  • + (1 − λ)

˜ A(P0, P1) + B(P0, P1)

  • .

9 / 20

slide-16
SLIDE 16

10/20

Second-order Results

Remark 1

The rate of convergence of the optimal λ-weighted finite-length exponents sup(δn,Tn) − λ

n log P1|0(δn, Tn) − 1−λ n

log P0|1(δn, Tn) to the λ-weighted exponents λD(P1P0) + (1 − λ)D(P0P1) is Θ( 1

n) .

10 / 20

slide-17
SLIDE 17

11/20

Numerical Examples

Example 1

Let γ0 and γ1 be two positive real numbers such that γ0 < γ1. Let p0(x) = γ0e−γ0x and p1(x) = γ1e−γ1x for x > 0. We can numerically compute the second-order exponent under the expectation constraint. This is illustrated in Figure 3 for various λ’s.

λ=0.1 λ=0.5 λ=0.9 0.2 0.4 0.6 0.8 1.0 γ 1 2 3 4 F

Figure: Exponential distributions as in Example 2 with γ0 = γ and γ1 = 1

11 / 20

slide-18
SLIDE 18

12/20

Numerical Examples

Example 2

Let θ0 and θ1 be two distinct real numbers. Let p0(x) =

1 √ 2πe− (x−θ0)2

2

and p1(x) =

1 √ 2πe− (x−θ1)2

2

for x ∈ R. Let ∆(θ) = |θ1 − θ2|. Then we can numerically compute the second-order exponent under the expectation constraint. This is illustrated in Figure 3. We note that for this case of discriminating between two Gaussians, F(λ) does not depend on λ ∈ [0, 1].

2 4 6 8 |Δθ| 5 10 15 20 F

Figure: Gaussian distributions

12 / 20

slide-19
SLIDE 19

13/20

Proof of the Main Result

Auxiliary Tools

In the proof of Theorem 1, we use the following results on the asymptotics of the first passage time. Let {αi}∞

i=1 and {βi}∞ i=1 be two increasing sequences of positive

real numbers such that αi → ∞ and βi → ∞ as i → ∞. Let (δi, Ti) be an SPRT with parameters (αi, βi).

Theorem 2 (Woodfore)

Assume that max{EP1[Y 2

1 ], EP0[Y 2 1 ]} < ∞ and Y1 is

non-arithmetic. Then as n → ∞, EP0[Tn] = βn D(P0P1) + A(P0, P1) D(P0P1) + o(1), and EP1[Tn] = αn D(P1P0) + ˜ A(P0, P1) D(P1P0) + o(1).

13 / 20

slide-20
SLIDE 20

14/20

Proof of the Main Result

Auxiliary Tools Theorem 3 (Woodfore)

Assume that max{EP1[Y 2

1 ], EP0[Y 2 1 ]} < ∞ and Y1 is

non-arithmetic. Then, lim

i→∞ P1|0(δi, Ti)eαi = e ˜ B(P0,P1),

lim

i→∞ P0|1(δi, Ti)eβi = eB(P0,P1).

The following lemma characterizes the optimality of the SPRT.

Lemma 4 (Ferguson)

Let (δ, T) be an SPRT. Let (˜ δ, ˜ T) be any SHT such that EP0[ ˜ T] ≤ EP0[T] and EP1[ ˜ T] ≤ EP1[T]. Then P0|1(δ, T) ≤ P0|1(˜ δ, ˜ T) and P1|0(δ, T) ≤ P1|0(˜ δ, ˜ T).

14 / 20

slide-21
SLIDE 21

15/20

Proof of the Main Result

Upper Bound on the Error Probabilities

For any δ > 0, let αn = nD(P1P0)

  • 1 −

˜ A(P0,P1) nD(P1P0) − δ n

  • and

βn = nD(P0P1)

  • 1 −

A(P0,P1) nD(P0P1) − δ n

  • . Consider the SPRT

(δn, Tn) with parameters (αn, βn). As αn → ∞ and βn → ∞, from Theorem 3, it follows that P1|0(δn, Tn) = P0(ST ≤ αn) = (e

˜ B(P0,P1) + o(1))e−αn.

and similarly, P0|1(δn, Tn) = (eB(P0,P1) + o(1))e−βn. We now show that EP0[Tn] ≤ n and EP1[Tn] ≤ n. From Theorem 2, it follows that EP0[Tn] = βn D(P0P1) + A(P0, P1) D(P0P1) + o(1) = n − δ + o(1). (2)

15 / 20

slide-22
SLIDE 22

16/20

Proof of the Main Result

Upper Bound on the Error Probabilities

Similarly, EP1[Tn] ≤ n − δ + o(1). Thus for sufficiently large n, we have that EP0[Tn] ≤ n and EP1[Tn] ≤ n. It then follows that F(λ) = lim sup

n→∞ Fn(λ) ≤ λ(A(P0, P1) + ˜

B(P0, P1)) + (1 − λ)( ˜ A(P0, P1) + B(P0, P1)) + (λD(P0P1) + (1 − λ)D(P1P0))δ. As δ > 0 is arbitrary, we conclude that F(λ) ≤ λ(A(P0, P1) + ˜ B(P0, P1)) + (1 − λ)( ˜ A(P0, P1) + B(P0, P1)).

16 / 20

slide-23
SLIDE 23

17/20

Proof of the Main Result

Lower Bound on the Error Probabilities

For any δ > 0, let ˆ αn = nD(P1P0)

  • 1 −

˜ A(P0,P1) nD(P1P0) + δ n

  • and

ˆ βn = nD(P0P1)

  • 1 −

A(P0,P1) nD(P0P1) + δ n

  • . Consider a sequence of

SPRTs {(ˆ δn, ˆ Tn)}∞

n=1 with parameters {(ˆ

αn, ˆ βn)}∞

n=1. Now we

show that for sufficiently large n, we have that for i = 0, 1, EPi[ ˆ Tn] ≥ n + δ

  • 2. From Theorem 2 and the choice of ˆ

βn, it follows that for sufficiently large n, EP0[ ˆ Tn] = ˆ βn D(P0P1) + A(P0, P1) D(P0P1) + o(1) = n + δ + o(1) ≥ n + δ 2 > n. Similarly, for sufficiently large n, EP1[ ˆ Tn] ≥ n + δ

2 > n.

17 / 20

slide-24
SLIDE 24

18/20

Proof of the Main Result

Lower Bound on the Error Probabilities

Then from Lemma 4, we conclude that for any SHT (δn, Tn) with max{EP0[T], EP1[T]} ≤ n, P0|1(δn, Tn) ≥ P0|1(ˆ δn, ˆ Tn) and P1|0(δn, Tn) ≥ P1|0(ˆ δn, ˆ Tn). From Theorem 3, we have that log P1|0(ˆ δn, ˆ Tn) = ˜ B(P0, P1) − ˆ αn + log(1 + o(1)) (3) and log P0|1(ˆ δn, ˆ Tn) = B(P0, P1) − ˆ βn + log(1 + o(1)). Combining P1|0(δn, Tn) ≥ P1|0(ˆ δn, ˆ Tn) and (3), we have that log P1|0(δn, Tn) + nD(P1P0) ≥ log P1|0(ˆ δn, ˆ Tn) + nD(P1P0) ≥ ˜ B(P0, P1) + A(P0, P1) − δD(P0P1) + log(1 + o(1)). (4)

18 / 20

slide-25
SLIDE 25

19/20

Proof of the Main Result

Lower Bound on the Error Probabilities

Similarly, we have that log P0|1(δn, Tn) + nD(P0P1) ≥ B(P0, P1) + ˜ A(P0, P1) − δD(P1P0) + log(1 + o(1)). (5) As limx→0 log(1 + x) = 0, combining (4) and (5), we have that F(λ) ≥ lim inf

n→∞ Fn(λ) ≥ λ( ˜

B(P0, P1) + A(P0, P1)) + (1 − λ)(B(P0, P1) + ˜ A(P0, P1)) − (λD(P0P1) + (1 − λ)(D(P1P0))δ. Finally letting δ → 0+, we have F(λ) ≥ λ( ˜ B(P0, P1) + A(P0, P1)) + (1 − λ)(B(P0, P1) + ˜ A(P0, P1)), as desired.

19 / 20

slide-26
SLIDE 26

20/20

Thanks for Your Attention!

20 / 20