1/20
Second-Order Asymptotics of Sequential Hypothesis Testing
Yonglong Li and Vincent Y. F. Tan June 4, 2020
1 / 20
Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong - - PowerPoint PPT Presentation
Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong Li and Vincent Y. F. Tan June 4, 2020 1/20 1 / 20 Outline Problem Setup Literature Review Main Result Numerical Examples Proof of the Main Result 2/20 2 /
1/20
Yonglong Li and Vincent Y. F. Tan June 4, 2020
1 / 20
2/20
◮ Problem Setup ◮ Literature Review ◮ Main Result ◮ Numerical Examples ◮ Proof of the Main Result
2 / 20
3/20
Problem Setup
◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X.
3 / 20
3/20
Problem Setup
◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X. ◮ {Xk}∞
k=1 are i.i.d. random variables distributed according to P.
3 / 20
3/20
Problem Setup
◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X. ◮ {Xk}∞
k=1 are i.i.d. random variables distributed according to P.
◮ Fn is the σ-algebra generated by X n
1 .
3 / 20
3/20
Problem Setup
◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X. ◮ {Xk}∞
k=1 are i.i.d. random variables distributed according to P.
◮ Fn is the σ-algebra generated by X n
1 .
◮ T is a stopping time adapted to the filtration {Fn}∞
n=1 and
FT is the σ-algebra associated with T.
3 / 20
3/20
Problem Setup
◮ Binary hypothesis testing: H0 : P = P0 and H1 : P = P1, where P0 and P1 are probability distributions defined on the same alphabet X. ◮ {Xk}∞
k=1 are i.i.d. random variables distributed according to P.
◮ Fn is the σ-algebra generated by X n
1 .
◮ T is a stopping time adapted to the filtration {Fn}∞
n=1 and
FT is the σ-algebra associated with T. ◮ δ is a {0, 1}-valued function measurable with respect to FT. δ = i means Hi is the underlying hypothesis. A pair (δ, T) is called a sequential hypothesis test (SHT).
3 / 20
4/20
Problem Setup
◮ P1|0(δ, T) = PT
0 (δ = 1)
and P0|1(δ, T) = PT
1 (δ = 0).
◮ Expectation constraint on the sample size T: for any integer n, maxi=0,1 EPi[T] ≤ n.
Sequential Probability Ratio Test
One important class of SHTs is the family of sequential probability ratio tests (SPRTs). Let Yk = log p0(Xk)
p1(Xk) and Sn = n k=1 Yk. For
any pair of positive real numbers α and β, an SPRT with parameters (α, β) is defined as follows δ =
1 if ST < −α, where T = inf{n ≥ 1 : Sn / ∈ [−α, β]}.
4 / 20
5/20
Error Exponents
Given a sequence of SHTs {(δn, Tn)}∞
n=1 satisfy the expectation
constraint, we are concerned with the error exponents (E0, E1) defined as E0 = lim inf
n→∞
1 n log 1 P1|0(δn, Tn) and E1 = lim inf
n→∞
1 n log 1 P0|1(δn, Tn).
5 / 20
6/20
◮ In 1948, Wald and Wolfowitz showed that E0 ≤ D(P1P0) and E1 ≤ D(P0P1) and the error exponent can be achieved by a sequence of SPRTs.
6 / 20
6/20
◮ In 1948, Wald and Wolfowitz showed that E0 ≤ D(P1P0) and E1 ≤ D(P0P1) and the error exponent can be achieved by a sequence of SPRTs. ◮ In 1986 and 1988, Lotov studied the series expansion of the error probabilities for a sequence of SPRTs.
6 / 20
6/20
◮ In 1948, Wald and Wolfowitz showed that E0 ≤ D(P1P0) and E1 ≤ D(P0P1) and the error exponent can be achieved by a sequence of SPRTs. ◮ In 1986 and 1988, Lotov studied the series expansion of the error probabilities for a sequence of SPRTs. ◮ For the fixed length binary hypothesis testing problem, Strassen showed that the backoff from the optimal exponent D(P0P1) is of the order Θ( 1
√n) and characterized the implied
constant as a function of the relative entropy variance and the Gaussian cumulative distribution function.
6 / 20
7/20
Secon-order Term under Expectation Constraint
For fixed λ ∈ [0, 1], Fn(λ) = sup
(δn,Tn):maxi=0,1 EPi [Tn]≤n
λ 1 n log P1|0(δn, Tn) + D(P1P0)
1 n log P0|1(δn, Tn) + D(P0P1)
(1) Let F(λ) = lim sup
n→∞ Fn(λ)
and F(λ) = lim inf
n→∞ Fn(λ).
If F(λ) = F(λ), then we term this common value as the second-order exponent of SHT under the expectation constraint and we denote it simply as F(λ).
7 / 20
8/20
Second-order Asymptotics under the Expectation Constraint
Let {αk}∞
k=1 and {βk}∞ k=1 be two increasing sequences of positive
real numbers such that αk → ∞ and βk → ∞ as k → ∞. Let T(βk) = inf{n ≥ 1 : Sn > βk} and ˜ T(αk) = inf{n ≥ 1 : −Sn > αk}. Furthermore, let Rk = ST(βk) − βk and ˜ Rk = −S ˜
T(αk) − αk. It is known that
◮ if the true hypothesis is H0, {Rk}∞
k=1 converges in distribution
to some random variable R and the limit is independent of the choice of {αk}∞
k=1;
◮ if the true hypothesis is H1, { ˜ Rk}∞
k=1 converges in distribution
to some random variable ˜ R and the limit is independent of the choice of {βk}∞
k=1.
8 / 20
9/20
Second-order Asymptotics under the Expectation Constraint
Define A(P0, P1) = E[R], ˜ A(P0, P1) = E[ ˜ R], B(P0, P1) = log E[e−R], ˜ B(P0, P1) = log E[e− ˜
R].
We note that ˜ A(P0, P1) = A(P1, P0) and ˜ B(P0, P1) = B(P1, P0) in general.
Theorem 1
Let P0 and P1 be such that maxi=0,1 EPi
p1(X1)
< ∞ and log p0(X1)
p1(X1) is non-arithmetic when X1 ∼ P0. Then for every
λ ∈ [0, 1], F(λ) = F(λ) = F(λ) = λ
B(P0, P1)
˜ A(P0, P1) + B(P0, P1)
9 / 20
10/20
Remark 1
The rate of convergence of the optimal λ-weighted finite-length exponents sup(δn,Tn) − λ
n log P1|0(δn, Tn) − 1−λ n
log P0|1(δn, Tn) to the λ-weighted exponents λD(P1P0) + (1 − λ)D(P0P1) is Θ( 1
n) .
10 / 20
11/20
Example 1
Let γ0 and γ1 be two positive real numbers such that γ0 < γ1. Let p0(x) = γ0e−γ0x and p1(x) = γ1e−γ1x for x > 0. We can numerically compute the second-order exponent under the expectation constraint. This is illustrated in Figure 3 for various λ’s.
λ=0.1 λ=0.5 λ=0.9 0.2 0.4 0.6 0.8 1.0 γ 1 2 3 4 F
Figure: Exponential distributions as in Example 2 with γ0 = γ and γ1 = 1
11 / 20
12/20
Example 2
Let θ0 and θ1 be two distinct real numbers. Let p0(x) =
1 √ 2πe− (x−θ0)2
2
and p1(x) =
1 √ 2πe− (x−θ1)2
2
for x ∈ R. Let ∆(θ) = |θ1 − θ2|. Then we can numerically compute the second-order exponent under the expectation constraint. This is illustrated in Figure 3. We note that for this case of discriminating between two Gaussians, F(λ) does not depend on λ ∈ [0, 1].
2 4 6 8 |Δθ| 5 10 15 20 F
Figure: Gaussian distributions
12 / 20
13/20
Auxiliary Tools
In the proof of Theorem 1, we use the following results on the asymptotics of the first passage time. Let {αi}∞
i=1 and {βi}∞ i=1 be two increasing sequences of positive
real numbers such that αi → ∞ and βi → ∞ as i → ∞. Let (δi, Ti) be an SPRT with parameters (αi, βi).
Theorem 2 (Woodfore)
Assume that max{EP1[Y 2
1 ], EP0[Y 2 1 ]} < ∞ and Y1 is
non-arithmetic. Then as n → ∞, EP0[Tn] = βn D(P0P1) + A(P0, P1) D(P0P1) + o(1), and EP1[Tn] = αn D(P1P0) + ˜ A(P0, P1) D(P1P0) + o(1).
13 / 20
14/20
Auxiliary Tools Theorem 3 (Woodfore)
Assume that max{EP1[Y 2
1 ], EP0[Y 2 1 ]} < ∞ and Y1 is
non-arithmetic. Then, lim
i→∞ P1|0(δi, Ti)eαi = e ˜ B(P0,P1),
lim
i→∞ P0|1(δi, Ti)eβi = eB(P0,P1).
The following lemma characterizes the optimality of the SPRT.
Lemma 4 (Ferguson)
Let (δ, T) be an SPRT. Let (˜ δ, ˜ T) be any SHT such that EP0[ ˜ T] ≤ EP0[T] and EP1[ ˜ T] ≤ EP1[T]. Then P0|1(δ, T) ≤ P0|1(˜ δ, ˜ T) and P1|0(δ, T) ≤ P1|0(˜ δ, ˜ T).
14 / 20
15/20
Upper Bound on the Error Probabilities
For any δ > 0, let αn = nD(P1P0)
˜ A(P0,P1) nD(P1P0) − δ n
βn = nD(P0P1)
A(P0,P1) nD(P0P1) − δ n
(δn, Tn) with parameters (αn, βn). As αn → ∞ and βn → ∞, from Theorem 3, it follows that P1|0(δn, Tn) = P0(ST ≤ αn) = (e
˜ B(P0,P1) + o(1))e−αn.
and similarly, P0|1(δn, Tn) = (eB(P0,P1) + o(1))e−βn. We now show that EP0[Tn] ≤ n and EP1[Tn] ≤ n. From Theorem 2, it follows that EP0[Tn] = βn D(P0P1) + A(P0, P1) D(P0P1) + o(1) = n − δ + o(1). (2)
15 / 20
16/20
Upper Bound on the Error Probabilities
Similarly, EP1[Tn] ≤ n − δ + o(1). Thus for sufficiently large n, we have that EP0[Tn] ≤ n and EP1[Tn] ≤ n. It then follows that F(λ) = lim sup
n→∞ Fn(λ) ≤ λ(A(P0, P1) + ˜
B(P0, P1)) + (1 − λ)( ˜ A(P0, P1) + B(P0, P1)) + (λD(P0P1) + (1 − λ)D(P1P0))δ. As δ > 0 is arbitrary, we conclude that F(λ) ≤ λ(A(P0, P1) + ˜ B(P0, P1)) + (1 − λ)( ˜ A(P0, P1) + B(P0, P1)).
16 / 20
17/20
Lower Bound on the Error Probabilities
For any δ > 0, let ˆ αn = nD(P1P0)
˜ A(P0,P1) nD(P1P0) + δ n
ˆ βn = nD(P0P1)
A(P0,P1) nD(P0P1) + δ n
SPRTs {(ˆ δn, ˆ Tn)}∞
n=1 with parameters {(ˆ
αn, ˆ βn)}∞
n=1. Now we
show that for sufficiently large n, we have that for i = 0, 1, EPi[ ˆ Tn] ≥ n + δ
βn, it follows that for sufficiently large n, EP0[ ˆ Tn] = ˆ βn D(P0P1) + A(P0, P1) D(P0P1) + o(1) = n + δ + o(1) ≥ n + δ 2 > n. Similarly, for sufficiently large n, EP1[ ˆ Tn] ≥ n + δ
2 > n.
17 / 20
18/20
Lower Bound on the Error Probabilities
Then from Lemma 4, we conclude that for any SHT (δn, Tn) with max{EP0[T], EP1[T]} ≤ n, P0|1(δn, Tn) ≥ P0|1(ˆ δn, ˆ Tn) and P1|0(δn, Tn) ≥ P1|0(ˆ δn, ˆ Tn). From Theorem 3, we have that log P1|0(ˆ δn, ˆ Tn) = ˜ B(P0, P1) − ˆ αn + log(1 + o(1)) (3) and log P0|1(ˆ δn, ˆ Tn) = B(P0, P1) − ˆ βn + log(1 + o(1)). Combining P1|0(δn, Tn) ≥ P1|0(ˆ δn, ˆ Tn) and (3), we have that log P1|0(δn, Tn) + nD(P1P0) ≥ log P1|0(ˆ δn, ˆ Tn) + nD(P1P0) ≥ ˜ B(P0, P1) + A(P0, P1) − δD(P0P1) + log(1 + o(1)). (4)
18 / 20
19/20
Lower Bound on the Error Probabilities
Similarly, we have that log P0|1(δn, Tn) + nD(P0P1) ≥ B(P0, P1) + ˜ A(P0, P1) − δD(P1P0) + log(1 + o(1)). (5) As limx→0 log(1 + x) = 0, combining (4) and (5), we have that F(λ) ≥ lim inf
n→∞ Fn(λ) ≥ λ( ˜
B(P0, P1) + A(P0, P1)) + (1 − λ)(B(P0, P1) + ˜ A(P0, P1)) − (λD(P0P1) + (1 − λ)(D(P1P0))δ. Finally letting δ → 0+, we have F(λ) ≥ λ( ˜ B(P0, P1) + A(P0, P1)) + (1 − λ)(B(P0, P1) + ˜ A(P0, P1)), as desired.
19 / 20
20/20
20 / 20