Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong - PowerPoint PPT Presentation

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong Li and Vincent Y. F. Tan June 4, 2020 1/20 1 / 20

Outline ◮ Problem Setup ◮ Literature Review ◮ Main Result ◮ Numerical Examples ◮ Proof of the Main Result 2/20 2 / 20

Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . 3/20 3 / 20

Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . ◮ { X k } ∞ k =1 are i.i.d. random variables distributed according to P . 3/20 3 / 20

Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . ◮ { X k } ∞ k =1 are i.i.d. random variables distributed according to P . ◮ F n is the σ -algebra generated by X n 1 . 3/20 3 / 20

Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . ◮ { X k } ∞ k =1 are i.i.d. random variables distributed according to P . ◮ F n is the σ -algebra generated by X n 1 . ◮ T is a stopping time adapted to the filtration {F n } ∞ n =1 and F T is the σ -algebra associated with T . 3/20 3 / 20

Sequential Hypothesis Testing Problem Setup ◮ Binary hypothesis testing: H 0 : P = P 0 and H 1 : P = P 1 , where P 0 and P 1 are probability distributions defined on the same alphabet X . ◮ { X k } ∞ k =1 are i.i.d. random variables distributed according to P . ◮ F n is the σ -algebra generated by X n 1 . ◮ T is a stopping time adapted to the filtration {F n } ∞ n =1 and F T is the σ -algebra associated with T . ◮ δ is a { 0 , 1 } -valued function measurable with respect to F T . δ = i means H i is the underlying hypothesis. A pair ( δ, T ) is called a sequential hypothesis test (SHT). 3/20 3 / 20

Sequential Hypothesis Testing Problem Setup ◮ P 1 | 0 ( δ, T ) = P T P 0 | 1 ( δ, T ) = P T 0 ( δ = 1) and 1 ( δ = 0) . ◮ Expectation constraint on the sample size T : for any integer n , max i =0 , 1 E P i [ T ] ≤ n . Sequential Probability Ratio Test One important class of SHTs is the family of sequential probability ratio tests (SPRTs). Let Y k = log p 0 ( X k ) p 1 ( X k ) and S n = � n k =1 Y k . For any pair of positive real numbers α and β , an SPRT with parameters ( α, β ) is defined as follows � 0 if S T > β δ = 1 if S T < − α, 4/20 where T = inf { n ≥ 1 : S n / ∈ [ − α, β ] } . 4 / 20

Problem Setup Error Exponents Given a sequence of SHTs { ( δ n , T n ) } ∞ n =1 satisfy the expectation constraint, we are concerned with the error exponents ( E 0 , E 1 ) defined as 1 1 1 1 E 0 = lim inf n log and E 1 = lim inf n log P 0 | 1 ( δ n , T n ) . P 1 | 0 ( δ n , T n ) n →∞ n →∞ 5/20 5 / 20

Literature Review ◮ In 1948, Wald and Wolfowitz showed that E 0 ≤ D ( P 1 � P 0 ) and E 1 ≤ D ( P 0 � P 1 ) and the error exponent can be achieved by a sequence of SPRTs. 6/20 6 / 20

Literature Review ◮ In 1948, Wald and Wolfowitz showed that E 0 ≤ D ( P 1 � P 0 ) and E 1 ≤ D ( P 0 � P 1 ) and the error exponent can be achieved by a sequence of SPRTs. ◮ In 1986 and 1988, Lotov studied the series expansion of the error probabilities for a sequence of SPRTs. 6/20 6 / 20

Literature Review ◮ In 1948, Wald and Wolfowitz showed that E 0 ≤ D ( P 1 � P 0 ) and E 1 ≤ D ( P 0 � P 1 ) and the error exponent can be achieved by a sequence of SPRTs. ◮ In 1986 and 1988, Lotov studied the series expansion of the error probabilities for a sequence of SPRTs. ◮ For the fixed length binary hypothesis testing problem, Strassen showed that the backoff from the optimal exponent D ( P 0 � P 1 ) is of the order Θ( 1 √ n ) and characterized the implied constant as a function of the relative entropy variance and the Gaussian cumulative distribution function. 6/20 6 / 20

Main Result Secon-order Term under Expectation Constraint For fixed λ ∈ [0 , 1], � 1 � F n ( λ ) = sup λ n log P 1 | 0 ( δ n , T n ) + D ( P 1 � P 0 ) ( δ n , T n ):max i =0 , 1 E Pi [ T n ] ≤ n � 1 � + (1 − λ ) n log P 0 | 1 ( δ n , T n ) + D ( P 0 � P 1 ) . (1) Let F ( λ ) = lim sup n →∞ F n ( λ ) and F ( λ ) = lim inf n →∞ F n ( λ ) . If F ( λ ) = F ( λ ), then we term this common value as the second-order exponent of SHT under the expectation constraint and we denote it simply as F ( λ ). 7/20 7 / 20

Main Result Second-order Asymptotics under the Expectation Constraint Let { α k } ∞ k =1 and { β k } ∞ k =1 be two increasing sequences of positive real numbers such that α k → ∞ and β k → ∞ as k → ∞ . Let T ( β k ) = inf { n ≥ 1 : S n > β k } and ˜ T ( α k ) = inf { n ≥ 1 : − S n > α k } . Furthermore, let R k = S T ( β k ) − β k and ˜ R k = − S ˜ T ( α k ) − α k . It is known that ◮ if the true hypothesis is H 0 , { R k } ∞ k =1 converges in distribution to some random variable R and the limit is independent of the choice of { α k } ∞ k =1 ; ◮ if the true hypothesis is H 1 , { ˜ R k } ∞ k =1 converges in distribution to some random variable ˜ R and the limit is independent of the choice of { β k } ∞ k =1 . 8/20 8 / 20

Main Result Second-order Asymptotics under the Expectation Constraint Define A ( P 0 , P 1 ) = E [ ˜ ˜ A ( P 0 , P 1 ) = E [ R ] , R ] , B ( P 0 , P 1 ) = log E [ e − ˜ B ( P 0 , P 1 ) = log E [ e − R ] , ˜ R ] . We note that ˜ A ( P 0 , P 1 ) � = A ( P 1 , P 0 ) and ˜ B ( P 0 , P 1 ) � = B ( P 1 , P 0 ) in general. Theorem 1 �� 2 � � log p 0 ( X 1 ) � Let P 0 and P 1 be such that max i =0 , 1 E P i < ∞ and p 1 ( X 1 ) log p 0 ( X 1 ) p 1 ( X 1 ) is non-arithmetic when X 1 ∼ P 0 . Then for every λ ∈ [0 , 1] , F ( λ ) = F ( λ ) = F ( λ ) = � ˜ A ( P 0 , P 1 ) + ˜ � � � λ B ( P 0 , P 1 ) + (1 − λ ) A ( P 0 , P 1 ) + B ( P 0 , P 1 ) . 9/20 9 / 20

Second-order Results Remark 1 The rate of convergence of the optimal λ -weighted finite-length exponents sup ( δ n , T n ) − λ n log P 1 | 0 ( δ n , T n ) − 1 − λ log P 0 | 1 ( δ n , T n ) to n the λ -weighted exponents λ D ( P 1 � P 0 ) + (1 − λ ) D ( P 0 � P 1 ) is Θ( 1 n ) . 10/20 10 / 20

Numerical Examples Example 1 Let γ 0 and γ 1 be two positive real numbers such that γ 0 < γ 1 . Let p 0 ( x ) = γ 0 e − γ 0 x and p 1 ( x ) = γ 1 e − γ 1 x for x > 0. We can numerically compute the second-order exponent under the expectation constraint. This is illustrated in Figure 3 for various λ ’s. F 4 λ = 0.1 λ = 0.5 3 λ = 0.9 2 1 γ 0.2 0.4 0.6 0.8 1.0 Figure: Exponential distributions as in Example 2 with γ 0 = γ and γ 1 = 1 11/20 11 / 20

Numerical Examples Example 2 Let θ 0 and θ 1 be two distinct real numbers. Let 2 π e − ( x − θ 0)2 2 π e − ( x − θ 1)2 1 1 p 0 ( x ) = √ and p 1 ( x ) = √ for x ∈ R . Let 2 2 ∆( θ ) = | θ 1 − θ 2 | . Then we can numerically compute the second-order exponent under the expectation constraint. This is illustrated in Figure 3. We note that for this case of discriminating between two Gaussians, F ( λ ) does not depend on λ ∈ [0 , 1]. F 20 15 10 5 | Δθ | 2 4 6 8 Figure: Gaussian distributions 12/20 12 / 20

Proof of the Main Result Auxiliary Tools In the proof of Theorem 1, we use the following results on the asymptotics of the first passage time. Let { α i } ∞ i =1 and { β i } ∞ i =1 be two increasing sequences of positive real numbers such that α i → ∞ and β i → ∞ as i → ∞ . Let ( δ i , T i ) be an SPRT with parameters ( α i , β i ). Theorem 2 (Woodfore) Assume that max { E P 1 [ Y 2 1 ] , E P 0 [ Y 2 1 ] } < ∞ and Y 1 is non-arithmetic. Then as n → ∞ , D ( P 0 � P 1 ) + A ( P 0 , P 1 ) β n E P 0 [ T n ] = D ( P 0 � P 1 ) + o (1) , and ˜ α n A ( P 0 , P 1 ) E P 1 [ T n ] = D ( P 1 � P 0 ) + D ( P 1 � P 0 ) + o (1) . 13/20 13 / 20

Proof of the Main Result Auxiliary Tools Theorem 3 (Woodfore) Assume that max { E P 1 [ Y 2 1 ] , E P 0 [ Y 2 1 ] } < ∞ and Y 1 is non-arithmetic. Then, ˜ i →∞ P 1 | 0 ( δ i , T i ) e α i = e B ( P 0 , P 1 ) , lim i →∞ P 0 | 1 ( δ i , T i ) e β i = e B ( P 0 , P 1 ) . lim The following lemma characterizes the optimality of the SPRT. Lemma 4 (Ferguson) Let ( δ, T ) be an SPRT. Let (˜ δ, ˜ T ) be any SHT such that E P 0 [ ˜ E P 1 [ ˜ T ] ≤ E P 0 [ T ] T ] ≤ E P 1 [ T ] . Then and P 0 | 1 ( δ, T ) ≤ P 0 | 1 (˜ δ, ˜ P 1 | 0 ( δ, T ) ≤ P 1 | 0 (˜ δ, ˜ T ) and T ) . 14/20 14 / 20

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong - PowerPoint PPT Presentation

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong Li and Vincent Y. F. Tan June 4, 2020 1/20 1 / 20 Outline Problem Setup Literature Review Main Result Numerical Examples Proof of the Main Result 2/20 2 /

Sequential techniques for Hypothesis testing & Change detection George V. Moustakides

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Asymptotics of symmetric functions with applications to Setup Asymptotics of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Biclosed sets in representation theory Al Garver, UQAM (joint with Thomas McConville and Kaveh

Selective Restructuring of Bo nding Vol me Hierarchies for Bounding Volume Hierarchies for

Minimum-Norm Interpolation in Statistical Learning: new phenomena in high dimensions Tengyuan

Likelihood Functions The likelihood function answers the question: What does the sensor tell about

0 20 40 60 80 100 N(t) N(t) = k 2 k 2 1 + t B L (t) = k sample path t l t l+ 1 t k 1 (0,

Which Multiple Testing Methods are Optimal? Peter H. Westfall, Texas Tech University Background

Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie,

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong - PowerPoint PPT Presentation

Second-Order Asymptotics of Sequential Hypothesis Testing Yonglong Li and Vincent Y. F. Tan June 4, 2020 1/20 1 / 20 Outline Problem Setup Literature Review Main Result Numerical Examples Proof of the Main Result 2/20 2 /

Sequential techniques for Hypothesis testing &amp; Change detection George V. Moustakides

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Asymptotics of symmetric functions with applications to Setup Asymptotics of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Biclosed sets in representation theory Al Garver, UQAM (joint with Thomas McConville and Kaveh

Selective Restructuring of Bo nding Vol me Hierarchies for Bounding Volume Hierarchies for

Minimum-Norm Interpolation in Statistical Learning: new phenomena in high dimensions Tengyuan

Likelihood Functions The likelihood function answers the question: What does the sensor tell about

0 20 40 60 80 100 N(t) N(t) = k 2 k 2 1 + t B L (t) = k sample path t l t l+ 1 t k 1 (0,

Which Multiple Testing Methods are Optimal? Peter H. Westfall, Texas Tech University Background

Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie,

Sequential techniques for Hypothesis testing & Change detection George V. Moustakides