lecture 8 information theory and statistics
play

Lecture 8: Information Theory and Statistics I-Hsiang Wang - PowerPoint PPT Presentation

Hypothesis Testing Lecture 8: Information Theory and Statistics I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015 1 / 30 I-Hsiang Wang IT Lecture 8 Part II Part II : Hypothesis


  1. Hypothesis Testing Lecture 8: Information Theory and Statistics I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015 1 / 30 I-Hsiang Wang IT Lecture 8 Part II Part II : Hypothesis Testing and Estimation

  2. Hypothesis Testing Basic Theory Asymptotics 1 Hypothesis Testing Basic Theory Asymptotics 2 / 30 I-Hsiang Wang IT Lecture 8 Part II

  3. Hypothesis Testing Basic Theory Asymptotics 1 Hypothesis Testing Basic Theory Asymptotics 3 / 30 I-Hsiang Wang IT Lecture 8 Part II

  4. Hypothesis Testing Basic Theory I-Hsiang Wang 4 / 30 Probability of miss detection (false negative; type II error): Probability of false alarm (false positive; type I error): 3 A popular measure of the cost is based on probability of errors: choose one of the two hypotheses, based on the observed realization IT Lecture 8 Part II Basic Setup We begin with the simplest setup – binary hypothesis testing: Asymptotics 1 Two hypotheses regarding the observation X , indexed by θ ∈ { 0 , 1 } : H 0 : X ∼ P 0 (Null Hypothesis, θ = 0 ) H 1 : X ∼ P 1 (Alternative Hypothesis, θ = 1 ) 2 Goal: design a decision making algorithm φ : X → { 0 , 1 } , x �→ ˆ θ , to of X , so that a certain cost (or risk ) is minimized. α φ ≡ P FA ( φ ) ≜ P {H 1 is chosen | H 0 } . β φ ≡ P MD ( φ ) ≜ P {H 0 is chosen | H 1 } .

  5. Hypothesis Testing by its corresponding acceptance (decision) regions : I-Hsiang Wang 5 / 30 When the context is clear, we often drop the equivalently represented as Hence, the two types of probability of error can be Basic Theory IT Lecture 8 Part II Asymptotics Deterministic Testing Algorithm ≡ Decision Regions A test φ : X → { 0 , 1 } is equivalently characterized Observation Space A θ ( φ ) ≡ φ − 1 ( ) { } X ˆ x ∈ X : φ ( x ) = ˆ , ˆ ≜ θ θ θ = 0 , 1 . A 1 ( φ ) Acceptance Region of H 1 . ∑ P 0 ( x ) = ∑ α φ = φ ( x ) P 0 ( x ) , x ∈X x ∈A 1 ( φ ) ∑ P 1 ( x ) = ∑ A 0 ( φ ) β φ = (1 − φ ( x )) P 1 ( x ) . x ∈A 0 ( φ ) x ∈X Acceptance Region of H 0 . dependency on the test φ when dealing with acceptance regions A ˆ θ .

  6. Hypothesis Testing Basic Theory I-Hsiang Wang 6 / 30 IT Lecture 8 Part II Definition 1 (Likelihood Ratio Test) Likelihood Ratio Test Asymptotics A (deteministic) likelihood ratio test (LRT) is a test φ τ , parametrized by constants τ > 0 (called threshold), defined as follows: { 1 if P 1 ( x ) > τ P 0 ( x ) φ τ ( x ) = if P 1 ( x ) ≤ τ P 0 ( x ) . 0 For x ∈ supp P 0 , the likelihood ratio L ( x ) ≜ P 1 ( x ) P 0 ( x ) . Hence, LRT is a thresholding algorithm on likelihood ratio L ( x ) . Remark : For computational convenience, often one deals with log likelihood ratio (LLR) log ( L ( x )) = log ( P 1 ( x )) − log ( P 0 ( x )) .

  7. Hypothesis Testing Basic Theory I-Hsiang Wang 7 / 30 IT Lecture 8 Part II Theorem 1 (Neyman-Pearson Lemma) Asymptotics Trade-Off Between α (P FA ) and β (P MD ) For a likelihood ratio test φ τ and another deterministic test φ , α φ ≤ α φ τ = ⇒ β φ ≥ β φ τ . pf : Observe ∀ x ∈ X , 0 ≤ ( φ τ ( x ) − φ ( x )) ( P 1 ( x ) − τ P 0 ( x )) , because if P 1 ( x ) − τ P 0 ( x ) > 0 = ⇒ φ τ ( x ) = 1 = ⇒ ( φ τ ( x ) − φ ( x )) ≥ 0 . if P 1 ( x ) − τ P 0 ( x ) ≤ 0 = ⇒ φ τ ( x ) = 0 = ⇒ ( φ τ ( x ) − φ ( x )) ≤ 0 . Summing over all x ∈ X , we get 0 ≤ (1 − β φ τ ) − (1 − β φ ) − τ ( α φ τ − α φ ) = ( β φ − β φ τ ) + τ ( α φ − α φ τ ) . Since τ > 0 , from above we conclude that α φ ≤ α φ τ = ⇒ β φ ≥ β φ τ .

  8. Hypothesis Testing Basic Theory I-Hsiang Wang 8 / 30 What is the optimal test achieving the curve? What is the optimal trade-off curve? Question : IT Lecture 8 Part II Asymptotics β ( P MD ) β ( P MD ) 1 1 α ( P FA ) α ( P FA ) 1 1

  9. Hypothesis Testing Definition 3 (Randomized LRT) I-Hsiang Wang 9 / 30 Basic Theory IT Lecture 8 Part II Asymptotics Definition 2 (Randomized Test) Randomized tests include deterministic tests as special cases. Randomized Testing Algorithm A randomized test decides ˆ θ = 1 with probability φ ( x ) and ˆ θ = 0 with probability 1 − φ ( x ) , where φ is a mapping φ : X → [0 , 1] . Note : A randomized test is characterized by φ , as in deterministic tests. A randomized likelihood ratio test (LRT) is a test φ τ,γ , parametrized by cosntants τ > 0 and γ ∈ (0 , 1) , defined as follows:  1 if P 1 ( x ) > τ P 0 ( x )   φ τ,γ ( x ) = γ if P 1 ( x ) = τ P 0 ( x ) .   0 if P 1 ( x ) < τ P 0 ( x )

  10. Hypothesis Testing Basic Theory I-Hsiang Wang 10 / 30 attains optimality for the Neyman-Pearson Problem . Theorem 2 (Neyman-Pearson) subject to minimize Neyman-Pearson Problem Consider the following optimization problem: Randomized LRT Achieves the Optimal Trade-Off Asymptotics IT Lecture 8 Part II β φ φ : X→ [0 , 1] α φ ≤ α ∗ A randomized LRT φ τ ∗ ,γ ∗ with the parameters ( τ ∗ , γ ∗ ) satisfying α ∗ = α φ τ ∗ ,γ ∗ ,

  11. Hypothesis Testing Basic Theory I-Hsiang Wang 11 / 30 IT Lecture 8 Part II Asymptotics pf : First argue that for any α ∗ ∈ (0 , 1) , one can find ( τ ∗ , γ ∗ ) such that α ∗ = α φ τ ∗ ,γ ∗ = ∑ φ τ ∗ ,γ ∗ ( x ) P 0 ( x ) x ∈X ∑ ∑ x : L ( x )= τ ∗ γ ∗ P 0 ( x ) = x : L ( x ) >τ ∗ P 0 ( x ) + For any test φ , due to a similar argument as in Theorem 1, we have ∀ x ∈ X , ( φ τ ∗ ,γ ∗ ( x ) − φ ( x )) ( P 1 ( x ) − τ ∗ P 0 ( x )) ≥ 0 . Summing over all x ∈ X , similarly we get ( ) + τ ∗ ( ) β φ − β φ τ ∗ ,γ ∗ α φ − α φ τ ∗ ,γ ∗ ≥ 0 Hence, for any feasible test φ with α φ ≤ α ∗ = α φ τ ∗ ,γ ∗ , its probability of type II error β φ ≥ β φ τ ∗ ,γ ∗ .

  12. Hypothesis Testing Basic Theory I-Hsiang Wang 12 / 30 error (or in general, a risk function) is minimized. with knowledge of prior probabilities so that the average probability of The Bayesian hypothesis testing problem is to test the two hypotheses IT Lecture 8 Part II With prior probabilities, it then makes sense to talk about the average Asymptotics Bayesian Setup Sometimes prior probabilities of the two hypotheses are known: π θ ≜ P {H θ is true } , θ = 0 , 1 , π 0 + π 1 = 1 . In this sense, one can view the index Θ as a (binary) random variable with (prior) distribution P { Θ = θ } = π θ , for θ = 0 , 1 . probability of error for a test φ , or more generally, the average cost (risk) : [ { }] Θ ̸ = ˆ P e ( φ ) ≜ π 0 α φ + π 1 β φ = E Θ , X Θ 1 , [ ] R ( φ ) ≜ E Θ , X . r Θ , ˆ Θ

  13. Hypothesis Testing Basic Theory I-Hsiang Wang 13 / 30 attains optimality for the Bayesian Problem . threshold Theorem 3 (LRT is an Optimal Bayesian Test) with known IT Lecture 8 Part II minimize Bayesian Problem Consider the following problem of minimizing Bayes risk. Minimizing Bayes Risk Asymptotics [ ] R ( φ ) ≜ E Θ , X r Θ , ˆ Θ φ : X→ [0 , 1] ( π 0 , π 1 ) and r θ, ˆ θ Assume r 0 , 0 < r 0 , 1 and r 1 , 1 < r 1 , 0 . A deterministic LRT φ τ ∗ with τ ∗ = ( r 0 , 1 − r 0 , 0 ) π 0 ( r 1 , 0 − r 1 , 1 ) π 1

  14. Hypothesis Testing Basic Theory I-Hsiang Wang 14 / 30 It is then obvious that we should choose IT Lecture 8 Part II Asymptotics pf : R ( φ ) = ∑ r 0 , 0 π 0 P 0 ( x ) (1 − φ ( x )) + ∑ r 0 , 1 π 0 P 0 ( x ) φ ( x ) x ∈X x ∈X + ∑ r 1 , 0 π 1 P 1 ( x ) (1 − φ ( x )) + ∑ r 1 , 1 π 1 P 1 ( x ) φ ( x ) x ∈X x ∈X = r 0 , 0 π 0 + ∑ ( r 0 , 1 − r 0 , 0 ) π 0 P 0 ( x ) φ ( x ) x ∈X + r 1 , 0 π 1 + ∑ ( r 1 , 1 − r 1 , 0 ) π 1 P 1 ( x ) φ ( x ) x ∈X ( ∗ ) � �� � [ ] = ∑ ( r 0 , 1 − r 0 , 0 ) π 0 P 0 ( x ) − ( r 1 , 1 − r 1 , 0 ) π 1 P 1 ( x ) φ ( x ) x ∈X + r 0 , 0 π 0 + r 1 , 0 π 1 . For each x ∈ X , we shall choose φ ( x ) ∈ [0 , 1] such that ( ∗ ) is minimized. { 1 if ( r 0 , 1 − r 0 , 0 ) π 0 P 0 ( x ) − ( r 1 , 1 − r 1 , 0 ) π 1 P 1 ( x ) < 0 φ ( x ) = if ( r 0 , 1 − r 0 , 0 ) π 0 P 0 ( x ) − ( r 1 , 1 − r 1 , 0 ) π 1 P 1 ( x ) ≥ 0 . 0

  15. Hypothesis Testing M -ary hypothesis testing I-Hsiang Wang 15 / 30 with information theoretic tools. explore the asymptotic behavior of hypothesis testing, and the connection Here we do not pursue these directions further. Instead, we would like to Composite hypothesis testing, etc. Minimax risk optimization (with unknown prior) Extensions include Basic Theory and Neyman-Pearson settings. Moreover, a likelihood ratio test (LRT) is optimal both in the Bayesian turns out to be a sufficient statistics. For binary hypothesis testing problems, the likelihood ratio Discussions Asymptotics IT Lecture 8 Part II L ( x ) ≜ P 1 ( x ) P 0 ( x )

  16. Hypothesis Testing Basic Theory Asymptotics 1 Hypothesis Testing Basic Theory Asymptotics 16 / 30 I-Hsiang Wang IT Lecture 8 Part II

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend