 
              Hypothesis Testing Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay August 22, 2012 1 / 23
Basics of Hypothesis Testing
What is a Hypothesis? One situation among a set of possible situations Example (Radar) EM waves are transmitted and the reflections observed. Null Hypothesis Plane absent Alternative Hypothesis Plane present For a given set of observations, either hypothesis may be true. 3 / 23
What is Hypothesis Testing? • A statistical framework for deciding which hypothesis is true • Under each hypothesis the observations are assumed to have a known distribution • Consider the case of two hypotheses (binary hypothesis testing) H 0 : Y ∼ P 0 H 1 : Y ∼ P 1 Y is the random observation vector belonging to observation set Γ ⊆ R n for n ∈ N • The hypotheses are assumed to occur with given prior probabilities Pr ( H 0 is true ) = π 0 Pr ( H 1 is true ) = π 1 where π 0 + π 1 = 1. 4 / 23
Location Testing with Gaussian Error • Let observation set Γ = R and µ > 0 Y ∼ N ( − µ, σ 2 ) H 0 : Y ∼ N ( µ, σ 2 ) H 1 : p 0 ( y ) p 1 ( y ) µ y − µ • Any point in Γ can be generated under both H 0 and H 1 • What is a good decision rule for this hypothesis testing problem which takes the prior probabilities into account? 5 / 23
What is a Decision Rule? • A decision rule for binary hypothesis testing is a partition of Γ into Γ 0 and Γ 1 such that � 0 if y ∈ Γ 0 δ ( y ) = 1 if y ∈ Γ 1 We decide H i is true when δ ( y ) = i for i ∈ { 0 , 1 } • For the location testing with Gaussian error problem, one possible decision rule is Γ 0 = ( −∞ , 0 ] Γ 1 = ( 0 , ∞ ) and another possible decision rule is Γ 0 = ( −∞ , − 100 ) ∪ ( − 50 , 0 ) Γ 1 = [ − 100 , − 50 ] ∪ [ 0 , ∞ ) • Given that partitions of the observation set define decision rules, what is the optimal partition? 6 / 23
Which is the Optimal Decision Rule? • Minimizing the probability of decision error gives the optimal decision rule • For the binary hypothesis testing problem of H 0 versus H 1 , the conditional decision error probability given H i is true is P e | i = Pr [ Deciding H 1 − i is true | H i is true ] = Pr [ Y ∈ Γ 1 − i | H i ] = 1 − Pr [ Y ∈ Γ i | H i ] = 1 − P c | i • Probability of decision error is P e = π 0 P e | 0 + π 1 P e | 1 • Probability of correct decision is P c = π 0 P c | 0 + π 1 P c | 1 = 1 − P e 7 / 23
Which is the Optimal Decision Rule? • Maximizing the probability of correct decision will minimize probability of decision error • Probability of correct decision is = π 0 P c | 0 + π 1 P c | 1 P c � � = π 0 p 0 ( y ) dy + π 1 p 1 ( y ) dy y ∈ Γ 0 y ∈ Γ 1 • If a point y in Γ belongs to Γ i , its contribution to P c is proportional to π i p i ( y ) • To maximize P c , we choose the partition { Γ 0 , Γ 1 } as Γ 0 = { y ∈ Γ | π 0 p 0 ( y ) ≥ π 1 p 1 ( y ) } Γ 1 = { y ∈ Γ | π 1 p 1 ( y ) > π 0 p 0 ( y ) } • The points y for which π 0 p 0 ( y ) = π 1 p 1 ( y ) can be in either Γ 0 and Γ 1 (the optimal decision rule is not unique) 8 / 23
Location Testing with Gaussian Error • Let µ 1 > µ 0 and π 0 = π 1 = 1 2 H 0 : Y = µ 0 + Z H 1 : Y = µ 1 + Z where Z ∼ N ( 0 , σ 2 ) p 0 ( y ) p 1 ( y ) µ 0 µ 1 y ( y − µ 0 ) 2 1 2 πσ 2 e − √ p 0 ( y ) = 2 σ 2 2 πσ 2 e − ( y − µ 1 ) 2 1 p 1 ( y ) = √ 2 σ 2 9 / 23
Location Testing with Gaussian Error • Optimal decision rule is given by the partition { Γ 0 , Γ 1 } Γ 0 = { y ∈ Γ | π 0 p 0 ( y ) ≥ π 1 p 1 ( y ) } Γ 1 = { y ∈ Γ | π 1 p 1 ( y ) > π 0 p 0 ( y ) } • For π 0 = π 1 = 1 2 � � � y ≤ µ 1 + µ 0 � � Γ 0 = y ∈ Γ � 2 � � � y > µ 1 + µ 0 � � Γ 1 = y ∈ Γ � 2 10 / 23
Location Testing with Gaussian Error P e | 0 P e | 1 µ 0 µ 1 y µ 0 + µ 1 2 � � Y > µ 0 + µ 1 � � µ 1 − µ 0 � � P e | 0 = Pr = Q � H 0 � 2 2 σ � � Y ≤ µ 0 + µ 1 � � µ 0 − µ 1 � � µ 1 − µ 0 � � P e | 1 = Pr � H 1 = Φ = Q � 2 2 σ 2 σ � µ 1 − µ 0 � P e = π 0 P e | 0 + π 1 P e | 1 = Q 2 σ This P e is for π 0 = π 1 = 1 2 11 / 23
Location Testing with Gaussian Error • Suppose π 0 � = π 1 • Optimal decision rule is still given by the partition { Γ 0 , Γ 1 } Γ 0 = { y ∈ Γ | π 0 p 0 ( y ) ≥ π 1 p 1 ( y ) } Γ 1 = { y ∈ Γ | π 1 p 1 ( y ) > π 0 p 0 ( y ) } • The partitions specialized to this problem are σ 2 � � � y ≤ µ 1 + µ 0 � ( µ 1 − µ 0 ) log π 0 � Γ 0 = y ∈ Γ + � 2 π 1 σ 2 � � � � y > µ 1 + µ 0 ( µ 1 − µ 0 ) log π 0 � Γ 1 = y ∈ Γ + � 2 π 1 12 / 23
Location Testing with Gaussian Error Suppose π 0 = 0 . 6 and π 1 = 0 . 4 σ 2 + 0 . 4054 σ 2 τ = µ 1 + µ 0 = µ 1 + µ 0 ( µ 1 − µ 0 ) log π 0 + 2 π 1 2 ( µ 1 − µ 0 ) P e | 0 P e | 1 µ 0 τ µ 1 y 13 / 23
Location Testing with Gaussian Error Suppose π 0 = 0 . 4 and π 1 = 0 . 6 σ 2 − 0 . 4054 σ 2 τ = µ 1 + µ 0 = µ 1 + µ 0 ( µ 1 − µ 0 ) log π 0 + 2 π 1 2 ( µ 1 − µ 0 ) P e | 0 P e | 1 µ 0 τ µ 1 y 14 / 23
M -ary Hypothesis Testing • M hypotheses with prior probabilities π i , i = 1 , . . . , M H 1 : Y ∼ P 1 : Y ∼ P 2 H 2 . . . . . . H M : Y ∼ P M • A decision rule for M -ary hypothesis testing is a partition of Γ into M disjoint regions { Γ i | i = 1 , . . . , M } such that δ ( y ) = i if y ∈ Γ i We decide H i is true when δ ( y ) = i for i ∈ { 1 , . . . , M } • Minimum probability of error rule is δ MPE ( y ) = arg max 1 ≤ i ≤ M π i p i ( y ) 15 / 23
Maximum A Posteriori Decision Rule • The a posteriori probability of H i being true given observation y is � � � = π i p i ( y ) � P H i is true � y � p ( y ) • The MAP decision rule is given by � � � � δ MAP ( y ) = arg max 1 ≤ i ≤ M P H i is true � y = δ MPE ( y ) � MAP decision rule = MPE decision rule 16 / 23
Maximum Likelihood Decision Rule • The ML decision rule is given by δ ML ( y ) = arg max 1 ≤ i ≤ M p i ( y ) • If the M hypotheses are equally likely, π i = 1 M • The MPE decision rule is then given by δ MPE ( y ) = arg max 1 ≤ i ≤ M π i p i ( y ) = δ ML ( y ) For equal priors, ML decision rule = MPE decision rule 17 / 23
Irrelevant Statistics
Irrelevant Statistics • In this context, the term statistic means an observation • For a given hypothesis testing problem, all the observations may not be useful Example (Irrelevant Statistic) � � Y = Y 1 Y 2 H 1 : Y 1 = A + N 1 , Y 2 = N 2 H 0 : Y 1 = N 1 , Y 2 = N 2 where A > 0, N 1 ∼ N ( 0 , σ 2 ) , N 2 ∼ N ( 0 , σ 2 ) . • If N 1 and N 2 are independent, Y 2 is irrelevant. • If N 1 and N 2 are correlated, Y 2 is relevant. • Need a method to recognize irrelevant components of the observations 19 / 23
Characterizing an Irrelevant Statistic Theorem For M-ary hypothesis testing using an observation � � Y = Y 1 Y 2 , the statistic Y 2 is irrelevant if the conditional distribution of Y 2 , given Y 1 and H i , is independent of i. In terms of densities, the condition for irrelevance is p ( y 2 | y 1 , H i ) = p ( y 2 | y 1 ) ∀ i . Proof δ MPE ( y ) = arg max 1 ≤ i ≤ M π i p i ( y ) = arg max 1 ≤ i ≤ M π i p ( y | H i ) p ( y | H i ) = p ( y 1 , y 2 | H i ) = p ( y 2 | y 1 , H i ) p ( y 1 | H i ) = p ( y 2 | y 1 ) p ( y 1 | H i ) 20 / 23
Example of an Irrelevant Statistic Example (Independent Noise) � � Y = Y 1 Y 2 H 1 : Y 1 = A + N 1 , Y 2 = N 2 H 0 : Y 1 = N 1 , Y 2 = N 2 where A > 0, N 1 ∼ N ( 0 , σ 2 ) , N 2 ∼ N ( 0 , σ 2 ) , N 1 ⊥ N 2 . p ( y 2 | y 1 , H 0 ) = p ( y 2 ) p ( y 2 | y 1 , H 1 ) = p ( y 2 ) 21 / 23
Example of a Relevant Statistic Example (Correlated Noise) � T � Y = Y 1 Y 2 H 1 : Y 1 = A + N 1 , Y 2 = N 2 H 0 : Y 1 = N 1 , Y 2 = N 2 � 1 � ρ where A > 0, N 1 ∼ N ( 0 , σ 2 ) , N 2 ∼ N ( 0 , σ 2 ) , C Y = σ 2 ρ 1 − ( y 2 − ρ y 1 ) 2 1 2 ( 1 − ρ 2 ) σ 2 , p ( y 2 | y 1 , H 0 ) = 2 π ( 1 − ρ 2 ) σ 2 e � − [ y 2 − ρ ( y 1 − A )] 2 1 2 ( 1 − ρ 2 ) σ 2 p ( y 2 | y 1 , H 1 ) = 2 π ( 1 − ρ 2 ) σ 2 e � 22 / 23
Thanks for your attention 23 / 23
Recommend
More recommend