Hypothesis Testing Saravanan Vijayakumaran sarva@ee.iitb.ac.in - PowerPoint PPT Presentation

Hypothesis Testing Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay August 22, 2012 1 / 23

Basics of Hypothesis Testing

What is a Hypothesis? One situation among a set of possible situations Example (Radar) EM waves are transmitted and the reflections observed. Null Hypothesis Plane absent Alternative Hypothesis Plane present For a given set of observations, either hypothesis may be true. 3 / 23

What is Hypothesis Testing? • A statistical framework for deciding which hypothesis is true • Under each hypothesis the observations are assumed to have a known distribution • Consider the case of two hypotheses (binary hypothesis testing) H 0 : Y ∼ P 0 H 1 : Y ∼ P 1 Y is the random observation vector belonging to observation set Γ ⊆ R n for n ∈ N • The hypotheses are assumed to occur with given prior probabilities Pr ( H 0 is true ) = π 0 Pr ( H 1 is true ) = π 1 where π 0 + π 1 = 1. 4 / 23

Location Testing with Gaussian Error • Let observation set Γ = R and µ > 0 Y ∼ N ( − µ, σ 2 ) H 0 : Y ∼ N ( µ, σ 2 ) H 1 : p 0 ( y ) p 1 ( y ) µ y − µ • Any point in Γ can be generated under both H 0 and H 1 • What is a good decision rule for this hypothesis testing problem which takes the prior probabilities into account? 5 / 23

What is a Decision Rule? • A decision rule for binary hypothesis testing is a partition of Γ into Γ 0 and Γ 1 such that � 0 if y ∈ Γ 0 δ ( y ) = 1 if y ∈ Γ 1 We decide H i is true when δ ( y ) = i for i ∈ { 0 , 1 } • For the location testing with Gaussian error problem, one possible decision rule is Γ 0 = ( −∞ , 0 ] Γ 1 = ( 0 , ∞ ) and another possible decision rule is Γ 0 = ( −∞ , − 100 ) ∪ ( − 50 , 0 ) Γ 1 = [ − 100 , − 50 ] ∪ [ 0 , ∞ ) • Given that partitions of the observation set define decision rules, what is the optimal partition? 6 / 23

Which is the Optimal Decision Rule? • Minimizing the probability of decision error gives the optimal decision rule • For the binary hypothesis testing problem of H 0 versus H 1 , the conditional decision error probability given H i is true is P e | i = Pr [ Deciding H 1 − i is true | H i is true ] = Pr [ Y ∈ Γ 1 − i | H i ] = 1 − Pr [ Y ∈ Γ i | H i ] = 1 − P c | i • Probability of decision error is P e = π 0 P e | 0 + π 1 P e | 1 • Probability of correct decision is P c = π 0 P c | 0 + π 1 P c | 1 = 1 − P e 7 / 23

Which is the Optimal Decision Rule? • Maximizing the probability of correct decision will minimize probability of decision error • Probability of correct decision is = π 0 P c | 0 + π 1 P c | 1 P c � � = π 0 p 0 ( y ) dy + π 1 p 1 ( y ) dy y ∈ Γ 0 y ∈ Γ 1 • If a point y in Γ belongs to Γ i , its contribution to P c is proportional to π i p i ( y ) • To maximize P c , we choose the partition { Γ 0 , Γ 1 } as Γ 0 = { y ∈ Γ | π 0 p 0 ( y ) ≥ π 1 p 1 ( y ) } Γ 1 = { y ∈ Γ | π 1 p 1 ( y ) > π 0 p 0 ( y ) } • The points y for which π 0 p 0 ( y ) = π 1 p 1 ( y ) can be in either Γ 0 and Γ 1 (the optimal decision rule is not unique) 8 / 23

Location Testing with Gaussian Error • Let µ 1 > µ 0 and π 0 = π 1 = 1 2 H 0 : Y = µ 0 + Z H 1 : Y = µ 1 + Z where Z ∼ N ( 0 , σ 2 ) p 0 ( y ) p 1 ( y ) µ 0 µ 1 y ( y − µ 0 ) 2 1 2 πσ 2 e − √ p 0 ( y ) = 2 σ 2 2 πσ 2 e − ( y − µ 1 ) 2 1 p 1 ( y ) = √ 2 σ 2 9 / 23

Location Testing with Gaussian Error • Optimal decision rule is given by the partition { Γ 0 , Γ 1 } Γ 0 = { y ∈ Γ | π 0 p 0 ( y ) ≥ π 1 p 1 ( y ) } Γ 1 = { y ∈ Γ | π 1 p 1 ( y ) > π 0 p 0 ( y ) } • For π 0 = π 1 = 1 2 � � � y ≤ µ 1 + µ 0 � � Γ 0 = y ∈ Γ � 2 � � � y > µ 1 + µ 0 � � Γ 1 = y ∈ Γ � 2 10 / 23

Location Testing with Gaussian Error P e | 0 P e | 1 µ 0 µ 1 y µ 0 + µ 1 2 � � Y > µ 0 + µ 1 � � µ 1 − µ 0 � � P e | 0 = Pr = Q � H 0 � 2 2 σ � � Y ≤ µ 0 + µ 1 � � µ 0 − µ 1 � � µ 1 − µ 0 � � P e | 1 = Pr � H 1 = Φ = Q � 2 2 σ 2 σ � µ 1 − µ 0 � P e = π 0 P e | 0 + π 1 P e | 1 = Q 2 σ This P e is for π 0 = π 1 = 1 2 11 / 23

Location Testing with Gaussian Error • Suppose π 0 � = π 1 • Optimal decision rule is still given by the partition { Γ 0 , Γ 1 } Γ 0 = { y ∈ Γ | π 0 p 0 ( y ) ≥ π 1 p 1 ( y ) } Γ 1 = { y ∈ Γ | π 1 p 1 ( y ) > π 0 p 0 ( y ) } • The partitions specialized to this problem are σ 2 � � � y ≤ µ 1 + µ 0 � ( µ 1 − µ 0 ) log π 0 � Γ 0 = y ∈ Γ + � 2 π 1 σ 2 � � � � y > µ 1 + µ 0 ( µ 1 − µ 0 ) log π 0 � Γ 1 = y ∈ Γ + � 2 π 1 12 / 23

Location Testing with Gaussian Error Suppose π 0 = 0 . 6 and π 1 = 0 . 4 σ 2 + 0 . 4054 σ 2 τ = µ 1 + µ 0 = µ 1 + µ 0 ( µ 1 − µ 0 ) log π 0 + 2 π 1 2 ( µ 1 − µ 0 ) P e | 0 P e | 1 µ 0 τ µ 1 y 13 / 23

Location Testing with Gaussian Error Suppose π 0 = 0 . 4 and π 1 = 0 . 6 σ 2 − 0 . 4054 σ 2 τ = µ 1 + µ 0 = µ 1 + µ 0 ( µ 1 − µ 0 ) log π 0 + 2 π 1 2 ( µ 1 − µ 0 ) P e | 0 P e | 1 µ 0 τ µ 1 y 14 / 23

M -ary Hypothesis Testing • M hypotheses with prior probabilities π i , i = 1 , . . . , M H 1 : Y ∼ P 1 : Y ∼ P 2 H 2 . . . . . . H M : Y ∼ P M • A decision rule for M -ary hypothesis testing is a partition of Γ into M disjoint regions { Γ i | i = 1 , . . . , M } such that δ ( y ) = i if y ∈ Γ i We decide H i is true when δ ( y ) = i for i ∈ { 1 , . . . , M } • Minimum probability of error rule is δ MPE ( y ) = arg max 1 ≤ i ≤ M π i p i ( y ) 15 / 23

Maximum A Posteriori Decision Rule • The a posteriori probability of H i being true given observation y is � � � = π i p i ( y ) � P H i is true � y � p ( y ) • The MAP decision rule is given by � � � � δ MAP ( y ) = arg max 1 ≤ i ≤ M P H i is true � y = δ MPE ( y ) � MAP decision rule = MPE decision rule 16 / 23

Maximum Likelihood Decision Rule • The ML decision rule is given by δ ML ( y ) = arg max 1 ≤ i ≤ M p i ( y ) • If the M hypotheses are equally likely, π i = 1 M • The MPE decision rule is then given by δ MPE ( y ) = arg max 1 ≤ i ≤ M π i p i ( y ) = δ ML ( y ) For equal priors, ML decision rule = MPE decision rule 17 / 23

Irrelevant Statistics

Irrelevant Statistics • In this context, the term statistic means an observation • For a given hypothesis testing problem, all the observations may not be useful Example (Irrelevant Statistic) � � Y = Y 1 Y 2 H 1 : Y 1 = A + N 1 , Y 2 = N 2 H 0 : Y 1 = N 1 , Y 2 = N 2 where A > 0, N 1 ∼ N ( 0 , σ 2 ) , N 2 ∼ N ( 0 , σ 2 ) . • If N 1 and N 2 are independent, Y 2 is irrelevant. • If N 1 and N 2 are correlated, Y 2 is relevant. • Need a method to recognize irrelevant components of the observations 19 / 23

Characterizing an Irrelevant Statistic Theorem For M-ary hypothesis testing using an observation � � Y = Y 1 Y 2 , the statistic Y 2 is irrelevant if the conditional distribution of Y 2 , given Y 1 and H i , is independent of i. In terms of densities, the condition for irrelevance is p ( y 2 | y 1 , H i ) = p ( y 2 | y 1 ) ∀ i . Proof δ MPE ( y ) = arg max 1 ≤ i ≤ M π i p i ( y ) = arg max 1 ≤ i ≤ M π i p ( y | H i ) p ( y | H i ) = p ( y 1 , y 2 | H i ) = p ( y 2 | y 1 , H i ) p ( y 1 | H i ) = p ( y 2 | y 1 ) p ( y 1 | H i ) 20 / 23

Example of an Irrelevant Statistic Example (Independent Noise) � � Y = Y 1 Y 2 H 1 : Y 1 = A + N 1 , Y 2 = N 2 H 0 : Y 1 = N 1 , Y 2 = N 2 where A > 0, N 1 ∼ N ( 0 , σ 2 ) , N 2 ∼ N ( 0 , σ 2 ) , N 1 ⊥ N 2 . p ( y 2 | y 1 , H 0 ) = p ( y 2 ) p ( y 2 | y 1 , H 1 ) = p ( y 2 ) 21 / 23

Example of a Relevant Statistic Example (Correlated Noise) � T � Y = Y 1 Y 2 H 1 : Y 1 = A + N 1 , Y 2 = N 2 H 0 : Y 1 = N 1 , Y 2 = N 2 � 1 � ρ where A > 0, N 1 ∼ N ( 0 , σ 2 ) , N 2 ∼ N ( 0 , σ 2 ) , C Y = σ 2 ρ 1 − ( y 2 − ρ y 1 ) 2 1 2 ( 1 − ρ 2 ) σ 2 , p ( y 2 | y 1 , H 0 ) = 2 π ( 1 − ρ 2 ) σ 2 e � − [ y 2 − ρ ( y 1 − A )] 2 1 2 ( 1 − ρ 2 ) σ 2 p ( y 2 | y 1 , H 1 ) = 2 π ( 1 − ρ 2 ) σ 2 e � 22 / 23

Thanks for your attention 23 / 23

Hypothesis Testing Saravanan Vijayakumaran sarva@ee.iitb.ac.in - PowerPoint PPT Presentation

Hypothesis Testing Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay August 22, 2012 1 / 23 Basics of Hypothesis Testing What is a Hypothesis? One situation among a set of

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman

Bayesian hypothesis testing Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2019

Week 1, Video 4 Classifiers, Part 2 Classification There is something you want to predict

10-701 Machine Learning Classification Related reading: Mitchell 8.1,8.2; Bishop 1.5 Where we

Lecture 17 Spatial Data and Cartography (Part 2) Colin Rundel 03/22/2017 1 Plotting 2

Lecture 16 Spatial Data and Cartography Colin Rundel 03/20/2017 1 Background 2 Analysis of

Machine Learning III: Beyond Decision Trees Extensions to Decision Trees AI Class 15 (Ch.

CS 188: Artificial Intelligence Perceptrons and Logistic Regression Anca Dragan University of

Welfarism and the assessment of social decision rules Claus Beisbart and Stephan Hartmann

Robust optimization of uncertain multistage inventory systems with inexact data in decision rules