Lecture 8: Information Theory and Statistics I-Hsiang Wang - PowerPoint PPT Presentation

Hypothesis Testing Lecture 8: Information Theory and Statistics I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015 1 / 30 I-Hsiang Wang IT Lecture 8 Part II Part II : Hypothesis Testing and Estimation

Hypothesis Testing Basic Theory Asymptotics 1 Hypothesis Testing Basic Theory Asymptotics 2 / 30 I-Hsiang Wang IT Lecture 8 Part II

Hypothesis Testing Basic Theory I-Hsiang Wang 4 / 30 Probability of miss detection (false negative; type II error): Probability of false alarm (false positive; type I error): 3 A popular measure of the cost is based on probability of errors: choose one of the two hypotheses, based on the observed realization IT Lecture 8 Part II Basic Setup We begin with the simplest setup – binary hypothesis testing: Asymptotics 1 Two hypotheses regarding the observation X , indexed by θ ∈ { 0 , 1 } : H 0 : X ∼ P 0 (Null Hypothesis, θ = 0 ) H 1 : X ∼ P 1 (Alternative Hypothesis, θ = 1 ) 2 Goal: design a decision making algorithm φ : X → { 0 , 1 } , x �→ ˆ θ , to of X , so that a certain cost (or risk ) is minimized. α φ ≡ P FA ( φ ) ≜ P {H 1 is chosen | H 0 } . β φ ≡ P MD ( φ ) ≜ P {H 0 is chosen | H 1 } .

Hypothesis Testing by its corresponding acceptance (decision) regions : I-Hsiang Wang 5 / 30 When the context is clear, we often drop the equivalently represented as Hence, the two types of probability of error can be Basic Theory IT Lecture 8 Part II Asymptotics Deterministic Testing Algorithm ≡ Decision Regions A test φ : X → { 0 , 1 } is equivalently characterized Observation Space A θ ( φ ) ≡ φ − 1 ( ) { } X ˆ x ∈ X : φ ( x ) = ˆ , ˆ ≜ θ θ θ = 0 , 1 . A 1 ( φ ) Acceptance Region of H 1 . ∑ P 0 ( x ) = ∑ α φ = φ ( x ) P 0 ( x ) , x ∈X x ∈A 1 ( φ ) ∑ P 1 ( x ) = ∑ A 0 ( φ ) β φ = (1 − φ ( x )) P 1 ( x ) . x ∈A 0 ( φ ) x ∈X Acceptance Region of H 0 . dependency on the test φ when dealing with acceptance regions A ˆ θ .

Hypothesis Testing Basic Theory I-Hsiang Wang 6 / 30 IT Lecture 8 Part II Definition 1 (Likelihood Ratio Test) Likelihood Ratio Test Asymptotics A (deteministic) likelihood ratio test (LRT) is a test φ τ , parametrized by constants τ > 0 (called threshold), defined as follows: { 1 if P 1 ( x ) > τ P 0 ( x ) φ τ ( x ) = if P 1 ( x ) ≤ τ P 0 ( x ) . 0 For x ∈ supp P 0 , the likelihood ratio L ( x ) ≜ P 1 ( x ) P 0 ( x ) . Hence, LRT is a thresholding algorithm on likelihood ratio L ( x ) . Remark : For computational convenience, often one deals with log likelihood ratio (LLR) log ( L ( x )) = log ( P 1 ( x )) − log ( P 0 ( x )) .

Hypothesis Testing Basic Theory I-Hsiang Wang 7 / 30 IT Lecture 8 Part II Theorem 1 (Neyman-Pearson Lemma) Asymptotics Trade-Off Between α (P FA ) and β (P MD ) For a likelihood ratio test φ τ and another deterministic test φ , α φ ≤ α φ τ = ⇒ β φ ≥ β φ τ . pf : Observe ∀ x ∈ X , 0 ≤ ( φ τ ( x ) − φ ( x )) ( P 1 ( x ) − τ P 0 ( x )) , because if P 1 ( x ) − τ P 0 ( x ) > 0 = ⇒ φ τ ( x ) = 1 = ⇒ ( φ τ ( x ) − φ ( x )) ≥ 0 . if P 1 ( x ) − τ P 0 ( x ) ≤ 0 = ⇒ φ τ ( x ) = 0 = ⇒ ( φ τ ( x ) − φ ( x )) ≤ 0 . Summing over all x ∈ X , we get 0 ≤ (1 − β φ τ ) − (1 − β φ ) − τ ( α φ τ − α φ ) = ( β φ − β φ τ ) + τ ( α φ − α φ τ ) . Since τ > 0 , from above we conclude that α φ ≤ α φ τ = ⇒ β φ ≥ β φ τ .

Hypothesis Testing Basic Theory I-Hsiang Wang 8 / 30 What is the optimal test achieving the curve? What is the optimal trade-off curve? Question : IT Lecture 8 Part II Asymptotics β ( P MD ) β ( P MD ) 1 1 α ( P FA ) α ( P FA ) 1 1

Hypothesis Testing Definition 3 (Randomized LRT) I-Hsiang Wang 9 / 30 Basic Theory IT Lecture 8 Part II Asymptotics Definition 2 (Randomized Test) Randomized tests include deterministic tests as special cases. Randomized Testing Algorithm A randomized test decides ˆ θ = 1 with probability φ ( x ) and ˆ θ = 0 with probability 1 − φ ( x ) , where φ is a mapping φ : X → [0 , 1] . Note : A randomized test is characterized by φ , as in deterministic tests. A randomized likelihood ratio test (LRT) is a test φ τ,γ , parametrized by cosntants τ > 0 and γ ∈ (0 , 1) , defined as follows:  1 if P 1 ( x ) > τ P 0 ( x )   φ τ,γ ( x ) = γ if P 1 ( x ) = τ P 0 ( x ) .   0 if P 1 ( x ) < τ P 0 ( x )

Hypothesis Testing Basic Theory I-Hsiang Wang 10 / 30 attains optimality for the Neyman-Pearson Problem . Theorem 2 (Neyman-Pearson) subject to minimize Neyman-Pearson Problem Consider the following optimization problem: Randomized LRT Achieves the Optimal Trade-Off Asymptotics IT Lecture 8 Part II β φ φ : X→ [0 , 1] α φ ≤ α ∗ A randomized LRT φ τ ∗ ,γ ∗ with the parameters ( τ ∗ , γ ∗ ) satisfying α ∗ = α φ τ ∗ ,γ ∗ ,

Hypothesis Testing Basic Theory I-Hsiang Wang 11 / 30 IT Lecture 8 Part II Asymptotics pf : First argue that for any α ∗ ∈ (0 , 1) , one can find ( τ ∗ , γ ∗ ) such that α ∗ = α φ τ ∗ ,γ ∗ = ∑ φ τ ∗ ,γ ∗ ( x ) P 0 ( x ) x ∈X ∑ ∑ x : L ( x )= τ ∗ γ ∗ P 0 ( x ) = x : L ( x ) >τ ∗ P 0 ( x ) + For any test φ , due to a similar argument as in Theorem 1, we have ∀ x ∈ X , ( φ τ ∗ ,γ ∗ ( x ) − φ ( x )) ( P 1 ( x ) − τ ∗ P 0 ( x )) ≥ 0 . Summing over all x ∈ X , similarly we get ( ) + τ ∗ ( ) β φ − β φ τ ∗ ,γ ∗ α φ − α φ τ ∗ ,γ ∗ ≥ 0 Hence, for any feasible test φ with α φ ≤ α ∗ = α φ τ ∗ ,γ ∗ , its probability of type II error β φ ≥ β φ τ ∗ ,γ ∗ .

Hypothesis Testing Basic Theory I-Hsiang Wang 12 / 30 error (or in general, a risk function) is minimized. with knowledge of prior probabilities so that the average probability of The Bayesian hypothesis testing problem is to test the two hypotheses IT Lecture 8 Part II With prior probabilities, it then makes sense to talk about the average Asymptotics Bayesian Setup Sometimes prior probabilities of the two hypotheses are known: π θ ≜ P {H θ is true } , θ = 0 , 1 , π 0 + π 1 = 1 . In this sense, one can view the index Θ as a (binary) random variable with (prior) distribution P { Θ = θ } = π θ , for θ = 0 , 1 . probability of error for a test φ , or more generally, the average cost (risk) : [ { }] Θ ̸ = ˆ P e ( φ ) ≜ π 0 α φ + π 1 β φ = E Θ , X Θ 1 , [ ] R ( φ ) ≜ E Θ , X . r Θ , ˆ Θ

Hypothesis Testing Basic Theory I-Hsiang Wang 13 / 30 attains optimality for the Bayesian Problem . threshold Theorem 3 (LRT is an Optimal Bayesian Test) with known IT Lecture 8 Part II minimize Bayesian Problem Consider the following problem of minimizing Bayes risk. Minimizing Bayes Risk Asymptotics [ ] R ( φ ) ≜ E Θ , X r Θ , ˆ Θ φ : X→ [0 , 1] ( π 0 , π 1 ) and r θ, ˆ θ Assume r 0 , 0 < r 0 , 1 and r 1 , 1 < r 1 , 0 . A deterministic LRT φ τ ∗ with τ ∗ = ( r 0 , 1 − r 0 , 0 ) π 0 ( r 1 , 0 − r 1 , 1 ) π 1

Hypothesis Testing Basic Theory I-Hsiang Wang 14 / 30 It is then obvious that we should choose IT Lecture 8 Part II Asymptotics pf : R ( φ ) = ∑ r 0 , 0 π 0 P 0 ( x ) (1 − φ ( x )) + ∑ r 0 , 1 π 0 P 0 ( x ) φ ( x ) x ∈X x ∈X + ∑ r 1 , 0 π 1 P 1 ( x ) (1 − φ ( x )) + ∑ r 1 , 1 π 1 P 1 ( x ) φ ( x ) x ∈X x ∈X = r 0 , 0 π 0 + ∑ ( r 0 , 1 − r 0 , 0 ) π 0 P 0 ( x ) φ ( x ) x ∈X + r 1 , 0 π 1 + ∑ ( r 1 , 1 − r 1 , 0 ) π 1 P 1 ( x ) φ ( x ) x ∈X ( ∗ ) � �� [ ] = ∑ ( r 0 , 1 − r 0 , 0 ) π 0 P 0 ( x ) − ( r 1 , 1 − r 1 , 0 ) π 1 P 1 ( x ) φ ( x ) x ∈X + r 0 , 0 π 0 + r 1 , 0 π 1 . For each x ∈ X , we shall choose φ ( x ) ∈ [0 , 1] such that ( ∗ ) is minimized. { 1 if ( r 0 , 1 − r 0 , 0 ) π 0 P 0 ( x ) − ( r 1 , 1 − r 1 , 0 ) π 1 P 1 ( x ) < 0 φ ( x ) = if ( r 0 , 1 − r 0 , 0 ) π 0 P 0 ( x ) − ( r 1 , 1 − r 1 , 0 ) π 1 P 1 ( x ) ≥ 0 . 0

Hypothesis Testing M -ary hypothesis testing I-Hsiang Wang 15 / 30 with information theoretic tools. explore the asymptotic behavior of hypothesis testing, and the connection Here we do not pursue these directions further. Instead, we would like to Composite hypothesis testing, etc. Minimax risk optimization (with unknown prior) Extensions include Basic Theory and Neyman-Pearson settings. Moreover, a likelihood ratio test (LRT) is optimal both in the Bayesian turns out to be a sufficient statistics. For binary hypothesis testing problems, the likelihood ratio Discussions Asymptotics IT Lecture 8 Part II L ( x ) ≜ P 1 ( x ) P 0 ( x )

Lecture 8: Information Theory and Statistics I-Hsiang Wang - PowerPoint PPT Presentation

Hypothesis Testing Lecture 8: Information Theory and Statistics I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015 1 / 30 I-Hsiang Wang IT Lecture 8 Part II Part II : Hypothesis

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Information Theory, Statistics, and Decision Trees L eon Bottou COS 424 4/6/2010 Summary

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Statistics Asymptotic Theory Shiu-Sheng Chen Department of Economics National Taiwan University

Statistics I Chapter 1 What is Statistics? Ling-Chieh Kung Department of Information

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

CS786: Lecture 1 May 1st Basics: review of probability theory 1 CS 786 Lecture Slides (c)

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

Error Exponents for Composite Hypothesis Testing of Markov Forest Distributions Vincent Tan,

Significance Testing Evaluation, session 6 CS6200: Information Retrieval Statistical

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

The Gaussian parameterized by mean and SD (position / width) product of two Gaussians is

Quality Data Categories Administered by: Funded by: Target audience MultilingualWebLT

Type Error Slicing What is a type error and how do you locate one? Christian Haack Joe Wells

Failure is not an Option Error handling strategies for Kotlin programs Nat Pryce & Duncan

Assessing the targeting efficiency of the Social Cash Transfer in Zambia Outline Intro and

Lecture 8: Information Theory and Statistics I-Hsiang Wang - PowerPoint PPT Presentation

Hypothesis Testing Lecture 8: Information Theory and Statistics I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015 1 / 30 I-Hsiang Wang IT Lecture 8 Part II Part II : Hypothesis

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Information Theory, Statistics, and Decision Trees L eon Bottou COS 424 4/6/2010 Summary

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Statistics Asymptotic Theory Shiu-Sheng Chen Department of Economics National Taiwan University

Statistics I Chapter 1 What is Statistics? Ling-Chieh Kung Department of Information

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

CS786: Lecture 1 May 1st Basics: review of probability theory 1 CS 786 Lecture Slides (c)

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

Error Exponents for Composite Hypothesis Testing of Markov Forest Distributions Vincent Tan,

Significance Testing Evaluation, session 6 CS6200: Information Retrieval Statistical

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

The Gaussian parameterized by mean and SD (position / width) product of two Gaussians is

Quality Data Categories Administered by: Funded by: Target audience MultilingualWebLT

Type Error Slicing What is a type error and how do you locate one? Christian Haack Joe Wells

Failure is not an Option Error handling strategies for Kotlin programs Nat Pryce &amp; Duncan

Assessing the targeting efficiency of the Social Cash Transfer in Zambia Outline Intro and

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

Failure is not an Option Error handling strategies for Kotlin programs Nat Pryce & Duncan