Unbiased Estimation Binomial problem shows general phenomenon. An - PDF document

Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆ θ and ˜ θ , two estimators of θ : Say θ is better than ˜ ˆ θ if it has uniformly smaller MSE: θ ( θ ) ≤ MSE ˜ θ ( θ ) MSE ˆ for all θ . Normally we also require that the inequality be strict for at least one θ . 135

Question: is there a best estimate – one which is better than every other estimator? Answer: NO. Suppose ˆ θ were such a best estimate. Fix a θ ∗ in Θ and let ˜ θ ≡ θ ∗ . θ is 0 when θ = θ ∗ . Since ˆ Then MSE of ˜ θ is better than ˜ θ we must have θ ( θ ∗ ) = 0 MSE ˆ θ = θ ∗ with probability equal to 1. so that ˆ So ˆ θ = ˜ θ . If there are actually two different possible values of θ this gives a contradiction; so no such ˆ θ exists. 136

Principle of Unbiasedness : A good estimate is unbiased, that is, E θ (ˆ θ ) ≡ θ . WARNING: In my view the Principle of Unbi- asedness is a load of hog wash. For an unbiased estimate the MSE is just the variance. An estimator ˆ Definition : φ of a parameter φ = φ ( θ ) is Uniformly Minimum Variance Unbiased (UMVU) if, whenever ˜ φ is an unbiased estimate of φ we have Var θ (ˆ φ ) ≤ Var θ (˜ φ ) We call ˆ φ the UMVUE. (‘E’ is for Estimator.) The point of having φ ( θ ) is to study problems like estimating µ when you have two parame- ters like µ and σ for example. 137

Cram´ er Rao Inequality Suppose T ( X ) is some unbiased estimator of θ . We can derive some information from the identity E θ ( T ( X )) ≡ θ When we worked with the score function we derived some information from the identity � f ( x, θ ) dx ≡ 1 by differentiation and we do the same here. Since T ( X ) is is an unbiased estimate for θ then � E θ ( T ( X )) = T ( x ) f ( x, θ ) dx ≡ θ Differentiate both sides to get 1 = d � T ( x ) f ( x, θ ) dx dθ T ( x ) ∂ � = ∂θf ( x, θ ) dx T ( x ) ∂ � = ∂θ log( f ( x, θ )) f ( x, θ ) dx = E θ ( T ( X ) U ( θ )) where U is the score function. 138

Remember: Cov( W, Z ) = E( WZ ) − E( W )E( Z ) A Here Cov θ ( T ( X ) , U ( θ )) = E( T ( X ) U ( θ )) − E( T ( X ))E( U ( θ )) But recall that score U ( θ ) has mean 0 so: Cov θ ( T ( X ) , U ( θ )) = 1 Definition of correlation gives: { Corr( W, Z ) } 2 = { Cov( W, Z ) } 2 Var( W )Var( Z ) Correlations squared are less than 1 therefore Var θ ( T )Var θ ( U ( θ )) > 1 . Remember Var( U ( θ )) = I ( θ ) Therefore 1 Var θ ( T ) ≥ I ( θ ) . RHS called the Cram´ er Rao Lower Bound. 139

Examples of Cram´ er Rao Lower Bound: 1) X 1 , . . . , X n iid Exponential with mean µ . f ( x ) = 1 µ exp {− x/µ } for x > 0 Log-likelihood: � ℓ = − n log µ − X i /µ Score ( U ( µ ) = ∂ℓ/∂µ ): � X i U ( µ ) = − n µ + µ 2 Negative second derivative: � X i V ( µ ) = − U ′ ( µ ) = − n µ 2 + 2 µ 3 140

Take expected value to compute Fisher information � E ( X i ) I ( µ ) = − n µ 2 + 2 µ 3 = − n µ 2 + 2 nµ µ 3 = n µ 2 So if T ( X 1 , . . . , X n ) is an unbiased estimator of µ then I ( µ ) = µ 2 1 Var( T ) ≥ n ¯ Example : X is an unbiased estimate of µ Also: X ) = µ 2 Var( ¯ n SO: ¯ X is the best unbiased estimate. It has the smallest possible variance. We say it is a Uniformly Minimum Variance Unbiased Estimator of µ . 141

Similar ideas with more than 1 parameter. Example : X 1 , . . . , X n sample from N ( µ, σ 2 ). σ 2 ) is an unbiased estimator of Suppose (ˆ µ, ˆ ( µ, σ 2 ). (Such as ( ¯ X, s 2 ).) Estimating σ 2 not σ . So give symbol for σ 2 . Define τ = σ 2 . Find information matrix because CRLB is its inverse. Log-likelihood: � ( X i − µ ) 2 ℓ = − n − n 2 log τ − 2 log(2 π ) 2 τ Score:   � ( X i − µ ) τ   U =     � ( X i − µ ) 2  − n  2 τ + 2 τ 2 142

Negative second derivative matrix:  � ( X i − µ )  n τ τ 2   V =     � ( X i − µ ) 2 � ( X i − µ )  − n  − 2 τ + τ 2 τ 3 Fisher information matrix � n � 0 τ I ( µ, τ ) = n 0 2 τ 2 Cram´ er-Rao lower bound: σ 2 ) ≥ { I ( µ, τ ) } − 1 Var(ˆ µ, ˆ � τ 0 � n = 2 τ 2 0 n In particular σ 2 ) ≥ 2 τ 2 = 2 σ 4 Var(ˆ n n 143

Notice: ( n − 1) s 2 σ 2 � � Var( s 2 ) = Var σ 2 n − 1 σ 4 ( n − 1) 2 Var( χ 2 = n − 1 ) = 2 σ 4 n − 1 > 2 σ 4 n Conclusions: Variance of sample variance larger than lower bound. But ratio of variance to lower bound is n/ ( n − 1) which is nearly 1. Fact: s 2 is UMVUE anyway; see later in course. 144

Slightly more general. If E( T ) = φ ( θ ) for some ftn φ then similar argument gives: � 2 � φ ′ ( θ ) Var( T ) ≥ I ( θ ) Inequality is strict unless corr = 1 so that U ( θ ) = A ( θ ) T ( X ) + B ( θ ) for non-random constants A and B (may de- pend on θ .) This would prove that ℓ ( θ ) = A ∗ ( θ ) T ( X ) + B ∗ ( θ ) + C ( X ) for other constants A ∗ and B ∗ and finally f ( x, θ ) = h ( x ) e A ∗ ( θ ) T ( x )+ B ∗ ( θ ) for h = e C . 145

Summary of Implications • You can recognize a UMVUE sometimes. If Var θ ( T ( X )) ≡ 1 /I ( θ ) then T ( X ) is the UMVUE. In the N ( µ, 1) example the Fisher information is n and Var( X ) = 1 /n so that X is the UMVUE of µ . • In an asymptotic sense the MLE is nearly optimal: it is nearly unbiased and (approx- imate) variance nearly 1 /I ( θ ). • Good estimates are highly correlated with the score. • Densities of exponential form (called exponential family ) given above are somehow special. • Usually inequality is strict — strict unless score is affine function of a statistic T and T (or T/c for constant c ) is unbiased for θ . 146

What can we do to find UMVUEs when the CRLB is a strict inequality? Use Sufficiency: choose good summary statistics. Completeness: recognize unique good statistics. Rao-Blackwell theorem: mechanical way to improve unbiased estimates. Lehman-Scheff´ e theorem: way to prove estimate is UMVUE. 147

Sufficiency Example : Suppose X 1 , . . . , X n iid Bernoulli( θ ) and � T ( X 1 , . . . , X n ) = X i Consider conditional distribution of X 1 , . . . , X n given T . Take n = 4. P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0 | T = 2) = P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0 , T = 2) P ( T = 2) = P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0) P ( T = 2) p 2 (1 − p ) 2 = � 4 � p 2 (1 − p ) 2 2 = 1 6 Notice disappearance of p ! This happens for all possibilities for n , T and the X s. 148

In the binomial situation we say the conditional distribution of the data given the summary statistic T is free of θ . Defn : Statistic T ( X ) is sufficient for the model { P θ ; θ ∈ Θ } if conditional distribution of data X given T = t is free of θ . Intuition : Data tell us about θ if different values of θ give different distributions to X . If two different values of θ correspond to same density or cdf for X we cannot distinguish these two values of θ by examining X . Extension of this notion: if two values of θ give same conditional distribution of X given T then observing T in addition to X doesn’t improve our ability to distinguish the two values. 149

Theorem : [Rao-Blackwell] Suppose S ( X ) is a sufficient statistic for model { P θ , θ ∈ Θ } . If T is an estimate of φ ( θ ) then: 1. E ( T | S ) is a statistic. 2. E ( T | S ) has the same bias as T ; if T is unbiased so is E ( T | S ). 3. Var θ ( E ( T | S )) ≤ Var θ ( T ) and the inequality is strict unless T is a function of S . 4. MSE of E ( T | S ) is no more than MSE of T . 150

Usage : Review conditional distributions: Defn : if X , Y are rvs with joint density then � E ( g ( Y ) | X = x ) = g ( y ) f Y | X ( y | x ) dy and E ( Y | X ) is this function of x evaluated at X Important property: � E { R ( X ) E ( Y | X ) } = R ( x ) E ( Y | X = x ) f X ( x ) dx �� = R ( x ) yf X ( x ) f ( y | x ) dydx �� = R ( x ) yf X,Y ( x, y ) dydx = E ( R ( X ) Y ) Think of E ( Y | X ) as average Y holding X fixed. Behaves like ordinary expected value but functions of X only are like constants: � � E ( A i ( X ) Y i | X ) = A i ( X ) E ( Y i | X ) 151

Examples : In the binomial problem Y 1 (1 − Y 2 ) is an unbiased estimate of p (1 − p ). We improve this by computing E ( Y 1 (1 − Y 2 ) | X ) We do this in two steps. First compute E ( Y 1 (1 − Y 2 ) | X = x ) 152

Notice that the random variable Y 1 (1 − Y 2 ) is either 1 or 0 so its expected value is just the probability it is equal to 1: E ( Y 1 (1 − Y 2 ) | X = x ) = P ( Y 1 (1 − Y 2 ) = 1 | X = x ) = P ( Y 1 = 1 , Y 2 = 0 | Y 1 + Y 2 + · · · + Y n = x ) P ( Y 1 = 1 , Y 2 = 0 , Y 1 + · · · + Y n = x ) = P ( Y 1 + Y 2 + · · · + Y n = x ) P ( Y 1 = 1 , Y 2 = 0 , Y 3 + · · · + Y n = x − 1) = � n � p x (1 − p ) n − x x � n − 2 � p x − 1 (1 − p ) ( n − 2) − ( x − 1) p (1 − p ) x − 1 = � n � p x (1 − p ) n − x x � n − 2 � x − 1 = � n � x x ( n − x ) = n ( n − 1) This is simply n ˆ p (1 − ˆ p ) / ( n − 1) (can be bigger than 1 / 4, the maximum value of p (1 − p )). 153

Example : If X 1 , . . . , X n are iid N ( µ, 1) then ¯ X is sufficient and X 1 is an unbiased estimate of µ . Now E ( X 1 | ¯ X ) = E [ X 1 − ¯ X + ¯ X | ¯ X ] = E [ X 1 − ¯ X | ¯ X ] + ¯ X = ¯ X which is (later) the UMVUE. 154

Unbiased Estimation Binomial problem shows general phenomenon. An - PDF document

Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of and bad for others. To compare and , two estimators of : Say is better than if it has uniformly smaller

1 Binomial Heaps Binomial- -Heap Heap- -Union Union Binomial Heaps Binomial Binomial-

Finite Projective Planes http://math.uwyo.edu/moorhouse/pub/planes/ Eric Moorhouse Mutually

On the q -binomial coefficients and binomial congruences q -series seminar University of Illinois

Chapter 19: Binomial Heaps We will study another heap structure called, the binomial heap. The

Interval Estimation Edwin Leuven Interval estimation While an estimator may be unbiased or

Chapter 3.3, 4.1, 4.3. Binomial Coefficient Identities Prof. Tesler Math 184A Winter 2017 Prof.

The Binomial Distribution Binomial Experiment An experiment with these characteristics: For some

Binomial Distribution Binomial Experiment 1 The same experiment is repeated a fixed number of

Chapter 3.3, 4.1, 4.3. Binomial Coefficient Identities Prof. Tesler Math 184A Winter 2019

JUST THE MATHS SLIDES NUMBER 2.2 SERIES 2 (Binomial series) by A.J.Hobson 2.2.1

MA/CSSE 473 Day 29 Day30-Dynamic-Binomial-Warshall Dynamic Programming Binomial Coefficients

Unit 2: Probability and distributions Lecture 4: Binomial distribution Statistics 101 Thomas

SHOWS Presentation Outline Purpose Objectives of the Road Shows Activity Plan Road

Bias in Rendering Keenan Crane (kcrane@uiuc.edu) Contents 1 What does unbiased mean? 1

k Ho t k S . E . k degrees of freedom = n

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Second order reduced bias tail index estimators under a third order framework M. Ivette Gomes

Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience

Lecture 18 Local Methods Sasha Rakhlin Nov 07, 2018 1 / 23 Today: analysis of local

1 1 easy to compute , 1 easy to compute 2

02941 Physically Based Rendering Density Estimation in Photon Mapping Jeppe Revall Frisvad March

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020

Accuracy & confidence Most of course so far: estimating stuff from data Today: how much

Unbiased Estimation Binomial problem shows general phenomenon. An - PDF document

Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of and bad for others. To compare and , two estimators of : Say is better than if it has uniformly smaller

1 Binomial Heaps Binomial- -Heap Heap- -Union Union Binomial Heaps Binomial Binomial-

Finite Projective Planes http://math.uwyo.edu/moorhouse/pub/planes/ Eric Moorhouse Mutually

On the q -binomial coefficients and binomial congruences q -series seminar University of Illinois

Chapter 19: Binomial Heaps We will study another heap structure called, the binomial heap. The

Interval Estimation Edwin Leuven Interval estimation While an estimator may be unbiased or

Chapter 3.3, 4.1, 4.3. Binomial Coefficient Identities Prof. Tesler Math 184A Winter 2017 Prof.

The Binomial Distribution Binomial Experiment An experiment with these characteristics: For some

Binomial Distribution Binomial Experiment 1 The same experiment is repeated a fixed number of

Chapter 3.3, 4.1, 4.3. Binomial Coefficient Identities Prof. Tesler Math 184A Winter 2019

JUST THE MATHS SLIDES NUMBER 2.2 SERIES 2 (Binomial series) by A.J.Hobson 2.2.1

MA/CSSE 473 Day 29 Day30-Dynamic-Binomial-Warshall Dynamic Programming Binomial Coefficients

Unit 2: Probability and distributions Lecture 4: Binomial distribution Statistics 101 Thomas

SHOWS Presentation Outline Purpose Objectives of the Road Shows Activity Plan Road

Bias in Rendering Keenan Crane (kcrane@uiuc.edu) Contents 1 What does unbiased mean? 1

k Ho t k S . E . k degrees of freedom = n

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Second order reduced bias tail index estimators under a third order framework M. Ivette Gomes

Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience

Lecture 18 Local Methods Sasha Rakhlin Nov 07, 2018 1 / 23 Today: analysis of local

1 1 easy to compute , 1 easy to compute 2

02941 Physically Based Rendering Density Estimation in Photon Mapping Jeppe Revall Frisvad March

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020

Accuracy &amp; confidence Most of course so far: estimating stuff from data Today: how much

Accuracy & confidence Most of course so far: estimating stuff from data Today: how much