unbiased estimation binomial problem shows general
play

Unbiased Estimation Binomial problem shows general phenomenon. An - PDF document

Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of and bad for others. To compare and , two estimators of : Say is better than if it has uniformly smaller


  1. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆ θ and ˜ θ , two estimators of θ : Say θ is better than ˜ ˆ θ if it has uniformly smaller MSE: θ ( θ ) ≤ MSE ˜ θ ( θ ) MSE ˆ for all θ . Normally we also require that the inequality be strict for at least one θ . 135

  2. Question: is there a best estimate – one which is better than every other estimator? Answer: NO. Suppose ˆ θ were such a best es- timate. Fix a θ ∗ in Θ and let ˜ θ ≡ θ ∗ . θ is 0 when θ = θ ∗ . Since ˆ Then MSE of ˜ θ is better than ˜ θ we must have θ ( θ ∗ ) = 0 MSE ˆ θ = θ ∗ with probability equal to 1. so that ˆ So ˆ θ = ˜ θ . If there are actually two different possible val- ues of θ this gives a contradiction; so no such ˆ θ exists. 136

  3. Principle of Unbiasedness : A good estimate is unbiased, that is, E θ (ˆ θ ) ≡ θ . WARNING: In my view the Principle of Unbi- asedness is a load of hog wash. For an unbiased estimate the MSE is just the variance. An estimator ˆ Definition : φ of a parameter φ = φ ( θ ) is Uniformly Minimum Variance Unbiased (UMVU) if, whenever ˜ φ is an unbi- ased estimate of φ we have Var θ (ˆ φ ) ≤ Var θ (˜ φ ) We call ˆ φ the UMVUE. (‘E’ is for Estimator.) The point of having φ ( θ ) is to study problems like estimating µ when you have two parame- ters like µ and σ for example. 137

  4. Cram´ er Rao Inequality Suppose T ( X ) is some unbiased estimator of θ . We can derive some information from the identity E θ ( T ( X )) ≡ θ When we worked with the score function we derived some information from the identity � f ( x, θ ) dx ≡ 1 by differentiation and we do the same here. Since T ( X ) is is an unbiased estimate for θ then � E θ ( T ( X )) = T ( x ) f ( x, θ ) dx ≡ θ Differentiate both sides to get 1 = d � T ( x ) f ( x, θ ) dx dθ T ( x ) ∂ � = ∂θf ( x, θ ) dx T ( x ) ∂ � = ∂θ log( f ( x, θ )) f ( x, θ ) dx = E θ ( T ( X ) U ( θ )) where U is the score function. 138

  5. Remember: Cov( W, Z ) = E( WZ ) − E( W )E( Z ) A Here Cov θ ( T ( X ) , U ( θ )) = E( T ( X ) U ( θ )) − E( T ( X ))E( U ( θ )) But recall that score U ( θ ) has mean 0 so: Cov θ ( T ( X ) , U ( θ )) = 1 Definition of correlation gives: { Corr( W, Z ) } 2 = { Cov( W, Z ) } 2 Var( W )Var( Z ) Correlations squared are less than 1 therefore Var θ ( T )Var θ ( U ( θ )) > 1 . Remember Var( U ( θ )) = I ( θ ) Therefore 1 Var θ ( T ) ≥ I ( θ ) . RHS called the Cram´ er Rao Lower Bound. 139

  6. Examples of Cram´ er Rao Lower Bound: 1) X 1 , . . . , X n iid Exponential with mean µ . f ( x ) = 1 µ exp {− x/µ } for x > 0 Log-likelihood: � ℓ = − n log µ − X i /µ Score ( U ( µ ) = ∂ℓ/∂µ ): � X i U ( µ ) = − n µ + µ 2 Negative second derivative: � X i V ( µ ) = − U ′ ( µ ) = − n µ 2 + 2 µ 3 140

  7. Take expected value to compute Fisher infor- mation � E ( X i ) I ( µ ) = − n µ 2 + 2 µ 3 = − n µ 2 + 2 nµ µ 3 = n µ 2 So if T ( X 1 , . . . , X n ) is an unbiased estimator of µ then I ( µ ) = µ 2 1 Var( T ) ≥ n ¯ Example : X is an unbiased estimate of µ Also: X ) = µ 2 Var( ¯ n SO: ¯ X is the best unbiased estimate. It has the smallest possible variance. We say it is a Uniformly Minimum Variance Unbiased Estimator of µ . 141

  8. Similar ideas with more than 1 parameter. Example : X 1 , . . . , X n sample from N ( µ, σ 2 ). σ 2 ) is an unbiased estimator of Suppose (ˆ µ, ˆ ( µ, σ 2 ). (Such as ( ¯ X, s 2 ).) Estimating σ 2 not σ . So give symbol for σ 2 . Define τ = σ 2 . Find information matrix because CRLB is its inverse. Log-likelihood: � ( X i − µ ) 2 ℓ = − n − n 2 log τ − 2 log(2 π ) 2 τ Score:   � ( X i − µ ) τ   U =     � ( X i − µ ) 2  − n  2 τ + 2 τ 2 142

  9. Negative second derivative matrix:  � ( X i − µ )  n τ τ 2   V =     � ( X i − µ ) 2 � ( X i − µ )  − n  − 2 τ + τ 2 τ 3 Fisher information matrix � n � 0 τ I ( µ, τ ) = n 0 2 τ 2 Cram´ er-Rao lower bound: σ 2 ) ≥ { I ( µ, τ ) } − 1 Var(ˆ µ, ˆ � τ 0 � n = 2 τ 2 0 n In particular σ 2 ) ≥ 2 τ 2 = 2 σ 4 Var(ˆ n n 143

  10. Notice: ( n − 1) s 2 σ 2 � � Var( s 2 ) = Var σ 2 n − 1 σ 4 ( n − 1) 2 Var( χ 2 = n − 1 ) = 2 σ 4 n − 1 > 2 σ 4 n Conclusions: Variance of sample variance larger than lower bound. But ratio of variance to lower bound is n/ ( n − 1) which is nearly 1. Fact: s 2 is UMVUE anyway; see later in course. 144

  11. Slightly more general. If E( T ) = φ ( θ ) for some ftn φ then similar argument gives: � 2 � φ ′ ( θ ) Var( T ) ≥ I ( θ ) Inequality is strict unless corr = 1 so that U ( θ ) = A ( θ ) T ( X ) + B ( θ ) for non-random constants A and B (may de- pend on θ .) This would prove that ℓ ( θ ) = A ∗ ( θ ) T ( X ) + B ∗ ( θ ) + C ( X ) for other constants A ∗ and B ∗ and finally f ( x, θ ) = h ( x ) e A ∗ ( θ ) T ( x )+ B ∗ ( θ ) for h = e C . 145

  12. Summary of Implications • You can recognize a UMVUE sometimes. If Var θ ( T ( X )) ≡ 1 /I ( θ ) then T ( X ) is the UMVUE. In the N ( µ, 1) example the Fisher information is n and Var( X ) = 1 /n so that X is the UMVUE of µ . • In an asymptotic sense the MLE is nearly optimal: it is nearly unbiased and (approx- imate) variance nearly 1 /I ( θ ). • Good estimates are highly correlated with the score. • Densities of exponential form (called expo- nential family ) given above are somehow special. • Usually inequality is strict — strict unless score is affine function of a statistic T and T (or T/c for constant c ) is unbiased for θ . 146

  13. What can we do to find UMVUEs when the CRLB is a strict inequality? Use Sufficiency: choose good summary statistics. Completeness: recognize unique good statis- tics. Rao-Blackwell theorem: mechanical way to im- prove unbiased estimates. Lehman-Scheff´ e theorem: way to prove esti- mate is UMVUE. 147

  14. Sufficiency Example : Suppose X 1 , . . . , X n iid Bernoulli( θ ) and � T ( X 1 , . . . , X n ) = X i Consider conditional distribution of X 1 , . . . , X n given T . Take n = 4. P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0 | T = 2) = P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0 , T = 2) P ( T = 2) = P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0) P ( T = 2) p 2 (1 − p ) 2 = � 4 � p 2 (1 − p ) 2 2 = 1 6 Notice disappearance of p ! This happens for all possibilities for n , T and the X s. 148

  15. In the binomial situation we say the condi- tional distribution of the data given the sum- mary statistic T is free of θ . Defn : Statistic T ( X ) is sufficient for the model { P θ ; θ ∈ Θ } if conditional distribution of data X given T = t is free of θ . Intuition : Data tell us about θ if different val- ues of θ give different distributions to X . If two different values of θ correspond to same den- sity or cdf for X we cannot distinguish these two values of θ by examining X . Extension of this notion: if two values of θ give same condi- tional distribution of X given T then observing T in addition to X doesn’t improve our ability to distinguish the two values. 149

  16. Theorem : [Rao-Blackwell] Suppose S ( X ) is a sufficient statistic for model { P θ , θ ∈ Θ } . If T is an estimate of φ ( θ ) then: 1. E ( T | S ) is a statistic. 2. E ( T | S ) has the same bias as T ; if T is un- biased so is E ( T | S ). 3. Var θ ( E ( T | S )) ≤ Var θ ( T ) and the inequality is strict unless T is a function of S . 4. MSE of E ( T | S ) is no more than MSE of T . 150

  17. Usage : Review conditional distributions: Defn : if X , Y are rvs with joint density then � E ( g ( Y ) | X = x ) = g ( y ) f Y | X ( y | x ) dy and E ( Y | X ) is this function of x evaluated at X Important property: � E { R ( X ) E ( Y | X ) } = R ( x ) E ( Y | X = x ) f X ( x ) dx �� = R ( x ) yf X ( x ) f ( y | x ) dydx �� = R ( x ) yf X,Y ( x, y ) dydx = E ( R ( X ) Y ) Think of E ( Y | X ) as average Y holding X fixed. Behaves like ordinary expected value but func- tions of X only are like constants: � � E ( A i ( X ) Y i | X ) = A i ( X ) E ( Y i | X ) 151

  18. Examples : In the binomial problem Y 1 (1 − Y 2 ) is an unbi- ased estimate of p (1 − p ). We improve this by computing E ( Y 1 (1 − Y 2 ) | X ) We do this in two steps. First compute E ( Y 1 (1 − Y 2 ) | X = x ) 152

  19. Notice that the random variable Y 1 (1 − Y 2 ) is either 1 or 0 so its expected value is just the probability it is equal to 1: E ( Y 1 (1 − Y 2 ) | X = x ) = P ( Y 1 (1 − Y 2 ) = 1 | X = x ) = P ( Y 1 = 1 , Y 2 = 0 | Y 1 + Y 2 + · · · + Y n = x ) P ( Y 1 = 1 , Y 2 = 0 , Y 1 + · · · + Y n = x ) = P ( Y 1 + Y 2 + · · · + Y n = x ) P ( Y 1 = 1 , Y 2 = 0 , Y 3 + · · · + Y n = x − 1) = � n � p x (1 − p ) n − x x � n − 2 � p x − 1 (1 − p ) ( n − 2) − ( x − 1) p (1 − p ) x − 1 = � n � p x (1 − p ) n − x x � n − 2 � x − 1 = � n � x x ( n − x ) = n ( n − 1) This is simply n ˆ p (1 − ˆ p ) / ( n − 1) (can be bigger than 1 / 4, the maximum value of p (1 − p )). 153

  20. Example : If X 1 , . . . , X n are iid N ( µ, 1) then ¯ X is sufficient and X 1 is an unbiased estimate of µ . Now E ( X 1 | ¯ X ) = E [ X 1 − ¯ X + ¯ X | ¯ X ] = E [ X 1 − ¯ X | ¯ X ] + ¯ X = ¯ X which is (later) the UMVUE. 154

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend