lecture 3 su ffi ciency
play

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi - PowerPoint PPT Presentation

0. Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient statistics Su ffi cient statistics The concept of su ffi ciency addresses the question Is there a statistic T ( X ) that in some sense contains


  1. 0. Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (1–14)

  2. 3. Su ffi ciency 3.1. Su ffi cient statistics Su ffi cient statistics The concept of su ffi ciency addresses the question “Is there a statistic T ( X ) that in some sense contains all the information about θ that is in the sample?” Example 3.1 X 1 , . . . , X n iid Bernoulli( θ ), so that P ( X i =1) = 1 � P ( X i =0) = θ for some 0 < θ < 1. P x i (1 � θ ) n � P x i . So f X ( x | θ ) = Q n i =1 θ x i (1 � θ ) 1 � x i = θ This depends on the data only through T ( x ) = P x i , the total number of ones. Note that T ( X ) ⇠ Bin( n , θ ). If T ( x ) = t , then P x i (1 � θ ) n � P x i ◆ � 1 ✓ n f X | T = t ( x | T = t ) = P θ ( X = x , T = t ) = P θ ( X = x P θ ( T = t ) = θ = , � n P θ ( T = t ) � θ t (1 � θ ) n � t t t ie the conditional distribution of X given T = t does not depend on θ . Thus if we know T , then additional knowledge of x (knowing the exact sequence of 0’s and 1’s) does not give extra information about θ . ⇤ Lecture 3. Su ffi ciency 2 (1–14)

  3. 3. Su ffi ciency 3.1. Su ffi cient statistics Definition 3.1 A statistic T is su ffi cient for θ if the conditional distribution of X given T does not depend on θ . Note that T and/or θ may be vectors. In practice, the following theorem is used to find su ffi cient statistics. Lecture 3. Su ffi ciency 3 (1–14)

  4. 3. Su ffi ciency 3.1. Su ffi cient statistics Theorem 3.2 (The Factorisation criterion) T is su ffi cient for θ i ff f X ( x | θ ) = g ( T ( x ) , θ ) h ( x ) for suitable functions g and h. Proof (Discrete case only) Suppose f X ( x | θ ) = g ( T ( x ) , θ ) h ( x ). If T ( x )= t then P θ ( X = x , T ( X )= t ) g ( T ( x ) , θ ) h ( x ) f X | T = t ( x | T = t ) = = P θ ( T = t ) P { x 0 : T ( x 0 )= t } g ( t , θ ) h ( x 0 ) g ( t , θ ) h ( x ) h ( x ) = { x 0 : T ( x 0 )= t } h ( x 0 ) = { x 0 : T ( x 0 )= t } h ( x 0 ) , g ( t , θ ) P P which does not depend on θ , so T is su ffi cient. Now suppose that T is su ffi cient so that the conditional distribution of X | T = t does not depend on θ . Then P θ ( X = x ) = P θ ( X = x , T ( X ) = t ( x )) = P θ ( X = x | T = t ) P θ ( T = t ) . The first factor does not depend on θ by assumption; call it h ( x ). Let the second factor be g ( t , θ ), and so we have the required factorisation. ⇤ Lecture 3. Su ffi ciency 4 (1–14)

  5. 3. Su ffi ciency 3.1. Su ffi cient statistics Example 3.1 continued P x i (1 � θ ) n � P x i . For Bernoulli trials, f X ( x | θ ) = θ Take g ( t , θ ) = θ t (1 � θ ) n � t and h ( x ) = 1 to see that T ( X ) = P X i is su ffi cient for θ . ⇤ Example 3.2 Let X 1 , . . . , X n be iid U [0 , θ ]. Write 1 A ( x ) for the indicator function, = 1 if x 2 A , = 0 otherwise. We have n 1 θ 1 [0 , θ ] ( x i ) = 1 Y f X ( x | θ ) = θ n 1 { max i x i  θ } (max x i ) 1 { 0  min i x i } (min x i ) . i i i =1 Then T ( X ) = max i X i is su ffi cient for θ . ⇤ Lecture 3. Su ffi ciency 5 (1–14)

  6. 3. Su ffi ciency 3.2. Minimal su ffi cient statistics Minimal su ffi cient statistics Su ffi cient statistics are not unique. If T is su ffi cient for θ , then so is any (1-1) function of T . X itself is always su ffi cient for θ ; take T ( X ) = X , g ( t , θ ) = f X ( t | θ ) and h ( x ) = 1. But this is not much use. The sample space X n is partitioned by T into sets { x 2 X n : T ( x ) = t } . If T is su ffi cient, then this data reduction does not lose any information on θ . We seek a su ffi cient statistic that achieves the maximum-possible reduction. Definition 3.3 A su ffi cient statistic T ( X ) is minimal su ffi cient if it is a function of every other su ffi cient statistic: i.e. if T 0 ( X ) is also su ffi cient, then T 0 ( X ) = T 0 ( Y ) ! T ( X ) = T ( Y ) i.e. the partition for T is coarser than that for T 0 . Lecture 3. Su ffi ciency 6 (1–14)

  7. 3. Su ffi ciency 3.2. Minimal su ffi cient statistics Minimal su ffi cient statistics can be found using the following theorem. Theorem 3.4 Suppose T = T ( X ) is a statistic such that f X ( x ; θ ) / f X ( y ; θ ) is constant as a function of θ if and only if T ( x ) = T ( y ) . Then T is minimal su ffi cient for θ . Sketch of proof : Non-examinable First, we aim to use the Factorisation Criterion to show su ffi ciency. Define an equivalence relation ∼ on X n by setting x ∼ y when T ( x ) = T ( y ). (Check that this is indeed an equivalence relation.) Let U = { T ( x ) : x ∈ X n } , and for each u in U , choose a representative x u from the equivalence class { x : T ( x ) = u } . Let x be in X n and suppose that T ( x ) = t . Then x is in the equivalence class { x 0 : T ( x 0 ) = t } , which has representative x t , and this representative may also be written x T ( x ) . We have x ∼ x t , so f X ( x ; θ ) that T ( x ) = T ( x t ), ie T ( x ) = T ( x T ( x ) ). Hence, by hypothesis, the ratio f X ( x T ( x ) ; θ ) does not depend on θ , so let this be h ( x ). Let g ( t , θ ) = f X ( x t , θ ). Then f X ( x ; θ ) f X ( x ; θ ) = f X ( x T ( x ) ; θ ) f X ( x T ( x ) ; θ ) = g ( T ( x ) , θ ) h ( x ) , and so T = T ( X ) is su ffi cient for θ by the Factorisation Criterion. Lecture 3. Su ffi ciency 7 (1–14)

  8. 3. Su ffi ciency 3.2. Minimal su ffi cient statistics Next we aim to show that T ( X ) is a function of every other su ffi cient statistic. Suppose that S ( X ) is also su ffi cient for θ , so that, by the Factorisation Criterion, there exist functions g S and h S (we call them g S and h S to show that they belong to S and to distinguish them from g and h above) such that f X ( x ; θ ) = g S ( S ( x ) , θ ) h S ( x ) . Suppose that S ( x ) = S ( y ). Then f X ( y ; θ ) = g S ( S ( x ) , θ ) h S ( x ) f X ( x ; θ ) g S ( S ( y ) , θ ) h S ( y ) = h S ( x ) h S ( y ) , because S ( x ) = S ( y ). This means that the ratio f X ( x ; θ ) f X ( y ; θ ) does not depend on θ , and this implies that T ( x ) = T ( y ) by hypothesis. So we have shown that S ( x ) = S ( y ) implies that T ( x ) = T ( y ), i.e T is a function of S . Hence T is minimal su ffi cient. ⇤ Lecture 3. Su ffi ciency 8 (1–14)

  9. 3. Su ffi ciency 3.2. Minimal su ffi cient statistics Example 3.3 Suppose X 1 , . . . , X n are iid N ( µ, σ 2 ). Then (2 πσ 2 ) � n / 2 exp � � 1 i ( x i � µ ) 2 f X ( x | µ, σ 2 ) P 2 σ 2 = (2 πσ 2 ) � n / 2 exp f X ( y | µ, σ 2 ) � � 1 i ( y i � µ ) 2 P 2 σ 2 ( X ! X !) � 1 + µ x 2 X y 2 X = exp i � x i � y i . i 2 σ 2 σ 2 i i i i This is constant as a function of ( µ, σ 2 ) i ff P i x 2 i y 2 i = P i and P i x i = P i y i . �P i X 2 � is minimal su ffi cient for ( µ, σ 2 ). ⇤ So T ( X ) = i , P i X i 1-1 functions of minimal su ffi cient statistics are also minimal su ffi cient. So T 0 ( X ) = ( ¯ X , P ( X i � ¯ X ) 2 ) is also su ffi cient for ( µ, σ 2 ), where ¯ X = P i X i / n . We write S XX for P ( X i � ¯ X ) 2 . Lecture 3. Su ffi ciency 9 (1–14)

  10. 3. Su ffi ciency 3.2. Minimal su ffi cient statistics Notes Example 3.3 has a vector T su ffi cient for a vector θ . Dimensions do not have to the same: e.g. for N ( µ, µ 2 ), T ( X ) = i X 2 �P i , P � i X i is minimal su ffi cient for µ [check] If the range of X depends on θ , then ” f X ( x ; θ ) / f X ( y ; θ ) is constant in θ ” means ” f X ( x ; θ ) = c ( x , y ) f X ( y ; θ )” Lecture 3. Su ffi ciency 10 (1–14)

  11. 3. Su ffi ciency 3.3. The Rao–Blackwell Theorem The Rao–Blackwell Theorem The Rao–Blackwell theorem gives a way to improve estimators in the mse sense. Theorem 3.5 (The Rao–Blackwell theorem) Let T be a su ffi cient statistic for θ and let ˜ θ be an estimator for θ with E (˜ θ 2 ) < 1 for all θ . Let ˆ ⇥ ˜ ⇤ θ = E θ | T . Then for all θ , (ˆ (˜ θ � θ ) 2 ⇤ θ � θ ) 2 ⇤ ⇥ ⇥  E . E The inequality is strict unless ˜ θ is a function of T. Proof By the conditional expectation formula we have E ˆ E (˜ = E ˜ ⇥ ⇤ θ = E θ | T ) θ , so θ and ˜ ˆ θ have the same bias. By the conditional variance formula, var(˜ var(˜ E (˜ var(˜ + var(ˆ ⇥ ⇤ ⇥ ⇤ ⇥ ⇤ θ ) = E θ | T ) + var θ | T ) = E θ | T ) θ ) . Hence var(˜ θ ) � var(ˆ θ ), and so mse(˜ θ ) � mse(ˆ θ ), with equality only if var(˜ θ | T ) = 0. ⇤ Lecture 3. Su ffi ciency 11 (1–14)

  12. 3. Su ffi ciency 3.3. The Rao–Blackwell Theorem Notes (i) Since T is su ffi cient for θ , the conditional distribution of X given T = t does not depend on θ . Hence ˆ ⇥ ˜ ⇤ θ = E θ ( X ) | T does not depend on θ , and so is a bona fide estimator. (ii) The theorem says that given any estimator, we can find one that is a function of a su ffi cient statistic that is at least as good in terms of mean squared error of estimation. (iii) If ˜ θ is unbiased, then so is ˆ θ . (iv) If ˜ θ is already a function of T , then ˆ θ = ˜ θ . Lecture 3. Su ffi ciency 12 (1–14)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend