Estimation II: Sufficiency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 - - PowerPoint PPT Presentation

estimation ii sufficiency
SMART_READER_LITE
LIVE PREVIEW

Estimation II: Sufficiency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 - - PowerPoint PPT Presentation

Estimation II: Sufficiency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 The Main Idea Suppose we have a random sample Y 1 , . . . , Y n from a N ( , 2 ) population, with mean (unknown) and variance 2 (known). To estimate , we have


slide-1
SLIDE 1

Estimation II: Sufficiency

Stat 3202 @ OSU, Autumn 2018 Dalpiaz

1

slide-2
SLIDE 2

The Main Idea

Suppose we have a random sample Y1, . . . , Yn from a N(µ, σ2) population, with mean µ (unknown) and variance σ2 (known). To estimate µ, we have proposed using the sample mean ¯ Y . This is a nice, intuitive, unbiased estimator of µ – but we could ask: does it encode all the information we can glean from the data about the parameter µ?

  • Another way of asking this question: if I collected the data and calculated ¯

Y , and I kept the data secret and only told you ¯ Y , do I have any more information than you do about where µ is? In this model, the answer is: ¯ Y does encode all the information in the data about the location of µ – there is nothing more we can get from the actual data values Y1, . . . , Yn.

  • We will call ¯

Y a sufficient statistic for µ.

2

slide-3
SLIDE 3

Multivariate Probability Distributions

Last semester you learned a lot about probability (mass) functions (pmfs) for discrete random variables X, and probability density functions (pdfs) for continuous random variables X.

  • For discrete variables, the pmf gives you the probability of observing a particular value:

p(x) = P(X = x)

  • For two discrete variables, the joint pmf gives the probability of observing particular values

for each variable: p(x, y) = P(X = x and Y = y)

3

slide-4
SLIDE 4

Multivariate Probability Distributions

If you have an iid sample X1, . . . , Xn, from a population with pmf f (x | θ), then the joint pmf is the function that gives the probability of observing a particular array of values x1, . . . , xn : f (x1, x2, . . . , xn | θ) =

n

  • i=1

f (xi | θ) Later, we will consider the x1, . . . , xn known and instead consider θ as unknown, we would call this function the likelihood: L(θ | x1, x2, . . . , xn) =

n

  • i=1

f (xi | θ)

4

slide-5
SLIDE 5

Conditional Distributions

You can also construct conditional pmfs, which give the probability of observing X = x given that you have already observed Y = y: p(x | y) = P(X = x | Y = y) = P(X = x and Y = y) P(Y = y)

5

slide-6
SLIDE 6

Definition of Sufficiency

Let Y1, . . . , Yn denote a random sample from a probability distribution with unknown parameter θ. Then a statistic U = g(Y1, . . . , Yn) is said to be sufficient for θ if the conditional distribution

  • f Y1, . . . , Yn given U, does not depend on θ.
  • The intuition is that the statistic U contains all the information in the sample that is

relevant for estimating θ.

  • What is the conditional distribution of Y1, Y2, . . . , Yn given U?

6

slide-7
SLIDE 7

Example: Using the Definition of Sufficiency

Let Y1, Y2, . . . , Yn be iid observations from a Poisson distribution with parameter λ. Show that U = n

i=1 Yi is sufficient for λ. 7

slide-8
SLIDE 8

An Anti-Example

If X1, X2, X3 ∼ iid Bernoulli(θ), show that Y = X1 + 2X2 + X3 is not a sufficient statistic for θ.

8

slide-9
SLIDE 9

The Factorization Theorem

Let U be a statistic based on a random sample Y1, Y2, . . . , Yn. Then U is a sufficient statistic for θ if and only if the joint probability distribution or density function can be factored into two nonnegative functions, f (y1, y2, . . . , yn|θ) = g(u, θ) · h(y1, y2, . . . , yn), where g(u, θ) is a function only of u and θ and h(y1, y2, . . . , yn) is not a function of θ.

9

slide-10
SLIDE 10

Poission Example, Again

Let Y1, Y2, . . . , Yn be iid observations from a Poisson distribution with parameter λ. Show that U = n

i=1 Yi is sufficient for λ. 10

slide-11
SLIDE 11

Another Example

Let X1, X2, . . . , Xn be iid observations from a distribution f (x | θ) = θ (1 + x)θ+1 , 0 < θ < ∞, 0 < x < ∞ Find a sufficient statistic for θ.

11

slide-12
SLIDE 12

Another Anti-Example

Let X1, X2, . . . , Xn be iid observations from a distribution f (x | θ) = 1 π(1 + (x − θ)2) Can you find a sufficient statistic for θ?

12

slide-13
SLIDE 13

One-To-One Functions of Sufficient Statistics

Any one-to-one function of a sufficient statistic is sufficient. Example: We found U = n

i=1 Xi is sufficient for λ in the Poisson example.

Thus, ¯ X is also sufficient for λ.

13

slide-14
SLIDE 14

Bernoulli Example

Let X1, X2, . . . , Xn be iid from a Bernoulli distribution, such that P(Xi = 1) = p and P(Xi = 0) = 1 − p, for each i. Note that E[Xi] = p and E[Xi] = p(1 − p).

  • Show that n

i=1 Xi is a sufficient statistic for p.

  • Find an unbiased estimator of p that is also a sufficient statistic.
  • We might try to estimate the variance of Xi by using the statistic V = ¯

X(1 − ¯ X). Is V a sufficient statistic for p?

14

slide-15
SLIDE 15

Where We Are

So far, we have looked at some properties of estimators which we may use to judge estimators. Specifically, if we have an estimator, ˆ θ, of θ, it’d be nice if:

  • ˆ

θ was unbiased for θ, or had low bias

  • ˆ

θ had low variance

  • ˆ

θ had low MSE

  • ˆ

θ was consistent for theta

  • ˆ

θ was sufficient for theta These properties are useful to use to evaluate a single estimator ˆ θ, or to compare two estimators ˆ θ1 and ˆ θ2 to decide which one is “better.” Next up, we will discuss methods for finding estimators for parameters.

15