Estimation II: Sufficiency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1

The Main Idea Suppose we have a random sample Y 1 , . . . , Y n from a N ( µ, σ 2 ) population, with mean µ (unknown) and variance σ 2 (known). To estimate µ , we have proposed using the sample mean ¯ Y . This is a nice, intuitive, unbiased estimator of µ – but we could ask: does it encode all the information we can glean from the data about the parameter µ ? • Another way of asking this question: if I collected the data and calculated ¯ Y , and I kept the data secret and only told you ¯ Y , do I have any more information than you do about where µ is? In this model, the answer is: ¯ Y does encode all the information in the data about the location of µ – there is nothing more we can get from the actual data values Y 1 , . . . , Y n . • We will call ¯ Y a sufficient statistic for µ . 2

Multivariate Probability Distributions Last semester you learned a lot about probability (mass) functions (pmfs) for discrete random variables X , and probability density functions (pdfs) for continuous random variables X . • For discrete variables, the pmf gives you the probability of observing a particular value: p ( x ) = P ( X = x ) • For two discrete variables, the joint pmf gives the probability of observing particular values for each variable: p ( x , y ) = P ( X = x and Y = y ) 3

Multivariate Probability Distributions If you have an iid sample X 1 , . . . , X n , from a population with pmf f ( x | θ ) , then the joint pmf is the function that gives the probability of observing a particular array of values x 1 , . . . , x n : n � f ( x 1 , x 2 , . . . , x n | θ ) = f ( x i | θ ) i =1 Later, we will consider the x 1 , . . . , x n known and instead consider θ as unknown, we would call this function the likelihood : n � L ( θ | x 1 , x 2 , . . . , x n ) = f ( x i | θ ) i =1 4

Conditional Distributions You can also construct conditional pmfs, which give the probability of observing X = x given that you have already observed Y = y : p ( x | y ) = P ( X = x | Y = y ) = P ( X = x and Y = y ) P ( Y = y ) 5

Definition of Sufficiency Let Y 1 , . . . , Y n denote a random sample from a probability distribution with unknown parameter θ. Then a statistic U = g ( Y 1 , . . . , Y n ) is said to be sufficient for θ if the conditional distribution of Y 1 , . . . , Y n given U , does not depend on θ. • The intuition is that the statistic U contains all the information in the sample that is relevant for estimating θ. • What is the conditional distribution of Y 1 , Y 2 , . . . , Y n given U ? 6

Example: Using the Definition of Sufficiency Let Y 1 , Y 2 , . . . , Y n be iid observations from a Poisson distribution with parameter λ . Show that U = � n i =1 Y i is sufficient for λ . 7

An Anti-Example If X 1 , X 2 , X 3 ∼ iid Bernoulli( θ ), show that Y = X 1 + 2 X 2 + X 3 is not a sufficient statistic for θ . 8

The Factorization Theorem Let U be a statistic based on a random sample Y 1 , Y 2 , . . . , Y n . Then U is a sufficient statistic for θ if and only if the joint probability distribution or density function can be factored into two nonnegative functions, f ( y 1 , y 2 , . . . , y n | θ ) = g ( u , θ ) · h ( y 1 , y 2 , . . . , y n ) , where g ( u , θ ) is a function only of u and θ and h ( y 1 , y 2 , . . . , y n ) is not a function of θ . 9

Poission Example, Again Let Y 1 , Y 2 , . . . , Y n be iid observations from a Poisson distribution with parameter λ . Show that U = � n i =1 Y i is sufficient for λ . 10

Another Example Let X 1 , X 2 , . . . , X n be iid observations from a distribution θ f ( x | θ ) = 0 < θ < ∞ , 0 < x < ∞ (1 + x ) θ +1 , Find a sufficient statistic for θ . 11

Another Anti-Example Let X 1 , X 2 , . . . , X n be iid observations from a distribution 1 f ( x | θ ) = π (1 + ( x − θ ) 2 ) Can you find a sufficient statistic for θ ? 12

One-To-One Functions of Sufficient Statistics Any one-to-one function of a sufficient statistic is sufficient. Example: We found U = � n i =1 X i is sufficient for λ in the Poisson example. Thus, ¯ X is also sufficient for λ . 13

Bernoulli Example Let X 1 , X 2 , . . . , X n be iid from a Bernoulli distribution, such that P ( X i = 1) = p and P ( X i = 0) = 1 − p , for each i . Note that E[ X i ] = p and E[ X i ] = p (1 − p ) . • Show that � n i =1 X i is a sufficient statistic for p . • Find an unbiased estimator of p that is also a sufficient statistic. • We might try to estimate the variance of X i by using the statistic V = ¯ X (1 − ¯ X ) . Is V a sufficient statistic for p ? 14

Where We Are So far , we have looked at some properties of estimators which we may use to judge estimators. Specifically, if we have an estimator, ˆ θ , of θ , it’d be nice if: • ˆ θ was unbiased for θ , or had low bias • ˆ θ had low variance • ˆ θ had low MSE • ˆ θ was consistent for theta • ˆ θ was sufficient for theta These properties are useful to use to evaluate a single estimator ˆ θ , or to compare two estimators θ 1 and ˆ ˆ θ 2 to decide which one is “better.” Next up , we will discuss methods for finding estimators for parameters. 15

Estimation II: Sufficiency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 - PowerPoint PPT Presentation

Estimation II: Sufficiency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 The Main Idea Suppose we have a random sample Y 1 , . . . , Y n from a N ( , 2 ) population, with mean (unknown) and variance 2 (known). To estimate , we have

Office of Self-Sufficiency Programs Overview Ways and Means Presentation March 13, 2013 Erinn

Ongoing work on sufficiency of measures analysis 8-5 Progress on BSAP update activities and the

2009 Indiana Self-Sufficiency Standard 1 The 2009 report was made possible through generous

EIM Resource Sufficiency Evaluation Megan Poage & Don Tretheway Market Design Policy

Reading Sufficiency Act (RSA) Screening Instrument Recommendations to OSBE Oklahoma State Board

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

ESTIMATION AS UNCERTAINTY REDUCTION What is this estimation thing, anyway? Michael Godeck

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Estimation theory Parametric estimation Properties of estimators Minimum variance

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Multilingual and cross-lingual news topic tracking asper a Emilia K Koke, February 05, 2005 a

Nearly Optimal Sparse Fourier Transform Haitham Hassanieh Piotr Indyk Dina Katabi Eric Price

Applied Machine Learning Applied Machine Learning Bootstrap, Bagging and Boosting Siamak

Why tune your model? EX TREME GRADIEN T BOOS TIN G W ITH X GBOOS T Sergey Fogelson VP of

Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin Example P is a Bernoulli

Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback

Using panelstat to compute statistics for panel data Marta Silva (Banco de Portugal) 4th Stata

Patterns of Evolution Summary statistics based on segregating sites Site Frequency Spectrum 3 2