Point Estimation Edwin Leuven Introduction Last time we reviewed - PowerPoint PPT Presentation

Point Estimation Edwin Leuven

Introduction Last time we reviewed statistical inference We saw that while in probability we ask: ◮ given a data generating process, what are the properties of the outcomes? in statistics the question is the reverse: ◮ given the outcomes, what can we say about the process that generated the data? Statistical inference consists in 1. Estimation (point, interval) 2. Inference (quantifying sampling error, hypothesis testing) 2/43

Introduction Today we take a closer look at point estimation We will go over three desirable properties of estimator: 1. Unbiasedness 2. Consistency 3. Efficiency And how to quantify the trade-off between location and variance using the ◮ Mean Squared Error (MSE) 3/43

Random sampling Statistical inference starts with an assumption about how our data came about (the “data generating process”) We introduced the notion of sampling where we consider observations in our data X 1 , . . . , X n as draws from a population or, more generally, an unknown probability distribution f ( X ) Simple Random Sample We call a sample X 1 , . . . , X n random if X i are independent and identically distributed (i.i.d) random variables Random samples arise if we draw each unit in the population with equal probability in our sample. 4/43

Random sampling We will assume throughout that our samples are random! The aim is to use our data X 1 , . . . , X n to learn something about the unknown probability distribution f ( X ) where the data came from We typically focus on E [ X ], the mean of X , to explain things but we can ask many different questions: ◮ What is the variance of X ◮ What is the 10th percentile of X ◮ What fraction of X lies below 100,000 ◮ etc. Very often we are interested in comparing measurements across populations ◮ What is the difference in earnings between men and women 5/43

Bias Consider 1. the estimand E [ X ], and 2. an estimator ˆ X What properties do we want our estimator ˆ X to have? One desirable property is that ˆ X is on average correct E [ˆ X ] = E [ X ] We call such estimators unbiased Bias Bias = E [ˆ X ] − E [ X ] 6/43

Bias The estimand – in our example the population mean E [ X ] – is a number For a given sample ˆ X is also a number, we call this the estimate Bias is not the difference between the estimate and the estimand ◮ this the estimation error Bias is the average estimation error across (infintely) many random samples! 7/43

Estimating the Mean of X The sample average is an unbiased estimator of the mean � n E [¯ X n ] = 1 i =1 E [ X i ] = E [ X ] n but we can think of different unbiased estimators, f.e. X 1 is also an unbiased estimate of E [ X ] If X has a symmetric distribution then both ◮ median( X ), and ◮ (min( X ) + max( X )) / 2 are unbiased 8/43

Estimation the Variance of X The estimator of the variance � � n i =1 ( X i − ¯ 1 X n ) 2 Var( X ) = n − 1 Why divide by n − 1 and not n ? n n E [1 X n ) 2 ] = 1 � � ( X i − ¯ E [( X i − ¯ X n ) 2 ] n n i =1 i =1 n = 1 � E [ X 2 i − 2 X i ¯ X n + ¯ X 2 n ] n i =1 = E [ X 2 i ] − 2 E [ X i ¯ X n ] + E [¯ X 2 n ] = n − 1 i ] − E [ X i ] 2 ) = n − 1 ( E [ X 2 Var( X i ) n n where the last line follows since X n ] = 1 i ] + n − 1 E [¯ X 2 n ] = E [ X i ¯ nE [ X 2 E [ X i ] 2 n 9/43

Variance Estimation We can verify this through numerical simulation: n = 20; nrep = 10 ^ 5 varhat1 = rep (0, nrep); varhat2 = rep (0, nrep) for (i in 1 : nrep) { x = rnorm (n, 5, sqrt (3)) sx = sum ((x - mean (x)) ^ 2) varhat1[i] = sx / (n - 1); varhat2[i] = sx / n } mean (varhat1) ## [1] 3.0000818 mean (varhat2) ## [1] 2.8500777 10/43

How to choose between two unbiased estimators? Since both are centered around the truth: ◮ pick the one that tends to be closest! One measure of “close” is Var(ˆ X ), the sampling variance of ˆ X x1 = rep (0, nrep); x2 = rep (0, nrep) for (i in 1 : nrep) { x = rnorm (100, 0, 1) x1[i] = mean (x); x2[i] = ( min (x) + max (x)) / 2 } var (x1) ## [1] 0.0099863036 var (x2) ## [1] 0.092541761 11/43

How to choose between two unbiased estimators? Since both are centered around the truth: ◮ pick the one that tends to be closest! One measure of “close” is Var(ˆ X ), the sampling variance of ˆ X y1 = rep (0, nrep); y2 = rep (0, nrep) for (i in 1 : nrep) { x = runif (100, 0, 1) y1[i] = mean (x); y2[i] = ( min (x) + max (x)) / 2 } var (y1) ## [1] 0.00083522986 var (y2) ## [1] 0.00004879091 12/43

How to choose between two unbiased estimators? Normal(0,1) distribution 4 4 3 3 Density 2 2 1 1 0 0 −0.4 −0.4 −0.2 −0.2 0.0 0.0 0.2 0.2 0.4 0.4 x1 13/43

How to choose between two unbiased estimators? Uniform[0,1] distribution 60 60 Density 40 40 20 20 0 0 0.40 0.40 0.45 0.45 0.50 0.50 0.55 0.55 0.60 0.60 0.65 0.65 y1 14/43

How to choose between two unbiased estimators? The sampling distribution of our estimator depends on the underlying distribution of X i in the population! ◮ X i ∼ Normal the sample average outperforms the midrange ◮ X i ∼ Uniform the midrange outperforms the sample average However, the sample average is attractive default because it is often 1. has a sampling distribution that is well understood 2. more efficient (smaller sampling variance) than alternative estimators We will say more about this in the context of the WLLN and the CLT 15/43

The Standard Error Above we compared the average and the midrange estimators using the sampling variance Var(ˆ X ) = E [(ˆ X − E [ˆ X ]) 2 ] = E [ˆ X 2 ] − E [ˆ X ] 2 It is however common to use the square root of the sampling variance of our estimators This is called the standard error � Standard Error of ˆ Var(ˆ X = X ) 16/43

The Standard Error of the Sample Proportion Consider a Bernouilli random variable X where � 1 with probability p X = 0 with probability 1 − p The sample proportion is ¯ X n = 1 � i X i with variance n X ) = 1 Var( X i ) = n Var( X ) = p (1 − p ) � Var(ˆ n 2 n 2 n i but this depends on p which is unknown! We have an unbiased estimator of p , namely ˆ X and we can therefore estimate the variance as follows � Var(¯ X n ) = ¯ X (1 − ¯ X ) / n 17/43

The Standard Error of the Sample Mean When the distribution of X is unknown but i.i.d. we can also more generally derive the variance of the sample mean as follows X n ) = 1 Var( X i ) = Var( X ) � Var(¯ n 2 n i this again depends on an unknown parameter, Var ( X ), but that we also have an estimator of n 1 � � ( X i − ¯ X n ) 2 Var( X ) = n − 1 i =1 so that � Var( X n ) = � Var( X ) / n and we get the standard error by taking the square root 18/43

Calculating Standard Errors phat = mean ( rbinom (100,1,.54)) sqrt (phat * (1 - phat) / 100) # estimate ## [1] 0.049638695 sqrt (.54 * (1 - .54) / 100) # theoretical s.e. ## [1] 0.049839743 sqrt ( var ( rnorm (100,1,2)) / 100) # estimate ## [1] 0.16962716 sqrt (2 ^ 2 / 100) # theoretical s.e. ## [1] 0.2 19/43

Bias vs Variance Suppose we have 1. an unbiased estimator with a large sampling variance 2. a biased estimator with a small sampling variance Should we choosing our “best” estimator on ◮ bias, or ◮ variance ? 20/43

b b b b b bb b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bb b b b b b b b b b bb b b b b b b b bb b b b b b b b b b b b b b b b b b b Bias vs Variance High Variance Low Variance Low Bias b b b b b b b b High Bias 21/43

Bias vs Variance E[X]=0 10 10 8 8 Density 6 6 4 4 2 2 0 0 −0.4 −0.4 −0.2 −0.2 0.0 0.0 0.2 0.2 0.4 0.4 xhat 22/43

Bias vs Variance E[X]=0 10 10 8 8 Density 6 6 4 4 2 2 0 0 −0.4 −0.4 −0.2 −0.2 0.0 0.0 0.2 0.2 0.4 0.4 xhat 23/43

Mean Squared Error We may need to choose between two estimators one of which is unbiased Consider the biased estimator, is the sampling variance (or the standard error) still a good measure? Var(ˆ X ) = E [(ˆ X − E [ˆ X ]) 2 ] = E [(ˆ X − ( E [ˆ X ] − E [ X ]) − E [ X ]) 2 ] = E [(ˆ X − E [ X ] − Bias) 2 ] Suppose Var(ˆ X biased ) < Var(ˆ X unbiased ) what would you conclude? 24/43

Mean Squared Error We are interest in the spread relative to the truth!! This is called the Mean Squared Error (MSE) Mean Squared Error MSE = E [(ˆ X − E [ X ]) 2 ] We can show that MSE = E [(ˆ X − E [ X ]) 2 ] = E [(ˆ X − E [ˆ X ] + E [ˆ X ] − E [ X ]) 2 ] = E [(ˆ X − E [ˆ X ]) 2 ] + ( E [ˆ X ] − E [ X ]) 2 ] � �� Bias 2 Var(ˆ X ) There is a potential trade-off between Bias and Variance 25/43

Mean Squared Error Consider again the following two estimators of the variance: � � n i =1 ( X i − ¯ 1 X n ) 2 1. Var( X ) = n − 1 � Var( X ) = 1 � n i =1 ( X i − ¯ X n ) 2 2. n We saw that 1. is unbiased while 2. is not How about the MSE? Consider the example on p.10 bias2 var mse vhat1 0.00000001 0.94744282 0.94743335 vhat2 0.02247669 0.85506714 0.87753529 here X ∼ N (5 , 3) and n = 20, try for X ∼ χ 2 (1) and vary n 26/43

Point Estimation Edwin Leuven Introduction Last time we reviewed - PowerPoint PPT Presentation

Point Estimation Edwin Leuven Introduction Last time we reviewed statistical inference We saw that while in probability we ask: given a data generating process, what are the properties of the outcomes? in statistics the question is the

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Point-wise map recovery Task : Recover a point-to-point map from its functional representation n

point to point telephone & telegraph History of Information October 22 overview point to

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Why Laser Scanning? 3D Laser Scan Survey Conventional Survey Point by point Point by point - -

Why Laser Scanning? 3D Laser Scan Survey Conventional Survey Point by point Point by point - -

POINT CLOUD TO CAD LOD LEVELS POINT CLOUD MODEL POINT

Frame Relay Basic Configurations: Point to Point Frame Relay Basic Point to Point Configuration

A little introduction to MPI Jean-Luc Falcone July 2017 Message Passing Basics Point to point

[9] Orthogonalization Finding the closest point in a plane Goal: Given a point b and a plane, find

P2-29 a. Glazes cost estimation equation: Choose a representative high and low activity point

Java Basics: Part 3 - Exceptions Manuel Oriol Throwable The Throwable interface is meant to

Abdel Rahman Aminezza Scribes Tuan : - , , Wed Out Homework 2 : on Problem *

Mesh & Nodal One equation per node Analysis Solve for node voltages Mesh Analysis

GDAWasser AN EXAMPLE OF SHOGUN IN ACTION Marc Jansen, jansen@terrestris.de, terrestris GmbH

Mosers Algorithmic LLL Presented by Zhiwei Zhang Sep 6th, 2018 <latexit

Third Quarter 2018 Investor Call M. Terry Turner, President and CEO Harold R. Carpenter, EVP and

Committee February 16, 2018 HEALTH POLICY & ANALYTICS Office of Health Analytics Todays

Standard Model embeddings in orientifold string vacua Elias Kiritsis Ecole Polytechnique and

Point Estimation Edwin Leuven Introduction Last time we reviewed - PowerPoint PPT Presentation

Point Estimation Edwin Leuven Introduction Last time we reviewed statistical inference We saw that while in probability we ask: given a data generating process, what are the properties of the outcomes? in statistics the question is the

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Point-wise map recovery Task : Recover a point-to-point map from its functional representation n

point to point telephone &amp; telegraph History of Information October 22 overview point to

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Why Laser Scanning? 3D Laser Scan Survey Conventional Survey Point by point Point by point - -

Why Laser Scanning? 3D Laser Scan Survey Conventional Survey Point by point Point by point - -

POINT CLOUD TO CAD LOD LEVELS POINT CLOUD MODEL POINT

Frame Relay Basic Configurations: Point to Point Frame Relay Basic Point to Point Configuration

A little introduction to MPI Jean-Luc Falcone July 2017 Message Passing Basics Point to point

[9] Orthogonalization Finding the closest point in a plane Goal: Given a point b and a plane, find

P2-29 a. Glazes cost estimation equation: Choose a representative high and low activity point

Java Basics: Part 3 - Exceptions Manuel Oriol Throwable The Throwable interface is meant to

Abdel Rahman Aminezza Scribes Tuan : - , , Wed Out Homework 2 : on Problem *

Mesh &amp; Nodal One equation per node Analysis Solve for node voltages Mesh Analysis

GDAWasser AN EXAMPLE OF SHOGUN IN ACTION Marc Jansen, jansen@terrestris.de, terrestris GmbH

Mosers Algorithmic LLL Presented by Zhiwei Zhang Sep 6th, 2018 &lt;latexit

Third Quarter 2018 Investor Call M. Terry Turner, President and CEO Harold R. Carpenter, EVP and

Committee February 16, 2018 HEALTH POLICY &amp; ANALYTICS Office of Health Analytics Todays

Standard Model embeddings in orientifold string vacua Elias Kiritsis Ecole Polytechnique and

point to point telephone & telegraph History of Information October 22 overview point to

Mesh & Nodal One equation per node Analysis Solve for node voltages Mesh Analysis

Mosers Algorithmic LLL Presented by Zhiwei Zhang Sep 6th, 2018 <latexit

Committee February 16, 2018 HEALTH POLICY & ANALYTICS Office of Health Analytics Todays