Evaluating Estimators Statistical evaluation ways of choosing with- - PowerPoint PPT Presentation

Evaluating Estimators Statistical evaluation – ways of choosing with- out access to test data Mean Squared Error (MSE) : The MSE of an About this class estimator W of a parameter θ is the function of θ defined by E θ ( W − θ ) 2 We’ll talk about the concepts of mean squared error, bias, and variance, and discuss Alternatives? (Any increasing function of | W − the tradeo ff s θ | could work...) We’ll discuss linear regression and show how Bias/Variance decomposition: to estimate the parameters of a linear model E ( W − θ ) 2 = E [ W 2 ] + θ 2 − 2 θ E [ W ] + ( E [ W ]) 2 − ( E [ W ]) 2 = (Bias W ) 2 + E [ W 2 ] − ( E [ W ]) 2 1 2

Estimators for the Normal Distribution Let X 1 , . . . , X n be iid N ( µ, σ 2 ) Unbiased estimator for mean is sample mean = (Var W ) + (Bias W ) 2 Unbiased estimator for variance is the sample variance: where n 1 S 2 = ( X i − X ) 2 � Bias W = E θ W − θ n − 1 i =1 Proof: Unbiased estimators ( E θ W = θ for all θ ) are n 1 good at controlling bias! An unbiased estima- E [ S 2 ] = E [ ( X i − X ) 2 ) � n − 1( tor has MSE equal to its variance i =1 n n 1 i ) + nX 2 − 2 X X 2 � � = n − 1[ E ( X i ] i =1 i =1 n 1 i − nX 2 ) X 2 � = n − 1 E ( i =1 3

1 Proof: 1 − nEX 2 ) n − 1( nEX 2 = n n n g ( X i ))] 2 � � � Var g ( X i ) = E [ g ( X i ) − E ( i =1 i =1 i =1 Now we need to use a couple of additional n facts: ( g ( X i ) − Eg ( X i ))] 2 � = E [ i =1 1 − ( EX 1 ) 2 = σ 2 If we expand this, there are n terms of the form EX 2 ( g ( X i ) − Eg ( X i )) 2 and EX 2 − ( EX ) 2 = σ 2 /n The expectation of this term is Var g ( X i ). There- fore, for n of them we get n Var g ( X 1 ). (This second is basically the definition of stan- dard error) What about the other terms? They are all of the form: To show the second, here’s a lemma: ( g ( X i ) − Eg ( X i ))( g ( X j ) − Eg ( X j )) n � Var g ( X i ) = n Var g ( X 1 ) with i � = j The expectation of this is the co- i =1 variance of X i and X j , which is 0 from inde- (where Eg ( X i )) and Var g ( X i ) exist) pendence.

MSEs for Estimators for the Normal Distribution Unbiased estimator for the mean µ is X Unbi- ased estimator for the variance σ 2 is S 2 MSEs for these estimators are: Now we plug back into the expression for E [ S 2 ] E ( X − µ ) 2 = Var X = σ 2 and find: n 1 1 − nEX 2 ) E [ S 2 ] = n − 1( nEX 2 E ( S 2 − σ 2 ) 2 = Var S 2 = 2 σ 4 n − 1 n − 1( n ( σ 2 + µ 2 ) − n ( σ 2 1 n + µ 2 )) = σ 2 = 1 � n MLE for the variance is ˆ i =1 ( X i − n X ) 2 = n − 1 n S 2 = σ 2 σ 2 = E ( n − 1 S 2 ) = ( n − 1 ) σ 2 E ˆ n n σ 2 = Var ( n − 1 S 2 ) Var ˆ n 4

= ( n − 1 ) 2 Var S 2 n Bias/Variance Tradeo ff in General ) 2 2 σ 4 = ( n − 1 Keep in mind: MSE is not the last word. Should n n − 1 we be comfortable using biased estimators? = 2( n − 1) σ 4 Why are they biased? n 2 Is MSE reasonable for scale parameters (as op- MSE, using the bias/variance decomposition posed to location ones?) – forgives underesti- σ 2 − σ 2 ) 2 = 2( n − 1) σ 4 + ( n − 1 mation... σ 2 − σ 2 ) 2 E (ˆ n 2 n Hypothesis space too simple? High bias, low = 2 n − 1 σ 4 variance n 2 Which is less than Hypothesis space too complex? Low bias, high 2 σ 4 variance n − 1 5

Least Squares Regression Define x and y as usual from our sample data. Statistics: describing data, inferring conclu- Now define: sions n ( x i − x ) 2 � S xx = Machine learning: predicting future data (out- i =1 of-sample) n ( y i − y ) 2 � S yy = What would be a reasonable thing to do in the i =1 following case (diagram of point cloud)? n � S xy = ( x i − x )( y i − y ) i =1 Assumption for linear regression: data can be modeled by Let’s fit a line to the data as best as we can. y i = α + β x i + ǫ i How do we define this? Residual sum of squares (RSS) n First algorithmic question for us: how to find ( y i − ( c + dx i )) 2 � α and β ? i =1 6 7

n n n Now, find a and b , estimators of α and β , such ( x i − x ) 2 +2 ( x − a ) 2 � � � = ( x i − x )( x − a )+ that: i =1 i =1 i =1 n n Second term drops out, basically giving us our ( y i − ( c + dx i )) 2 = ( y i − ( a + bx i )) 2 � � min result c,d i =1 i =1 For a given value of d , the minimum value of For any fixed value of d , the minimizing value RSS is then of c can be found as follows. n n n (( y i − dx i ) − ( y − dx )) 2 ( y i − ( c + dx i )) 2 = (( y i − dx i ) − c ) 2 � � � i =1 i =1 i =1 n (( y i − y ) − d ( x i − x )) 2 � = Turns out the right side is minimized at i =1 n c = 1 � ( y i − dx i ) = S yy − 2 dS xy + d 2 S xx n i =1 Take the derivative with respect to d and set = y − dx to 0 − 2 S xy + 2 dS xx = 0 Why? n n ⇒ d = S xy ( x i − a ) 2 = min ( x i − x + x − a ) 2 � � min S xx a a i =1 i =1

A Statistical Method: BLUE Assumptions: EY i = α + β x i Var Y i = σ 2 Second one implies that variance is the same for all data points No assumption needed on the distribution of the Y i We’ll get di ff erent lines if we regress x on y ! (exercise) BLUE: Best Linear Unbiased Estimator Linear: estimator of the form � n i =1 d i Y i Unbiased: estimator must satisfy E � n i =1 d i Y i = β Therefore β = � n i =1 d i E [ Y i ] n � = d i ( α + β x i ) i =1 8

n n � � = α d i + β d i x i i =1 i =1 Must hold for all α and β . This is true i ff � n i =1 d i = 0 and � n i =1 d i x i = 1 The advantage of working under statistically Best: Smallest variance (Equal to MSE for un- explicit assumptions is we also get statistical biased estimators) knowledge about our estimator n n n i = σ 2 d 2 � � Var b = σ 2 d 2 Var d i Y i = i Var Y i � S xx i =1 i =1 i =1 n n i σ 2 = σ 2 d 2 d 2 � � = If you can choose the x i , you can design the i i =1 i =1 experiment to try and minimize the variance! The BLUE is then defined by constants d i that Similar analysis shows that the BLUE of α is minimize � n i =1 d 2 i while satisfying the constraints the same a as in least squares derived above. It turns out that the choices d i = x i − x S xx are the choices that do this, which gives us b = S xy S xx

Evaluating Estimators Statistical evaluation ways of choosing with- - PowerPoint PPT Presentation

Evaluating Estimators Statistical evaluation ways of choosing with- out access to test data Mean Squared Error (MSE) : The MSE of an About this class estimator W of a parameter is the function of defined by E ( W ) 2 Well

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Dynamic Panel Data estimators Christopher F Baum ECON 8823: Applied Econometrics Boston College,

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC)

Regression Discontinuity Estimators and LATE James Heckman University of Chicago Econ 312 May

Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true

Continuous attractors as unreliable estimators Arvind Murugan Dept. of Physics Regression using

Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University

Survival models and Cox-regression Rates and Survival Lifetable estimators Bendix Carstensen

Survival Rates and Multiple timescales Survival Lifetable estimators Competing risks Kaplan-

Minimax risk of truncated series estimators over symmetric convex polytopes Adel Javanmard

Permanent estimators via random matrices Mark Rudelson joint work with Ofer Zeitouni Department

The Principles Underlying Evaluation Estimators James J. Heckman University of Chicago Econ

Normal Form Games Game Theory MohammadAmin Fazli Social and Economic Networks 1 TOC Self

Climate-KIC European Collaboration on User-driven Research and Innovation for Climate-Knowledge,

Spatial Competition with Heterogeneous Firms Jonathan Vogel November 2007 Jonathan Vogel ()

ZEW, July 29, 2020 Priority Services Have Innate Structural Barriers to Competition Priority

Estimation theory Parametric estimation Properties of estimators Minimum variance

ECON2228 Notes 3 Christopher F Baum Boston College Economics 20142015 cfb (BC Econ)

Linear Regression, Regularization Bias-Variance Tradeoff Thanks to C Guestrin, T Dietterich, R

Introduction to Statistical Learning Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines