advanced econometrics 2 hilary term 2020 statistical
play

Advanced Econometrics 2, Hilary term 2020 Statistical decision - PowerPoint PPT Presentation

Statistical Decision Theory Advanced Econometrics 2, Hilary term 2020 Statistical decision theory Maximilian Kasy Department of Economics, Oxford University 1 / 53 Statistical Decision Theory Takeaways for this part of class 1. A general


  1. Statistical Decision Theory Advanced Econometrics 2, Hilary term 2020 Statistical decision theory Maximilian Kasy Department of Economics, Oxford University 1 / 53

  2. Statistical Decision Theory Takeaways for this part of class 1. A general framework to think about what makes a “good” estimator, test, etc. 2. How the foundations of statistics relate to those of microeconomic theory. 3. In what sense the set of Bayesian estimators contains most “reasonable” estimators. 2 / 53

  3. Statistical Decision Theory Examples of decision problems ◮ Decide whether or not the hypothesis of no racial discrimination in job interviews is true ◮ Provide a forecast of the unemployment rate next month ◮ Provide an estimate of the returns to schooling ◮ Pick a portfolio of assets to invest in ◮ Decide whether to reduce class sizes for poor students ◮ Recommend a level for the top income tax rate 3 / 53

  4. Statistical Decision Theory Agenda ◮ Basic definitions ◮ Optimality criteria ◮ Relationships between optimality criteria ◮ Analogies to microeconomics ◮ Two justifications of the Bayesian approach 4 / 53

  5. Statistical Decision Theory Basic definitions Components of a general statistical decision problem ◮ Observed data X ◮ A statistical decision a ◮ A state of the world θ ◮ A loss function L ( a , θ ) (the negative of utility) ◮ A statistical model f ( X | θ ) ◮ A decision function a = δ ( X ) 5 / 53

  6. Statistical Decision Theory Basic definitions How they relate ◮ underlying state of the world θ ⇒ distribution of the observation X . ◮ decision maker: observes X ⇒ picks a decision a ◮ her goal: pick a decision that minimizes loss L ( a , θ ) ( θ unknown state of the world) ◮ X is useful ⇔ reveals some information about θ ⇔ f ( X | θ ) does depend on θ . ◮ problem of statistical decision theory: find decision functions δ which “make loss small.” 6 / 53

  7. Statistical Decision Theory Basic definitions Graphical illustration decision function a=δ(X) observed data decision X a statistical model X~f(x,θ) state of the world loss θ L(a,θ) 7 / 53

  8. Statistical Decision Theory Basic definitions Examples ◮ investing in a portfolio of assets: ◮ X : past asset prices ◮ a : amount of each asset to hold ◮ θ : joint distribution of past and future asset prices ◮ L : minus expected utility of future income ◮ decide whether or not to reduce class size: ◮ X : data from project STAR experiment ◮ a : class size ◮ θ : distribution of student outcomes for different class sizes ◮ L : average of suitably scaled student outcomes, net of cost 8 / 53

  9. Statistical Decision Theory Basic definitions Practice problem For each of the examples on slide 2, what are ◮ the data X , ◮ the possible actions a , ◮ the relevant states of the world θ , and ◮ reasonable choices of loss function L ? 9 / 53

  10. Statistical Decision Theory Basic definitions Loss functions in estimation ◮ goal: find an a ◮ which is close to some function µ of θ . ◮ for instance: µ ( θ ) = E [ X ] ◮ loss is larger if the difference between our estimate and the true value is larger Some possible loss functions: 1. squared error loss, L ( a , θ ) = ( a − µ ( θ )) 2 2. absolute error loss, L ( a , θ ) = | a − µ ( θ ) | 10 / 53

  11. Statistical Decision Theory Basic definitions Loss functions in testing ◮ goal: decide whether H 0 : θ ∈ Θ 0 is true ◮ decision a ∈ { 0 , 1 } (accept / reject) Possible loss function:  if a = 1 , θ ∈ Θ 0 1  L ( a , θ ) = if a = 0 , θ / ∈ Θ 0 c  0 else. truth θ ∈ Θ 0 θ / ∈ Θ 0 decision a 0 0 c 1 1 0 11 / 53

  12. Statistical Decision Theory Basic definitions Risk function R ( δ , θ ) = E θ [ L ( δ ( X ) , θ )] . ◮ expected loss of a decision function δ ◮ R is a function of the true state of the world θ . ◮ crucial intermediate object in evaluating a decision function ◮ small R ⇔ good δ ◮ δ might be good for some θ , bad for other θ . ◮ Decision theory deals with this trade-off. 12 / 53

  13. Statistical Decision Theory Basic definitions Example: estimation of mean ◮ observe X ∼ N ( µ , 1 ) ◮ want to estimate µ ◮ L ( a , θ ) = ( a − µ ( θ )) 2 ◮ δ ( X ) = α + β · X Practice problem (Estimation of means) Find the risk function for this decision problem. 13 / 53

  14. Statistical Decision Theory Basic definitions Variance / Bias trade-off Solution: R ( δ , µ ) = E [( δ ( X ) − µ ) 2 ] = Var( δ ( X ))+Bias( δ ( X )) 2 = β 2 Var( X )+( α + β E [ X ] − E [ X ]) 2 = β 2 +( α +( β − 1 ) µ ) 2 . ◮ equality 1 and 2: always true for squared error loss ◮ Choosing β (and α ) involves a trade-off of bias and variance, ◮ this trade-off depends on µ . 14 / 53

  15. Statistical Decision Theory Optimality criteria Optimality criteria ◮ Ranking provided by the risk function is multidimensional: ◮ a ranking of performance between decision functions for every θ ◮ To get a global comparison of their performance, have to aggregate this ranking into a global ranking. ◮ preference relationship on space of risk functions ⇒ preference relationship on space of decision functions 15 / 53

  16. Statistical Decision Theory Optimality criteria Illustrations for intuition ◮ Suppose θ can only take two values, ◮ ⇒ risk functions are points in a 2D-graph, ◮ each axis corresponds to R ( δ , θ ) for θ = θ 0 , θ 1 . R(.,θ 1 ) R(.,θ 0 ) 16 / 53

  17. Statistical Decision Theory Optimality criteria Three approaches to get a global ranking 1. partial ordering : a decision function is better relative to another if it is better for every θ 2. complete ordering, weighted average : a decision function is better relative to another if a weighted average of risk across θ is lower weights ∼ prior distribution 3. complete ordering, worst case : a decision function is better relative to another if it is better under its worst-case scenario. 17 / 53

  18. Statistical Decision Theory Optimality criteria Approach 1: Admissibility Dominance: δ is said to dominate another function δ ′ if R ( δ , θ ) ≤ R ( δ ′ , θ ) for all θ , and R ( δ , θ ) < R ( δ ′ , θ ) for at least one θ . Admissibility: decisions functions which are not dominated are called admissible, all other decision functions are inadmissible. 18 / 53

  19. Statistical Decision Theory Optimality criteria R(.,θ 1 ) feasible admissible R(.,θ 0 ) 19 / 53

  20. Statistical Decision Theory Optimality criteria ◮ admissibility ∼ “Pareto frontier” ◮ Dominance only generates a partial ordering of decision functions. ◮ in general: many different admissible decision functions. 20 / 53

  21. Statistical Decision Theory Optimality criteria Practice problem ◮ you observe X i ∼ iid N ( µ , 1 ) , i = 1 ,..., n for n > 1 ◮ your goal is to estimate µ , with squared error loss ◮ consider the estimators 1. δ ( X ) = X 1 2. δ ( X ) = 1 n ∑ i X i ◮ can you show that one of them is inadmissible? 21 / 53

  22. Statistical Decision Theory Optimality criteria Approach 2: Bayes optimality ◮ natural approach for economists: ◮ trade off risk across different θ ◮ by assigning weights π ( θ ) to each θ Integrated risk: � R ( δ , π ) = R ( δ , θ ) π ( θ ) d θ . 22 / 53

  23. Statistical Decision Theory Optimality criteria Bayes decision function: minimizes integrated risk, δ ∗ = argmin R ( δ , π ) . δ ◮ Integrated risk ∼ linear indifference planes in space of risk functions ◮ prior ∼ normal vector for indifference planes 23 / 53

  24. Statistical Decision Theory Optimality criteria R(.,θ 1 ) R(δ*,.) π(θ) R(.,θ 0 ) 24 / 53

  25. Statistical Decision Theory Optimality criteria Decision weights as prior probabilities � π ( θ ) d θ < ∞ ◮ suppose 0 < � π ( θ ) d θ = 1 (normalize) ◮ then wlog ◮ if additionally π ≥ 0 ◮ then π is called a prior distribution 25 / 53

  26. Statistical Decision Theory Optimality criteria Posterior ◮ suppose π is a prior distribution ◮ posterior distribution: π ( θ | X ) = f ( X | θ ) π ( θ ) m ( X ) ◮ normalizing constant = prior likelihood of X � m ( X ) = f ( X | θ ) π ( θ ) d θ 26 / 53

  27. Statistical Decision Theory Optimality criteria Practice problem ◮ you observe X ∼ N ( θ , 1 ) ◮ consider the prior θ ∼ N ( 0 , τ 2 ) ◮ calculate 1. m ( X ) 2. π ( θ | X ) 27 / 53

  28. Statistical Decision Theory Optimality criteria Posterior expected loss � R ( δ , π | X ) := L ( δ ( X ) , θ ) π ( θ | X ) d θ Proposition Any Bayes decision function δ ∗ can be obtained by minimizing R ( δ , π | X ) through choice of δ ( X ) for every X . Practice problem Show that this is true. Hint: show first that � R ( δ , π ) = R ( δ ( X ) , π | X ) m ( X ) dX . 28 / 53

  29. Statistical Decision Theory Optimality criteria Bayes estimator with quadratic loss ◮ assume quadratic loss, L ( a , θ ) = ( a − µ ( θ )) 2 ◮ posterior expected loss: R ( δ , π | X ) = E θ | X [ L ( δ ( X ) , θ ) | X ] ( δ ( X ) − µ ( θ )) 2 | X � � = E θ | X = Var( µ ( θ ) | X )+( δ ( X ) − E [ µ ( θ ) | X ]) 2 ◮ Bayes estimator minimizes posterior expected loss ⇒ δ ∗ ( X ) = E [ µ ( θ ) | X ] . 29 / 53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend