econ 2148 fall 2017 applications of gaussian process
play

Econ 2148, fall 2017 Applications of Gaussian process priors - PowerPoint PPT Presentation

Shrinkage Econ 2148, fall 2017 Applications of Gaussian process priors Maximilian Kasy Department of Economics, Harvard University 1 / 36 Shrinkage Applications from my own work Agenda Optimal treatment assignment in experiments.


  1. Shrinkage Econ 2148, fall 2017 Applications of Gaussian process priors Maximilian Kasy Department of Economics, Harvard University 1 / 36

  2. Shrinkage Applications from my own work Agenda ◮ Optimal treatment assignment in experiments. ◮ Setting: Treatment assignment given baseline covariates ◮ General decision theory result: Non-random rules dominate random rules ◮ Prior for expectation of potential outcomes given covariates ◮ Expression for MSE of estimator for ATE to minimize by treatment assignment ◮ Optimal insurance and taxation. ◮ Review: Envelope theorem. ◮ Economic setting: Co-insurance rate for health insurance ◮ Statistical setting: prior for behavioral average response function ◮ Expression for posterior expected social welfare to maximize by choice of co-insurance rate 2 / 36

  3. Shrinkage Applications use Gaussian process priors 1. Optimal experimental design ◮ How to assign treatment to minimize mean squared error for treatment effect estimators? ◮ Gaussian process prior for the conditional expectation of potential outcomes given covariates. 2. Optimal insurance and taxation ◮ How to choose a co-insurance rate or tax rate to maximize social welfare, given (quasi-)experimental data? ◮ Gaussian process prior for the behavioral response function mapping the co-insurance rate into the tax base. 3 / 36

  4. Shrinkage Experimental design Application 1 “Why experimenters might not always want to randomize” Setup 1. Sampling: random sample of n units baseline survey ⇒ vector of covariates X i 2. Treatment assignment: binary treatment assigned by D i = d i ( X , U ) X matrix of covariates; U randomization device 3. Realization of outcomes: Y i = D i Y 1 i +( 1 − D i ) Y 0 i 4. Estimation: estimator � β of the (conditional) average treatment effect, β = 1 n ∑ i E [ Y 1 i − Y 0 i | X i , θ ] 4 / 36

  5. Shrinkage Experimental design Questions ◮ How should we assign treatment? ◮ In particular, if X i has continuous or many discrete components? ◮ How should we estimate β ? ◮ What is the role of prior information? 5 / 36

  6. Shrinkage Experimental design Some intuition ◮ “Compare apples with apples” ⇒ balance covariate distribution. ◮ Not just balance of means! ◮ We don’t add random noise to estimators – why add random noise to experimental designs? ◮ Identification requires controlled trials (CTs), but not randomized controlled trials (RCTs). 6 / 36

  7. Shrinkage Experimental design General decision problem allowing for randomization ◮ General decision problem: ◮ State of the world θ , observed data X , randomization device U ⊥ X , ◮ decision procedure δ ( X , U ) , loss L ( δ ( X , U ) , θ ) . ◮ Conditional expected loss of decision procedure δ ( X , U ) : R ( δ , θ | U = u ) = E [ L ( δ ( X , u ) , θ ) | θ ] ◮ Bayes risk: � � R B ( δ , π ) = R ( δ , θ | U = u ) d π ( θ ) dP ( u ) ◮ Minimax risk: � R mm ( δ ) = R ( δ , θ | U = u ) dP ( u ) max θ 7 / 36

  8. Shrinkage Experimental design Theorem (Optimality of deterministic decisions) Consider a general decision problem. Let R ∗ equal R B or R mm . Then: 1. The optimal risk R ∗ ( δ ∗ ) , when considering only deterministic procedures δ ( X ) , is no larger than the optimal risk when allowing for randomized procedures δ ( X , U ) . 2. If the optimal deterministic procedure δ ∗ is unique, then it has strictly lower risk than any non-trivial randomized procedure. 8 / 36

  9. Shrinkage Experimental design Practice problem Proof this. Hints: ◮ Assume for simplicity that U has finite support. ◮ Note that a (weighted) average of numbers is always at least as large as their minimum. ◮ Write the risk (Bayes or minimax) of any randomized assignment rule as (weighted) average of the risk of deterministic rules. 9 / 36

  10. Shrinkage Experimental design Solution ◮ Any robability distribution P ( u ) satisfies ◮ ∑ u P ( u ) = 1, P ( u ) ≥ 0 for all u . ◮ Thus ∑ u R u · P ( u ) ≥ min u R u for any set of values R u . ◮ Let δ u ( x ) = δ ( x , u ) . ◮ Then � R B ( δ , π ) = ∑ R ( δ u , θ ) d π ( θ ) P ( u ) u � R ( δ u , θ ) d π ( θ ) = min u R B ( δ u , π ) . ≥ min u ◮ Similarly R mm ( δ ) = ∑ R ( δ u , θ ) P ( u ) max θ u R ( δ u , θ ) = min u R mm ( δ u ) . ≥ min u max θ 10 / 36

  11. Shrinkage Experimental design Bayesian setup ◮ Back to experimental design setting. ◮ Conditional distribution of potential outcomes: for d = 0 , 1 Y d i | X i = x ∼ N ( f ( x , d ) , σ 2 ) . ◮ Gaussian process prior: f ∼ GP ( µ , C ) , E [ f ( x , d )] = µ ( x , d ) Cov ( f ( x 1 , d 1 ) , f ( x 2 , d 2 )) = C (( x 1 , d 1 ) , ( x 2 , d 2 )) ◮ Conditional average treatment effect (CATE): β = 1 i | X i , θ ] = 1 n ∑ E [ Y 1 i − Y 0 n ∑ f ( X i , 1 ) − f ( X i , 0 ) . i i 11 / 36

  12. Shrinkage Experimental design Notation: ◮ Covariance matrix C , where C i , j = C (( X i , D i ) , ( X j , D j )) ◮ Mean vector µ , components µ i = µ ( X i , D i ) ◮ Covariance of observations with CATE, C i = Cov ( Y i , β | X , D ) = 1 n ∑ ( C (( X i , D i ) , ( X j , 1 )) − C (( X i , D i ) , ( X j , 0 ))) . j Practice problem ◮ Derive the posterior expectation � β of β . ◮ Derive risk of any deterministic treatment assignment vector d , assuming 1. The estimator � β is used. 2. The loss function ( � β − β ) 2 is considered. 12 / 36

  13. Shrinkage Experimental design Solution ◮ The posterior expectation � β of β equals β = µ β + C ′ · ( C + σ 2 I ) − 1 · ( Y − µ ) . � ◮ The corresponding risk equals R B ( d , � β | X ) = Var ( β | X , Y ) = Var ( β | X ) − Var ( E [ β | X , Y ] | X ) = Var ( β | X ) − C ′ · ( C + σ 2 I ) − 1 · C . 13 / 36

  14. Shrinkage Experimental design Discrete optimization ◮ The optimal design solves C ′ · ( C + σ 2 I ) − 1 · C . max d ◮ Possible optimization algorithms: 1. Search over random d 2. greedy algorithm 3. simulated annealing 14 / 36

  15. Shrinkage Experimental design Variation of the problem Practice problem ◮ Suppose that the researcher insists on estimating β using a simple comparison of means, β = 1 D i Y i − 1 � n 1 ∑ n 0 ∑ ( 1 − D i ) Y i . i i ◮ Derive again the risk of any deterministic treatment assignment vector d , assuming 1. The estimator � β is used. 2. The loss function ( � β − β ) 2 is considered. 15 / 36

  16. Shrinkage Experimental design Solution ◮ Notation: i = µ ( X i , d ) and C d 1 , d 2 ◮ Let µ d = C (( X i , d 1 ) , ( X j , d 2 )) . i , j ◮ Collect these terms in the vectors µ d and matrices C d 1 , d 2 , and let � � C 00 C 01 µ = ( µ 1 , µ 2 ) , � � C = . C 10 C 11 ◮ Weights w = ( w 0 , w 1 ) , w 1 i = d i n 1 − 1 n , i = − 1 − d i w 0 n 0 + 1 n . ◮ Risk: Sum of variance and squared bias, � � � � 2 + w ′ · � 1 + 1 β | X ) = σ 2 · w ′ · � R B ( d , � + µ C · w . n 1 n 0 16 / 36

  17. Shrinkage Experimental design Special case linear separable model ◮ Suppose f ( x , d ) = x ′ · γ + d · β , γ ∼ N ( 0 , Σ) , and we estimate β using comparison of means. 1 − X 0 ) ′ · γ , prior expected squared bias ◮ Bias of � β equals ( X 1 − X 1 − X 0 ) ′ · Σ · ( X 0 ) . ( X ◮ Mean squared error � � 1 + 1 1 − X 1 − X MSE ( d 1 ,..., d n ) = σ 2 · 0 ) ′ · Σ · ( X 0 ) . +( X n 1 n 0 ◮ ⇒ Risk is minimized by 1. choosing treatment and control arms of equal size, 2. and optimizing balance as measured by the difference in covariate 1 − X 0 ) . means ( X 17 / 36

  18. Shrinkage Envelope theorem Review for application 2: The envelope theorem ◮ Policy parameter t ◮ Vector of individual choices x ◮ Choice set X ◮ Individual utility υ ( x , t ) ◮ Realized choices x ( t ) ∈ argmax υ ( x , t ) . x ∈ X ◮ Realized utility V ( t ) = max x ∈ X υ ( x , t ) = υ ( x ( t ) , t ) 18 / 36

  19. Shrinkage Envelope theorem ◮ Let x ∗ = x ( t ∗ ) for some fixed t ∗ ◮ Define V ( t ) = V ( t ) − υ ( x ∗ , t ) ˜ (1) = υ ( x ( t ) , t ) − υ ( x ( t ∗ ) , t ) x ∈ X υ ( x , t ) − υ ( x ∗ , t ) . = max (2) ◮ Definition of ˜ V immediately implies: ◮ ˜ V ( t ) ≥ 0 for all t and ˜ V ( t ∗ ) = 0. ◮ Thus: t ∗ is a global minimizer of ˜ V . ◮ If ˜ V is differentiable at t ∗ : ˜ V ′ ( t ∗ ) = 0 ◮ Thus V ′ ( t ∗ ) = ∂ ∂ t υ ( x ∗ , t ) | t = t ∗ , ◮ Behavioral responses don’t matter for effect of policy change on individual utility! 19 / 36

  20. Shrinkage Optimal insurance Application 2 “Optimal insurance and taxation using machine learning” Economic setting ◮ Population of insured individuals i . ◮ Y i : health care expenditures of individual i . ◮ T i : share of health care expenditures covered by the insurance 1 − T i : coinsurance rate; Y i · ( 1 − T i ) : out-of-pocket expenditures ◮ Behavioral response to share covered: structural function Y i = g ( T i , ε i ) . ◮ Per capita expenditures under policy t : average structural function m ( t ) = E [ g ( t , ε i )] . 20 / 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend