approximate cross validation and dynamic experiments for
play

Approximate Cross-Validation and Dynamic Experiments for Policy - PowerPoint PPT Presentation

Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate Cross-Validation and Dynamic Experiments for Policy Choice Maximilian Kasy Department of Economics, Harvard University April 23, 2018 1 / 23 Approximate


  1. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate Cross-Validation and Dynamic Experiments for Policy Choice Maximilian Kasy Department of Economics, Harvard University April 23, 2018 1 / 23

  2. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Introduction Introduction ◮ Two separate, early stage projects: 1. Approximate cross-validation ◮ First order approximation to leave-one-out estimator. ◮ Relationship to Stein’s unbiased risk estimator. ◮ Accelerated tuning. ◮ Joint with Lester Mackey, MSR. 2. Dynamic experiments for policy choice ◮ Experimental design problem for choosing discrete treatment. ◮ Goal: maximize average outcome. ◮ Multiple waves. ◮ Joint with Anja Sautman, J-PAL. ◮ Feedback appreciated! 2 / 23

  3. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation Project 1: Approximate cross-validation ◮ Different ways of estimating risk (mean squared error): ◮ Covariance penalties, ◮ Stein’s Unbiased Risk Estimate (SURE), ◮ Cross-validation (CV). ◮ Result 1: ◮ Consider repeated draws of some vector. ◮ Then CV for estimating mean is approximately equal to SURE. ◮ Without normality, unknown variance! ◮ Result 2: ◮ Consider penalized M-estimation problem. ◮ Then CV for prediction loss is approximately equal to in-sample risk plus penalty, ◮ with a simple penalty based on gradient, Hessian. ◮ ⇒ algorithm for accelerated tuning! 3 / 23

  4. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation The normal means model ◮ θ , X ∈ R k ◮ X ∼ N ( θ , Σ) ◮ Estimator � θ ( X ) of θ (“almost differentiable”) ◮ Mean squared error: � θ − θ � 2 � MSE ( � � � θ , θ ) = 1 k E θ � θ j − θ j ) 2 � ( � = 1 k ∑ . E θ j ◮ Would like to estimate MSE ( � θ , θ ) . ◮ Choose tuning parameters to minimize estimated MSE. ◮ Choose between estimators to minimize estimated MSE. ◮ Theoretical tool for proving dominance results. 4 / 23

  5. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation Covariance penalty ◮ Efron (2004): Adding and subtracting θ j gives θ j − X j ) 2 = ( � θ j − θ j ) 2 + 2 · ( � ( � θ j − θ j )( θ j − X j )+( θ j − X j ) 2 . ◮ Thus MSE ( � θ , θ ) = 1 k ∑ j MSE j , where � θ j − θ j ) 2 � ( � MSE j = E θ � ( X j − θ j ) 2 � = E θ [( � θ j − X j ) 2 ]+ 2 E θ [( � θ j − θ j ) · ( X j − θ j )] − E θ = E θ [( � θ j − X j ) 2 ]+ 2Cov θ ( � θ j , X j ) − Var θ ( X j ) . ◮ First term: In-sample prediction error (observed). ◮ Second term: Covariance penalty (depends on unobserved θ ). ◮ Third term: Doesn’t depend on � θ . 5 / 23

  6. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation Stein’s Unbiased Risk Estimate ◮ Using partial integration and fact that ϕ ′ ( x ) = − x · ϕ ( x ) , can show � � � � θ − X � 2 + 2trace θ ′ · Σ � � � MSE = 1 − trace (Σ) . k E θ ◮ All terms on the right hand side are observed! Sample version: � � � � θ − X � 2 + 2trace θ ′ · Σ � � � SURE = 1 − trace (Σ) . k ◮ Key assumptions that we used: ◮ X is normally distributed. ◮ Σ is known. ◮ � θ is almost differentiable. 6 / 23

  7. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation Cross-validation ◮ Assume panel structure: X is a sample average, i = 1 ,..., n and j = 1 ,..., k , Y i ∼ i . i . d . ( θ , n · Σ) . X = 1 n ∑ Y i , i ◮ Leave-one-out mean and estimator: θ − i = � � 1 n − 1 ∑ X − i = Y i ′ , θ ( X − i ) . i ′ � = i ◮ n -fold cross-validation: CV i = � Y i − � CV = 1 n ∑ θ − i � 2 . CV i , i 7 / 23

  8. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation Large n : SURE ≈ CV Proposition Suppose � θ ( · ) is continuously differentiable in a neighborhood of θ , i − θ ) / √ and suppose X n = 1 n ∑ i Y n i with ( Y n n i.i.d. with expectation 0 and variance Σ . Let � Σ = 1 i − X n ) ′ . Then n 2 ∑ i ( Y n i − X n )( Y n � Σ n � CV n = � X n − � θ n � 2 + 2trace θ ′ · � +( n − 1 ) trace ( � � Σ n )+ o p ( 1 ) as n → ∞ . ◮ New result, I believe. ◮ “For large n , CV is the same as SURE, plus the irreducible forecasting error” n · trace (Σ) = E θ [ � Y i − θ � 2 ] . ◮ Does not require ◮ normality, ◮ known Σ ! 8 / 23

  9. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation Sketch of proof ◮ Let s = √ n − 1, omit superscript n , U i = 1 s ( Y i − X ) U i ∼ ( 0 , Σ) , X − i = X − 1 Y i = X + sU i s U i θ ( X − i ) = � � s � θ ′ ( X ) · U i +∆ i θ ( X ) − 1 ∆ i = o ( 1 s U i ) � U i U ′ Σ = 1 n ∑ i . i ◮ Then θ − i � 2 = � X + sU i − ( � CV i = � Y i − � s � θ ′ ( X ) · U i +∆ i ) � 2 θ − 1 � � θ � 2 + 2 = � X − � U i , � θ ′ ( X ) · U i + s 2 � U i � 2 � � � 1 � θ , ( s + 1 X − � s � s 2 � � θ ′ ( X ) · U i � 2 + 2 � ∆ i , Y i − � θ ′ ) U i θ − i � + 2 + . � � θ � 2 + 2trace θ ′ · � CV i = � X − � � +( n − 1 ) trace ( � CV = 1 n ∑ Σ Σ) i + 0 + o p ( 1 n ) . 9 / 23

  10. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation More general setting: Penalized M-estimation ◮ Suppose β = argmin b E [ m ( X , β )] . ◮ Estimate β using penalized M-estimation, � ∑ β ( λ ) = argmin m ( X i , b )+ π ( b , λ ) . b i ◮ Would like to choose λ to minimize the out-of-sample prediction error R ( λ ) = E [ m ( X , � β ( λ ))] . ◮ Leave-one-out estimator, n-fold cross-validation � ∑ β − i ( λ ) = argmin m ( X j , b )+ π ( b , λ ) . b j � = i m ( X i , � CV ( λ ) = 1 n ∑ β − i ( λ )) . i 10 / 23

  11. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation ◮ Computationally costly to re-estimate β for every choice of i and λ ! ◮ Notation for Hessian, gradients: � � m bb ( X j , � β ( λ ))+ π bb ( � ∑ H = β ( λ ) , λ ) j g i = m b ( X i , � β ( λ )) . ◮ First-order approximation to leave-one-out estimator (assuming 2nd derivatives): β ( λ ) ≈ H − 1 · g i . β − i ( λ ) − � � ◮ In-sample prediction error: m ( X i , � ¯ R ( λ ) = 1 n ∑ β ( λ )) . i 11 / 23

  12. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation ◮ Another first-order approximation: �� � β − i ( λ ) − � CV ( λ ) ≈ ¯ R ( λ )+ 1 n ∑ g i · β ( λ ) . i ◮ Combining the two approximations: R ( λ )+ 1 i · H − 1 · g i . CV ( λ ) ≈ ¯ n ∑ g t i ◮ ¯ R , g i and H are automatically available if Newton-Raphson was used for finding � β ( λ ) ! ◮ If not, could approximate then without bias using random subsample. 12 / 23

  13. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Approximate cross-validation Open questions ◮ Implementation! ◮ Regularity conditions for validity of approximations? ◮ Gains of speed in tuning, e.g., neural nets? ◮ Gains of efficiency relative to wasteful sample-partition methods? 13 / 23

  14. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Dynamic experiments for policy choice Project 2: Dynamic experiments for policy choice ◮ Setup: ◮ Optimal treatment assignment (multiple treatments) ◮ in multi-wave experiments. ◮ Goal: After experiment, choose a policy ◮ to maximize welfare (average outcome net of costs). ◮ Dynamic stochastic optimization problem, ◮ used normatively (for experimenter) rather than descriptively (as in structural models). ◮ Solution via exact backward induction. ◮ Outline: 1. Setup: ¯ d treatments, binary outcomes, T waves 2. Objective function: social welfare, max over treatment 3. Independent Beta priors for mean potential outcomes 4. Value functions, backward induction 14 / 23

  15. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Dynamic experiments for policy choice Setup ◮ Waves t = 1 ,..., T , sample sizes N t . ◮ Treatment D ∈ { 1 ,..., ¯ d } , outcomes Y ∈ { 0 , 1 } , potential outcomes Y d , ¯ d ∑ 1 ( D it = d ) Y d Y it = it . d = 1 it ,..., Y ¯ ◮ ( Y 0 d it ) are i.i.d. across both i and t . ◮ Denote θ d = E [ Y d t ] t = ∑ n d 1 ( D it = d ) i t = ∑ s d 1 ( D it = d , Y it = Y d it = 1 ) . i 15 / 23

  16. Approximate Cross-Validation and Dynamic Experiments for Policy Choice Dynamic experiments for policy choice Treatment assignment, outcomes, state space t ,..., n ¯ ◮ Treatment assignment in wave t : n t = ( n 1 d t ) . t ,..., s ¯ ◮ Outcomes of wave t : s t = ( s 1 d t ) . ◮ Cumulative versions: M t = ∑ t ′ ≤ t N t ′ , t ,..., m ¯ t ) = ∑ m t = ( m 1 d n t t ′ ≤ t t ,..., s ¯ t ) = ∑ r t = ( s 1 d s t . t ′ ≤ t ◮ Relevant information for the experimenter in period t + 1 is summarized by m t and r t . 16 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend