estimating nonlinear functions of means
play

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: - PowerPoint PPT Presentation

Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020 1 / 30 Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination


  1. Estimating Nonlinear Functions of Means Peter J. Haas CS 590M: Simulation Spring Semester 2020 1 / 30

  2. Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 2 / 30

  3. Nonlinear Functions of Means Our focus up until now I Estimate quantities of the form µ = E [ X ] I E.g., expected win/loss of gambling game I We’ll now focus on more complex quantities Nonlinear functions of means: α = g ( µ 1 , µ 2 , . . . , µ d ), where I g is a nonlinear function I µ i = E [ X ( i ) ] for 1  i  d I For simplicity, take d = 2 and focus on α = g ( µ X , µ Y ) I µ X = E [ X ] and µ Y = E [ Y ] 3 / 30

  4. Bonferroni inequality Example: Retail Outlet - NBC - PLAT ) PLAN B) 21 I Goal: Estimate α = long-run average revenue per customer I X i = R i = revenue generated on day i I Y i = number of customers on day i I Assume that pairs ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , . . . are i.i.d. I Set ¯ i =1 X i and ¯ X n = (1 / n ) P n Y n = (1 / n ) P n i =1 Y i ¯ ¥ X 1 + · · · + X n X n α = lim = lim = ¯ Y 1 + · · · + Y n Y n My n !1 n !1 X I So α = g ( µ X , µ Y ), where g ( x , y ) = y 4 / 30

  5. Example: Higher-Order Moments I Let R 1 , R 2 , . . . be daily revenues as before I Assume that the R i ’s are i.i.d. (Critique?) I α = Var[ R ] = variance of daily revenue I Let X = R 2 and Y = R - yin x α = g ( µ X , µ Y ) , where g ( x , y ) = 5 / 30

  6. Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 6 / 30

  7. gcmx.tt/)--3Mxt4MyuaYdn--3Xn*4YnECdnT--3EExnTt4 Linger ; if Delta Method (Taylor Series) ELENI 'd Assume that function g ( x , y ) is smooth I Continuously di ff erentiable in neighborhood of ( µ x , µ y ) I I.e., g is continuous, as are ∂ g / ∂ x and ∂ g / ∂ y Point estimate I Run simulation n times to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. strong seonsistenoy I Set α n = g ( ¯ X n , ¯ Y n ) Plan → a) =L I This estimator is biased: I E [ α n ] = E [ g ( ¯ X n , ¯ Y n )] 6 = g ( E [ ¯ X n ] , E [ ¯ Y n ]) = g ( µ x , µ y ) = α I Jensen’s inequality: E [ α n ] = E [ g ( ¯ X n )] � g ( µ X ) = α if g is convex I By SLLN and continuity of g , we have bias ! 0 as n ! 1 (Estimator α n is asymptotically unbiased) 7 / 30

  8. Delta Method, Continued Confidence interval I ( ¯ X n , ¯ Y n ) should be “close” to ( µ X , µ Y ) for large n by SLLN I α n = g ( ¯ X n , ¯ Y n ) should be close to g ( µ X , µ Y ) = α α n � α = g ( ¯ X n , ¯ Y n ) � g ( µ X , µ Y ) = ∂ g X � µ X ) + ∂ g ∂ x ( µ X , µ Y ) · ( ¯ ∂ y ( µ X , µ Y ) · ( ¯ Y � µ Y ) n n = ¯ Z n I Z i = c ( X i � µ X ) + d ( Y i � µ Y ) and ¯ Z n = (1 / n ) P n i =1 Z i I c = ∂ g ∂ x ( µ X , µ Y ) and d = ∂ g ∂ y ( µ X , µ Y ) 8 / 30

  9. Delta Method, Continued Confidence interval, continued I { Z n : n � 1 } are i.i.d. as Z = c ( X � µ X ) + d ( Y � µ Y ) I E [ Z ] = 0 I By CLT, p n ¯ Z n / σ D ⇠ N (0 , 1) approximately for large n I Thus p n ( α n � α ) / σ D ⇠ N (0 , 1) approximately for large n I Here σ 2 = Var[ Z ] = E [ Z 2 ] = E � 2 ⇤ ⇥� c ( X � µ X ) + d ( Y � µ Y ) I So asymptotic 100(1 � δ )% CI is α n ± z δ σ / p n I z δ is 1 � ( δ / 2) quantile of standard normal distribution I Estimate c , d , and σ from data 9 / 30

  10. Delta Method, Continued Delta Method CI Algorithm 1. Simulate to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. 2. α n g ( ¯ X n , ¯ Y n ) ∂ x ( ¯ X n , ¯ ∂ y ( ¯ X n , ¯ 3. c n ∂ g Y n ) and d n ∂ g Y n ) � 2 n = ( n � 1) − 1 P n c n ( X i � ¯ X n ) + d n ( Y i � ¯ 4. s 2 � Y n ) i =1 5. Return asymptotic 100(1 � δ )% CI: α n � z δ s n p n , α n + z δ s n h i p n I SLLN and continuity assumptions imply that, with prob. 1, c n ! c , d n ! d , and s 2 n ! σ 2 10 / 30

  11. Example: Ratio Estimation: g ( x , y ) = x / y Multi-pass method (apply previous algorithm directly) - Y÷r " ' Ma , fry ' ' Ii α = c = d = α n = c n = d n = - En My . n � 2 s 2 n = ( n � 1) � 1 X c n ( X i � ¯ X n ) + d n ( Y i � ¯ � Y n ) i =1 11 / 30

  12. Example: Ratio Estimation: g ( x , y ) = x / y Single-pass method σ 2 = Var[ Z ] = Var[ c ( X � µ X ) + d ( Y � µ Y )] = Var[ X ] � 2 α Cov[ X , Y ] + α 2 Var[ Y ] µ 2 Y n = s n (1 , 1) � 2 α n s n (1 , 2) + α 2 n s n (2 , 2) s 2 � ¯ � 2 Y n 1 P n i =1 ( X i � ¯ I s n (1 , 1) = X n ) 2 Use n � 1 1 P n i =1 ( Y i � ¯ I s n (2 , 2) = Y n ) 2 single-pass n � 1 i =1 ( X i � ¯ X n )( Y i � ¯ 1 P n formulas I s n (1 , 2) = Y n ) n � 1 n = P n n = P n I Set S X i =1 X i and S Y i =1 Y i ! ! S X S Y k � 1 � ( k � 1) X k k � 1 � ( k � 1) Y k ( k � 1) v k = ( k � 1) v k � 1 + k � 1 k 12 / 30

  13. Delta Method for Stochastic Root-Finding Problem: Find ¯ θ such that E [ g ( X , ¯ θ )] = 0 (can replace 0 with any fixed constant) Applications: I Process control, risk management, finance, quantiles, . . . I Stochastic optimization: min θ E [ h ( X , θ )] I Optimality condition: ∂ ∂θ E [ h ( X , θ )] = 0 h i I Can often show that ∂ ∂ ∂θ E [ h ( X , θ )] = E ∂θ h ( X , θ ) I So take g ( X , θ ) = ∂ ∂θ h ( X , θ ) Point Estimate (Stochastic Average Approximation) I Generate X 1 , . . . , X n i.i.d. as X I Find θ n s.t. 1 P n i =1 g ( X i , θ n ) = 0 (deterministic problem) n 13 / 30

  14. Delta Method for Stochastic Root-Finding Problem: Find ¯ θ such that E [ g ( X , ¯ θ )] = 0 Point Estimate (Stochastic Average Approximation) I Generate X 1 , . . . , X n i.i.d. as X P n I Find θ n s.t. 1 i =1 g ( X i , θ n ) = 0 n How to find a confidence interval for ¯ θ ? I Taylor series: g ( X i , θ n ) ⇡ g ( X i , ¯ ∂θ ( X i , ¯ θ )( θ n � ¯ θ ) + ∂ g θ ) I Implies: 1 P n i =1 g ( X i , θ n ) ⇡ 1 P n i =1 g ( X i , ¯ θ ) � c n (¯ θ � θ n ) n n ⇥ ∂ g I where c n = 1 P n ∂ g ∂θ ( X i , ¯ ∂θ ( X , ¯ ⇤ θ ) ⇡ E θ ) O i =1 n Etgcxi , = I Implies: ¯ P n i =1 g ( X i , ¯ θ � θ n ⇡ 1 1 θ ) c n n I Implies: θ n � ¯ θ ⇡ N (0 , σ 2 / n ), where σ 2 = Var[ g ( X , ¯ n = E [ g ( X , ¯ θ )] / c 2 θ ) 2 ] / c 2 n 14 / 30

  15. Delta Method for Stochastic Root-Finding Algorithm 1. Simulate to get X 1 , . . . , X n i.i.d. 2. Find θ n s.t. 1 P n i =1 g ( X i , θ n ) = 0 n c n 1 P n ∂ g 3. ˆ ∂θ ( X i , θ n ) i =1 n 4. s 2 n 1 P n i =1 g ( X i , θ n ) 2 / ˆ c 2 n n 5. Return asymptotic 100(1 � δ )% CI: θ n � z δ s n p n , θ n + z δ s n h i p n I Can use pilot runs, etc. in the usual way 15 / 30

  16. Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 16 / 30

  17. Jackknife Method Estimate 9=9 ( Mx , My ) Overview I Naive point estimator α n = g ( ¯ X n , ¯ Y n ) is biased I Jackknife estimator has lower bias I Avoids need to compute partial derivatives as in Delta method I More computationally intensive Starting point: Taylor series + expectation E [ α n ] = α + b n + c n 2 + · · · I Thus bias is O ( n − 1 ) I Estimate b and adjust? α ∗ n = α n � b n n I Messy partial derivative calculation, adds noise 17 / 30

  18. Jackknife, Continued I Observe that E ( α n ) = α + b n + c n 2 + · · · b c E ( α n � 1 ) = α + n � 1 + ( n � 1) 2 + · · · I and so ✓ 1 1 ◆ c E [ n α n � ( n � 1) α n � 1 ] = α + c n � + · · · = α � n ( n � 1) + · · · n � 1 I Bias reduced to O ( n − 2 )! I Q: What is special about deleting the n th data point? 18 / 30

  19. Jackknife, Continued I Delete each data point in turn to get a low-bias estimator I Average the estimators to reduce variance Jackknife CI Algorithm for α = g ( µ X , µ Y ) 1. Choose n and δ , and set z δ = 1 � ( δ / 2) quantile of N (0 , 1) 2. Simulate to get ( X 1 , Y 1 ) , . . . , ( X n , Y n ) i.i.d. 3. α n g ( ¯ X n , ¯ Y n ) 4. For i = 1 to n n n ! 1 1 X X 4.1 α i n g X j , Y j (leave out X i ) n � 1 n � 1 j =1 j =1 j 6 = i j 6 = i 4.2 α n ( i ) n α n � ( n � 1) α i ( i th pseudovalue) n n (1 / n ) P n 5. Point estimator: α J i =1 α n ( i ) � 2 1 P n 6. v J � α n ( i ) � α J n = i =1 n n − 1 h i α J p n / n , α J p v J v J 7. 100(1 � δ )% CI: n � z δ n + z δ n / n 19 / 30

  20. Jackknife, Continued Observations I Not obvious that CI is correct (why?) I Substitutes computational brute force for analytical complexity I Not a one-pass algorithm I Basic jackknife breaks down for “non-smooth” statistics like quantiles, maximum (but can fix—see next lecture) 20 / 30

  21. Estimating Nonlinear Functions of Means Overview Delta Method Jackknife Method Bootstrap Confidence Intervals Complete Bias Elimination 21 / 30

  22. Bootstrap Confidence Intervals Another brute force method I Key idea: analyze variability of estimator using samples of original data I More general than jackknife (estimates entire sampling distribution of estimator, not just mean and variance) I Jackknife is somewhat better empirically at variance estimates I “Non-repeatable”, unlike jackknife I OK for quantiles, still breaks down for maximum 22 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend