draft
play

Draft 1 Density estimation by Monte Carlo and randomized - PowerPoint PPT Presentation

Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo (RQMC) Pierre LEcuyer Joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer NUS, Singapore, October 2019 Draft 2 What this talk is about Monte


  1. Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo (RQMC) Pierre L’Ecuyer Joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer NUS, Singapore, October 2019

  2. Draft 2 What this talk is about Monte Carlo (MC) simulation is widely used to estimate the expectation E [ X ] of a random variable X and compute a confidence interval on E [ X ]. MSE = Var [ ¯ X n ] = O ( n − 1 ).

  3. Draft 2 What this talk is about Monte Carlo (MC) simulation is widely used to estimate the expectation E [ X ] of a random variable X and compute a confidence interval on E [ X ]. MSE = Var [ ¯ X n ] = O ( n − 1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F ( x ) = P [ X ≤ x ], or its density f defined by f ( x ) = F ′ ( x ).

  4. Draft 2 What this talk is about Monte Carlo (MC) simulation is widely used to estimate the expectation E [ X ] of a random variable X and compute a confidence interval on E [ X ]. MSE = Var [ ¯ X n ] = O ( n − 1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F ( x ) = P [ X ≤ x ], or its density f defined by f ( x ) = F ′ ( x ). If X 1 , . . . , X n are n indep. realizations of X , the empirical cdf n F n ( x ) = 1 ˆ � I [ X i ≤ x ] n i =1 is unbiased for F ( x ) at all x , and Var [ ˆ F n ( x )] = O ( n − 1 ).

  5. Draft 2 What this talk is about Monte Carlo (MC) simulation is widely used to estimate the expectation E [ X ] of a random variable X and compute a confidence interval on E [ X ]. MSE = Var [ ¯ X n ] = O ( n − 1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F ( x ) = P [ X ≤ x ], or its density f defined by f ( x ) = F ′ ( x ). If X 1 , . . . , X n are n indep. realizations of X , the empirical cdf n F n ( x ) = 1 ˆ � I [ X i ≤ x ] n i =1 is unbiased for F ( x ) at all x , and Var [ ˆ F n ( x )] = O ( n − 1 ). For a continuous r.v. X , the density f provides a better visual idea of the distribution. Here we focus on estimating f over [ a , b ] ⊂ R .

  6. Draft 3 Setting Classical density estimation was developed in the context where independent observations X 1 , . . . , X n of X are given and one wishes to estimate the density f of X from that. Best rate: Var [ˆ f n ( x )] = O ( n − 4 / 5 ). Here we assume that X 1 , . . . , X n are generated by simulation from a stochastic model. We can choose n and we have some freedom on how the simulation is performed. The X i ’s are realizations of a random variable X = g ( U ) ∈ R with density f , where U = ( U 1 , . . . , U s ) ∼ U (0 , 1) s and g ( u ) can be computed easily for any u ∈ (0 , 1) s .

  7. Draft 3 Setting Classical density estimation was developed in the context where independent observations X 1 , . . . , X n of X are given and one wishes to estimate the density f of X from that. Best rate: Var [ˆ f n ( x )] = O ( n − 4 / 5 ). Here we assume that X 1 , . . . , X n are generated by simulation from a stochastic model. We can choose n and we have some freedom on how the simulation is performed. The X i ’s are realizations of a random variable X = g ( U ) ∈ R with density f , where U = ( U 1 , . . . , U s ) ∼ U (0 , 1) s and g ( u ) can be computed easily for any u ∈ (0 , 1) s . 1. Can we obtain a better estimate of f with RQMC instead of MC? How much better? What about taking a stratified sample over [0 , 1) s ? 2. Is it possible to obtain unbiased density estimators that converge as O ( n − 1 ) or faster, using clever sampling strategies?

  8. Draft 4 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E [ X ], cdf, density, ... sink 5 8 Y 11 Y 6 Y 10 Y 13 Want to estimate the density of X , 2 4 7 Y 5 Y 9 f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. Y 2 Y 3 Y 8 Y 12 0 1 3 6 Y 1 Y 5 Y 7 source

  9. Draft 4 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E [ X ], cdf, density, ... The sample cdf sink F n ( x ) = 1 ˆ � n i =1 I [ X i ≤ x ] 5 8 n Y 11 is an unbiased estimator of the cdf Y 6 F ( x ) = P [ X ≤ x ] . Y 10 Y 13 Want to estimate the density of X , 2 4 7 Y 5 Y 9 f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. Y 2 Y 3 Y 8 Y 12 0 1 3 6 Y 1 Y 5 Y 7 source

  10. Draft 4 Small example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k ) with known cdf F k ( y ) := P [ Y k ≤ y ] . Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E [ X ], cdf, density, ... The sample cdf sink F n ( x ) = 1 ˆ � n i =1 I [ X i ≤ x ] 5 8 n Y 11 is an unbiased estimator of the cdf Y 6 F ( x ) = P [ X ≤ x ] . Y 10 Y 13 Want to estimate the density of X , 2 4 7 Y 5 Y 9 f ( x ) = F ′ ( x ) = d d x P [ X ≤ x ]. Y 2 Y 3 Y 8 Y 12 The sample derivative ˆ F ′ n ( x ) is useless 0 1 3 6 fo estimate f ( x ), because it is 0 almost Y 1 Y 5 Y 7 everywhere. source

  11. Draft 5 Numerical illustration from Elmaghraby (1977): Y k ∼ N ( µ k , σ 2 k ) for k = 1 , 2 , 4 , 11 , 12, and Y k ∼ Expon(1 /µ k ) otherwise. µ 1 , . . . , µ 13 : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5. Results of an experiment with n = 100 000. Note: X is not normal! Frequency X det = 48 . 2 10000 5000 0 X 0 25 50 75 100 125 150 175 200

  12. Draft 5 Numerical illustration from Elmaghraby (1977): Y k ∼ N ( µ k , σ 2 k ) for k = 1 , 2 , 4 , 11 , 12, and Y k ∼ Expon(1 /µ k ) otherwise. µ 1 , . . . , µ 13 : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5. Results of an experiment with n = 100 000. Note: X is not normal! Frequency mean = 64 . 2 X det = 48 . 2 10000 5000 ˆ ξ 0 . 99 = 131 . 8 0 X 0 25 50 75 100 125 150 175 200

  13. Draft 6 Density Estimation Suppose we estimate the density f over a finite interval [ a , b ]. Let ˆ f n ( x ) denote the density estimator at x , with sample size n . We use the following measures of error: � b E [(ˆ f n ( x ) − f ( x )) 2 ] d x = mean integrated squared error = MISE a = IV + ISB � b Var [ˆ IV = integrated variance = f n ( x )] d x a � b ( E [ˆ f n ( x )] − f ( x )) 2 d x ISB = integrated squared bias = a To minimize the MISE, we need to balance the IV and ISB.

  14. Draft 7 Density Estimation Simple histogram: Partition [ a , b ] in m intervals of size h = ( b − a ) / m and define f n ( x ) = n j ˆ nh for x ∈ I j = [ a + ( j − 1) h , a + jh ) , j = 1 , ..., m where n j is the number of observations X i that fall in interval j . Kernel Density Estimator (KDE) : Select kernel k (unimodal symmetric density centered at 0) and bandwidth h > 0 (horizontal stretching factor for the kernel). The KDE is n n f n ( x ) = 1 � x − X i � = 1 � x − g ( U i ) � ˆ � � k k . nh h nh h i =1 i =1

  15. Draft 8 Example of a KDE in s = 1 dimension KDE (blue) vs true density (red) with n = 2 19 : Here we take U 1 , . . . , U n in (0 , 1) and put X i = F − 1 ( U i ). midpoint rule (left) stratified sample of U = F ( X ) (right). (10 − 5 ) (10 − 5 ) 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -5.1 -4.6 -4.1 -3.6 -5.1 -4.6 -4.1 -3.6

  16. Draft 9 Asymptotic convergence with Monte Carlo for smooth f Here we assume independent random samples (Monte Carlo or given data). For histograms and KDEs, when n → ∞ and h → 0: C nh + Bh α . AMISE = AIV + AISB ∼ For any g : R → R , define � b � ∞ ( g ( x )) 2 d x , −∞ x r g ( x ) d x , R ( g ) = µ r ( g ) = for r = 0 , 1 , 2 , . . . a α C B R ( f ′ ) / 12 Histogram 1 2 ( µ 2 ( k )) 2 R ( f ′′ ) / 4 µ 0 ( k 2 ) KDE 4

  17. Draft 10 The asymptotically optimal h is � C � 1 / ( α +1) h ∗ = B α n and it gives AMISE = Kn − α/ (1+ α ) . h ∗ C B α AMISE R ( f ′ ) ( nR ( f ′ ) / 6) − 1 / 3 O ( n − 2 / 3 ) Histogram 1 2 12 ( µ 2 ( k )) 2 R ( f ′′ ) � 1 / 5 µ 0 ( k 2 ) � µ 0 ( k 2 ) O ( n − 4 / 5 ) KDE 4 ( µ 2 ( k )) 2 R ( f ′′ ) n 4 To estimate h ∗ , one can estimate R ( f ′ ) and R ( f ′′ ) via KDE (plugin). This is true under the simplifying assumption that h must be the same all over [ a , b ].

  18. Draft 11 Can we take the stochastic derivative of an estimator of F ? Can we estimate the density f ( x ) = F ′ ( x ) by the derivative of an estimator of F ( x ). A simple candidate cdf estimator is the empirical cdf n F n ( x ) = 1 ˆ � I [ X i ≤ x ] . n i =1 However d ˆ F n ( x ) / d x = 0 almost everywhere, so this cannot be a useful density estimator! We need a smoother estimator of F .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend