draft
play

Draft 1 Density estimation by Randomized Quasi-Monte Carlo Pierre - PowerPoint PPT Presentation

Draft 1 Density estimation by Randomized Quasi-Monte Carlo Pierre LEcuyer Joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer SAMSI , Raleigh-Durham, North-Carolina, May 2018 Draft 2 What this talk is about Quasi-Monte


  1. Draft 1 Density estimation by Randomized Quasi-Monte Carlo Pierre L’Ecuyer Joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer SAMSI , Raleigh-Durham, North-Carolina, May 2018

  2. Draft 2 What this talk is about Quasi-Monte Carlo (QMC) and randomized QMC (RQMC) methods have been studied extensively for estimating an integral, E [ X ]. Can they be useful for estimating the entire distribution of X ? E.g., estimating a density, a cdf, some quantiles, etc. When hours or days of computing time are required to perform simulation runs, reporting only a confidence interval on the mean is a waste of information! People routinely look at empirical distributions via histograms, for example. More refined methods: kernel density estimators (KDEs). Can RQMC improve such density estimators, and by how much?

  3. Draft 3 Setting Classical density estimation was developed in the context where independent observations X 1 , . . . , X n are given and one wishes to estimate the density f from which they come. Here we assume that X 1 , . . . , X n are generated by simulation from a stochastic model. We can choose n and we have some freedom on how the simulation is performed. The X i ’s are realizations of a random variable X = g ( U ) ∈ R with density f , where U = ( U 1 , . . . , U s ) ∼ U (0 , 1) s and g ( u ) can be computed easily for any u ∈ (0 , 1) s . Can we obtain a better estimate of f with RQMC instead of MC? How much better?

  4. Draft 4 Density Estimation Suppose we estimate the density f over a finite interval ( a , b ). Let ˆ f n ( x ) denote the density estimator at x , with sample size n . We use the following measures of error: � b E [ˆ f n ( x ) − f ( x )] 2 d x MISE = mean integrated squared error = a = IV + ISB � b Var [ˆ IV = integrated variance = f n ( x )] d x a � b ( E [ˆ f n ( x )] − f ( x )) 2 d x ISB = integrated squared bias = a

  5. Draft 5 Density Estimation Simple histogram: Partition [ a , b ] in m intervals of size h = ( b − a ) / m and define f n ( x ) = n j ˆ nh for x ∈ I j = [ a + ( j − 1) h , a + jh ) , j = 1 , ..., m where n j is the number of observations X i that fall in interval j . Kernel Density Estimator (KDE) : Select kernel k (unimodal symmetric density centered at 0) and bandwidth h > 0 (serves as horizontal stretching factor for the kernel). The KDE is defined by n f n ( x ) = 1 � x − X i � ˆ � k . nh h i =1

  6. Draft 6 Asymptotic convergence with Monte Carlo for smooth f For g : R → R , define � b ( g ( x )) 2 d x , R ( g ) = a � ∞ x r g ( x ) d x , µ r ( g ) = for r = 0 , 1 , 2 , . . . −∞ For histograms and KDEs, when n → ∞ and h → 0: C nh + Bh α . AMISE = AIV + AISB ∼ α C B R ( f ′ ) / 12 Histogram 1 2 ( µ 2 ( k )) 2 R ( f ′′ ) / 4 µ 0 ( k 2 ) KDE 4

  7. Draft 7 The asymptotically optimal h is � C � 1 / ( α +1) h ∗ = B α n and it gives AMISE = Kn − α/ (1+ α ) . h ∗ α AMISE C B R ( f ′ ) ( nR ( f ′ ) / 6) − 1 / 3 O ( n − 2 / 3 ) Histogram 1 2 12 ( µ 2 ( k )) 2 R ( f ′′ ) � 1 / 5 µ 0 ( k 2 ) � µ 0 ( k 2 ) O ( n − 4 / 5 ) KDE 4 ( µ 2 ( k )) 2 R ( f ′′ ) n 4 To estimate h ∗ , one can estimate R ( f ′ ) and R ( f ′′ ) via KDE (plugin).

  8. Draft 8 Asymptotic convergence with RQMC for smooth f Idea: Replace U 1 , . . . , U n by RQMC points. RQMC does not change the bias. For a KDE with smooth k , one could hope (perhaps) to get AIV = C ′ n − β h − 1 for β > 1, instead of Cn − 1 h − 1 . If the IV is reduced, the optimal h can be taken smaller to reduce the ISB as well (re-balance) and then reduce the MISE.

  9. Draft 8 Asymptotic convergence with RQMC for smooth f Idea: Replace U 1 , . . . , U n by RQMC points. RQMC does not change the bias. For a KDE with smooth k , one could hope (perhaps) to get AIV = C ′ n − β h − 1 for β > 1, instead of Cn − 1 h − 1 . If the IV is reduced, the optimal h can be taken smaller to reduce the ISB as well (re-balance) and then reduce the MISE. Unfortunately, things are not so simple. Roughly, decreasing h increases the variation of the function in the estimator. So we may have something like AIV = C ′ n − β h − δ IV ≈ C ′ n − β h − δ in some bounded region. or

  10. Draft 9 Elementary QMC Bounds (Recall) Integration error for g : [0 , 1) s → R with point set P n = { u 0 , . . . , u n − 1 } ⊂ [0 , 1) s : n − 1 E n = 1 � � g ( u i ) − [0 , 1) s g ( u ) d u . n i =0 Koksma-Hlawka inequality: | E n | ≤ V HK ( g ) D ∗ ( P n ) where � � ∂ | v | g � � � � V HK ( g ) = � d u , (Hardy-Krause (HK) variation) � � ∂ v � � [0 , 1) s � ∅� = v ⊆S � � vol [ 0 , u ) − | P n ∩ [ 0 , u ) | � D ∗ ( P n ) � � = sup (star-discrepancy) . � � n � u ∈ [0 , 1) s There are explicit point sets for which D ∗ ( P n ) = O ((log n ) s − 1 / n ) = O ( n − 1+ ǫ ). Explicit RQMC constructions for which E [ E n ] = 0 and Var [ E n ] = O ( n − 2+ ǫ ).

  11. Draft 10 Also | E n | ≤ V 2 ( g ) D 2 ( P n ) where 2 � � ∂ | v | g � � � V 2 � 2 ( g ) = d u , (square L 2 variation) , � � ∂ v � � [0 , 1) s � � ∅� = v ⊆S 2 � � vol [ 0 , u ) − | P n ∩ [ 0 , u ) | � � D 2 � � 2 ( P n ) = d u (square L 2 -star-discrepancy) . � � n u ∈ [0 , 1) s � Explicit constructions for which D 2 ( P n ) = O ( n − 1+ ǫ ). Moreover, if P n is a digital net randomized by a nested uniform scramble (NUS) and V 2 ( g ) < ∞ , then E [ E n ] = 0 and Var [ E n ] = O ( V 2 2 ( g ) n − 3+ ǫ ) for all ǫ > 0.

  12. Draft 11 Bounding the AIV under RQMC for a KDE KDE density estimator at a single point x : n n � x − g ( U i ) � f n ( x ) = 1 1 = 1 ˆ � � hk g ( U i ) . ˜ n h n i =1 i =1 � With RQMC points U i , this is an RQMC estimator of E [˜ g ( U )] = [0 , 1) s ˜ g ( u ) d u = E [ f n ( x )]. RQMC does not change the bias, but may reduce Var [ˆ f n ( x )], and then the IV. To get RQMC variance bounds, we need bounds on the variation of ˜ g .

  13. Draft 11 Bounding the AIV under RQMC for a KDE KDE density estimator at a single point x : n n � x − g ( U i ) � f n ( x ) = 1 1 = 1 ˆ � � hk g ( U i ) . ˜ n h n i =1 i =1 � With RQMC points U i , this is an RQMC estimator of E [˜ g ( U )] = [0 , 1) s ˜ g ( u ) d u = E [ f n ( x )]. RQMC does not change the bias, but may reduce Var [ˆ f n ( x )], and then the IV. To get RQMC variance bounds, we need bounds on the variation of ˜ g . The partial derivatives are: ∂ | v | ∂ | v | g ( u ) = 1 � x − g ( u ) � ˜ k . ∂ u v ∂ u v h h We assume they exist and are uniformly bounded. E.g., Gaussian kernel k . By expanding via the chain rule, we obtain terms in h − j for j = 2 , . . . , | v | + 1. One of the term for v = S grows as h − s − 1 k ( s ) (( g ( u ) − x ) / h ) � s j =1 g j ( u ) = O ( h − s − 1 ) when h → 0, so this AIV bound grows as h − 2 s − 2 . Not so good!

  14. Draft 12 Improvement by a Change of Variable, in One Dimension Suppose g : [0 , 1] → R is monotone. Change of variable w = ( x − g ( u )) / h . In one dimension ( s = 1), we have d w / d u = − g ′ ( u ) / h , so � 1 � ∞ � � − g ′ ( u ) g ) = 1 � x − g ( u ) � d u = 1 k ′ k ′ ( w ) d w = O ( h − 1 ) . V HK (˜ h h h h 0 −∞ Then, if k and g are continuously differentiable, with RQMC points having D ∗ ( P n ) = O ( n − 1+ ǫ ), we obtain AIV = O ( n − 2+ ǫ h − 2 ). With h = Θ( n − 1 / 3 ), this gives AMISE = O ( n − 4 / 3 ). A similar argument gives � 1 � ∞ �� 2 � � − g ′ ( u ) g ) = 1 � � x − g ( u ) d u = 1 V 2 k ′ ( k ′ ( w )) 2 d w = O ( h − 3 ) 2 (˜ h 3 L g h 2 h h 0 −∞ if | g ′ | ≤ L g , and then with NUS: AIV = O ( n − 3+ ǫ h − 3 ). With h = Θ( n − 3 / 7 ), this gives AMISE = O ( n − 12 / 7 ).

  15. Draft 13 Higher Dimensions Let s = 2 and v = { 1 , 2 } . With the change of variable ( u 1 , u 2 ) → ( w , u 2 ), the Jacobian is | d w / d u 1 | = | g 1 ( u 1 , u 2 ) / h | , where g j = ∂ g /∂ u j . If | g 2 | and | g 12 / g 1 | are bounded by a constant L , � ∂ | v | ˜ � � ∂ 2 �� � g 1 � � x − g ( u ) � � � � � d u = k � d u 1 d u 2 � � � � ∂ u v ∂ u 1 ∂ u 2 h h [0 , 1) 2 � [0 , 1) 2 � � g 1 ( u ) � g 12 ( u ) � � 1 � � x − g ( u ) g 2 ( u ) � x − g ( u ) � � k ′′ + k ′ � = � d u 1 d u 2 � � h h h h h h [0 , 1) 2 � 1 � ∞ � � 1 � k ′′ ( w ) g 2 ( u ) + k ′ ( w ) g 12 ( u ) � � = � d w d u 2 � � g 1 ( u ) h h 0 −∞ L h [ µ 0 ( k ′′ ) / h + µ 0 ( k ′ )] = O ( h − 2 ) . = This provides a bound of O ( h − 2 ) for V HK (˜ g ), then AIV = O ( n − 2+ ǫ h − 4 ). g ) = O ( h − s ), AIV = O ( n − 2+ ǫ h − 2 s ), MISE = O ( n − 4 / (2+ s ) ) . Generalizing to s ≥ 2 gives V HK (˜ Beats MC for s < 3, same rate for s = 3. Not very satisfactory.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend