draft
play

Draft Introduction to (randomized) quasi-Monte Carlo Pierre - PowerPoint PPT Presentation

1 Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference, Stanford University, August 2016 2 Draft Program Monte Carlo, Quasi-Monte Carlo, Randomized quasi-Monte Carlo QMC point sets and


  1. 15 Draft Example of contract: Discretely-monitored Asian call option :   d  0 , 1 �  . g ( S ( t 1 ) , . . . , S ( t d )) = max S ( t j ) − K d j =1 Option price written as an integral over the unit hypercube: Let Z j = Φ − 1 ( U j ) where the U j are i.i.d. U (0 , 1). Here we have s = d and � s � 0 , 1 e − rT max � v ( s 0 , T ) = s 0 · s [0 , 1) s i =1    i �  ( r − σ 2 / 2) t i + σ t j − t j − 1 Φ − 1 ( u j )  − K  d u 1 . . . d u s � exp j =1 � = [0 , 1) s f ( u 1 , . . . , u s ) d u 1 . . . d u s .

  2. 16 Draft Numerical illustration: Bermudean Asian option with d = 12, T = 1 (one year), t j = j / 12 for j = 0 , . . . , 12, K = 100, s 0 = 100, r = 0 . 05, σ = 0 . 5. We performed n = 10 6 independent simulation runs. In 53.47% of cases, the payoff is 0. Mean: 13.1 . Max = 390.8 Histogram of the 46.53% positive values: Frequency ( × 10 3 ) 30 20 10 Payoff 0 0 50 100 150 13.1

  3. 17 Draft Reducing the variance by changing f If we replace the arithmetic average by a geometric average in the payoff, we obtain   d C = e − rT max ( S ( t j )) 1 / d − K �  ,  0 , j =1 whose expectation ν = E [ C ] has a closed-form formula. When estimating the mean E [ X ] = v ( s 0 , T ), we can then use C as a control variate (CV): Replace the estimator X by the “corrected” version X c = X − β ( C − ν ) for some well-chosen constant β . Optimal β is β ∗ = Cov [ C , X ] / Var [ C ]. Using a CV makes the integrand f smoother. Can provide a huge variance reduction, e.g., by a factor of over a million in some examples.

  4. 18 Draft Quasi-Monte Carlo (QMC) Replace the independent random points U i by a set of deterministic points P n = { u 0 , . . . , u n − 1 } that cover [0 , 1) s more evenly. Estimate n − 1 � µ n = 1 � µ = [0 , 1) s f ( u ) d u by ¯ f ( u i ) . n i =0 Integration error E n = ¯ µ − µ . P n is called a highly-uniform point set or low-discrepancy point set if some measure of discrepancy between the empirical distribution of P n and the uniform distribution converges to 0 faster than O ( n − 1 / 2 ) (the typical rate for independent random points).

  5. 18 Draft Quasi-Monte Carlo (QMC) Replace the independent random points U i by a set of deterministic points P n = { u 0 , . . . , u n − 1 } that cover [0 , 1) s more evenly. Estimate n − 1 � µ n = 1 � µ = [0 , 1) s f ( u ) d u by ¯ f ( u i ) . n i =0 Integration error E n = ¯ µ − µ . P n is called a highly-uniform point set or low-discrepancy point set if some measure of discrepancy between the empirical distribution of P n and the uniform distribution converges to 0 faster than O ( n − 1 / 2 ) (the typical rate for independent random points). Main construction methods : lattice rules and digital nets (Korobov, Hammersley, Halton, Sobol’, Faure, Niederreiter, etc.)

  6. 19 Draft Simple case: one dimension ( s = 1 ) Obvious solutions: P n = Z n / n = { 0 , 1 / n , . . . , ( n − 1) / n } (left Riemann sum): 0 0.5 1 n − 1 µ n = 1 f ( i / n ) , and E n = O ( n − 1 ) if f ′ is bounded , � which gives ¯ n i =0

  7. 19 Draft Simple case: one dimension ( s = 1 ) Obvious solutions: P n = Z n / n = { 0 , 1 / n , . . . , ( n − 1) / n } (left Riemann sum): 0 0.5 1 n − 1 µ n = 1 f ( i / n ) , and E n = O ( n − 1 ) if f ′ is bounded , � which gives ¯ n i =0 or P ′ n = { 1 / (2 n ) , 3 / (2 n ) , . . . , (2 n − 1) / (2 n ) } (midpoint rule): 0 0.5 1 for which E n = O ( n − 2 ) if f ′′ is bounded.

  8. 20 Draft If we allow different weights on the f ( u i ), we have the trapezoidal rule: 0 0.5 1 n − 1 � � 1 f (0) + f (1) � + f ( i / n ) , n 2 i =1 for which | E n | = O ( n − 2 ) if f ′′ is bounded,

  9. 20 Draft If we allow different weights on the f ( u i ), we have the trapezoidal rule: 0 0.5 1 n − 1 � � 1 f (0) + f (1) � + f ( i / n ) , n 2 i =1 for which | E n | = O ( n − 2 ) if f ′′ is bounded, or the Simpson rule, f (0) + 4 f (1 / n ) + 2 f (2 / n ) + · · · + 2 f (( n − 2) / n ) + 4 f (( n − 1) / n ) + f (1) , 3 n which gives | E n | = O ( n − 4 ) if f (4) is bounded, etc.

  10. 20 Draft If we allow different weights on the f ( u i ), we have the trapezoidal rule: 0 0.5 1 n − 1 � � 1 f (0) + f (1) � + f ( i / n ) , n 2 i =1 for which | E n | = O ( n − 2 ) if f ′′ is bounded, or the Simpson rule, f (0) + 4 f (1 / n ) + 2 f (2 / n ) + · · · + 2 f (( n − 2) / n ) + 4 f (( n − 1) / n ) + f (1) , 3 n which gives | E n | = O ( n − 4 ) if f (4) is bounded, etc. Here, for QMC and RQMC, we restrict ourselves to equal weight rules. For the RQMC points that we will examine, one can prove that equal weights are optimal.

  11. 21 Draft Simplistic solution for s > 1 : rectangular grid P n = { ( i 1 / d , . . . , i s / d ) such that 0 ≤ i j < d ∀ j } where n = d s . 1 u i , 1 u i , 2 0 1

  12. 21 Draft Simplistic solution for s > 1 : rectangular grid P n = { ( i 1 / d , . . . , i s / d ) such that 0 ≤ i j < d ∀ j } where n = d s . 1 u i , 1 u i , 2 0 1 Midpoint rule in s dimensions. Quickly becomes impractical when s increases. Moreover, each one-dimensional projection has only d distinct points, each two-dimensional projections has only d 2 distinct points, etc.

  13. 22 Draft Lattice rules (Korobov, Sloan, etc.) Integration lattice:   s   � L s =  v = z j v j such that each z j ∈ Z  , j =1 where v 1 , . . . , v s ∈ R s are linearly independent over R and where L s contains Z s . Lattice rule: Take P n = { u 0 , . . . , u n − 1 } = L s ∩ [0 , 1) s .

  14. 22 Draft Lattice rules (Korobov, Sloan, etc.) Integration lattice:   s   � L s =  v = z j v j such that each z j ∈ Z  , j =1 where v 1 , . . . , v s ∈ R s are linearly independent over R and where L s contains Z s . Lattice rule: Take P n = { u 0 , . . . , u n − 1 } = L s ∩ [0 , 1) s . Lattice rule of rank 1: u i = i v 1 mod 1 for i = 0 , . . . , n − 1, where n v 1 = a = ( a 1 , . . . , a s ) ∈ { 0 , 1 , . . . , n − 1 } s . Korobov rule: a = (1 , a , a 2 mod n , . . . ).

  15. 22 Draft Lattice rules (Korobov, Sloan, etc.) Integration lattice:   s   � L s =  v = z j v j such that each z j ∈ Z  , j =1 where v 1 , . . . , v s ∈ R s are linearly independent over R and where L s contains Z s . Lattice rule: Take P n = { u 0 , . . . , u n − 1 } = L s ∩ [0 , 1) s . Lattice rule of rank 1: u i = i v 1 mod 1 for i = 0 , . . . , n − 1, where n v 1 = a = ( a 1 , . . . , a s ) ∈ { 0 , 1 , . . . , n − 1 } s . Korobov rule: a = (1 , a , a 2 mod n , . . . ). For any u ⊂ { 1 , . . . , s } , the projection L s ( u ) of L s is also a lattice.

  16. 23 Draft Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / n P n = { u i = i v 1 mod 1) : i = 0 , . . . , n − 1 } = { (0 , 0) , (1 / 101 , 12 / 101) , (2 / 101 , 43 / 101) , . . . } . 1 u i , 1 v 1 u i , 2 0 1 Here, each one-dimensional projection is { 0 , 1 / n , . . . , ( n − 1) / n } .

  17. 23 Draft Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / n P n = { u i = i v 1 mod 1) : i = 0 , . . . , n − 1 } = { (0 , 0) , (1 / 101 , 12 / 101) , (2 / 101 , 43 / 101) , . . . } . 1 u i , 1 v 1 u i , 2 0 1 Here, each one-dimensional projection is { 0 , 1 / n , . . . , ( n − 1) / n } .

  18. 23 Draft Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / n P n = { u i = i v 1 mod 1) : i = 0 , . . . , n − 1 } = { (0 , 0) , (1 / 101 , 12 / 101) , (2 / 101 , 43 / 101) , . . . } . 1 u i , 1 v 1 u i , 2 0 1 Here, each one-dimensional projection is { 0 , 1 / n , . . . , ( n − 1) / n } .

  19. 23 Draft Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / n P n = { u i = i v 1 mod 1) : i = 0 , . . . , n − 1 } = { (0 , 0) , (1 / 101 , 12 / 101) , (2 / 101 , 43 / 101) , . . . } . 1 u i , 1 v 1 u i , 2 0 1 Here, each one-dimensional projection is { 0 , 1 / n , . . . , ( n − 1) / n } .

  20. 23 Draft Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / n P n = { u i = i v 1 mod 1) : i = 0 , . . . , n − 1 } = { (0 , 0) , (1 / 101 , 12 / 101) , (2 / 101 , 43 / 101) , . . . } . 1 u i , 1 v 1 u i , 2 0 1 Here, each one-dimensional projection is { 0 , 1 / n , . . . , ( n − 1) / n } .

  21. 24 Draft Another example: s = 2 , n = 1021 , v 1 = (1 , 90) / n P n = { u i = i v 1 mod 1 : i = 0 , . . . , n − 1 } = { ( i / 1021 , (90 i / 1021) mod 1) : i = 0 , . . . , 1020 } . 1 u i , 1 v 1 u i , 2 0 1

  22. 25 Draft A bad lattice: s = 2 , n = 101 , v 1 = (1 , 51) / n 1 u i , 1 v 1 u i , 2 0 1 Good uniformity in one dimension, but not in two!

  23. 26 Draft Digital net in base b (Niederreiter) Gives n = b k points. For i = 0 , . . . , b k − 1 and j = 1 , . . . , s : a i , 0 + a i , 1 b + · · · + a i , k − 1 b k − 1 = a i , k − 1 · · · a i , 1 a i , 0 , i =     u i , j , 1 a i , 0 . . . . = C j  mod b ,     . .    u i , j , w a i , k − 1 w � u i , j ,ℓ b − ℓ , = u i = ( u i , 1 , . . . , u i , s ) , u i , j ℓ =1 where the generating matrices C j are w × k with elements in Z b . In practice, w and k are finite, but there is no limit. Digital sequence: infinite sequence. Can stop at n = b k for any k .

  24. 26 Draft Digital net in base b (Niederreiter) Gives n = b k points. For i = 0 , . . . , b k − 1 and j = 1 , . . . , s : a i , 0 + a i , 1 b + · · · + a i , k − 1 b k − 1 = a i , k − 1 · · · a i , 1 a i , 0 , i =     u i , j , 1 a i , 0 . . . . = C j  mod b ,     . .    u i , j , w a i , k − 1 w � u i , j ,ℓ b − ℓ , = u i = ( u i , 1 , . . . , u i , s ) , u i , j ℓ =1 where the generating matrices C j are w × k with elements in Z b . In practice, w and k are finite, but there is no limit. Digital sequence: infinite sequence. Can stop at n = b k for any k . Can also multiply in some ring R , with bijections between Z b and R . Each one-dim projection truncated to first k digits is Z n / n = { 0 , 1 / n , . . . , ( n − 1) / n } . Each C j defines a permutation of Z n / n .

  25. 27 Draft Small example: Hammersley in two dimensions Let n = 2 8 = 256 and s = 2. Take the points (in binary): i u 1 , i u 2 , i 0 .00000000 .0 1 .00000001 .1 2 .00000010 .01 3 .00000011 .11 4 .00000100 .001 5 .00000101 .101 6 .00000110 .011 . . . . . . . . . 254 .11111110 .01111111 255 .11111111 .11111111 Right side: van der Corput sequence in base 2.

  26. 28 Draft Hammersley point set , n = 2 8 = 256, s = 2. 1 u i , 1 u i , 2 0 1

  27. 28 Draft Hammersley point set , n = 2 8 = 256, s = 2. 1 u i , 1 u i , 2 0 1

  28. 28 Draft Hammersley point set , n = 2 8 = 256, s = 2. 1 u i , 1 u i , 2 0 1

  29. 28 Draft Hammersley point set , n = 2 8 = 256, s = 2. 1 u i , 1 u i , 2 0 1

  30. 28 Draft Hammersley point set , n = 2 8 = 256, s = 2. 1 u i , 1 u i , 2 0 1

  31. 29 Draft In general, can take n = 2 k points. If we partition [0 , 1) 2 in rectangles of sizes 2 − k 1 by 2 − k 2 where k 1 + k 2 ≤ k , each rectangle will contain exactly the same number of points. We say that the points are equidistributed for this partition.

  32. 29 Draft In general, can take n = 2 k points. If we partition [0 , 1) 2 in rectangles of sizes 2 − k 1 by 2 − k 2 where k 1 + k 2 ≤ k , each rectangle will contain exactly the same number of points. We say that the points are equidistributed for this partition. For a digital net in base b in s dimensions, we choose s permutations of { 0 , 1 , . . . , 2 b − 1 } , then divide each coordinate by b k . Can also have s = ∞ and/or n = ∞ (infinite sequence of points).

  33. 30 Draft Suppose we divide axis j in b q j equal parts, for each j . This determines a partition of [0 , 1) s into 2 q 1 + ··· + q s rectangles of equal sizes. If each rectangle contains exactly the same number of points, we say that the point set P n is ( q 1 , . . . , q s )-equidistributed in base b . This occurs iff the matrix formed by the first q 1 rows of C 1 , the first q 2 rows of C 2 , . . . , the first q s rows of C s , is of full rank (mod b ). To verify equidistribution, we can construct these matrices and compute their rank. P n is a ( t , k , s )-net iff it is ( q 1 , . . . , q s )-equidistributed whenever q 1 + · · · + q s = k − t . This is possible for t = 0 only if b ≥ s − 1. t -value of a net: smallest t for which it is a ( t , k , s )-net.

  34. 30 Draft Suppose we divide axis j in b q j equal parts, for each j . This determines a partition of [0 , 1) s into 2 q 1 + ··· + q s rectangles of equal sizes. If each rectangle contains exactly the same number of points, we say that the point set P n is ( q 1 , . . . , q s )-equidistributed in base b . This occurs iff the matrix formed by the first q 1 rows of C 1 , the first q 2 rows of C 2 , . . . , the first q s rows of C s , is of full rank (mod b ). To verify equidistribution, we can construct these matrices and compute their rank. P n is a ( t , k , s )-net iff it is ( q 1 , . . . , q s )-equidistributed whenever q 1 + · · · + q s = k − t . This is possible for t = 0 only if b ≥ s − 1. t -value of a net: smallest t for which it is a ( t , k , s )-net. An infinite sequence { u 0 , u 1 , . . . , } in [0 , 1) s is a ( t , s )-sequence in base b if for all k > 0 and ν ≥ 0, Q ( k , ν ) = { u i : i = ν b k , . . . , ( ν + 1) b k − 1 } , is a ( t , k , s )-net in base b .

  35. 30 Draft Suppose we divide axis j in b q j equal parts, for each j . This determines a partition of [0 , 1) s into 2 q 1 + ··· + q s rectangles of equal sizes. If each rectangle contains exactly the same number of points, we say that the point set P n is ( q 1 , . . . , q s )-equidistributed in base b . This occurs iff the matrix formed by the first q 1 rows of C 1 , the first q 2 rows of C 2 , . . . , the first q s rows of C s , is of full rank (mod b ). To verify equidistribution, we can construct these matrices and compute their rank. P n is a ( t , k , s )-net iff it is ( q 1 , . . . , q s )-equidistributed whenever q 1 + · · · + q s = k − t . This is possible for t = 0 only if b ≥ s − 1. t -value of a net: smallest t for which it is a ( t , k , s )-net. An infinite sequence { u 0 , u 1 , . . . , } in [0 , 1) s is a ( t , s )-sequence in base b if for all k > 0 and ν ≥ 0, Q ( k , ν ) = { u i : i = ν b k , . . . , ( ν + 1) b k − 1 } , is a ( t , k , s )-net in base b . This is possible for t = 0 only if b ≥ s .

  36. 31 Draft Sobol’ nets and sequences Sobol’ (1967) proposed a digital net in base b = 2 where   1 . . . . . . v j , 2 , 1 v j , c , 1 0 1 . . . v j , c , 2 . . .     . . C j = ... . . .   . 0 .    .  . . 1

  37. 31 Draft Sobol’ nets and sequences Sobol’ (1967) proposed a digital net in base b = 2 where   1 . . . . . . v j , 2 , 1 v j , c , 1 0 1 . . . v j , c , 2 . . .     . . C j = ... . . .   . 0 .    .  . . 1 Column c of C j is represented by an odd integer c v j , c , l 2 c − l = v j , c , 1 2 c − 1 + · · · + v j , c , c − 1 2 + 1 < 2 c . � m j , c = l =1 The integers m j , c are selected as follows.

  38. For each j , we choose a primitive polynomial over F 2 , 32 Draft f j ( z ) = z d j + a j , 1 z d j − 1 + · · · + a j , d j , and we choose d j integers m j , 0 , . . . , m j , d j − 1 (the first d j columns).

  39. For each j , we choose a primitive polynomial over F 2 , 32 Draft f j ( z ) = z d j + a j , 1 z d j − 1 + · · · + a j , d j , and we choose d j integers m j , 0 , . . . , m j , d j − 1 (the first d j columns). Then, m j , d j , m j , d j +1 , . . . are determined by the recurrence m j , c = 2 a j , 1 m j , c − 1 ⊕ · · · ⊕ 2 d j − 1 a j , d j − 1 m j , c − d j +1 ⊕ 2 d j m j , c − d j ⊕ m j , c − d j Proposition. If the polynomials f j ( z ) are all distinct, we obtain a ( t , s )-sequence with t ≤ d 0 + · · · + d s − 1 + 1 − s .

  40. For each j , we choose a primitive polynomial over F 2 , 32 Draft f j ( z ) = z d j + a j , 1 z d j − 1 + · · · + a j , d j , and we choose d j integers m j , 0 , . . . , m j , d j − 1 (the first d j columns). Then, m j , d j , m j , d j +1 , . . . are determined by the recurrence m j , c = 2 a j , 1 m j , c − 1 ⊕ · · · ⊕ 2 d j − 1 a j , d j − 1 m j , c − d j +1 ⊕ 2 d j m j , c − d j ⊕ m j , c − d j Proposition. If the polynomials f j ( z ) are all distinct, we obtain a ( t , s )-sequence with t ≤ d 0 + · · · + d s − 1 + 1 − s . Sobol’ suggests to list all primitive polynomials over F 2 by increasing order of degree, starting with f 0 ( z ) ≡ 1 (which gives C 0 = I ), and to take f j ( z ) as the ( j + 1)-th polynomial in the list. There are many ways of selecting the first m j , c ’s, which are called the direction numbers. They can be selected to minimize some discrepancy (or figure of merit). The values proposed by Sobol’ give an ( s , ℓ )-equidistribution for ℓ = 1 and ℓ = 2 (only the first two bits). For n = 2 k fixed, we can gain one dimension as for the Faure sequence. Joe and Kuo (2008) tabulated direction numbers giving the best t -value for the two-dimensional projections, for given s and k .

  41. 33 Draft Other constructions Faure nets and sequences Niederreiter-Xing point sets and sequences Polynomial lattice rules (special case of digital nets) Halton sequence Etc.

  42. 34 Draft Worst-case error bounds Koksma-Hlawka-type inequalities (Koksma, Hlawka, Hickernell, etc.): | ˆ µ n , rqmc − µ | ≤ V ( f ) · D ( P n ) for all f in some Hilbert space or Banach space H , where V ( f ) = � f − µ � H is the variation of f , and D ( P n ) is the discrepancy of P n .

  43. 34 Draft Worst-case error bounds Koksma-Hlawka-type inequalities (Koksma, Hlawka, Hickernell, etc.): | ˆ µ n , rqmc − µ | ≤ V ( f ) · D ( P n ) for all f in some Hilbert space or Banach space H , where V ( f ) = � f − µ � H is the variation of f , and D ( P n ) is the discrepancy of P n . Lattice rules: For certain Hilbert spaces of smooth periodic functions f with square-integrable partial derivatives of order up to α : D ( P n ) = O ( n − α + ǫ ) for arbitrary small ǫ . Digital nets: “Classical” Koksma-Hlawka inequality for QMC: f must have finite variation in the sense of Hardy and Krause (implies no discontinuity not aligned with the axes). Popular constructions achieve D ( P n ) = O ( n − 1 (ln n ) s ) = O ( n − 1+ ǫ ) for arbitrary small ǫ . More recent constructions offer better rates for smooth functions.

  44. 34 Draft Worst-case error bounds Koksma-Hlawka-type inequalities (Koksma, Hlawka, Hickernell, etc.): | ˆ µ n , rqmc − µ | ≤ V ( f ) · D ( P n ) for all f in some Hilbert space or Banach space H , where V ( f ) = � f − µ � H is the variation of f , and D ( P n ) is the discrepancy of P n . Lattice rules: For certain Hilbert spaces of smooth periodic functions f with square-integrable partial derivatives of order up to α : D ( P n ) = O ( n − α + ǫ ) for arbitrary small ǫ . Digital nets: “Classical” Koksma-Hlawka inequality for QMC: f must have finite variation in the sense of Hardy and Krause (implies no discontinuity not aligned with the axes). Popular constructions achieve D ( P n ) = O ( n − 1 (ln n ) s ) = O ( n − 1+ ǫ ) for arbitrary small ǫ . More recent constructions offer better rates for smooth functions. Bounds are conservative and too hard to compute in practice.

  45. 35 Draft Randomized quasi-Monte Carlo (RQMC) n − 1 µ n , rqmc = 1 � ˆ f ( U i ) , n i =0 with P n = { U 0 , . . . , U n − 1 } ⊂ (0 , 1) s an RQMC point set: (i) each point U i has the uniform distribution over (0 , 1) s ; (ii) P n as a whole is a low-discrepancy point set. E [ˆ µ n , rqmc ] = µ (unbiased) . Var [ f ( U i )] 2 � Var [ˆ µ n , rqmc ] = + Cov [ f ( U i ) , f ( U j )] . n n 2 i < j We want to make the last sum as negative as possible.

  46. 35 Draft Randomized quasi-Monte Carlo (RQMC) n − 1 µ n , rqmc = 1 � ˆ f ( U i ) , n i =0 with P n = { U 0 , . . . , U n − 1 } ⊂ (0 , 1) s an RQMC point set: (i) each point U i has the uniform distribution over (0 , 1) s ; (ii) P n as a whole is a low-discrepancy point set. E [ˆ µ n , rqmc ] = µ (unbiased) . Var [ f ( U i )] 2 � Var [ˆ µ n , rqmc ] = + Cov [ f ( U i ) , f ( U j )] . n n 2 i < j We want to make the last sum as negative as possible. Weaker attempts to do the same: antithetic variates ( n = 2), Latin hypercube sampling (LHS), stratification, ...

  47. 36 Draft Variance estimation: Can compute m independent realizations X 1 , . . . , X m of ˆ µ n , rqmc , then µ n , rqmc ] by their sample mean ¯ estimate µ and Var [ˆ X m and sample variance S 2 m . Could be used to compute a confidence interval. Temptation: assume that ¯ X m has the normal distribution. Beware: usually wrong unless m → ∞ .

  48. 37 Draft Stratification of the unit hypercube Partition axis j in k j ≥ 1 equal parts, for j = 1 , . . . , s . Draw n = k 1 · · · k s random points, one per box, independently. Example , s = 2, k 1 = 12, k 2 = 8, n = 12 × 8 = 96. 1 u i , 1 u i , 2 0 1

  49. 38 Draft Stratification of the unit hypercube Example , s = 2, k 1 = 24, k 2 = 16, n = 384. 1 u i , 1 u i , 2 0 1

  50. Stratified estimator: 39 Draft n − 1 X s , n = 1 � f ( U j ) . n j =0 The crude MC variance with n points can be decomposed as n − 1 X n ] = Var [ X s , n ] + 1 Var [ ¯ � ( µ j − µ ) 2 n j =0 where µ j is the mean over box j . The more the µ j differ, the more the variance is reduced.

  51. Stratified estimator: 39 Draft n − 1 X s , n = 1 � f ( U j ) . n j =0 The crude MC variance with n points can be decomposed as n − 1 X n ] = Var [ X s , n ] + 1 Var [ ¯ � ( µ j − µ ) 2 n j =0 where µ j is the mean over box j . The more the µ j differ, the more the variance is reduced. If f ′ is continuous and bounded, and all k j are equal, then Var [ X s , n ] = O ( n − 1 − 2 / s ) .

  52. Stratified estimator: 39 Draft n − 1 X s , n = 1 � f ( U j ) . n j =0 The crude MC variance with n points can be decomposed as n − 1 X n ] = Var [ X s , n ] + 1 Var [ ¯ � ( µ j − µ ) 2 n j =0 where µ j is the mean over box j . The more the µ j differ, the more the variance is reduced. If f ′ is continuous and bounded, and all k j are equal, then Var [ X s , n ] = O ( n − 1 − 2 / s ) . For large s , not practical. For small s , not really better than midpoint rule with a grid when f is smooth. But can still be applied to a few important random variables. Also, gives an unbiased estimator, and variance can be estimated by replicating m ≥ 2 times.

  53. 40 Draft Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 u i , 2 0 1

  54. 40 Draft Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 U u i , 2 0 1

  55. 40 Draft Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 u i , 2 0 1

  56. 40 Draft Randomly-Shifted Lattice Example: lattice with s = 2 , n = 101 , v 1 = (1 , 12) / 101 1 u i , 1 u i , 2 0 1

  57. 41 Draft Random digital shift for digital net Equidistribution in digital boxes is lost with random shift modulo 1, but can be kept with a random digital shift in base b . In base 2: Generate U ∼ U (0 , 1) s and XOR it bitwise with each u i . Example for s = 2: u i = (0.01100100..., 0.10011000...) 2 U = (0.01001010..., 0.11101001...) 2 u i ⊕ U = (0.00101110..., 0.01110001...) 2 . Each point has U (0 , 1) distribution. Preservation of the equidistribution ( k 1 = 3, k 2 = 5): u i = (0.***, 0.*****) U = (0.010, 0.11101) 2 u i ⊕ U = (0.***, 0.*****)

  58. Example with 42 Draft U = (0 . 1270111220 , 0 . 3185275653) 10 = (0 . 0010 0000100000111100 , 0 . 0101 0001100010110000) 2 . Changes the bits 3, 9, 15, 16, 17, 18 of u i , 1 and the bits 2, 4, 8, 9, 13, 15, 16 of u i , 2 . 1 1 u n +1 u n +1 0 1 0 1 u n u n Red and green squares are permuted ( k 1 = k 2 = 4, first 4 bits of U ).

  59. 43 Draft Random digital shift in base b We have u i , j = � w ℓ =1 u i , j ,ℓ b − ℓ . Let U = ( U 1 , . . . , U s ) ∼ U [0 , 1) s where U j = � w ℓ =1 U j ,ℓ b − ℓ . We replace each u i , j by ˜ U i , j = � w ℓ =1 [( u i , j ,ℓ + U j ,ℓ ) mod b ] b − ℓ . Proposition. ˜ P n is ( q 1 , . . . , q s )-equidistributed in base b iff P n is. For w = ∞ , each point ˜ U i has the uniform distribution over (0 , 1) s .

  60. 44 Draft Other permutations that preserve equidistribution and may help reduce the variance further: Linear matrix scrambling (Matouˇ sek, Hickernell et Hong, Tezuka, Owen): We left-multiply each matrix C j by a random w × w matrix M j , non-singular and lower triangular, mod b . Several variants. We then apply a random digital shift in base b to obtain uniform distribution for each point (unbiasedness).

  61. 44 Draft Other permutations that preserve equidistribution and may help reduce the variance further: Linear matrix scrambling (Matouˇ sek, Hickernell et Hong, Tezuka, Owen): We left-multiply each matrix C j by a random w × w matrix M j , non-singular and lower triangular, mod b . Several variants. We then apply a random digital shift in base b to obtain uniform distribution for each point (unbiasedness). Nested uniform scrambling (Owen 1995). More costly. But provably reduces the variance to O ( n − 3 (log n ) s ) when f is sufficiently smooth!

  62. 45 Draft Asian option example T = 1 (year), t j = j / d , K = 100, s 0 = 100, r = 0 . 05, σ = 0 . 5. s = d = 2. Exact value: µ ≈ 17 . 0958. MC Variance: 934.0 . Lattice: Korobov with a from old table + random shift. Sobol: left matrix scramble + random digital shift. Variance estimated from m = 1000 indep. randomizations. VRF = (MC variance) / ( n Var [ X s , n ]) ¯ nS 2 method VRF n X m m 2 10 stratif. 17.100 232.8 4 2 10 lattice 17.092 20.8 45 2 10 Sobol 17.094 1.66 563 2 16 stratif. 17.046 135.3 7 2 16 lattice 17.096 4.38 213 2 16 Sobol 17.096 0.037 25,330 2 20 stratif. 17.085 117.6 8 2 20 lattice 17.096 0.112 8,318 2 20 Sobol 17.096 0.0026 360,000

  63. 46 Draft s = d = 12. µ ≈ 13 . 122. MC variance: 516.3 . Lattice: Korobov + random shift. Sobol: left matrix scramble + random digital shift. Variance estimated from m = 1000 indep. randomizations. ¯ nS 2 method n X m VRF m 2 10 lattice 13.114 39.3 13 2 10 Sobol 13.123 5.9 88 2 16 lattice 13.122 6.61 78 2 16 Sobol 13.122 1.63 317 2 20 lattice 13.122 8.59 60 2 20 Sobol 13.122 0.89 579

  64. 47 Draft Variance for randomly-shifted lattice rules Suppose f has Fourier expansion f ( h ) e 2 π √− 1 h t u . � ˆ f ( u ) = h ∈ Z s For a randomly shifted lattice, the exact variance is always � | ˆ f ( h ) | 2 , Var [ˆ µ n , rqmc ] = 0 � = h ∈ L ∗ s s = { h ∈ R s : h t v ∈ Z for all v ∈ L s } ⊆ Z s is the dual lattice. where L ∗ From the viewpoint of variance reduction, an optimal lattice for f minimizes Var [ˆ µ n , rqmc ].

  65. 48 Draft � | ˆ f ( h ) | 2 . Var [ˆ µ n , rqmc ] = 0 � = h ∈ L ∗ s Let α > 0 be an even integer. If f has square-integrable mixed partial derivatives up to order α/ 2 > 0, and the periodic continuation of its derivatives up to order α/ 2 − 1 is continuous across the unit cube boundaries, then f ( h ) | 2 = O ((max(1 , h 1 ) · · · max(1 , h s )) − α ) . | ˆ Moreover, there is a vector v 1 = v 1 ( n ) such that (max(1 , h 1 ) · · · max(1 , h s )) − α = O ( n − α + ǫ ) . � P α := 0 � = h ∈ L ∗ s This P α has been proposed long ago as a figure of merit, often with α = 2. It is the variance for a worst-case f having f ( h ) | 2 = (max(1 , | h 1 | ) · · · max(1 , | h s | )) − α . | ˆ A larger α means a smoother f and a faster convergence rate.

  66. 49 Draft For even integer α , this worst-case f is (2 π ) α/ 2 � � f ∗ ( u ) = ( α/ 2)! B α/ 2 ( u j ) . u ⊆{ 1 ,..., s } j ∈ u where B α/ 2 is the Bernoulli polynomial of degree α/ 2. In particular, B 1 ( u ) = u − 1 / 2 and B 2 ( u ) = u 2 − u + 1 / 6. Easy to compute P α and search for good lattices in this case! However: This worst-case function is not necessarily representative of what happens in applications. Also, the hidden factor in O increases quickly with s , so this result is not very useful for large s . To get a bound that is uniform in s , the Fourier coefficients must decrease faster with the dimension and “size” of vectors h ; that is, f must be “smoother” in high-dimensional projections. This is typically what happens in applications for which RQMC is effective!

  67. 50 Draft Baker’s (or tent) transformation To make the periodic continuation of f continuous. If f (0) � = f (1), define ˜ f by ˜ f (1 − u ) = ˜ f ( u ) = f (2 u ) for 0 ≤ u ≤ 1 / 2. This ˜ f has the same integral as f and ˜ f (0) = ˜ f (1). 0 1 1/2 .

  68. 50 Draft Baker’s (or tent) transformation To make the periodic continuation of f continuous. If f (0) � = f (1), define ˜ f by ˜ f (1 − u ) = ˜ f ( u ) = f (2 u ) for 0 ≤ u ≤ 1 / 2. This ˜ f has the same integral as f and ˜ f (0) = ˜ f (1). 0 1 1/2 .

  69. 50 Draft Baker’s (or tent) transformation To make the periodic continuation of f continuous. If f (0) � = f (1), define ˜ f by ˜ f (1 − u ) = ˜ f ( u ) = f (2 u ) for 0 ≤ u ≤ 1 / 2. This ˜ f has the same integral as f and ˜ f (0) = ˜ f (1). 0 1 1/2 .

  70. 50 Draft Baker’s (or tent) transformation To make the periodic continuation of f continuous. If f (0) � = f (1), define ˜ f by ˜ f (1 − u ) = ˜ f ( u ) = f (2 u ) for 0 ≤ u ≤ 1 / 2. This ˜ f has the same integral as f and ˜ f (0) = ˜ f (1). 0 1 1/2 For smooth f , can reduce the variance to O ( n − 4+ ǫ ) (Hickernell 2002). The resulting ˜ f is symmetric with respect to u = 1 / 2. In practice, we transform the points U i instead of f .

  71. 51 Draft One-dimensional case Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2 U j , 2(1 − U j )]. 0 0.5 1

  72. 51 Draft One-dimensional case Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2 U j , 2(1 − U j )]. 0 0.5 1 U / n

  73. 51 Draft One-dimensional case Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2 U j , 2(1 − U j )]. 0 0.5 1

  74. 51 Draft One-dimensional case Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2 U j , 2(1 − U j )]. 0 0.5 1 Gives locally antithetic points in intervals of size 2 / n . This implies that linear pieces over these intervals are integrated exactly. Intuition: when f is smooth, it is well-approximated by a piecewise linear function, which is integrated exactly, so the error is small.

  75. 52 Draft ANOVA decomposition The Fourier expansion has too many terms to handle. As a cruder expansion, we can write f ( u ) = f ( u 1 , . . . , u s ) as: s s � � � f ( u ) = f u ( u ) = µ + f { i } ( u i ) + f { i , j } ( u i , u j ) + · · · u ⊆{ 1 ,..., s } i =1 i , j =1 where � � f u ( u ) = u | f ( u ) d u ¯ u − f v ( u v ) , [0 , 1) | ¯ v ⊂ u and the Monte Carlo variance decomposes as σ 2 = � σ 2 where σ 2 u , u = Var [ f u ( U )] . u ⊆{ 1 ,..., s } The σ 2 u ’s can be estimated by MC or RQMC. Heuristic intuition: Make sure the projections P n ( u ) are very uniform for the important subsets u (i.e., with larger σ 2 u ).

  76. 53 Draft Weighted P γ,α with projection-dependent weights γ u Denote u ( h ) = u ( h 1 , . . . , h s ) the set of indices j for which h j � = 0. � γ u ( h ) (max(1 , | h 1 | ) · · · max(1 , | h s | )) − α . P γ,α = 0 � = h ∈ L ∗ s For α/ 2 integer > 0, with u i = ( u i , 1 , . . . , u i , s ) = i v 1 mod 1, � | u | � n − 1 � − ( − 4 π 2 ) α/ 2 1 � � P γ,α = γ u B α ( u i , j ) , n ( α )! i =0 j ∈ u ∅� = u ⊆{ 1 ,..., s } and the corresponding variation is 2 ∂ α | u | / 2 1 � � � � V 2 � � γ ( f ) = ∂ u α/ 2 f u ( u ) d u , � � γ u (4 π 2 ) α | u | / 2 � � [0 , 1] | u | ∅� = u ⊆{ 1 ,..., s } for f : [0 , 1) s → R smooth enough. Then, � µ n , rqmc ( f u )] ≤ V 2 Var [ˆ µ n , rqmc ] = Var [ˆ γ ( f ) P γ,α . u ⊆{ 1 ,..., s }

  77. 54 Draft P γ,α with α = 2 and properly chosen weights γ is a good practical choice of figure of merit. Simple choices of weights: order-dependent or product. Lattice Builder: Software to search for good lattices with arbitrary n , s , weights, etc. See my web page.

  78. 55 Draft ANOVA Variances for estimator of P [ T > x ] in Stochastic Activity Network Stochastic Activity Network x = 64 x = 100 CMC, x = 64 CMC, x = 100 0 20 40 60 80 100 % of total variance for each cardinality of u

  79. 56 Draft Variance for estimator of P [ T > x ] for SAN 10 − 3 Stochastic Activity Network ( x = 64) 10 − 4 variance 10 − 5 10 − 6 MC Sobol 10 − 7 Lattice ( P 2 ) + baker n − 2 2 8 . 66 2 11 . 54 2 14 . 43 2 17 . 31 2 20 . 2 n Variance decreases roughly as O ( n − 1 . 2 ). For E [ T ], we observe O ( n − 1 . 4 ).

  80. 57 Draft Variance for estimator of P [ T > x ] with CMC 10 − 4 Stochastic Activity Network (CMC x = 64) 10 − 5 variance 10 − 6 10 − 7 MC Sobol 10 − 8 Lattice ( P 2 ) + baker n − 2 2 8 . 66 2 11 . 54 2 14 . 43 2 17 . 31 2 20 . 2 n

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend