Tutorial on quasi-Monte Carlo methods Josef Dick School of - - PowerPoint PPT Presentation

tutorial on quasi monte carlo methods
SMART_READER_LITE
LIVE PREVIEW

Tutorial on quasi-Monte Carlo methods Josef Dick School of - - PowerPoint PPT Presentation

Tutorial on quasi-Monte Carlo methods Josef Dick School of Mathematics and Statistics, UNSW, Sydney, Australia josef.dick@unsw.edu.au Comparison: MCMC, MC, QMC Roughly speaking: Markov chain Monte Carlo and quasi-Monte Carlo are for


slide-1
SLIDE 1

Tutorial on quasi-Monte Carlo methods

Josef Dick

School of Mathematics and Statistics, UNSW, Sydney, Australia josef.dick@unsw.edu.au

slide-2
SLIDE 2

Comparison: MCMC, MC, QMC

Roughly speaking:

  • Markov chain Monte Carlo and quasi-Monte Carlo are for

different types of problems;

  • If you have a problem where Monte Carlo does not work, then

chances are quasi-Monte Carlo will not work as well;

  • If Monte Carlo works, but you want a faster method ⇒ try

(randomized) quasi-Monte Carlo (some tweaking might be necessary).

  • Quasi-Monte Carlo is an "experimental design" approach to

Monte Carlo simulation; In this talk we shall discuss how quasi-Monte Carlo can be faster than Monte Carlo under certain assumptions.

slide-3
SLIDE 3

Outline DISCREPANCY, REPRODUCING KERNEL HILBERT

SPACES AND WORST-CASE ERROR

QUASI-MONTE CARLO POINT SETS RANDOMIZATIONS WEIGHTED FUNCTION SPACES AND

TRACTABILITY

slide-4
SLIDE 4

The task is to approximate an integral Is(f) =

  • [0,1]s f(z) dz

for some integrand f by some quadrature rule QN,s(f) = 1 N

N−1

  • n=0

f(xn) at some sample points x0, . . . , xN−1 ∈ [0, 1]s.

slide-5
SLIDE 5
  • [0,1]s f(x) dx ≈ 1

N

N−1

  • n=0

f(xn) In other words: Area under curve = Volume of cube × average function value.

  • If x0, . . . , xN−1 ∈ [0, 1]s are chosen randomly ⇒ Monte Carlo
  • If x0, . . . , xN−1 ∈ [0, 1]s chosen deterministically ⇒ Quasi-Monte

Carlo

slide-6
SLIDE 6

Smooth integrands

  • Integrand f : [0, 1] → R; say continuously differentiable;
  • We want to study the integration error:

1 f(x) dx − 1 N

N−1

  • n=0

f(xn).

  • Representation:

f(x) = f(1) − 1

x

f ′(t) dt = f(1) − 1 1[0,t](x)f ′(t) dt, where 1[0,t](x) =

  • 1

if x ∈ [0, t],

  • therwise.
slide-7
SLIDE 7

Integration error

  • Substitute:

1 f(x)dx = 1

  • f(1) −

1 1[0,t](x)f ′(t) dt

  • dx

= f(1) − 1 1 1[0,t](x)f ′(t) dx dt, 1 N

N−1

  • n=0

f(xn) = 1 N

N−1

  • n=0
  • f(1) −

1 1[0,t](xn)f ′(t) dt

  • = f(1) −

1 1 N

N−1

  • n=0

1[0,t](xn)f ′(t) dt.

  • Integration error:

1 f(x) dx − 1 N

N−1

  • n=0

f(xn) = 1

  • 1

N

N−1

  • n=0

1[0,t](xn) − 1 1[0,t](x) dx

  • f ′(t) dt.
slide-8
SLIDE 8

Local discrepancy

Integration error: 1 f(x) dx − 1 N

N−1

  • n=0

f(xn) = 1

  • 1

N

N−1

  • n=0

1[0,t](xn) − t

  • f ′(t) dt.

Let P = {x0, . . . , xN−1} ⊂ [0, 1]. Define the local discrepancy by ∆P(t) = 1 N

N−1

  • n=0

1[0,t](xn) − t, t ∈ [0, 1]. Then 1 f(x) dx − 1 N

N−1

  • n=0

f(xn) = 1 ∆P(t)f ′(t) dt.

slide-9
SLIDE 9

Koksma-Hlawka inequality

  • 1

f(x) dx − 1 N

N−1

  • n=0

f(xn)

  • =
  • 1

∆P(t)f ′(t) dt

1 |∆P(t)|p dt 1/p 1 |f ′(t)|q dt 1/q = ∆PLpf ′Lq, 1 p + 1 q = 1, where gLp =

  • |g|p1/p.
slide-10
SLIDE 10

Interpretation of discrepancy

Let P = {x0, . . . , xN−1} ⊂ [0, 1]. Recall the definition of the local discrepancy: ∆P(t) = 1 N

N−1

  • n=0

1[0,t](xn) − t, t ∈ [0, 1]. Local discrepancy measures difference between uniform distribution and emperical distribution of quadrature points P = {x0, . . . , xN−1}. This is the Kolmogorov-Smirnov test for the difference between the empirical distribution of {x0, . . . , xN−1} and the uniform distribution.

slide-11
SLIDE 11

Function space

Representation f(x) = f(1) − 1

x

f ′(t) dt. Define inner product: f, g = f(1)g(1) + 1 f ′(t)g′(t) dt. and norm f =

  • |f(1)|2 +

1 |f ′(t)|2 dt. Function space: H = {f : [0, 1] → R : f absolutely continuous and f < ∞}.

slide-12
SLIDE 12

Worst-case error

The worst-case error is defined by e(H, P) = sup

f∈H,f≤1

  • 1

f(x) dx − 1 N

N−1

  • n=0

f(xn)

  • .
slide-13
SLIDE 13

Reproducing kernel Hilbert space

Recall: f(y) = f(1) · 1 + 1 f ′(x)(−1[0,x](y)) dx and f, g = f(1)g(1) + 1 f ′(x)g′(x) dx. Goal: Find set of functions gy ∈ H for each y ∈ [0, 1] such that f, gy = f(y) for all f ∈ H. Conclusions:

  • gy(1) = 1 for all y ∈ [0, 1];
  • g′

y(x) = −1[0,x](y) =

−1 if y ≤ x,

  • therwise;
  • Make g continuous such that g ∈ H;
slide-14
SLIDE 14

Reproducing kernel Hilbert space

It follows that gy(x) = 1 + min{1 − x, 1 − y}. The function K(x, y) := gy(x) = 1 + min{1 − x, 1 − y}, x, y ∈ [0, 1] is called reproducing kernel. The space H is called a reproducing kernel Hilbert space (with reproducing kernel K).

slide-15
SLIDE 15

Numerical integration in reproducing kernel Hilbert spaces

Function representation: 1 f(z) dz = 1 f, K(·, z) dz =

  • f,

1 K(·, z) dz

  • 1

N

N−1

  • n=0

f(xn) = 1 N

N−1

  • n=0

f, K(·, xn) =

  • f, 1

N

N−1

  • n=0

K(·, xn)

  • Integration error

1 f(z) dz − 1 N

N−1

  • n=0

f(xn) =

  • f,

1 K(·, z) dz − 1 N

N−1

  • n=0

K(·, xn)

  • = f, h,

where h(x) = 1 K(x, z) dz − 1 N

N−1

  • n=0

K(x, xn).

slide-16
SLIDE 16

Worst-case error in reproducing kernel Hilbert spaces

Thus e(H, P) = sup

f∈H,f≤1

  • 1

f(z) dz − 1 N

N−1

  • n=0

f(xn)

  • =

sup

f∈H,f=1

|f, h| = sup

f∈H,f=0

  • f

f, h

  • = h, h

h = h, since the supremum is attained when choosing f =

h h ∈ H.

slide-17
SLIDE 17

Worst-case error in reproducing kernel Hilbert spaces

e2(H, P) = 1 1 K(x, y) dx dy − 2 N

N−1

  • n=0

1 K(x, xn) dx + 1 N2

N−1

  • n,m=0

K(xn, xm)

slide-18
SLIDE 18

Numerical integration in higher dimensions

  • Tensor product space Hs = H × · · · × H.
  • Reproducing kernel

K(x, y) =

s

  • i=1

[1 + min{1 − xi, 1 − yi}], where x = (x1, . . . , xs), y = (y1, . . . , ys) ∈ [0, 1]s.

  • Functions f ∈ Hs have partial mixed derivatives up to order 1 in

each variable ∂|u|f ∂xu ∈ L2([0, 1]s), where u ⊆ {1, . . . , s}, xu = (xi)i∈u and |u| denotes the cardinality

  • f u and where ∂|u|f

∂xu (xu, 1) = 0 for all u ⊂ {1, . . . , s}.

slide-19
SLIDE 19

Worst-case error

Again e2(H, P) =

  • [0,1]s
  • [0,1]s K(x, y) dx dy − 2

N

N−1

  • n=0
  • [0,1]s K(x, xn) dx

+ 1 N2

N−1

  • n,m=0

K(xn, xm) and e2(H, P) =

  • [0,1]s ∆P(x)∂sf

∂x (x) dx, where ∆P(x) = 1 N

N−1

  • n=0

1[0,x](xn) −

s

  • i=1

xi.

slide-20
SLIDE 20

Discrepancy in higher dimensions

Point set P = {x0, . . . , xN−1} ⊂ [0, 1]s, t = (t1, . . . , ts) ∈ [0, 1]s.

  • Local discrepancy:

∆P(t) = 1 N

N−1

  • n=0

1[0,t](xn) −

s

  • i=1

ti, where [0, t] = s

i=1[0, ti].

slide-21
SLIDE 21

Koksma-Hlawka inequality

Let f : [0, 1]s → R with f =

  • [0,1]s
  • ∂s

∂x f(x)

  • q

dx 1/q , and where ∂|u|f

∂xu (xu, 1) = 0 for all u ⊂ {1, . . . , s}.

Then

  • [0,1]s f(x) dx − 1

N

N−1

  • n=0

f(xn)

  • ≤ f∆PLp,

where 1

p + 1 q = 1.

⇒ Construct points P = {x0, . . . , xN−1} ⊂ [0, 1]s with small discrepancy Lp(P) = ∆PLp. It is often useful to consider different reproducing kernels yielding different worst-case errors and discrepancies. We will see further examples later.

slide-22
SLIDE 22

Constructions of low-discrepancy sequences

  • Lattice rules (Bilyk, Brauchart, Cools, D., Hellekalek, Hickernell,

Hlawka, Joe, Keller, Korobov, Kritzer, Kuo, Larcher, L ’Ecuyer, Lemieux, Leobacher, Niederreiter, Nuyens, Pillichshammer, Sinescu, Sloan, Temlyakov, Wang, Wo´ zniakowski, ...)

  • Digital nets and sequences (Baldeaux, Bierbrauer, Brauchart,

Chen, D., Edel, Faure, Hellekalek, Hofer, Keller, Kritzer, Kuo, Larcher, Leobacher, Niederreiter, Owen, Özbudak, Pillichshammer, Pirsic, Schmid, Skriganov, Sobol’, Wang, Xing, Yue, ...)

  • Hammersley-Halton sequences (Atanassov, De Clerck, Faure,

Halton, Hammersley, Kritzer, Larcher, Lemieux, Pillichshammer, Pirsic, White, ...)

  • Kronecker sequences (Beck, Hellekalek, Larcher, Niederreiter,

Schoissengeier, ...)

slide-23
SLIDE 23

Lattice rules

Let N ∈ N, let g = (g1, . . . , gs) ∈ {1, . . . , N − 1}s. Choose the quadrature points as xn = ng N

  • ,

for n = 0, . . . , N − 1, where {z} = z − ⌊z⌋ for z ∈ R+

0 .

slide-24
SLIDE 24

Fibonacci lattice rules

Lattice rule with N = 55 points and generating vector g = (1, 34).

slide-25
SLIDE 25

Fibonacci lattice rules

Lattice rule with N = 89 points and generating vector g = (1, 55).

slide-26
SLIDE 26

In comparison: Random point set

Random set of 64 points generated by Matlab using Mersenne Twister.

slide-27
SLIDE 27

Lattice rules

How to find generating vector g? Reproducing kernel: K(x, y) =

s

  • i=1

(1 + 2πBα ({xi − yi})) , where {xi − yi} = (xi − yi) − ⌊xi − yi⌋ is the fractional part of xi − yi. Reproducing kernel Hilbert space of Fourier series: f(x) =

  • h∈Zs
  • f(h)e2πih·x.
slide-28
SLIDE 28

Worst-case error: e2

α(g)

= −1 + 1 N

N−1

  • n=0

s

  • i=1
  • 1 + 2πBα

ngi N

  • ,

where Bα is the Bernoulli polynomial of order α. For instance B2(x) = x2 − x + 1/6.

slide-29
SLIDE 29

Component-by-component construction (Korobov, Sloan-Reztsov,Sloan-Kuo-Joe, Nuyens-Cools)

  • Set g1 = 1.
  • For d = 2, . . . , s assume that we have found g2, . . . , gd−1. Then

find gd ∈ {1, . . . , N − 1} which minimizes eα(g1, . . . , gd−1, g) as a function of g, i.e. gd = argmin

g∈{1,...,N−1}e2 α(g1, . . . , gd−1, g).

Using fast Fourier transform (Nuyens, Cools), a good generating vector g ∈ {1, . . . , N − 1}s can be found in O(sN log N) operations.

slide-30
SLIDE 30

Hammersley-Halton sequence

Radical inverse function in base b:

  • Let n ∈ N0 have base b expansion

n = n0 + n1b + · · · + na−1ba−1.

  • Set

ϕb(n) = n0 b + n1 b2 + · · · + na−1 ba ∈ [0, 1]. Hammersley-Halton sequence:

  • Let P = {p1, p2, . . . , } be the set of prime numbers in increasing
  • rder, i.e. p1 = 2, p2 = 3, p3 = 5, p4 = 7, . . ..
  • Define quadrature points x0, x1, . . . by

xn = (ϕp1(n), ϕp2(n), . . . , ϕps(n)) for n = 0, 1, 2, . . . .

slide-31
SLIDE 31

Hammersley-Halton sequence

Hammerlsey-Halton point set with 64 points.

slide-32
SLIDE 32

Digital nets

  • Choose prime number b and finite field Zb = {0, 1, . . . , b − 1} of
  • rder b.
  • Choose C1, . . . , Cs ∈ Zm×m

b

.

  • Let n = n0 + n1b + · · · + nm−1bm−1 and set
  • n = (n0, . . . , nm−1)⊤ ∈ Zm

b .

  • Let
  • yn,i = Ci

n for 1 ≤ i ≤ s, 0 ≤ n < bm.

  • For

yn,i = (yn,i,1, . . . , yn,i,m)⊤ ∈ Zm

b let

xn,i = yn,i,1 b + · · · + yn,i,m bm .

  • Set xn = (xn,1, . . . , xn,s) for 0 ≤ n < bm.
slide-33
SLIDE 33

(t, m, s)-net property

Let m, s ≥ 1 and b ≥ 2 be integers. A point set P = {x0, . . . , xbm−1} is called a (t, m, s)-net in base b, if for all integers d1, . . . , ds ≥ 0 with d1 + · · · + ds = m − t the number of points in the elementary intervals

s

  • i=1

ai bdi , ai + 1 bdi

  • where 0 ≤ ai < bdi,

is bt.

slide-34
SLIDE 34

If P = {x0, . . . , xbm−1} ⊂ [0, 1]s is a (t, m, s)-net, then local discrepancy function ∆P a1 bd1 , . . . , as bds

  • = 0

for all 0 ≤ ai < bdi, di ≥ 0, d1 + · · · + ds = m − t.

slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47

Sobol point set

The first 64 points of a Sobol sequence which form a (0, 6, 2)-net in base 2.

slide-48
SLIDE 48

Niederreiter-Xing point set

The first 64 points of a Niederreiter-Xing sequence.

slide-49
SLIDE 49

Niederreiter-Xing point set

The first 1024 points of a Niederreiter-Xing sequence.

slide-50
SLIDE 50

Kronecker sequence

Let α1, . . . , αs ∈ R be linearly independent over Q. Let xn = ({nα1}, . . . , {nαs}) for n = 0, 1, 2, . . . . For instance, one can choose αi = √pi where pi is the ith prime number.

slide-51
SLIDE 51

Kronecker sequence

The first 64 points of a Kronecker sequence.

slide-52
SLIDE 52

Discrepancy bounds

  • It is known that for all the constructions above one has

Lp(P) ≤ Cs (log N)c(s) N , for all N ≥ 1, 1 ≤ p ≤ ∞, where Cs > 0 and c(s) ≈ s.

  • Lower bound: For all point sets P consisting of N points we have

Lp(P) ≥ C′

s

(log N)(s−1)/2 N for all N ≥ 1, 1 < p ≤ ∞. In comparison: random point set E(L2(P)) = O

  • log log N

N

  • .
slide-53
SLIDE 53

For 1 < p < ∞, the exact order of convergence is known to be of

  • rder

(log N)(s−1)/2 N . Explicit constructions of such points were achieved by Chen & Skriganov for p = 2 and Skriganov for 1 < p < ∞. Great open problem: What is the correct order of convergence of min

P⊂[0,1]s |P|=N

L∞(P)?

slide-54
SLIDE 54

Randomized quasi-Monte Carlo

Introduce random element to deterministic point set. Has several advantages:

  • yields an unbiased estimator;
  • there is a statistical error estimate;
  • better rate of convergence of the random-case error compared to

worst-case error;

  • performs similar to Monte Carlo for L2 functions but has better

rate of convergence for smooth functions;

slide-55
SLIDE 55

Randomly shifted lattice rules

Lattice rules can be randomized using a random shift:

  • Lattice point set:

{ng/N} , n = 0, 1, . . . , N − 1.

  • Shifted lattice rule: choose σ ∈ [0, 1]s uniformly distributed; then

the shifted lattice point set is given by {ng/N + σ} , n = 0, 1, . . . , N − 1.

slide-56
SLIDE 56

Lattice point set

slide-57
SLIDE 57

Shifted lattice point set

slide-58
SLIDE 58

Owen’s scrambling

  • Let Zb = {0, 1, . . . , b − 1} and let

x = x1 b + x2 b2 + · · · , x1, x2, . . . ∈ Zb.

  • Randomly choose permutations σ, σx1, σx1,x2, . . . : Zb → Zb.
  • Let

y1 = σ(x1) y2 = σx1(x2) y3 = σx1,x2(x3) . . . . . .

  • Set

y = y1 b + y2 b2 + · · · .

slide-59
SLIDE 59

Scrambled Sobol point set

The first 1024 points of a Sobol sequence (left) and a scrambled Sobol sequence (right).

slide-60
SLIDE 60

Scrambled net variance

Let e(R) =

  • [0,1]s f(x) dx − 1

N

bm−1

  • n=0

f(yn). Then

  • E(e(R)) = 0
  • Var(e(R)) = O

(log N)s N2α+1

  • for integrands with smoothness 0 ≤ α ≤ 1.
slide-61
SLIDE 61

Dependence on the dimension

Although the convergence rate is better for quasi-Monte Carlo, there is a stronger dependence on the dimension:

  • Monte Carlo: error = O(N−1/2);
  • Quasi-Monte Carlo: error = O
  • (log N)s−1

N

  • ;

Notice that g(N) := N−1(log N)s−1 is an increasing function for N ≤ es−1. So if s is large, say s ≈ 30, then N−1(log N)29 increases for N ≤ 1012.

slide-62
SLIDE 62

Intractable

For integration error in H with reproducing kernel K(x, y) =

s

  • i=1

min{1 − xi, 1 − yi} it is known that e2(H, P) ≥

  • 1 − N

8 9 s e2(H, ∅). ⇒ Error can only decrease if N > constant · 9 8 s .

slide-63
SLIDE 63

Weighted function spaces (Sloan and Wo´ zniakowski)

Study weighted function spaces: Introduce γ1, γ2, . . . , > 0 and define K(x, y) =

s

  • i=1

(1 + γi min{1 − xi, 1 − yi}). Then if

  • i=1

γi < ∞ we have e(Hs,γ, P) ≤ CN−δ, where C > 0 is independent of the dimension s and δ < 1.

slide-64
SLIDE 64

Tractability

  • Minimal error over all methods using N points:

e∗

N(H) =

inf

P:|P|=N e(H, P);

  • Inverse of the error:

N∗(s, ε) = min{N ∈ N : e∗

N(H) ≤ εe∗ 0(H)}

for ε > 0;

  • Strong tractability:

N∗(s, ε) ≤ Cε−β for some constant C > 0; Sloan and Wo´ zniakowski: For the function space considered above, we get strong tractability if

  • i=1

γi < ∞.

slide-65
SLIDE 65

Tractability of star-discrepancy

Recall the local discrepancy function ∆P(t) = 1 N

N−1

  • n=0

1[0,t)(xn) − t1 · · · ts, where P = {x0, . . . , xN−1}. The star-discrepancy is given by D∗

N(P) = sup t∈[0,1]s |∆P(t)| .

Result by Heinrich, Novak, Wo´ zniakowski, Wasilkowski: Minimal star discrepancy D∗(s, N) = inf

P D∗ N(P) ≤ C

  • s

N for all s, N ∈ N. This implies that N∗(s, ε) ≤ Csε−2.

slide-66
SLIDE 66

Extensions, open problems and future research

  • Quasi-Monte Carlo methods which achieve convergence rates of
  • rder

N−α(log N)αs, α > 1 for sufficiently smooth integrands;

  • Construction of point sets whose star-discrepancy is tractable;
  • Completely uniformly distributed point sets to speed up

convergence in Markov chain Monte Carlo algorithms;

  • Infinite dimensional integration: Integrands have infinitely many

variables;

  • Connections to codes, orthogonal arrays and experimental

designs;

  • Quasi-Monte Carlo for function approximation;
  • Choosing the weights in applications;
  • Quasi-Monte Carlo on the sphere;
slide-67
SLIDE 67

Book

slide-68
SLIDE 68

Thank You!

Special thanks to

  • Dirk Nuyens for creating many of the pictures in this talk;
  • Friedrich Pillichshammer for letting me use a picture of his

daughters;

  • Fred Hickernell, Peter Kritzer, Art Owen, and Friedrich

Pillichshammer for helpful comments on the slides;

  • All colleagues who worked on quasi-Monte Carlo methods over

the years.