Derandomised lattice rules for high dimensional integration Ian - - PowerPoint PPT Presentation

derandomised lattice rules for high dimensional
SMART_READER_LITE
LIVE PREVIEW

Derandomised lattice rules for high dimensional integration Ian - - PowerPoint PPT Presentation

Derandomised lattice rules for high dimensional integration Ian Sloan i.sloan@unsw.edu.au University of New South Wales, Sydney, Australia RICAM, December, 2018 Joint with Yoshihito Kazashi (EPFL) and Frances Kuo (UNSW) We want to approximate


slide-1
SLIDE 1

Derandomised lattice rules for high dimensional integration

Ian Sloan

i.sloan@unsw.edu.au University of New South Wales, Sydney, Australia RICAM, December, 2018 Joint with Yoshihito Kazashi (EPFL) and Frances Kuo (UNSW)

slide-2
SLIDE 2

We want to approximate Is(f) := 1 . . . 1 f(x1, . . . , xs)dx1 . . . dxs =

  • [0,1]s f(x)dx.

In practice this usually means that some transformation to the unit cube has already been carried out.

And s may be large.

slide-3
SLIDE 3

Quasi-Monte Carlo (QMC)

For QMC we take Is(f) ≈ QN,s(f) := 1 N

N

  • k=1

f(tk) , with t1, . . . , tN deterministic (and cleverly chosen). How to choose t1, . . . , tN? Low discrepancy points (Halton, Sobol′, Faure, Niederreiter,. . .) Lattice rules (Korobov, Hlawka, Zaremba, Hua).

slide-4
SLIDE 4

We consider only the simplest kind of lattice rule, given by QN,s(z; f) = 1 N

N

  • k=1

f

  • k z

N

  • ,

where z ∈ {1, . . . , N − 1}s = Zs

N, and the braces mean that each

component of the s-vector in the braces is to be replaced by its fractional part. The corresponding shifted lattice rule is QN,s(z, ∆; f) = 1 N

N

  • k=1

f

  • k z

N + ∆

  • ,

where ∆ ∈ [0, 1)s.

slide-5
SLIDE 5

Example of lattice & shifted lattice rules

N = 34, z = (1, 21) N = 34, z = (1, 21), ∆ = (0.8, 0.1)

1 1

  • 1

1

slide-6
SLIDE 6

How to choose z and ∆?

Recall the shifted lattice rule: Qlattice

N,s,z,∆(f) = 1

N

N

  • k=1

f kz N + ∆

  • ,

∆ ∈ [0, 1)s, Here are three possible strategies: Choose ∆ randomly, from a uniform distribution on [0, 1]s. Then find the best possible choice of z, by finding good values of z1, z2, . . . , zs one after the other (“CBC”), by minimising some worst-case error. Make an inspired choice of ∆, eg ∆ = 0. Then choose z1, z2, . . . , zs by CBC. Choose z1, ∆1, z2, ∆2 . . . , zs, ∆s by a more general CBC.

slide-7
SLIDE 7

Choosing ∆ randomly

There are big advantages in choosing ∆ randomly: Like Monte Carlo, this family is an unbiased estimator of Is(f). For free we get an estimate of the error – in practice we choose say 25

random shifts, and evaluate the lattice rule with a given generating vector z for each shift. Then the mean gives an estimate of the integral, and the spread gives us an estimate of the error.

A good z can be constructed by minimising the shift-averaged worst-case error over z1, z2, , . . . , zs successively – “CBC”. There is excellent theoretical justification (Kuo) and fast implementation (Nuyens). (More about randomly shifted lattice rules later.)

slide-8
SLIDE 8

Worst-case error

The worst-case error of a given QMC rule with points t1, t2, . . . tN, in some Hilbert space Hs, is es(N; t1, . . . , tN) := sup

f∈Hs, fHs≤1

  • [0,1)s f(x) dx − 1

N

N

  • k=1

f(tk})

  • ,
slide-9
SLIDE 9

Choosing the function space Hs

We take the space Hs to be a reproducing kernel Hilbert space. with a kernel K: K(·, x) ∈ Hs ∀ x ∈ [0, 1)s, K(x, y) = K(y, x), (f, K(·, x))Hs = f(x) ∀ f ∈ Hs, ∀ x. For every RKHS it is well known that e2

s(N; t1, . . . , tN) =

  • [0,1]s
  • [0,1]s K(x, y) dx dy

− 2 N

N

  • k=1
  • [0,1]s K(tk, x) dx +

1 N 2

N

  • k=1

N

  • k′=1

K(tk, tk′),

slide-10
SLIDE 10

Specialising:

In particular we choose the RKHS to be Hs,γ with kernel Ks,γ(x, y) =

  • u⊆{1:s}

γu

  • j∈u

η(xj, yj), where η(x, y) := 1

2B2(|x − y|) +

  • x − 1

2

  • y − 1

2

  • ,

x, y ∈ [0, 1].

Here B2(x) = x2 − x + 1/6. This implies

  • [0,1]s K(x, y) dx = 1

for any y ∈ [0, 1]s, and hence the simpler formula e2

s(N; t1, . . . , tN) =

1 N 2

N

  • k=1

N

  • k′=1

K(tk, tk′) − 1.

slide-11
SLIDE 11

What is this space?

It is well known that this is a weighted “unanchored” Hilbert space, a space of functions with square integrable mixed first derivatives on [0, 1]s, with squared norm f2

Hs,γ :=

  • u⊆{1:s}

γ−1

u

  • [0,1]|u|
  • [0,1]s−|u|

∂|u|f ∂xu (xu; x{1:s}\u) dx{1:s}\u 2 dyu, The γu are “weights”, with γ∅ = 1. Example: γu =

j∈u γj, for some γ1 ≥ γ2 ≥ . . . > 0 .

Weights are essential for “tractability”(IHS and H Wo´

zniakowski ’98)

slide-12
SLIDE 12

For this space, since Ks(x, y) =

u γu

  • j∈u η(xj, yj), we have

e2

s(N; z, ∆) =

  • ∅=u⊆{1:s}

γu e2

u(N; zu, ∆),

where for u ⊆ {1 : s} and zu := (zj)j∈u, e2

u(N; zu, ∆) :=

1 N 2

N

  • k=1

N

  • k′=1
  • j∈u

1 2B2

  • kzj

N

k′zj N

  • +

kzj N + ∆

  • − 1

2 k′zj N + ∆

  • − 1

2 . This worst-case error is easily computable for reasonable weights (such as product weights γu =

j∈u γj).

slide-13
SLIDE 13

For the case of random shifts

For the case of random shifts the relevant quantity is the shift-averaged worst-case error, esh 2

s

(N; z) =

  • [0,1]s e2

s(N; z, ∆)d∆

= 1 N 2

N

  • k=1

N

  • k′=1
  • ∅=u⊆{1:s}

γu

  • j∈u

B2

  • kzj

N

k′zj N

  • We used this fact, important to us later:

1

  • {x + ∆} − 1

2

{y + ∆} − 1

2

  • d∆ = 1

2B2(|x − y|),

= ⇒ 1 η(x, y)d∆ = 1

2B2(|x − y|) + 1 2B2(|x − y|)

= B2(|x − y|) x, y ∈ [0, 1] .

slide-14
SLIDE 14

For the case of random shifts the relevant quantity is the shift-averaged worst-case error, esh 2

s

(N, z) =

  • [0,1]s e2

s(N, z, ∆)d∆

= 1 N 2

N

  • k=1

N

  • k′=1
  • ∅=u⊆{1:s}

γu

  • j∈u

B2

  • kzj

N

k′zj N

  • Finding successive z1, z2, . . . , zs from the set {1, . . . , N − 1} that

minimise this expression can be done very efficiently by fast CBC (Nuyens and Cools ...).

slide-15
SLIDE 15

Derandomising

But there are benefits too in finding a good deterministic shift – sometimes called derandomising. The practical benefit is that we do not always have to make say 25 random shifts, because that is a big extra cost.

Of course we do get some “Monte Carlo” benefit by taking 25 random shifts, but the expected reduction of the error is of order only 1/ √ 25 = 1/5, rather than something closer to 1/25.

slide-16
SLIDE 16

Using CBC to fix all ∆j and zj

In 2002 S/Kuo/Joe proposed an algorithm to determine successively z1, ∆1, z2, ∆2, . . . zs, ∆s. The idea: suppose z1, ∆1, . . . , zs−1, ∆s−1 are already determined. We next choose zs to minimise 1 e2

s(z1, ∆1, . . . , zs−1, ∆s−1, zs, ∆s)d∆s,

which can be evaluated explicitly. Then (in principle) choose ∆s ∈ [0, 1] to minimise es(z1, ∆1, . . . , zs−1, ∆s−1, zs, ∆s).

slide-17
SLIDE 17

That’s not feasible, but ...

We can’t search over all values of ∆s ∈ [0, 1], but SKJ 2002 showed that we need search over only

1 2N , 3 2N , . . . , 2N−1 2N .

Here’s the argument: Suppose that z1, ∆1, . . . , zs−1, ∆s−1 are

  • fixed. It can easily be shown that

e2

s(z1, ∆1, . . . , zs−1, ∆s−1, zs, ∆s) = e2 s−1(z1, ∆1, . . . , zs−1, ∆s−1)

+θs(z1, ∆1, . . . , zs−1, ∆s−1, zs, ∆s),

slide-18
SLIDE 18

e2

s(z1, ∆1, . . . , zs−1, ∆s−1, zs, ∆s) = e2 s−1(z1, ∆1, . . . , zs−1, ∆s−1)

+θs(z1, ∆1, . . . , zs−1, ∆s−1, zs, ∆s), where for product weights we have θs(z1, . . . , ∆s−1, zs, ∆s) := γs N 2

N

  • k=1

N

  • k′=1

 

s−1

  • j=1

(1 + γjη(tk,j, tk′,j))   ×

  • 1

2B2

(k − k′)zs N

  • +

kzs N + ∆s

  • − 1

2

k′zs N + ∆s

  • − 1

2

slide-19
SLIDE 19

The minimiser ∆∗

s satisfies θs(zs, ∆∗

s) ≤ 1

N

  • ∆s∈{ 1

2N , 3 2N ,..., 2N−1 2N }

θs(zs, ∆s) ≤ 1 θs(zs, ∆s)d∆s . The first inequality holds because ... The 2nd inequality holds because the midpoint rule underestimates 1 kzs N + ∆s

  • − 1

2

k′zs N + ∆s

  • − 1

2

  • d∆s.

The integrand is a piecewise quadratic in ∆s, with breakpoints at multiples of 1/N, and positive curvature in between.

slide-20
SLIDE 20

And the minimiser z∗

s satisfies if N is prime θ(z∗

s, ∆∗ s) ≤

1 N − 1

N−1

  • zs=1

1 θs(zs, ∆s)d∆s ≤ γs N 2

N

  • k=1

N

  • k′=1

 

s−1

  • j=1

(1 + γjη(tk,j, tk′,j))   1 N − 1

N−1

  • zs=1

B2 (k − k′)zs N

  • But

1 N − 1

N−1

  • zs=1

B2({(k − k′)zs N }) =      B2(0) = 1

6

if k = k′ − 1

6N

if k = k′ = ⇒ θ(z∗

s, ∆∗ s) ≤ γs

6N

s−1

  • j=1

(1 + γj 3 ).

slide-21
SLIDE 21

And so to a theorem:

e2

s(t1, . . . , tN) ≤ e2 s−1(t1, . . . , tN) + γs

6N

s−1

  • j=1

(1 + γj 3 ), = ⇒ e2

s(t1, . . . , tN) ≤ 1

N

s

  • j=1

(1 + γj

3 )

if N is prime, which is ≤ C/N independently of s if ∞

j=1 γj < ∞,

So what’s the problem? It’s too slow - at least N 3d operations. (It’s the same for any method that

searches the wce over shifts.)

The proven convergence rate is only Monte Carlo rate, 1/ √ N.

slide-22
SLIDE 22

Compare with shift-averaged CBC:

Frances Kuo (2003) showed: for all λ ∈ ( 1

2, 1]

esh

s (z∗) ≤

1 N 1/(2λ)  

s

  • j=1
  • 1 + 2ζ(2λ)

(2π2)λ γλ

j

  • − 1

 

1/(2λ)

. So the convergence rate is arbitrarily close to 1/N as λ → 1/2

but with a constant that blows up as λ → 1/2.

Ideally we would like to derandomise, yet get the same convergence result.

slide-23
SLIDE 23

New approach to derandomising

Let’s fix z∗

1, z∗ 2, . . . , z∗ s as obtained by shift-averaged CBC.

Then choose successively ∆1, ∆2, . . . , ∆s by “CBC for shifts”. Why would this be a good idea? We know that the generating vector z∗ := (z∗

1, . . . , z∗ s) is good for the

randomly shifted wce, so there must be some shift vector ∆ which gives us at least as good an error. (The problem is to find it!) Actually, we know more: there is a good derandomised rule even if each ∆j is restricted to an odd multiple of 1/2N:

slide-24
SLIDE 24

Theorem

For arbitrary z ∈ {1, 2, . . . , N − 1}s,

  • esh 2(z) −

1 N s

  • ∆∈{ 1

2N , 3 2N ,..., 2N−1 2N }s

e2(z, ∆)

s 4N 2

s

  • j=1

(1 + γj 3 ) Proof: The left side is the integral over all shifts of the wce2 , minus the product-midpoint rule for the same integral. So if z = z∗ then for moderate s there exists a deterministic ∆ ∈ { 1

2N , 3 2N , . . . , 2N−1 2N }s giving a wce close to esh(z∗).

. That doesn’t prove we get a good result with CBC. We can’t prove it!

slide-25
SLIDE 25

A numerical experiment

We take product weights with γj = 1/j, N = 1024, and s ≤ 50. We take the generating vector z∗ from the shift-averaged CBC. At step j we compute ∆∗

j and e2(z∗ 1, . . . , z∗ j , ∆∗ 1, . . . , ∆∗ j), but also

the shift-averaged wce squared, esh 2(z∗

1, . . . , z∗ j ), and express the

quality of the results through the ratio κ(j, N) := e(z∗

1, . . . , z∗ j , ∆∗ 1, . . . , ∆∗ j)

esh(z∗

1, . . . , z∗ j )

. In our experiment κ(j, N) increased from 0.71 at j = 1 to 0.957 for j = 50. With κ(j, N) always less than 1, so far so good! BUT THEORY?

slide-26
SLIDE 26

The practical side of “CBC for shifts”

The necessary input values of z∗

j , j = 1, 2, .., s are available for

very large values of s and N and many choices of weights γ – see the websites of Dirk Nuyens and Frances Kuo. A full search over all values of ∆j at step j is expensive: each evaluation of the worst-case error costs of order N 2j operations, and there are N possible values of ∆j. But this can be an off-line calculation, grinding away in the background. And when it becomes infeasible to search over all values of ∆j,

  • ne could search over a randomly selected subset.

And when that becomes impractical one could perhaps choose just ONE shift at random.

slide-27
SLIDE 27

What about an a priori choice of ∆?

In a first paper we explored the case we think is the worst, namely ∆ = 0. Why is it worth worrying about this case if it is so unpromising? Because in this case, the simplest one, we can begin to understand the mathematical issues. So, let’s set ∆ = 0, and try to compute the average of the wce 2 over all choices of z ∈ {1, 2, . . . , N − 1}s := Zs

N:

es

2(N) :=

1 (N − 1)s

  • z∈Zs

N

e2

s(N, z)

Then we use again the principle: “there’s always one choice that’s as good as average”.

slide-28
SLIDE 28

Theorem

If a certain number-theoretic conjecture holds then there exists an unshifted lattice rule for which the worst case error is bounded by C (ln N)α/2 N . Moreover, if the weights γu satisfy . . . then C is independent of the dimension d.

slide-29
SLIDE 29

Conjecture

TN(κ) :=

(N−1)/2

  • q=1

1 q |r(qκ, N)|, κ = 1, . . . , N − 1 2 . Conjecture: For N ≥ 3 and prime, let κj be an ordering of 1, 2, . . . , (N − 1)/2 such that TN(κj) is non-increasing. Then ∃ C1, C2 and α ≥ 2 such that TN(κj) ≤ C1 (ln N)α N for all j > C2 (ln N)α. That is, TN(κ) is bounded by a constant times (ln N)α/N for all but a few exceptional values of κ. (See Appendix for details of ∆ = 0.)

slide-30
SLIDE 30

Summary of “CBC for shifts”

We have not been able to prove that our new “CBC for shifts” (using the z∗

j from shift-invariant CBC) always gives a good result. (And we can

prove even less for zero shifts.)

BUT recall that we can compute κ(s, N) := e(z∗

1, . . . , z∗ s, ∆∗ 1, . . . , ∆∗ s)

esh(z∗

1, . . . , z∗ s)

. This leads to a rigorous bound |I(f)−Q(N, z∗, ∆∗)(f)| ≤ κ(s, N)esh(z∗

1, . . . , z∗ s)fHs

≤ κ(s, N) N 1/(2λ)  

s

  • j=1
  • 1 + 2ζ(2λ)

(2π2)λ γλ

j

  • − 1

 

1/(2λ)

fHs .

slide-31
SLIDE 31

A reflection on theoretical error analysis

We failed to prove an a priori bound. BUT we do have: |I(f)−Q(N, z∗, ∆∗)(f)| ≤ κ(s, N) N 1/(2λ)  

s

  • j=1
  • 1 + 2ζ(2λ)

(2π2)λ γλ

j

  • − 1

 

1/(2λ)

fHs, where κ(s, N) is computable.

!

slide-32
SLIDE 32

References

Dick J, Kuo F Y, Sloan I H, High-dimensional integration: the Quasi Monte Carlo way, Acta Numerica, 2013. Kazashi Y, Kuo F Y, Sloan I H, Worst-case error for unshifted lattice rules without randomisation, submitted. Kazashi Y, Kuo F Y, Sloan I H, Derandomised lattice rules for high-dimensional integration, in preparation.

slide-33
SLIDE 33

Appendix: the a priori choice ∆ = 0

We explored the case we think is the worst, namely ∆ = 0. Why is it worth worrying about this case if it is so unpromising? Because in this case, the simplest one, we can begin to understand the mathematical issues. So, let’s set ∆ = 0, and try to compute the average of the wce 2 over all choices of z ∈ {1, 2, . . . , N − 1}s := Zs: es

2(N) :=

1 (N − 1)s

  • z∈Zs

N

e2

s(N, z)

=

  • ∅=u⊆{1:s}

γu eu

2(N),

where eu

2(N) :=

1 (N − 1)s

  • z∈Zs

N

e2

u(N, zu) .

slide-34
SLIDE 34

Now recall e2

u(N, zu) :=

1 N 2

N

  • k=1

N

  • k′=1
  • j∈u

1 2B2

  • kzj

N

k′zj N

  • +

kzj N

  • − 1

2 k′zj N

  • − 1

2 . Thus eu

2(N) :=

1 (N − 1)s

  • z∈Zs

N

e2

u(N, zu)

= 1 N 2

N

  • k=1

N

  • k′=1
  • j∈u
  • 1

N − 1

N−1

  • zj=1

1 2B2

  • kzj

N

k′zj N

  • +

kzj N

  • − 1

2 k′zj N

  • − 1

2 .

slide-35
SLIDE 35

Or eu

2(N) =

1 N 2

N

  • k=1

N

  • k′=1

(XN;k,k′ + JN;k,k′)|u|, where XN;k,k′ := 1 (N − 1)

N−1

  • z=1

1 2B2 (k − k′)z N

  • ,

and JN;k,k′ := 1 N − 1

N−1

  • z=1

kz N

  • − 1

2 k′z N

  • − 1

2

  • .

The heart of the matter lies in these two sums. While XN;k,k′ can be treated in a standard way, JN;k,k′ is not standard.

slide-36
SLIDE 36

JN,k,k′? – This is the delicate part

Recall: JN;k,k′ := 1 N − 1

N−1

  • z=1

kz N

  • − 1

2 k′z N

  • − 1

2

  • .

In the shift-averaged case this is replaced by 1 N − 1

N−1

  • z=1

1 2B2 (k − k′)z N

  • = XN;k,k′,

so we are finished. But in the unshifted case this term presents difficulties.

slide-37
SLIDE 37

Again we use Fourier series: x − 1 2 = lim

M→∞

i 2π

M

  • h=−M

h=0

exp(2πihx) h , x ∈ (0, 1). Caution: this Fourier series is only conditionally convergent! We use

N−1

  • z=0

exp(2πiz(hk − h′k′)/N) =      N if hk − h′k′ ≡N 0, if hk − h′k′ ≡N 0, where ≡N means “congruence modulo N.

slide-38
SLIDE 38

We finish up having to evaluate lim

M→∞

lim

M ′→∞ M

  • h=−M

h=0 M ′

  • h′=−M ′

h′=0 h′k′≡N hk

1 hh′ To keep things simple(!), let’s assume N is prime. Then h′k′ ≡N hk iff h′k′k−1 ≡N h, where k−1 is the unique inverse of k in ZN. That is, k−1k ≡N 1.

slide-39
SLIDE 39

To estimate lim

M→∞

lim

M ′→∞ M

  • h=−M

h=0 M ′

  • h′=−M ′

h′=0 h′k′≡hk

1 hh′ the essence lies in this sum: TN(κ) :=

(N−1)/2

  • q=1

1 q |r(qκ, N)|, κ ∈ {1, . . . , N − 1 2 }. where r(j, N) is the integer congruent to j ( mod N) with the smallest absolute value.

Here κ = kk′−1.

slide-40
SLIDE 40

Example

TN(κ) :=

(N−1)/2

  • q=1

1 q |r(qκ, N)| κ ∈ {1, . . . , N − 1 2 }. where r(j, N) is the integer congruent to j ( mod N) with the smallest absolute value. Suppose N = 7. Then TN(1) = 1 1 × 1 + 1 2 × 2 + 1 3 × 3 = 1 + 1 4 + 1 9 TN(2) = 1 1 × 2 + 1 2 × | − 3| + 1 3 × | − 1| = 1 2 + 1 6 + 1 3 TN(3) = 1 1 × 3 + 1 2 × | − 1| + 1 3 × 2

slide-41
SLIDE 41

Conjecture

TN(κ) :=

(N−1)/2

  • q=1

1 q |r(qκ, N)|. Conjecture: For N ≥ 3 and prime and κ = 1, 2 . . . (N − 1)/2, let κj be an

  • rdering of 1, 2, . . . , (N − 1)/2 such that TN(κj) is non-increasing.

Then ∃ C1, C2 and α ≥ 2 such that TN(κj) ≤ C1 (ln N)α N for all j > C2 (ln N)α. That is, TN(κ) is bounded by a constant times (ln N)α/N for all but a few exceptional values of κ.

slide-42
SLIDE 42

Theorem

If the conjecture holds then there exists an unshifted lattice rule for which the worst case error is bounded by C (ln N)α/2 N . Moreover, if the weights γu satisfy . . . then C is independent of the dimension d.

slide-43
SLIDE 43

Numerical experiments on the conjecture

Conjecture: For N ≥ 3 and prime and κ = 1, 2 . . . (N − 1)/2, let κj be an

  • rdering of 1, 2, . . . , (N − 1)/2 such that TN(κj) is non-increasing.

Then ∃ C1, C2 and α ≥ 2 such that TN(κj) ≤ C1 (ln N)α N for all j > C2 (ln N)α. So we plot NTN(κ)/(ln N)α against j/(ln N)α to see if it is plausible.

slide-44
SLIDE 44

Test for α = 2

slide-45
SLIDE 45

Test for α = 3