[PPT] - High-Order Quasi Monte-Carlo Integration for Bayesian Inversion of PowerPoint Presentation

SLIDE 1

High-Order Quasi Monte-Carlo Integration for Bayesian Inversion of Parametric Operator Equations Christoph Schwab Seminar f¨ ur Angewandte Mathematik ETH Z¨ urich, Switzerland Joint with J. Dick, F. Kuo and T. Le Gia (Sydney), D. Nuyens (Leuven)

ICERM WS on Integration and Optimization, September 2014 ARC QE2 (J. Dick) and ARC FF (F. Kuo), ERC AdG and SNF (Ch. Schwab)

SLIDE 2

Outline

Infinite-Dimensional Parametric Operator Equations
Quasi Monte-Carlo Integration and High Order Digital Nets
Quasi Monte-Carlo Petrov-Galerkin FEM convergence rate
Bayesian Inverse Problem
Numerical Results
Conclusions
References

SLIDE 3

Infinite-Dimensional Parametric Operator Equations

Example: Linear, Affine-Parametric Operator Equation

Givenf ∈ Y′ , for every y ∈ U find u(y) ∈ X : A(y) u(y) = f . (1) Here A(y) = A0 +

j≥1

yj Aj ∈ L(X; Y′) , ∀y := (yj)j≥1 ∈ U := [−1/2, 1/2]N . (2)

Assumptions:

Small Fluctuations:

j≥1

AjL(X,Y′) “small” , Sparsity: ∃0 < p < 1 :

j≥1

Ajp

L(X,Y′) < ∞ .

(3)

Example: Karh´

unen-Loeve expansion: a(x, y) = ¯ a(x) +

j≥1 yjψj(x)

A(y) = −∇x · a(x, y)∇x = −∇x · ¯ a(x)∇x

A0

+

j≥1

−∇x · ψj(x)∇x

Aj

, X = Y = H1

0(D) .

Goal: given G ∈ X ′, find E[G(u(·))] with respect to y ∈ U, i.e.,

I(G(u)) :=

U

G(u(y)) dy . (4)

SLIDE 4

Infinite-Dimensional Parametric Operator Equations

Strategy:
1. Dimension Truncation: truncate (2) to s terms,
2. solve s-dimensional equation (1) by Petrov-Galerkin discretization from {X h} ⊂ X,
3. approximate s-dimensional integral using QMC integration,

1 N

N−1

n=0

G

uh

s

yn − 1

2

,

(5) where y0, . . . , yN−1 ∈ [0, 1]s denote N points from a higher order digital net.

4. Error bounds explicit w.r. to N, h and truncation dimension s.

SLIDE 5

Infinite-Dimensional Parametric Operator Equations

Variational Formulation: aj(·, ·) : X × Y → R via

∀v ∈ X, w ∈ Y : aj(v, w) = Yw, AjvY′ , j = 0, 1, 2, . . . . Assumption: The sequence {Aj}j≥0 satisfies:

1. the nominal operator A0 ∈ L(X, Y′) is boundedly invertible:

inf

0=v∈X sup 0=w∈Y

a0(v, w) vXwY ≥ µ0 > 0 , inf

0=w∈Y sup 0=v∈X

a0(v, w) vXwY ≥ µ0 > 0 . (6)

2. the fluctuation operators {Aj}j≥1 are small with respect to A0: exists 0 < κ < 2 such that
j≥1

βj ≤ κ < 2 , where βj := A−1

0 AjL(X,Y′) ,

j = 1, 2, . . . . (7)

SLIDE 6

Infinite-Dimensional Parametric Operator Equations

Assumption is sufficient for bounded invertibility of A(y) uniformly w.r. to y ∈ U: for every realization y ∈ U of the parameter vector a(y; v, w) := Yw, A(y)vY′ , (8) satisfies uniform (with respect to y ∈ U) inf-sup conditions: with µ = (1 − κ/2)µ0 > 0, ∀y ∈ U : inf

0=v∈X sup 0=w∈Y

a(y; v, w) vXwY ≥ µ , inf

0=w∈Y sup 0=v∈X

a(y; v, w) vXwY ≥ µ . (9) For every f ∈ Y′ and for every y ∈ U, the parametric problem find u(y) ∈ X : a(y; u(y), w) = Yw, fY′ ∀w ∈ Y (10) admits unique solution u(y) which satisfies the a-priori estimate u(y)X ≤ 1 µ fY′ . (11)

SLIDE 7

Infinite-Dimensional Parametric Operator Equations

Parametric regularity of solutions

∀y ∈ U : (∂ν

yu)(y)X ≤ C0 |ν|! βνfY′

for all ν ∈ NN

0 with |ν| < ∞ ,

(12) where 0! := 1, βν :=

j≥1 β νj j , with βj = A−1 0 AjL(X,X) and |ν| = j≥1 νj.

Spatial regularity: scales of smoothness spaces {Xt}t≥0, {Yt}t≥0, with

X = X0 ⊃ X1 ⊃ X2 ⊃ · · · , Y = Y0 ⊃ Y1 ⊃ Y2 ⊃ · · · , and X ′ = X ′

0 ⊃ X ′ 1 ⊃ X ′ 2 ⊃ · · · , Y′ = Y′ 0 ⊃ Y′ 1 ⊃ Y′ 2 ⊃ · · · .

(13)

Uniform regularity shift (sufficient for single-level QMC-PG): for 0 < t ≤ ¯

t, ∀y ∈ U : f ∈ Y′

t =

⇒ u(y) = A(y)−1f ∈ Xt .

Mixed regularity shift (necessary for multi-level QMC-PG):

f ∈ Y′

t =

⇒ sup

y∈U

(∂ν

yu)(y)Xt ≤ C0 |ν|! βν t fY′

t

for all ν ∈ NN

0 with |ν| < ∞ .

SLIDE 8

Infinite-Dimensional Parametric Operator Equations

Best N-term polynomial chaos approximation

Under Assumption, u(y) : U → X can be expanded in (unconditionally convergent in L2(U; dy)) gpc series ∀y ∈ U : u(y) =

F

uνLν(y) , where uν = (u, Lν)L2(U;dy) ∈ X . Here F = {ν ∈ NN

0 : |ν| < ∞}, and Lν is the (L2(U; dy)-normalized) tensorized Legendre polynomial

∀ν ∈ F, ∀y ∈ U : Lν(y) :=

j≥1

Lνj(yj) (note L0 ≡ 1) .

[Cohen & DeVore & CS (2011), (Chkifa & Cohen & CS 2014)]

Assume that β ∈ ℓp(N) for some 0 < p < 1. Then for every N ∈ N exists Λ ⊂ F such that, for q = 2, ∞ #(Λ) = N and u − uΛLq(U,dy;X) ≤ C(q)N −(1/p−1/q′) . q′ conjugate of q = 2, ∞, constant C > 0 independent of N and of dimension.

Proof nonconstructive. “Constructive versions” (Chkifa & Cohen & DeVore & CS 2012-2014).

SLIDE 9

Infinite-Dimensional Parametric Operator Equations

Petrov-Galerkin discretization:

Let {X h}h>0 ⊂ X and {Yh}h>0 ⊂ Y dense families of subspaces in X and Y.

Approximation Properties: for 0 < t ≤ ¯

t and 0 < t′ ≤ ¯ t′, and for 0 < h ≤ h0, there hold ∀v ∈ Xt : inf

vh∈X h v − vhX ≤ Ct ht vXt ,

∀w ∈ Yt′ : inf

wh∈Yh w − whY ≤ Ct′ ht′ wYt′ .

(14) ∀ 0 ≤ t ≤ ¯ t : sup

y∈U

A(y)−1L(Y′

t,Xt) < ∞ .

(15)

Stability: assume that (X h, Yh) satisfy discr. inf-sup condition for A0.

Then there hold uniform (with respect to y ∈ U) discrete inf-sup conditions ∀y ∈ U : inf

0=vh∈X h

sup

0=wh∈Yh

a(y; vh, wh) vhXwhY ≥ ¯ µ > 0 , (16) ∀y ∈ U : inf

0=wh∈Yh

sup

0=vh∈X h

a(y; vh, wh) vhXwhY ≥ ¯ µ > 0 . (17)

SLIDE 10

Infinite-Dimensional Parametric Operator Equations

For every 0 < h ≤ h0 and for every y ∈ U, Petrov-Galerkin approximation

find uh(y) ∈ X h : a(y; uh(y), wh) = Ywh, fY′ ∀wh ∈ Yh , (18) admits a unique solution uh(y) which satisfies the a-priori estimate uh(y)X ≤ 1 ¯ µ fY′ . (19)

Quasioptimality: exists a constant C > 0 such that for all y ∈ U

u(y) − uh(y)X ≤ C ¯ µ inf

0=vh∈X h u(y) − vhX .

(20)

Convergence Rate: Ex. C > 0 such that for every f ∈ Y′

t with 0 < t ≤ ¯

t as h → 0 u(y) − uh(y)X ≤ C ht fY′

t .

(21)

Dimension-Truncation: For y ∈ U and s ∈ N, (y1, y2, ..., ys, 0, 0, ...) ∈ U, so

all bounds valid for [−1/2, 1/2]s uniformly w.r. to s.

SLIDE 11

Infinite-Dimensional Parametric Operator Equations

Regularity:

∃ 0 < t′ ≤ ¯ t : G(·) ∈ X ′

t′ ,

(22)

Adjoint Regularity: exists Ct′ > 0 such that for every y ∈ U,

∀y ∈ U : w(y) = (A∗(y))−1G ∈ Yt′ , w(y)Yt′ ≤ Ct′ GX ′

t′ .

(23)

Superconvergence (Aubin-Nitsche):

for every f ∈ Y′

t with 0 < t ≤ ¯

t, for every G(·) ∈ X ′

t′ with 0 < t′ ≤ ¯

t sup

y∈U

G(u(y)) − G(uh(y))
≤ C hτ fY′

t GX ′ t′ .

(24) where 0 < τ := t + t′ ≤ 2¯ t.

SLIDE 12

Dimension Truncation

Assume the Aj are enumerated s.t. βj := A−1

0 AjX satisfy

β1 ≥ β2 ≥ · · · ≥ βj ≥ · · · . (25) Then, for every f ∈ Y′, every y ∈ U and for every s ∈ N, sup

y∈U

u(y) − us(y)X ≤ C µ fY′

j≥s+1

βj . (26) For every G(·) ∈ X ′, |I(G(u)) − I(G(us))| ≤ ˜ C µ fY′ GX ′

j≥s+1

βj 2 (27) for ˜ C > 0 independent of s, f and G. If conditions (25) hold, for any 0 < p < 1 and for any s ∈ N holds the dimension-truncation error bound

j≥s+1

βj ≤ min

1

1/p − 1, 1

j≥1

βp

j

1/p s−(1/p−1) .

SLIDE 13

Quasi Monte-Carlo Integration and High Order Digital Nets

Consider general s-variate integrand F ∈ C0([0, 1]s). Approximate s-dimensional integral

Is(F) :=

[0,1]s F(y) dy

(28) where F(y) = G(uh

s(y − 1 2)) by

N-point QMC method: an equal-weight quadrature rule

QN,s(F) := 1 N

N−1

n=0

F(yn) , (29) with judiciously chosen points y0, . . . , yN−1 ∈ [0, 1]s.

SLIDE 14

Quasi Monte-Carlo Integration and High Order Digital Nets

Theorem 1 [Dick, Gia, Kuo, Nuyens, CS 2013] Let s ≥ 1 and N = bm for m ≥ 1 and prime b. Let β = (βj)j≥1 be a sequence of positive numbers s.t. ∃ 0 < p < 1 :

∞

j=1

βp

j < ∞ .

(30) Define β{1:s} = (βj)1≤j≤s and α := ⌊1/p⌋ + 1 . (31) Assume for every s ∈ N, for every y ∈ [−1/2, 1/2]N ∀ ν ∈ {0, 1, . . . , α}s : |(∂ν

yF)(y{1:s})| ≤ c |ν|! βν {1:s}

(32) Then, an interlaced polynomial lattice rule of order α ≥ 1 with N points can be constructed using a fast component-by-component algorithm, with cost O(α s N log N + α2 s2N) operations, such that |Is(F) − QN,s(F)| ≤ Cα,β,b,p N −1/p , where Cα,β,b,p < ∞ is independent of s and N.

SLIDE 15

Combined Error Bound

Theorem 2 [Single-Level High-Order Quasi Monte-Carlo Galerkin Error bound]

1. Approximate I(G(u(·))) by dimension-truncation and interlaced polynomial lattice rule (5)
rder α = ⌊1/p⌋ + 1 , withN = bm points in s dimensions,

with Petrov-Galerkin discretization in D with subspace X h with Mh = dim(X h) DoF, cost O(Mh). Then ex. C > 0 independent of s, h and N such that with τ = t + t′ |I(G(u)) − QN,s(G(uh

s))| ≤ C

(s−2(1/p−1) + N −1/p)fY′ G(·)X ′ + hτ fY′

tG(·)X ′ t′

.
2. Cost for evaluation of QN,s(G(uh

s)) is O(sNMh) operations.

3. Cost for CBC construction of the interlaced polynomial lattice rule

O(α s N log N + α2s2N) operations, O(α s N) memory . Proof I(G(u)) − QN,s(G(uh

s)) = [I(G(u)) − I(G(us))] + [I(G(us)) − I(G(uh s))] + [I(G(uh s)) − QN,s(G(uh s))] .

✷

SLIDE 16

Bayesian Inverse Problems

uncertain input data u(y) = u +

j≥1 yjψj ∈ X, uniform prior π0 on u,

forward equation: A(u; q) = f, f ∈ Y′ known, forward solution map G(u(y)) holomorphic w.r. to y.
noisy observation data: δ = G(u) + η ∈ Y , G = O ◦ G : X → Y , η ∼ N(0, Γ) ∈ Y = RK,
QoI: φ(u).

Bayes’ Estimate: IΓ(φ(u)) := 1 Z

y∈[−1/2,1/2]N exp
−1

Γδ − G(u(y))2

Y

φ(u(y))
=:F(y)

π0(dy) Theorem 3 [High-Order Quasi Monte-Carlo for Bayesian Inverse Problems] For every Γ > 0, the parametric, deterministic integrand function F(y) satisfies the derivative bounds (32). The preceding error bounds apply to Bayesian Estimation. Proof: Analytic continuation and Cauchy’s Theorem. ✷

SLIDE 17

10 10

1

10

2

10

3

Number of Dimensions s

10

2

10

3

10

4

10

5

10

6

10

7

10

8

CBC Generation Time [ms] 1 2

Generation Time vs. Dimension. θ =0.1, C =0.1, ζ =4, m =15 α =2 α =3 α =4

(a) SPOD, C = 0.1

10 10

1

10

2

10

3

Number of Dimensions s

10

2

10

3

10

4

10

5

CBC Generation Time [ms] 1 1

Generation Time vs. Dimension. θ =0.1, C =1, ζ =4, m =15 α =2 α =3 α =4

(b) product, C = 1

Figure 1: CPU time required for the construction of generating vectors of varying order α = 2, 3, 4 for product

and SPOD weights vs. the dimension s in (a) and (b) and vs. the number of points N = 2m in (a) and (b).

SLIDE 18

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

1

10

2

10

3

10

4

10

5

10

6

10

7

CBC Generation Time [ms] 1 1

Generation Time vs. N. s =100, θ =0.1, C =0.1, ζ =4 α =2 α =3 α =4

(a) SPOD, s = 100

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

1

10

2

10

3

10

4

10

5

10

6

10

7

CBC Generation Time [ms] 1 1

Generation Time vs. N. s =1000, θ =0.1, C =0.1, ζ =4 α =2 α =3 α =4

(b) product, s = 1000

Figure 2: CPU time required for the construction of generating vectors of varying order α = 2, 3, 4 for product

and SPOD weights vs. the dimension s in (a) and (b) and vs. the number of points N = 2m in (a) and (b).

SLIDE 19

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

15

10

14

10

13

10

12

10

11

10

10

10

9

10

8

10

7

10

6

10

5

10

4

10

3

10

2

10

1

Quadrature Error |I[f]−Q[f]| 1 −2

Error vs. Number of Points N. s =100, θ =0.2, C =Cα,b, ζ =4 QMC, α =2 QMC, α =3 QMC, α =4 QMC (pruned), α =3 QMC (pruned), α =4

(a) C = Cα,b

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

15

10

14

10

13

10

12

10

11

10

10

10

9

10

8

10

7

10

6

10

5

10

4

10

3

10

2

10

1

Quadrature Error |I[f]−Q[f]| 1 −2

Error vs. Number of Points N. s =100, θ =0.2, C =1, ζ =4 QMC, α =2 QMC, α =3 QMC, α =4 QMC (pruned), α =3 QMC (pruned), α =4

(b) C = 1

Figure 3: Convergence of QMC approximation for SPOD integrand (inv. Karh´

unen-Loeve ) in s = 100 dimensions with interlacing parameter α = 2, 3, 4, with and without pruning.

SLIDE 20

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

15

10

14

10

13

10

12

10

11

10

10

10

9

10

8

10

7

10

6

10

5

10

4

10

3

10

2

10

1

Quadrature Error |I[f]−Q[f]| 1 −2 1 −4

Error vs. Number of Points N. s =100, θ =0.1, C =0.1, ζ =2 α =2 α =3 α =4

(a) s = 100, ζ = 2

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

15

10

14

10

13

10

12

10

11

10

10

10

9

10

8

10

7

10

6

10

5

10

4

10

3

10

2

10

1

Quadrature Error |I[f]−Q[f]| 1 −2 1 −4

Error vs. Number of Points N. s =100, θ =0.1, C =0.1, ζ =4 α =2 α =3 α =4

(b) s = 100, ζ = 4

Figure 4: Convergence of QMC approximation to integral for product weight integrand in s = 100, 1000 dimensions

with interlacing parameter α = 2, 3, 4.

SLIDE 21

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

15

10

14

10

13

10

12

10

11

10

10

10

9

10

8

10

7

10

6

10

5

10

4

10

3

10

2

10

1

Quadrature Error |I[f]−Q[f]| 1 −2 1 −4

Error vs. Number of Points N. s =1000, θ =0.1, C =0.1, ζ =2 α =2 α =3 α =4

(a) s = 1000, ζ = 2

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

15

10

14

10

13

10

12

10

11

10

10

10

9

10

8

10

7

10

6

10

5

10

4

10

3

10

2

10

1

Quadrature Error |I[f]−Q[f]| 1 −2 1 −4

Error vs. Number of Points N. s =1000, θ =0.1, C =0.1, ζ =4 α =2 α =3 α =4

(b) s = 1000, ζ = 4

Figure 5: Convergence of QMC approximation to integral for product weight integrand in s = 100, 1000 dimensions

with interlacing parameter α = 2, 3, 4.

SLIDE 22

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

15

10

14

10

13

10

12

10

11

10

10

10

9

10

8

10

7

10

6

10

5

10

4

10

3

10

2

10

1

Quadrature Error |I[f]−Q[f]| 1 −2 1 −4

Error vs. Number of Points N. s =100, θ =0.1, C =0.1, ζ =2 α =2 α =3 α =4

(a) ζ = 2

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

Number of Points N =bm

10

15

10

14

10

13

10

12

10

11

10

10

10

9

10

8

10

7

10

6

10

5

10

4

10

3

10

2

10

1

Quadrature Error |I[f]−Q[f]| 1 −2 1 −4

Error vs. Number of Points N. s =100, θ =0.1, C =0.1, ζ =4 α =2 α =3 α =4

(b) ζ = 4

Figure 6: Convergence of QMC approximation for SPOD weight integrand in s = 100 dimensions with interlacing

parameter α = 2, 3, 4.

SLIDE 23

Conclusions

Infinite-dimensional, holomorphic-parametric operator equations,
Advection-Diffusion, Helmholtz in random media, random domains,
Sparsity:

– affine-parametric equations

j≥1 A−1

0 Ajp L(X,X) < ∞ some 0 < p < 1 .

– holomorphic-parametric equations A(y; ·) : X → Y′ is holomorphic in polydiscs.

Petrov-Galerkin discretization in x ∈ D, Dimension truncation at dimension s ∈ N,
N - point QMC quadrature based on digital net of order α = 1 + ⌊1/p⌋ imply convergence rate
E[G(u)] − QN,s[G(uh

s)]

≤ C(N −1/p + s−2(1/p−1) + ht+t′) .
Work = O(NsMh) analogous to (Single-Level) Monte-Carlo for p = 1.
Multi-Level Extension: Report 2014-16, SAM ETH Z¨

urich (Dick, Kuo, LeGia & CS)

Holomorphic-parametric operator equations: Report 2014-23, SAM ETH Z¨

urich (Dick, Kuo, LeGia & CS).

Bayesian Inverse Problems:

for holomorphic-parametric problems, countably-parametric posterior density satisfies derivative bounds (Sparsity: Schillings, CS and Stuart 2013, weighted RKHS: Dick, LeGia and CS 2014) = ⇒ QMC Rate N −1/p vs. 1/2 for MCMC, N −(1/p−1) for adaptive Smolyak, CS-based sampling,

SLIDE 24

References

J. Dick and F.Y. Kuo and Q.T. Le Gia and D. Nuyens and Ch. Schwab:

Higher order QMC Galerkin discretization for parametric operator equations, Research Report 2013-29, SAM ETH Z¨ urich (to appear in SINUM 2015)

J. Dick and F.Y. Kuo and Q.T. Le Gia and Ch. Schwab:

Multi-level Higher order QMC Galerkin discretization for parametric operator equations, Research Report 2014-14, SAM ETH Z¨ urich (in review)

J. Dick and Q.T. Le Gia and Ch. Schwab:

Higher order QMC Petrov-Galerkin discretizations for holomorphic parametric operator equations Research Report 2014-23, SAM ETH Z¨ urich (in review)

Ch. Schwab

QMC Galerkin discretization of parametric operator equations

Proc. MCMQC 2012, Springer Publ. 2014.
A. Chkifa, A. Cohen and Ch. Schwab:

Breaking the curse of dimensionality in sparse polynomial approximation of parametric PDEs

Journ. Math. Pures & Appliquees (2014).

SLIDE 25

R.N. Gantner and Ch. Schwab: