transport multilevel approaches for large scale pde
play

Transport & Multilevel Approaches for Large-Scale - PowerPoint PPT Presentation

Transport & Multilevel Approaches for Large-Scale PDE-Constrained Bayesian Inference Robert Scheichl Institute of Applied Mathematics & Interdisciplinary Center for Scientific Computing Heidelberg University Collaborators: K


  1. Variational Bayes (as opposed to Metropolis-Hastings MCMC) Aim to characterise the posterior distribution (density π pos ) analytically (at least approximately) for more efficient inference. This is a challenging task since: x ∈ R d is typically high-dimensional (e.g., discretised function) π pos is in general non-Gaussian (even if π pr and observational noise are Gaussian) evaluations of likelihood may be expensive (e.g., solution of a PDE) Key Tools – a playground for a numerical analyst! Transport Maps, Optimisation , Principle Component Analysis, Model Order Reduction, Hierarchies, Sparsity, Low Rank Approximation R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 6 / 38

  2. Deterministic Couplings of Probability Measures Core idea [Moselhy, Marzouk, 2012] Choose a reference distribution η (e.g., standard Gaussian) Seek transport map T : R d → R d such that T ♯ η = π (push-forward) (invertible) T η π R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 7 / 38

  3. Deterministic Couplings of Probability Measures Core idea [Moselhy, Marzouk, 2012] Choose a reference distribution η (e.g., standard Gaussian) Seek transport map T : R d → R d such that T ♯ η = π (push-forward) (invertible) T η π In principle, enables exact (independent, unweighted) sampling! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 7 / 38

  4. Deterministic Couplings of Probability Measures Core idea [Moselhy, Marzouk, 2012] Choose a reference distribution η (e.g., standard Gaussian) Seek transport map T : R d → R d such that T ♯ η = π (push-forward) (invertible) T η π In principle, enables exact (independent, unweighted) sampling! Approximately satisfying conditions still useful: Preconditioning! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 7 / 38

  5. Variational Inference Goal: Sampling from target density π ( x ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  6. Variational Inference Goal: Sampling from target density π ( x ) Given a reference density η , find an invertible map ˆ T such that ˆ D KL ( η � T − 1 T := argmin D KL ( T ♯ η � π ) = argmin π ) ♯ T T where � � p ( x ) � D KL ( p � q ):= log p ( x ) d x . . . Kullback-Leibler divergence q ( x ) � � � � � � T − 1 ( x ) � det ∇ x T − 1 ( x ) � T ♯ p ( x ):= p . . . push-forward of p R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  7. Variational Inference Goal: Sampling from target density π ( x ) Given a reference density η , find an invertible map ˆ T such that ˆ D KL ( η � T − 1 T := argmin D KL ( T ♯ η � π ) = argmin π ) ♯ T T where � � p ( x ) � D KL ( p � q ):= log p ( x ) d x . . . Kullback-Leibler divergence q ( x ) � � � � � � T − 1 ( x ) � det ∇ x T − 1 ( x ) � T ♯ p ( x ):= p . . . push-forward of p Advantage of using D KL : normalising constant for π is not needed R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  8. Variational Inference Goal: Sampling from target density π ( x ) Given a reference density η , find an invertible map ˆ T such that ˆ D KL ( η � T − 1 T := argmin D KL ( T ♯ η � π ) = argmin π ) ♯ T T where � � p ( x ) � D KL ( p � q ):= log p ( x ) d x . . . Kullback-Leibler divergence q ( x ) � � � � � � T − 1 ( x ) � det ∇ x T − 1 ( x ) � T ♯ p ( x ):= p . . . push-forward of p Advantage of using D KL : normalising constant for π is not needed Minimise over some suitable class T of maps T � � ∇ x T − 1 ( x ) (where ideally Jacobian determinant det is easy to evaluate) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  9. Variational Inference Goal: Sampling from target density π ( x ) Given a reference density η , find an invertible map ˆ T such that ˆ D KL ( η � T − 1 T := argmin D KL ( T ♯ η � π ) = argmin π ) ♯ T T where � � p ( x ) � D KL ( p � q ):= log p ( x ) d x . . . Kullback-Leibler divergence q ( x ) � � � � � � T − 1 ( x ) � det ∇ x T − 1 ( x ) � T ♯ p ( x ):= p . . . push-forward of p Advantage of using D KL : normalising constant for π is not needed Minimise over some suitable class T of maps T � � ∇ x T − 1 ( x ) (where ideally Jacobian determinant det is easy to evaluate) or use samples of T − 1 To improve: enrich class T π as ♯ proposals for MCMC or in importance sampling (see below) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  10. Many Choices (“Architectures”) for T possible Examples: (list not comprehensive!!) Optimal Transport or Knothe-Rosenblatt Rearrangement 1 [Moselhy, Marzouk, 2012] , [Marzouk, Moselhy, Parno, Spantini, 2016] Normalizing or Autoregressive Flows [Rezende, Mohamed, 2015] 2 (and related methods in the ML literature) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 9 / 38

  11. Many Choices (“Architectures”) for T possible Examples: (list not comprehensive!!) Optimal Transport or Knothe-Rosenblatt Rearrangement 1 [Moselhy, Marzouk, 2012] , [Marzouk, Moselhy, Parno, Spantini, 2016] Normalizing or Autoregressive Flows [Rezende, Mohamed, 2015] 2 (and related methods in the ML literature) Kernel-based variational inference: Stein Variational Methods 3 [Liu, Wang, 2016], [Detommaso, Cui, Spantini, Marzouk, RS , 2018], [Chen, Wu, Chen, O’Leary-Roseberry, Ghattas, 2019] not today! Layers of low-rank maps [Bigoni, Zahm, Spantini, Marzouk, arXiv 2019] 4 Layers of hierarchical invertible neural networks (HINT) not today! 5 [Detommaso, Kruse, Ardizzone, Rother, K¨ othe, RS , arXiv:1905.10687] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 9 / 38

  12. Many Choices (“Architectures”) for T possible Examples: (list not comprehensive!!) Optimal Transport or Knothe-Rosenblatt Rearrangement 1 [Moselhy, Marzouk, 2012] , [Marzouk, Moselhy, Parno, Spantini, 2016] Normalizing or Autoregressive Flows [Rezende, Mohamed, 2015] 2 (and related methods in the ML literature) Kernel-based variational inference: Stein Variational Methods 3 [Liu, Wang, 2016], [Detommaso, Cui, Spantini, Marzouk, RS , 2018], [Chen, Wu, Chen, O’Leary-Roseberry, Ghattas, 2019] not today! Layers of low-rank maps [Bigoni, Zahm, Spantini, Marzouk, arXiv 2019] 4 Layers of hierarchical invertible neural networks (HINT) not today! 5 [Detommaso, Kruse, Ardizzone, Rother, K¨ othe, RS , arXiv:1905.10687] Low-rank tensor approximation of Knothe-Rosenblatt rearrangement 6 [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 9 / 38

  13. Approximation and Sampling of Multivariate Probability Distributions in the Tensor Train Decomposition [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 10 / 38

  14. Variational Inference with Triangular Maps In general, in Variational Inference aim to find argmin D KL ( T ♯ η || π ) T Note: � � D KL ( T ♯ η || π ) = − E u ∼ η log π ( T ( u )) + log | det ∇ T ( u ) | + const R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 11 / 38

  15. Variational Inference with Triangular Maps In general, in Variational Inference aim to find argmin D KL ( T ♯ η || π ) T Note: � � D KL ( T ♯ η || π ) = − E u ∼ η log π ( T ( u )) + log | det ∇ T ( u ) | + const Particularly useful family T are Knothe-Rosenblatt triangular rearrangements (see [Marzouk, Moshely, Parno, Spantini, 2016]) :   T 1 ( x 1 )   T 2 ( x 1 , x 2 )   T ( x ) =   . (= autoregressive flow in ML) .   . T d ( x 1 , x 2 , . . . , x d ) Then: log | det ∇ T ( u ) | = � k log ∂ x k T k R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 11 / 38

  16. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  17. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) Any density factorises into product of conditional densities: π ( x 1 , . . . , x d ) = π 1 ( x 1 ) π 2 ( x 2 | x 1 ) · · · π d ( x d | x 1 , . . . , x d − 1 ) Can sample (up to normalisation with known scaling factor) � x k ∼ π k ( x k | x 1 , . . . , x k − 1 ) ∼ π ( x 1 , . . . , x d ) dx k +1 · · · dx d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  18. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) Any density factorises into product of conditional densities: π ( x 1 , . . . , x d ) = π 1 ( x 1 ) π 2 ( x 2 | x 1 ) · · · π d ( x d | x 1 , . . . , x d − 1 ) 1st step: Produce sample x i 1 via 1D CDF-inversion from � π 1 ( x 1 ) ∼ π ( x 1 , x 2 , . . . , x d ) dx 2 · · · dx d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  19. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) Any density factorises into product of conditional densities: π ( x 1 , . . . , x d ) = π 1 ( x 1 ) π 2 ( x 2 | x 1 ) · · · π d ( x d | x 1 , . . . , x d − 1 ) 1st step: Produce sample x i 1 via 1D CDF-inversion from � π 1 ( x 1 ) ∼ π ( x 1 , x 2 , . . . , x d ) dx 2 · · · dx d k -th step: Given x i 1 , . . . , x i k − 1 sample x i k via 1D CDF-inversion from � π k ( x k | x i 1 , . . . , x i π ( x i 1 , . . . , x i k − 1 ) ∼ k − 1 , x k , x k +1 , . . . , x d ) dx k +1 · · · dx d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  20. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) Any density factorises into product of conditional densities: π ( x 1 , . . . , x d ) = π 1 ( x 1 ) π 2 ( x 2 | x 1 ) · · · π d ( x d | x 1 , . . . , x d − 1 ) 1st step: Produce sample x i 1 via 1D CDF-inversion from � π 1 ( x 1 ) ∼ π ( x 1 , x 2 , . . . , x d ) dx 2 · · · dx d k -th step: Given x i 1 , . . . , x i k − 1 sample x i k via 1D CDF-inversion from � π k ( x k | x i 1 , . . . , x i π ( x i 1 , . . . , x i k − 1 ) ∼ k − 1 , x k , x k +1 , . . . , x d ) dx k +1 · · · dx d Problem: ( d − k ) -dimensional integration at k -th step! Remedy: Find approximation ˜ π ≈ π where integration is cheap! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  21. Low-rank Tensor Approximation of Distributions Low-rank tensor decomposition ⇔ separation of variables: n � O ( n d ) O ( dn ) Tensor grid with n points per direction (or n polynomial basis fcts.) � | α |≤ r π 1 α ( x 1 ) π 2 α ( x 2 ) · · · π d Approximate: π ( x 1 , . . . , x d ) ≈ α ( x d ) � �� � � �� � tensor tensor product decomposition Many low-rank tensor formats exist [Kolda, Bader ’09], [Hackbusch ’12] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 13 / 38

  22. Conditional Distribution Sampler (with factorised distribution) For the low-rank tensor approximation � π 1 α ( x 1 ) · π 2 α ( x 2 ) · · · π d π ( x ) ≈ ˜ π ( x ) = α ( x d ) | α |≤ r the k -th step of the CD sampler, given x i 1 , . . . , x i k − 1 , simplifies to � π 1 1 ) · · · π k − 1 π k ( x k | x i 1 , . . . , x i α ( x i ( x i ˜ k − 1 ) ∼ k − 1 ) . . . α | α |≤ r . . . π k α ( x k ) . . . � � π k +1 π d . . . ( x k +1 ) dx k +1 · · · α ( x d ) dx d α � �� � Repeated 1D integrals! linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 14 / 38

  23. Conditional Distribution Sampler (with factorised distribution) For the low-rank tensor approximation � π 1 α ( x 1 ) · π 2 α ( x 2 ) · · · π d π ( x ) ≈ ˜ π ( x ) = α ( x d ) | α |≤ r the k -th step of the CD sampler, given x i 1 , . . . , x i k − 1 , simplifies to � π 1 1 ) · · · π k − 1 π k ( x k | x i 1 , . . . , x i α ( x i ( x i ˜ k − 1 ) ∼ k − 1 ) . . . α | α |≤ r . . . π k α ( x k ) . . . � � π k +1 π d . . . ( x k +1 ) dx k +1 · · · α ( x d ) dx d α � �� � Repeated 1D integrals! linear in d To sample (in each step) : Simple 1D CDF-inversions linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 14 / 38

  24. Low-rank Decomposition (Two Variables) Collect discretised values of π ( θ 1 , θ 2 ) on n × n grid into a matrix: r � P ( i , j ) = V α ( i ) W α ( j ) + O ( ε ) α =1 ≈ Rank r ≪ n (exploiting structure, smoothness, . . . ) mem ( V ) + mem ( W ) = 2 nr ≪ n 2 = mem ( P ) SVD provides optimal ε for fixed r (s.t. min V , W � P − VW ∗ � 2 F ) But requires all n 2 entries of P ! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 15 / 38

  25. Low-rank Decomposition (Two Variables) Collect discretised values of π ( θ 1 , θ 2 ) on n × n grid into a matrix: r � P ( i , j ) = V α ( i ) W α ( j ) + O ( ε ) α =1 ≈ Rank r ≪ n (exploiting structure, smoothness, . . . ) mem ( V ) + mem ( W ) = 2 nr ≪ n 2 = mem ( P ) SVD provides optimal ε for fixed r (s.t. min V , W � P − VW ∗ � 2 F ) But requires all n 2 entries of P ! n d in d dimensions! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 15 / 38

  26. Cross Algorithm (construct low-rank factorisation from few entries) Interpolation arguments show: for some suitable index sets I , J ⊂ { 1 , . . . , n } with | I | = | J | = r , the cross decomposition − 1 ≈ also P (: , J ) P − 1 ( I , J ) P ( I , :) ≈ P R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 16 / 38

  27. Cross Algorithm (construct low-rank factorisation from few entries) Interpolation arguments show: for some suitable index sets I , J ⊂ { 1 , . . . , n } with | I | = | J | = r , the cross decomposition − 1 ≈ also P (: , J ) P − 1 ( I , J ) P ( I , :) ≈ P Maxvol principle gives ‘best’ indices I , J [Goreinov, Tyrtyshnikov ’01] � � � � � det P ( ˆ ˆ � ⇒ � P − ˜ � P − ˆ | det P ( I , J ) | = max I , J ) P � C ≤ ( r +1) min P � 2 ˆ ˆ rank ˆ I , J P = r (NP-hard) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 16 / 38

  28. Cross Algorithm (construct low-rank factorisation from few entries) Interpolation arguments show: for some suitable index sets I , J ⊂ { 1 , . . . , n } with | I | = | J | = r , the cross decomposition − 1 ≈ also P (: , J ) P − 1 ( I , J ) P ( I , :) ≈ P Maxvol principle gives ‘best’ indices I , J [Goreinov, Tyrtyshnikov ’01] � � � � � det P ( ˆ ˆ � ⇒ � P − ˜ � P − ˆ | det P ( I , J ) | = max I , J ) P � C ≤ ( r +1) min P � 2 ˆ ˆ rank ˆ I , J P = r (NP-hard) But how can we find good sets I , J in practice ? And how can we generalise this to d > 2? R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 16 / 38

  29. Alternating Iteration (for cross approximation) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  30. Alternating Iteration (for cross approximation) j 1 j 2 j 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  31. Alternating Iteration (for cross approximation) j 1 j 2 j 3 i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  32. Alternating Iteration (for cross approximation) j 1 j 2 j 3 i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  33. Alternating Iteration (for cross approximation) j 1 j 2 j 3 i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  34. Alternating Iteration (for cross approximation) j 1 j 2 j 3 i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  35. Alternating Iteration (for cross approximation) Practically realizable j 1 j 2 j 3 strategy (with O (2 nr ) samples & O ( nr 2 ) flops) . For numerical stability use rank-revealing QR i 1 in practice. To adapt rank expand i 2 � � V → V Z ) (with residual Z ) Several similar i 3 algorithms exist: e.g. ACA [Bebendorf ’00] or EIM [Barrault et al ’04] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  36. Tensor Train (TT) Decomposition (Many Variables) Many variables: Matrix Product States/Tensor Train r k � π 1 α 1 ( i 1 ) · π 2 α 1 ,α 2 ( i 2 ) · π 3 α 2 ,α 3 ( i 3 ) · · · π d π ( i 1 . . . i d ) = α d − 1 ( i d ) α k =1 0 < k < d n n . . . r k − 1 r 1 r k [Wilson ’75] (comput. physics), [White ’93], [Verstraete ’04]; [Oseledets ’09] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 18 / 38

  37. Tensor Train (TT) Decomposition (Many Variables) Many variables: Matrix Product States/Tensor Train r k � π 1 α 1 ( i 1 ) · π 2 α 1 ,α 2 ( i 2 ) · π 3 α 2 ,α 3 ( i 3 ) · · · π d π ( i 1 . . . i d ) = α d − 1 ( i d ) α k =1 0 < k < d n n . . . r k − 1 r 1 r k [Wilson ’75] (comput. physics), [White ’93], [Verstraete ’04]; [Oseledets ’09] TT blocks π k are three-dimensional r k − 1 × n × r k tensors with TT ranks r 1 , . . . , r d − 1 ≤ r R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 18 / 38

  38. Tensor Train (TT) Decomposition (Many Variables) Many variables: Matrix Product States/Tensor Train r k � π 1 α 1 ( i 1 ) · π 2 α 1 ,α 2 ( i 2 ) · π 3 α 2 ,α 3 ( i 3 ) · · · π d π ( i 1 . . . i d ) = α d − 1 ( i d ) α k =1 0 < k < d n n . . . r k − 1 r 1 r k [Wilson ’75] (comput. physics), [White ’93], [Verstraete ’04]; [Oseledets ’09] TT blocks π k are three-dimensional r k − 1 × n × r k tensors with TT ranks r 1 , . . . , r d − 1 ≤ r Storage: O ( dnr 2 ) linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 18 / 38

  39. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k ( i k ) = π ( I k − 1 , i k , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  40. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k ( i k ) = π ( I k − 1 , i k , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  41. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  42. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  43. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) i 1 i 2 i 3 Set k → k + 1 and move to the next block. 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  44. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) i 1 i 2 i 3 Set k → k + 1 and move to the next block. 3 When k = d , switch direction and update index set J k − 1 . 4 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  45. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) i 1 i 2 i 3 Set k → k + 1 and move to the next block. 3 When k = d , switch direction and update index set J k − 1 . 4 Cost: O ( dnr 2 ) samples & O ( dnr 3 ) flops per iteration linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  46. Tensor Train (TT) Transport Maps (Summary & Comments) [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] Generic – not problem specific (‘black box’) Cross approximation: ‘sequential’ design along 1D lines π ( x 1 , . . . , x d ) = � | α |≤ r π 1 α ( x 1 ) . . . π d Separable product form: ˜ α ( x d ) Cheap construction/storage & low # model evals linear in d Cheap integration w.r.t. x linear in d Cheap samples via conditional distribution method linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 20 / 38

  47. Tensor Train (TT) Transport Maps (Summary & Comments) [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] Generic – not problem specific (‘black box’) Cross approximation: ‘sequential’ design along 1D lines π ( x 1 , . . . , x d ) = � | α |≤ r π 1 α ( x 1 ) . . . π d Separable product form: ˜ α ( x d ) Cheap construction/storage & low # model evals linear in d Cheap integration w.r.t. x linear in d Cheap samples via conditional distribution method linear in d Tuneable approximation error ε (by adapting ranks r ): = ⇒ cost & storage (poly)logarithmic in ε next slide R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 20 / 38

  48. Tensor Train (TT) Transport Maps (Summary & Comments) [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] Generic – not problem specific (‘black box’) Cross approximation: ‘sequential’ design along 1D lines π ( x 1 , . . . , x d ) = � | α |≤ r π 1 α ( x 1 ) . . . π d Separable product form: ˜ α ( x d ) Cheap construction/storage & low # model evals linear in d Cheap integration w.r.t. x linear in d Cheap samples via conditional distribution method linear in d Tuneable approximation error ε (by adapting ranks r ): = ⇒ cost & storage (poly)logarithmic in ε next slide Many known ways to use these samples for fast inference! (as proposals for MCMC, as control variates, importance weighting, . . . ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 20 / 38

  49. Theoretical Result [Rohrbach, Dolgov, Grasedyck, RS, 2020] For Gaussian distributions π ( x ) we have the following result: Let � � π : R d → R , − 1 2 x T Σ x x �→ exp � � and define Σ ( k ) Γ T Γ k ∈ R ( d − k ) × k . 11 Σ := k where Σ ( k ) Γ k 22 Theorem. Let Σ be SPD with λ min > 0. Suppose ρ := max k rank(Γ k ) and σ := max k , i σ ( k ) , where σ ( k ) are the singular values of Γ k . i i Then, for all ε > 0, there exists a TT-approximation ˜ π ε s.t. � π − ˜ π ε � L 2 ( R d ) ≤ ε � π � L 2 ( R d ) and the TT-ranks of ˜ π ε are bounded by �� � �� ρ � 1 + 7 σ 7 ρ d r ≤ log . (polylogarithmic growth) λ min ε R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 21 / 38

  50. How to use the TT-CD sampler to estimate E π Q ? Problem: We are sampling from approximate ˜ π = π + O ( ε ). R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 22 / 38

  51. How to use the TT-CD sampler to estimate E π Q ? Problem: We are sampling from approximate ˜ π = π + O ( ε ). Option 0: Classical variational inference Explicit integration (linear in d ) : get biased estimator E ˜ π Q ≈ E π Q R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 22 / 38

  52. How to use the TT-CD sampler to estimate E π Q ? Problem: We are sampling from approximate ˜ π = π + O ( ε ). Option 0: Classical variational inference Explicit integration (linear in d ) : get biased estimator E ˜ π Q ≈ E π Q Non-smooth Q : use Monte Carlo sampling, or better , QMC ‘seeds’ 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 → -0.5 -0.5 -0.5 -1 -1 -1 -1.5 -1.5 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 2D projection of 11D map (problem specification below!) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 22 / 38

  53. Sampling from exact π : Unbiased estimates of E π Q Option 1: Use { x i π } as (i.i.d.) proposals in Metropolis-Hastings ˜ � � 1 , π ( x i π ( x i − 1 π )˜ ) π ˜ Accept proposal x i π with probability α = min ˜ π ( x i − 1 π ( x i )˜ π ) π ˜ Can prove that rejection rate ∼ ε and IACT τ ∼ 1 + ε R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 23 / 38

  54. Sampling from exact π : Unbiased estimates of E π Q Option 1: Use { x i π } as (i.i.d.) proposals in Metropolis-Hastings ˜ � � 1 , π ( x i π ( x i − 1 π )˜ ) π ˜ Accept proposal x i π with probability α = min ˜ π ( x i − 1 π ( x i )˜ π ) π ˜ Can prove that rejection rate ∼ ε and IACT τ ∼ 1 + ε Option 2: Use ˜ π importance weighting with QMC quadrature N N � � π ) π ( x i π ( x i E π Q ≈ 1 1 π ) Z = 1 π ) ˜ ˜ Q ( x i with ˜ π ( x i π ( x i Z N ˜ π ) N ˜ π ) ˜ ˜ i =1 i =1 We can use an unbiased (randomised) QMC rule for both integrals. R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 23 / 38

  55. Sampling from exact π : Unbiased estimates of E π Q using TT approximation as preconditioner , importance weight or control variate Option 1: Use { x i π } as (i.i.d.) proposals in Metropolis-Hastings ˜ � � 1 , π ( x i π ( x i − 1 π )˜ ) π ˜ Accept proposal x i π with probability α = min ˜ π ( x i − 1 π ( x i )˜ π ) π ˜ Can prove that rejection rate ∼ ε and IACT τ ∼ 1 + ε Option 2: Use ˜ π importance weighting with QMC quadrature N N � � π ) π ( x i π ( x i E π Q ≈ 1 1 π ) Z = 1 π ) ˜ ˜ Q ( x i with ˜ π ( x i π ( x i Z N ˜ π ) N ˜ π ) ˜ ˜ i =1 i =1 We can use an unbiased (randomised) QMC rule for both integrals. Option 3: Use estimate w.r.t. ˜ π as control variate ( multilevel MCMC ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 23 / 38

  56. Numerical Example (Inverse Stationary Diffusion Problem) Model Problem (representative for 1 0.9 subsurface flow or structural mechanics) 0.8 0.7 ξ ∈ (0 , 1) 2 −∇ κ ( ξ , x ) ∇ u ( ξ , x ) = 0 0.6 0.5 0.4 u | ξ 1 =0 = 1 , u | ξ 1 =1 = 0 , 0.3 � � 0.2 � � ∂ u ∂ u ξ 2 =0 = 0 , ξ 2 =1 = 0 . � � 0.1 ∂ n ∂ n 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 � d Karhunen-Lo` eve expansion of log κ ( ξ , x ) = φ k ( ξ ) x k with prior k =1 d = 11, x k ∼ U [ − 1 , 1], � φ k � ∞ = O ( k − 3 2 ) [Eigel, Pfeffer, Schneider ’16] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 24 / 38

  57. Numerical Example (Inverse Stationary Diffusion Problem) Model Problem (representative for 1 0.9 subsurface flow or structural mechanics) 0.8 0.7 ξ ∈ (0 , 1) 2 −∇ κ ( ξ , x ) ∇ u ( ξ , x ) = 0 0.6 0.5 0.4 u | ξ 1 =0 = 1 , u | ξ 1 =1 = 0 , 0.3 � � 0.2 � � ∂ u ∂ u ξ 2 =0 = 0 , ξ 2 =1 = 0 . � � 0.1 ∂ n ∂ n 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 � d Karhunen-Lo` eve expansion of log κ ( ξ , x ) = φ k ( ξ ) x k with prior k =1 d = 11, x k ∼ U [ − 1 , 1], � φ k � ∞ = O ( k − 3 2 ) [Eigel, Pfeffer, Schneider ’16] Discretisation with bilinear FEs on uniform mesh with h = 1 / 64. R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 24 / 38

  58. Numerical Example (Inverse Stationary Diffusion Problem) Model Problem (representative for 1 1 0.9 0.9 subsurface flow or structural mechanics) 0.8 0.8 0.7 0.7 ξ ∈ (0 , 1) 2 −∇ κ ( ξ , x ) ∇ u ( ξ , x ) = 0 0.6 0.6 0.5 0.5 0.4 0.4 u | ξ 1 =0 = 1 , u | ξ 1 =1 = 0 , 0.3 0.3 � � 0.2 0.2 � � ∂ u ∂ u ξ 2 =0 = 0 , ξ 2 =1 = 0 . � � 0.1 0.1 ∂ n ∂ n 0 0 0 0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1 1 � d Karhunen-Lo` eve expansion of log κ ( ξ , x ) = φ k ( ξ ) x k with prior k =1 d = 11, x k ∼ U [ − 1 , 1], � φ k � ∞ = O ( k − 3 2 ) [Eigel, Pfeffer, Schneider ’16] Discretisation with bilinear FEs on uniform mesh with h = 1 / 64. Data: average pressure in 9 locations ( synthetic , i.e. for some ξ ∗ ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 24 / 38

  59. Numerical Example (Inverse Stationary Diffusion Problem) Model Problem (representative for 1 1 1 0.9 0.9 0.9 subsurface flow or structural mechanics) 0.8 0.8 0.8 0.7 0.7 0.7 ξ ∈ (0 , 1) 2 −∇ κ ( ξ , x ) ∇ u ( ξ , x ) = 0 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 u | ξ 1 =0 = 1 , u | ξ 1 =1 = 0 , 0.3 0.3 0.3 � � 0.2 0.2 0.2 � � ∂ u ∂ u ξ 2 =0 = 0 , ξ 2 =1 = 0 . � � 0.1 0.1 0.1 ∂ n ∂ n 0 0 0 0 0 0 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.4 0.5 0.5 0.5 0.6 0.6 0.6 0.7 0.7 0.7 0.8 0.8 0.8 0.9 0.9 0.9 1 1 1 � d Karhunen-Lo` eve expansion of log κ ( ξ , x ) = φ k ( ξ ) x k with prior k =1 d = 11, x k ∼ U [ − 1 , 1], � φ k � ∞ = O ( k − 3 2 ) [Eigel, Pfeffer, Schneider ’16] Discretisation with bilinear FEs on uniform mesh with h = 1 / 64. Data: average pressure in 9 locations ( synthetic , i.e. for some ξ ∗ ) QoI Q = h ( u ( · , x )): probability that flux exceeds 1.5 ( not smooth! ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 24 / 38

  60. Comparison against DRAM (for inverse diffusion problem) noise level σ 2 e = 0 . 01 TT-MH TT-qIW DRAM Relative error in P ( F > 1 . 5) 10 − 1 10 − 2 discr. error 10 − 3 10 1 10 2 10 3 10 4 CPU time TT-MH TT conditional distribution samples (iid) as proposals for MCMC TT-qIW TT surrogate for importance sampling with QMC DRAM Delayed Rejection Adaptive Metropolis [Haario et al, 2006] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 25 / 38

  61. Comparison against DRAM (for inverse diffusion problem) noise level σ 2 noise level σ 2 e = 0 . 01 e = 0 . 001 10 0 TT-MH TT-MH TT-qIW TT-qIW DRAM DRAM Relative error in P ( F > 1 . 5) relative error for P F > 1 . 5 10 − 1 10 − 1 10 − 2 10 − 2 discr. error 10 − 3 discr. error 10 − 3 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 CPU time CPU time TT-MH TT conditional distribution samples (iid) as proposals for MCMC TT-qIW TT surrogate for importance sampling with QMC DRAM Delayed Rejection Adaptive Metropolis [Haario et al, 2006] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 25 / 38

  62. Samples – Comparison TT-CD vs. DRAM TT-MH (i.i.d. seeds) DRAM 1.5 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1 -1 -0.5 -0.5 0 0 0.5 0.5 1 1 1.5 1.5 -1.5 -1.5 -1 -1 -0.5 -0.5 0 0 0.5 0.5 1 1 1.5 1.5 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 26 / 38

  63. Multilevel Markov Chain Monte Carlo [Dodwell, Ketelsen, RS, Teckentrup, 2015 & 2019], [Cui, Detommaso, RS, 2019] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 27 / 38

  64. Exploiting Model Hierarchy (same inverse diffusion problem) V ℓ X ℓ L 0 Here h ℓ = h 0 × 2 − ℓ and M ℓ ≈ M 0 × 2 2 ℓ R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 28 / 38

  65. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  66. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 Convergence of plain vanilla MC (mean square error) : � � 2 �� ˆ � 2 � V [ Q L ] Q MC − E [ Q ] = + E [ Q L − Q ] E L N � �� � � �� � � �� � =: MSE sampling error model error (“bias”) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  67. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 Convergence of plain vanilla MC (mean square error) : � � 2 �� ˆ � 2 � V [ Q L ] Q MC − E [ Q ] = + E [ Q L − Q ] E L N � �� � � �� � � �� � =: MSE sampling error model error (“bias”) Assuming | E [ Q ℓ − Q ] | = O (2 − αℓ ) and E [Cost ℓ ] = O (2 γℓ ), to get MSE = O ( ε 2 ), we need L ∼ log 2 ( ε − 1 ) α − 1 & N ∼ ε − 2 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  68. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 Convergence of plain vanilla MC (mean square error) : � � 2 �� ˆ � 2 � V [ Q L ] Q MC − E [ Q ] = + E [ Q L − Q ] E L N � �� � � �� � � �� � =: MSE sampling error model error (“bias”) Assuming | E [ Q ℓ − Q ] | = O (2 − αℓ ) and E [Cost ℓ ] = O (2 γℓ ), to get MSE = O ( ε 2 ), we need L ∼ log 2 ( ε − 1 ) α − 1 & N ∼ ε − 2 Monte Carlo Complexity Theorem � ε − 2 − γ/α � Cost( ˆ Q MC to obtain MSE = O ( ε 2 ) . ) = O ( NM L ) = O L R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  69. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 Convergence of plain vanilla MC (mean square error) : � � 2 �� ˆ � 2 � V [ Q L ] Q MC − E [ Q ] = + E [ Q L − Q ] E L N � �� � � �� � � �� � =: MSE sampling error model error (“bias”) Assuming | E [ Q ℓ − Q ] | = O (2 − αℓ ) and E [Cost ℓ ] = O (2 γℓ ), to get MSE = O ( ε 2 ), we need L ∼ log 2 ( ε − 1 ) α − 1 & N ∼ ε − 2 Monte Carlo Complexity Thm. (2D model problem w. AMG: α = 1, γ = 2) � ε − 2 − γ/α � Cost( ˆ Q MC to obtain MSE = O ( ε 2 ) . ) = O ( NM L ) = O L R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  70. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially L � Q L = Q 0 + Q ℓ − Q ℓ − 1 ℓ =1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  71. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  72. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 Define the following multilevel MC estimator for E [ Q ]: � L � := ˆ ˆ Q MLMC Q MC Y MC + where Y ℓ := Q ℓ − Q ℓ − 1 L 0 ℓ ℓ =1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  73. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 Define the following multilevel MC estimator for E [ Q ]: � L � := ˆ ˆ Q MLMC Q MC Y MC + where Y ℓ := Q ℓ − Q ℓ − 1 L 0 ℓ ℓ =1 Key Observation: (Variance Reduction! Corrections cheaper!) Level L : V [ Q L − Q L − 1 ] → 0 as L → ∞ ⇒ N L = O (1) (best case) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  74. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 Define the following multilevel MC estimator for E [ Q ]: � L � := ˆ ˆ Q MLMC Q MC Y MC + where Y ℓ := Q ℓ − Q ℓ − 1 L 0 ℓ ℓ =1 Key Observation: (Variance Reduction! Corrections cheaper!) Level L : V [ Q L − Q L − 1 ] → 0 as L → ∞ ⇒ N L = O (1) (best case) Level 0: N 0 ∼ N but Cost 0 = O ( M 0 ) = O (1) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  75. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 Define the following multilevel MC estimator for E [ Q ]: � L � := ˆ ˆ Q MLMC Q MC Y MC + where Y ℓ := Q ℓ − Q ℓ − 1 L 0 ℓ ℓ =1 Key Observation: (Variance Reduction! Corrections cheaper!) Level L : V [ Q L − Q L − 1 ] → 0 as L → ∞ ⇒ N L = O (1) (best case) . . . Level ℓ : N ℓ optimised to “balance” with cost on levels 0 and L . . . Level 0: N 0 ∼ N but Cost 0 = O ( M 0 ) = O (1) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  76. Complexity Theorem [Giles, ’07], [Cliffe, Giles, RS, Teckentrup, ’11] Assume approximation error O (2 − αℓ ), Cost/sample O (2 γℓ ) and V [ Q ℓ − Q ℓ − 1 ] = O (2 − βℓ ) (strong error/variance reduction) ℓ =0 to obtain MSE = O ( ε 2 ) with Then there exist L , { N ℓ } L � �� � 0 , γ − β Cost( � ε − 2 − max Q MLMC ) = O + possible log-factor α L � ˆ � L using dependent or independent estimators ˆ Q MC Y MC , and ℓ =1 . 0 ℓ R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 31 / 38

  77. Complexity Theorem [Giles, ’07], [Cliffe, Giles, RS, Teckentrup, ’11] Assume approximation error O (2 − αℓ ), Cost/sample O (2 γℓ ) and V [ Q ℓ − Q ℓ − 1 ] = O (2 − βℓ ) (strong error/variance reduction) ℓ =0 to obtain MSE = O ( ε 2 ) with Then there exist L , { N ℓ } L � �� � 0 , γ − β Cost( � ε − 2 − max Q MLMC ) = O + possible log-factor α L � ˆ � L using dependent or independent estimators ˆ Q MC Y MC , and ℓ =1 . 0 ℓ Running example (for smooth fctls. & AMG) : α ≈ 1, β ≈ 2, γ ≈ 2 � α ) � ε − max ( 2 , γ Cost( � = O (max( N 0 , M L )) ≈ O ( ε − 2 ) Q MLMC ) = O L R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 31 / 38

  78. Complexity Theorem [Giles, ’07], [Cliffe, Giles, RS, Teckentrup, ’11] Assume approximation error O (2 − αℓ ), Cost/sample O (2 γℓ ) and V [ Q ℓ − Q ℓ − 1 ] = O (2 − βℓ ) (strong error/variance reduction) ℓ =0 to obtain MSE = O ( ε 2 ) with Then there exist L , { N ℓ } L � �� � 0 , γ − β Cost( � ε − 2 − max Q MLMC ) = O + possible log-factor α L � ˆ � L using dependent or independent estimators ˆ Q MC Y MC , and ℓ =1 . 0 ℓ Running example (for smooth fctls. & AMG) : α ≈ 1, β ≈ 2, γ ≈ 2 � α ) � ε − max ( 2 , γ Cost( � = O (max( N 0 , M L )) ≈ O ( ε − 2 ) Q MLMC ) = O L Optimality: Asymptotic cost of one deterministic solve (to tol= ε ) ! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 31 / 38

  79. Numerical Example (Multilevel MC) Running example with Q = � u � L 2 ( D ) 8 ; lognormal diffusion coeff. w. exponential covariance ( σ 2 = 1, λ = 0 . 3) h 0 = 1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 32 / 38

  80. Inverse Problem: Multilevel Markov Chain Monte Carlo Posterior distribution for PDE model problem (Bayes): π ℓ ( x ℓ | y obs ) � exp( −� y obs − F ℓ ( x ℓ ) � 2 Σ obs ) π prior ( x ℓ ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 33 / 38

  81. Inverse Problem: Multilevel Markov Chain Monte Carlo Posterior distribution for PDE model problem (Bayes): π ℓ ( x ℓ | y obs ) � exp( −� y obs − F ℓ ( x ℓ ) � 2 Σ obs ) π prior ( x ℓ ) What were the key ingredients of “standard” multilevel Monte Carlo? R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 33 / 38

  82. Inverse Problem: Multilevel Markov Chain Monte Carlo Posterior distribution for PDE model problem (Bayes): π ℓ ( x ℓ | y obs ) � exp( −� y obs − F ℓ ( x ℓ ) � 2 Σ obs ) π prior ( x ℓ ) What were the key ingredients of “standard” multilevel Monte Carlo? Telescoping sum: E [ Q L ] = E [ Q 0 ] + � L ℓ =1 E [ Q ℓ − Q ℓ − 1 ] Models on coarser levels much cheaper to solve ( M 0 ≪ M L ). V [ Q ℓ − Q ℓ − 1 ] ℓ !∞ − → 0 as = ⇒ much fewer samples on finer levels. R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 33 / 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend