Bayesian Multi-Fidelity Optimization under Uncertainty Phaedon S. - - PowerPoint PPT Presentation

bayesian multi fidelity optimization under uncertainty
SMART_READER_LITE
LIVE PREVIEW

Bayesian Multi-Fidelity Optimization under Uncertainty Phaedon S. - - PowerPoint PPT Presentation

Bayesian Multi-Fidelity Optimization under Uncertainty Phaedon S. Koutsourelakis Maximilian Koschade p.s.koutsourelakis@tum.de maximilian.koschade@tum.de Continuum Mechanics Group Department of Mechanical Engineering Technical University of


slide-1
SLIDE 1

Bayesian Multi-Fidelity Optimization under Uncertainty

Maximilian Koschade maximilian.koschade@tum.de Phaedon S. Koutsourelakis p.s.koutsourelakis@tum.de

Continuum Mechanics Group Department of Mechanical Engineering Technical University of Munich, Germany

SIAM Computational Science and Engineering, Atlanta, 2017

1

slide-2
SLIDE 2

Optimization under Uncertainty

Example: Material property as random field λ (x)

  • z

: design variables (topology or shape)

  • θ ∼ pθ (θ) : stochastic influences, e.g.
  • material : discretized random field λ (x)
  • temperature / load : stochastic process
  • manufacturing tolerances : distributed around nominal value

Figure 1: Cross section view of stiffening rib

Introducing uncertainty to optimization problems In many engineering applications deterministic optimization is a simplification neglecting aleatory and epistemic uncertainty.

2

slide-3
SLIDE 3

Optimization under Uncertainty - Objective Function

Maximize the expected utility z∗ = arg max

z

V (z) = arg max

z

  • U (z, θ) pθ (θ) dθ
  • z

: design variables

  • θ ∼ pθ (θ) : stochastic influences on the system

Example: minimize probability of failure U (z, θ) = 1A (z, θ) (where A the event of non-failure) Example: design goal utarget U (z, θ) = exp

  • − 1

2τQ (u (z, θ) − utarget)2

τQ : penalty parameter enforcing the design goal

3

slide-4
SLIDE 4

Probabilistic Formulation of Optimization under Uncertainty

Reformulation as Probabilistic Inference1 Solution is given by an auxiliary posterior distribution2 π (z, θ) V (z) ∝

  • π (z, θ)

posterior

dθ ∝

  • U (z, θ)

likelihood

pθ (θ)

prior

dθ since the marginal π (z) ∝ V (z), given a flat prior pz (z). Conducive to consistent incorporation of epistemic uncertainty due to approximate, lower-fidelity solvers!

1Mueller (2005) 2This approach should NOT be confused with Bayesian optimization

4

slide-5
SLIDE 5

Probabilistic Formulation of Optimization under Uncertainty

Reformulation as Probabilistic Inference1 Solution is given by an auxiliary posterior distribution2 π (z, θ) V (z) ∝

  • π (z, θ)

posterior

dθ ∝

  • U (z, θ)

likelihood

pθ (θ)

prior

  • pz (z)

flat prior

since the marginal π (z) ∝ V (z), given a flat prior pz (z). Conducive to consistent incorporation of epistemic uncertainty due to approximate, lower-fidelity solvers!

1Mueller (2005) 2This approach should NOT be confused with Bayesian optimization

4

slide-6
SLIDE 6

Example: Stochastic Poisson Equation

x2 x1 z (x2) ∇ · (−λ (x) ∇u (x)) = 0 dim (z) = 21 dim (θ) = 800 Solution Target x2 u (x2) u (x2) z : Control heat influx θ : Log-Normal conductivity field θ(2) θ(1) x2

5

slide-7
SLIDE 7

Solution via rank-1-perturbed Gaussian q∗

−200 −150 −100 −50 50 100 150 200 0.2 0.4 0.6 0.8 1 heat fmux g(x2) x2 z∗ Sensitivity

Figure 2: Black-box stochastic variational inference in dimension 821 (dim (θ) = 800,dim (z) = 21) (Hoffman et al., 2013; Ranganath et al., 2013)

6

slide-8
SLIDE 8

Solution via rank-1-perturbed Gaussian q∗

−200 −150 −100 −50 50 100 150 200 0.2 0.4 0.6 0.8 1 heat fmux g(x2) x2 z∗ Sensitivity

Cost : O

  • 103

forward evaluations

6

slide-9
SLIDE 9
  • high dimension
  • expensive numerical model

⇒ probabilistic inference can quickly become prohibitive. How can we address this issue?

6

slide-10
SLIDE 10

Introduction of approximate solvers

If we denote a = log U and y = [z, θ]T we can rewrite π (y) πa (y) ∝ U (y) py (y) = exp (a (y)) py (y) =

  • exp (a) δ (a − log U (y)) py (y) da

=

  • exp (a) p (a|y) py (y) da

Approximate solvers = Epistemic uncertainty

  • As long as p (a|y) is a Dirac, we recover posterior perfectly
  • Introduction of cheap, approximate solvers leads to dispersion
  • f p (a|y) and irrevocable loss of information regarding y
  • We can consistently incorporate this epistemic

uncertainty in the Bayesian framework

7

slide-11
SLIDE 11

Introduction of approximate solvers

If we denote a = log U and y = [z, θ]T we can rewrite π (y) πa (y) ∝ U (y) py (y) = exp (a (y)) py (y) =

  • exp (a) δ (a − log U (y)) py (y) da

=

  • exp (a) p (a|y) py (y) da

Regression Model We may learn p (a|y) from e.g. a Bayesian regression model or a Gaussian process GP a = φ (y)T w + ǫ This approach is impractical for a high-dimensional probability space y = [z, θ]T !

7

slide-12
SLIDE 12

Introduction of approximate solvers

If we denote a = log U and y = [z, θ]T we can rewrite π (y) πa (y) ∝ U (y) py (y) = exp (a (y)) py (y) =

  • exp (a) δ (a − log U (y)) py (y) da

=

  • exp (a) p (a|y) py (y) da

Suppose instead we introduce a low-fidelity log-likelihood A p (a|y) =

  • p (a, A|y) dA =
  • p (a|A, y) p (A|y) dA

  • p (a|A) δ (A − log ULowFi) dA := pA (a|y)

⇒ πA (y) ∝

  • exp (a) pA (a|y) py (y) da

7

slide-13
SLIDE 13

Learning p (a|y) : Probabilistic multi-fidelity approach3

Introduce low-fidelity log-likelihood A pA (a|y) ≈

  • p (a|A) δ (A − log ULowFi. (y)) dA
  • 106
  • 104
  • 102
  • 100
  • 106
  • 104
  • 102
  • 100

Low-Fidelity A High-Fidelity a

  • Pred. density pA (a|A)
  • belief of high-fidelity a

given low-fidelity A

  • learn from a limited set
  • f forward solver

evaluations D

  • D = {a (yn) , A (yn)}N

n=1 8

slide-14
SLIDE 14

Learning p (a|y) : Probabilistic multi-fidelity approach3

Introduce low-fidelity log-likelihood A pA (a|y) ≈

  • p (a|A) δ (A − log ULowFi. (y)) dA
  • 106
  • 104
  • 102
  • 100
  • 106
  • 104
  • 102
  • 100

Low-Fidelity A High-Fidelity a

  • Pred. density pA (a|A, D)
  • belief of high-fidelity a

given low-fidelity A

  • learn from a limited set
  • f forward solver

evaluations D

  • D = {a (yn) , A (yn)}N

n=1 8

slide-15
SLIDE 15

Learning p (a|y) : Probabilistic multi-fidelity approach3

Introduce low-fidelity log-likelihood A pA (a|y) ≈

  • p (a|A) δ (A − log ULowFi. (y)) dA
  • 106
  • 104
  • 102
  • 100
  • 106
  • 104
  • 102
  • 100

Low-Fidelity A High-Fidelity a Learn pA (a|A, D)

  • Learn predictive density
  • Using e.g. variational

relevance vector machine (VRVM) or Variational Heteroscedastic Gaussian Process (VHGP)

  • D = {a (yn) , A (yn)}N

n=1 8

slide-16
SLIDE 16

Learning p (a|y) : Probabilistic multi-fidelity approach3

Introduce low-fidelity log-likelihood A pA (a|y, D) ≈

  • p (a|A, D) δ (A − log ULowFi. (y)) dA
  • 106
  • 104
  • 102
  • 100
  • 106
  • 104
  • 102
  • 100

Low-Fidelity A High-Fidelity a

  • Pred. density pA (a|A, D)
  • belief of high-fidelity a

given low-fidelity A

  • learn from a limited set
  • f forward solver

evaluations D

  • D = {a (yn) , A (yn)}N

n=1 8

slide-17
SLIDE 17

Extended Probability Space - Illustration

δ (a − log U (y)) πa(y) 1 y a πA (a, y)

9

slide-18
SLIDE 18

Extended Probability Space - Illustration

πa(y) πA(y) pA (a|y) 1 2 Increase of epistemic uncertainty y a πA (a, y)

9

slide-19
SLIDE 19

Extended Probability Space - Illustration

πa(y) πA(y) pA (a|y) 1 2 3 Increase of epistemic uncertainty y a πA (a, y)

9

slide-20
SLIDE 20

Multi-Fidelity posterior πA (y)

Approximate πA (y) If predictive density p (a|y) is given by a Gaussian N

  • a
  • µ (A (y)) , σ2 (A (y))
  • , then we obtain

log πA (y) = µ (A (y)) + 1 2 σ2 (A (y)) + log py (y) Place probability mass on y associated with

10

slide-21
SLIDE 21

Multi-Fidelity posterior πA (y)

Approximate πA (y) If predictive density p (a|y) is given by a Gaussian N

  • a
  • µ (A (y)) , σ2 (A (y))
  • , then we obtain

log πA (y) = µ (A (y)) A + 1 2 σ2 (A (y)) + log py (y) Place probability mass on y associated with (A): high predictive mean µ (y)

10

slide-22
SLIDE 22

Multi-Fidelity posterior πA (y)

Approximate πA (y) If predictive density p (a|y) is given by a Gaussian N

  • a
  • µ (A (y)) , σ2 (A (y))
  • , then we obtain

log πA (y) = µ (A (y)) A + 1 2 σ2 (A (y)) B + log py (y) Place probability mass on y associated with (A): high predictive mean µ (y) (B): large epistemic uncertainty σ2 (y)

10

slide-23
SLIDE 23

Example: Stochastic Poisson Equation

x2 x1 z (x2) ∇ · (−λ (x) ∇u (x)) = 0 dim (z) = 1 dim (θ) = 256 Solution Target x2 u (x2) u (x2) z : Control heat influx θ : Log-Normal conductivity field θ(2) θ(1) x2

11

slide-24
SLIDE 24

Effect of lower-fidelity solvers4

20 30 40 50 60 70 80 90 100 5 10 15 design variable z density estimate πA (z|D) z∗|Reference (32x32) πA (z|D) (8x8)

Figure 2: dim (θ) = 256, speedup S4×4 ≈ 2, 000 , N = 200 training data samples, density estimate obtained using MALA

4here the low-fidelity solvers are simply coarser discretizations of the stochastic

Poisson equation

12

slide-25
SLIDE 25

Effect of lower-fidelity solvers4

20 30 40 50 60 70 80 90 100 5 10 15 design variable z density estimate πA (z|D) z∗|Reference (32x32) πA (z|D) (8x8) πA (z|D) (4x4)

Figure 2: dim (θ) = 256, speedup S4×4 ≈ 2, 000 , N = 200 training data samples, density estimate obtained using MALA

4here the low-fidelity solvers are simply coarser discretizations of the stochastic

Poisson equation

12

slide-26
SLIDE 26

Effect of training data D

35 40 45 50 55 60 65 5 10 15 20 design variable z density estimate πA (z|D) z∗|Reference 10 data points

Figure 3: The data restricted likelihood becomes more confident due to the reduction of epistemic uncertainty by additional training samples. (dim (θ) = 256, averaged for 100 random sub-samplings of the data)

.

13

slide-27
SLIDE 27

Effect of training data D

35 40 45 50 55 60 65 5 10 15 20 design variable z density estimate πA (z|D) z∗|Reference 10 data points 50 data points

Figure 3: The data restricted likelihood becomes more confident due to the reduction of epistemic uncertainty by additional training samples. (dim (θ) = 256, averaged for 100 random sub-samplings of the data)

.

13

slide-28
SLIDE 28

Effect of training data D

35 40 45 50 55 60 65 5 10 15 20 design variable z density estimate πA (z|D) z∗|Reference 10 data points 50 data points 500 data points

Figure 3: The data restricted likelihood becomes more confident due to the reduction of epistemic uncertainty by additional training samples. (dim (θ) = 256, averaged for 100 random sub-samplings of the data)

.

13

slide-29
SLIDE 29

Review

Summary:

  • Optimization under uncertainty can be reformulated as

Bayesian inference

  • Allows consistent incorporation of epistemic uncertainty

introduced by cheaper, approximate (probabilistic) models

  • Inevitable loss of information, but me way obtain multi-fidelity

posterior which contains the optimal design z∗ (MAP)

  • Approach applicable to any problem of Bayesian inference

Outlook:

  • introduction of multiple predictors A(p)
  • adaptive enrichment of training data D
  • more flexible approach to learn p (a|A)

14

slide-30
SLIDE 30

Review

Summary:

  • Optimization under uncertainty can be reformulated as

Bayesian inference

  • Allows consistent incorporation of epistemic uncertainty

introduced by cheaper, approximate (probabilistic) models

  • Inevitable loss of information, but me way obtain multi-fidelity

posterior which contains the optimal design z∗ (MAP)

  • Approach applicable to any problem of Bayesian inference

Outlook:

  • introduction of multiple predictors A(p)
  • adaptive enrichment of training data D
  • more flexible approach to learn p (a|A)

14

slide-31
SLIDE 31

Review

Summary:

  • Optimization under uncertainty can be reformulated as

Bayesian inference

  • Allows consistent incorporation of epistemic uncertainty

introduced by cheaper, approximate (probabilistic) models

  • Inevitable loss of information, but me way obtain multi-fidelity

posterior which contains the optimal design z∗ (MAP)

  • Approach applicable to any problem of Bayesian inference

Outlook:

  • introduction of multiple predictors A(p)
  • adaptive enrichment of training data D
  • more flexible approach to learn p (a|A)

14

slide-32
SLIDE 32

Review

Summary:

  • Optimization under uncertainty can be reformulated as

Bayesian inference

  • Allows consistent incorporation of epistemic uncertainty

introduced by cheaper, approximate (probabilistic) models

  • Inevitable loss of information, but me way obtain multi-fidelity

posterior which contains the optimal design z∗ (MAP)

  • Approach applicable to any problem of Bayesian inference

Outlook:

  • introduction of multiple predictors A(p)
  • adaptive enrichment of training data D
  • more flexible approach to learn p (a|A)

14

slide-33
SLIDE 33

References

Hoffman, M. D., Blei, D. M., Wang, C., and Paisley, J. (2013). Stochastic variational

  • inference. J. Mach. Learn. Res., 14(1):1303–1347.

Mueller, P. (2005). Simulation based optimal design. In Dey, D. and Rao, C., editors, Bayesian Thinking Modeling and Computation, volume 25 of Handbook of Statistics, pages 509 – 518. Elsevier. Ng, L. W. and Willcox, K. E. (2014). Multifidelity approaches for optimization under

  • uncertainty. International Journal for Numerical Methods in Engineering,

100(10):746–772. Perdikaris, P., Venturi, D., Royset, J., and Karniadakis, G. (2015). Multi-fidelity modelling via recursive co-kriging and gaussian–markov random fields. In Proc. R.

  • Soc. A, volume 471, page 20150018. The Royal Society.

Ranganath, R., Gerrish, S., and Blei, D. M. (2013). Black Box Variational Inference. Aistats, 33. 15

slide-34
SLIDE 34

Addendum (1): generate Training Data D = {a (yn) , A (yn)}N

n=1

Batch: Generate D such that equal numbers of A (yn) fall in Ml =

  • y
  • π(l)

c

≤ πc (y) ≤ π(l+1)

c

  • l = 1, ..., L
  • π(l+1)

c

= const · π(l)

c

  • πc : posterior defined low-fidelity

solver. Adaptive Refinement: Use π (y|D) as acquisition function, corresponding to large predictive mean and epistemic uncertainty π(1)

c

π(2)

c

π(3)

c

π(4)

c

Figure 4: Iso-probability lines of the coarse posterior πc (y)

16

slide-35
SLIDE 35

Addendum (2): is it possible to exclude the optimal design z∗ (MAP)?

This approach will, if executed correctly, never put zero probability mass on the MAP z∗ or any other value deemed probable under the high fidelity posterior. Potential Errors:

  • Generated D does not sufficiently encapsulate p (a|A)
  • Regression model is not flexible enough to learn p (a|A, D)

correctly

  • Approximation of intractable posterior πA (y|D) using e.g.

VB, MCMC or SMC.

17