Data-driven time parallelism and model reduction Kevin Carlberg 1 , - - PowerPoint PPT Presentation

data driven time parallelism and model reduction
SMART_READER_LITE
LIVE PREVIEW

Data-driven time parallelism and model reduction Kevin Carlberg 1 , - - PowerPoint PPT Presentation

Data-driven time parallelism and model reduction Kevin Carlberg 1 , Lukas Brencher 2 , Bernard Haasdonk 2 , Andrea Barth 2 Sandia National Laboratories 1 University of Stuttgart 2 SIAM Conference on UQ April 7, 2016 Data-driven time parallelism


slide-1
SLIDE 1

Data-driven time parallelism and model reduction

Kevin Carlberg1, Lukas Brencher2, Bernard Haasdonk2, Andrea Barth2

Sandia National Laboratories1 University of Stuttgart2

SIAM Conference on UQ April 7, 2016

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 1 / 27

slide-2
SLIDE 2

Model reduction and UQ at Sandia

CFD model

100 million cells 200,000 time steps

High simulation costs

6 weeks, 5000 cores 6 runs maxes out Cielo

Barrier ‘In the field’ Bayesian inference Fast-turnaround stochastic optimization

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 2 / 27

slide-3
SLIDE 3

Cavity-flow problem

Unsteady Navier–Stokes DES turbulence model 1.2 million degrees of freedom Re = 6.3 × 106 M∞ = 0.6 CFD code: AERO-F

[Farhat et al., 2003]

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 3 / 27

slide-4
SLIDE 4

GNAT model [C et al., 2011, C et al., 2013]

Sample mesh: 4.1% nodes, 3.0% cells + Small problem size: can run on many fewer cores

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 4 / 27

slide-5
SLIDE 5

GNAT performance

vorticity field pressure field GNAT ROM FOM FOM: 5 hour x 48 CPU GNAT ROM: 32 min x 2 CPU. + 229x CPU-hour savings. Good for many query.

  • 9.4x walltime savings. Bad for real time.

Why?

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 5 / 27

slide-6
SLIDE 6

GNAT: strong scaling (Ahmed body) [C, 2011]

CPU (CPU × TFOM)/(CPU × TROM)

2 4 6 8 10 12 14 16 200 250 300 350 400 450

(a) CPU-hour savings CPU TFOM/TROM

2 4 6 8 10 12 14 16 2 4 6 8 10 12 14

(b) Walltime savings

+ Significant CPU-hour savings (max: 438 for 4 CPU)

  • Modest walltime savings (max: 7 for 12 CPU)

Spatial parallelism is quickly saturated!

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 6 / 27

slide-7
SLIDE 7

Time-parallel algorithms [Lions et al., 2001a, Farhat and Chandesris, 2003]

Goal: expose more parallelism to reduce walltime

T0 T1 T2 T ¯

M−1

T ¯

M

H h t0 t1t2 tM

Fine propagator: time step h F(x; τ1, τ2) Coarse propagator: time step H G(x; τ1, τ2) Parareal iteration k (sequential and parallel steps): xm+1

k+1 = G(xm k+1; Tm, Tm+1) + F(xm k ; Tm, Tm+1) − G(xm k ; Tm, Tm+1)

Interpretations [Gander and Vandewalle, 2007, Falgout et al., 2014]:

Deferred/residual-correction scheme B(xk+1) = B(xk) − A(xk) Multiple shooting method with FD Jacobian approximation Two-level multigrid

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 7 / 27

slide-8
SLIDE 8

Parareal: sequential and parallel steps [Lions et al., 2001a]

time step state variable

10 20 30 40 50 60 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7

xm+1 = G(xm

0 ; Tm, Tm+1)

time step state variable

10 20 30 40 50 60 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7

F(xm

0 ; Tm, Tm+1)

time step state variable

10 20 30 40 50 60 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7

xm+1

1

=F(xm

0 ;Tm,Tm+1)

+G(xm

1 ;Tm,Tm+1)−G(xm 0 ;Tm,Tm+1)

time step state variable

10 20 30 40 50 60 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7

F(xm

1 ; Tm, Tm+1)

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 8 / 27

slide-9
SLIDE 9

Coarse propagator

Critical: coarse propagator should be fast, accurate, stable Existing coarse propagators

Same integrator [Lions et al., 2001b, Bal and Maday, 2002] Coarse spatial discretization

[Fischer et al., 2005, Farhat et al., 2006, Cortial and Farhat, 2009]

Simplified physics model

[Baffico et al., 2002, Maday and Turinici, 2003, Blouza et al., 2011, Engblom, 2009, Maday, 2007]

Relaxed solver tolerance [Guibert and Tromeur-Dervout, 2007] Reduced-order model (on the fly) [Farhat et al., 2006,

Cortial and Farhat, 2009, Ruprecht and Krause, 2012, Chen et al., 2014]

ROM context: can we leverage offline data to improve the coarse propagator?

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 9 / 27

slide-10
SLIDE 10

Model reduction

full-order model (FOM) ˙ x(t, µ) = f (x; t, µ), x(0, µ) = x0(µ) Offline: snapshot collection X i := [x(0, µi) · · · x(tM, µi)] ∈ RN×M

  • X 1 · · · X ntrain
  • = UΣV T

Online: projection

trial subspace Φ =

  • u1 · · · u ˆ

N

  • ∈ RN× ˆ

N

x ≈ ˜ x(t, µ) = Φˆ x(t, µ) test subspace Ψ ∈ RN× ˆ

N

Ψ = Φ: Galerkin Ψ = (αoI − δtβ0 ∂f

∂x )Φ: LSPG

[C et al., 2015a]

˙ ˆ x(t, µ) = (ΨTΦ)−1ΨTf (Φˆ x; t, µ), ˆ x(0, µ) = ΦTx0(µ)

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 10 / 27

slide-11
SLIDE 11

Revisit the SVD

X1 X2 X3 = U Σ VT [ ]

n

M

time step ˆ x1

First row of V T jth row of V T contains a basis for time evolution of ˆ xj Construct Ξj: global time-evolution basis for ˆ xj Ξj :=

  • ξ1

j · · · ξntrain j

  • ,

ξi

j := [vM(i−1)+1,j · · · vMi,j]T

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 11 / 27

slide-12
SLIDE 12

First attempt [C et al., 2015b]

1 compute global forecast by gappy POD in time domain:

n

M

time step ˆ x1

M

ˆ x1 so far; memory α = 4; forecast; temporal basis

zj = arg min

z∈Raj Z(m − 1)Ξjz − Z(m − 1)g(ˆ

xj)2 Time sampling: Z(k) :=

  • ek−β · · · ek

T Time unrolling: g(ˆ xj) : ˆ xj → [ˆ xj(t0) · · · ˆ xj(tM)]T

2 use eT mΞjzj as initial guess for ˆ

xj(tm) in Newton solver

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 12 / 27

slide-13
SLIDE 13

First attempt: structural dynamics [C et al., 2015b]

memory α memory α speedup improvement Newton-it reduction + Newton iterations reduced by up to ∼2x + Speedup improved by up to ∼1.5x + No accuracy loss + Applicable to any nonlinear ROM

  • Insufficient for real-time computation

Can we apply the same idea for the coarse propagator?

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 13 / 27

slide-14
SLIDE 14

Coarse propagator via local forecasting

Offline: Construct local time-evolution basis Ξm

j

n

M

time step ˆ x1 Ξ1

1

Ξ2

1

Ξ4

1

Ξ3

1

Ξ5

1

Online: Coarse propagator Gm

j

defined via forecasting:

1 Compute α time steps with fine propagator 2 Compute local forecast via gappy POD 3 Select last timestep of local forecast

Gm

j : (ˆ

xj; Tm, Tm+1) → eT

H/hΞm j

  • Z(α + 1)Ξm

j

+    F(ˆ xj; Tm, Tm + h) . . . F(ˆ xj; Tm, Tm + hα)   

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 14 / 27

slide-15
SLIDE 15

Initial seed

xm+1

k+1 = G(xm k+1; Tm, Tm+1) + F(xm k ; Tm, Tm+1) − G(xm k ; Tm, Tm+1)

How to compute initial seed xm

0 , m = 0, ... , ¯

M?

1 Typical time integrator 2 Local forecast 3 Global forecast

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 15 / 27

slide-16
SLIDE 16

Ideal-conditions speedup

Theorem If g(ˆ xj) ∈ range(Ξj), j = 1, ... , ˆ N, then the proposed method converges in one parareal iteration and realizes a theoretical speedup of ¯ M ¯ M( ¯ M − 1)α/M + 1.

α=1 α=2 α=4 α=8 α=12

processors ¯ M speedup

5 10 15 20 25 30 35 5 10 15 20 25 30 35

Ideal-conditions speedup for M = 5000

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 16 / 27

slide-17
SLIDE 17

Ideal-conditions speedup with initial guesses

Corollary If f is nonlinear, g(ˆ xj) ∈ range(Ξj), j = 1, ... , ˆ N, and the forecasting method also provides Newton-solver initial guesses, then

1 the method converges in one parareal iteration, and 2 only α nonlinear systems of algebraic equations are solved in

each time interval. The method then realizes a theoretical speedup of M ( ¯ Mα) + (M/ ¯ M − α)τr relative to the sequential algorithm without forecasting. Here, τr = residual computation time nonlinear-system solution time.

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 17 / 27

slide-18
SLIDE 18

Ideal-conditions speedup with initial-guesses

α=1 α=2 α=4 α=8 α=12

processors ¯ M speedup

5 10 15 20 25 30 35 20 40 60 80 100 120

Ideal-condition speedup for M = 5000, τr = 1/10

Significant speedups possible by leveraging time-domain data!

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 18 / 27

slide-19
SLIDE 19

Stability

Theorem If the fine propagator is stable, i.e., F(x; τ, τ + H) ≤ (1 + CFH)x, ∀0 ≤ τ ≤ τ + H then the proposed method is also stable, i.e., ˆ xm

k+1 ≤ Cm exp(CFmH)ˆ

x0. Cm := m

k=1

k

m

  • βkγmαk(H/h)m−k

βk := exp(−CFk(H − hα)) ≤ 1 γ := max(maxm,j 1/Z(α + 1)Ξm

j , 1/σmin(Z(α + 1)Ξm j ))

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 19 / 27

slide-20
SLIDE 20

Example: inviscid Burgers equation [Rewienski, 2003]

∂u(x, τ) ∂τ + 1 2 ∂

  • u2 (x, τ)
  • ∂x

= 0.02eµ2x u(0, τ) = µ1, ∀τ ∈ [0, 25] u(x, 0) = 1, ∀x ∈ [0, 100] , Discretization: Godunov’s scheme (µ1, µ2) ∈ [2.5, 3.5] × [0.02, 0.03] h = 0.1, M = 250 fine time steps FOM: N = 500 degrees of freedom ROM: LSPG [C et al., 2011], POD basis dimension ˆ N = 100 ntrain = 4 training points (LHS sampling); random online point 2 coarse propagators: Backward Euler and local forecast 3 initial seeds: Backward Euler, local forecast, global forecast

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 20 / 27

slide-21
SLIDE 21

Global temporal bases

time step 50 100 150 200 250

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

time*step basis%vector*value

(a) coordinate 1

time step 50 100 150 200 250 basis-vector value

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15

(b) coordinate 5

time step 50 100 150 200 250 basis-vector value

  • 0.2
  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15

time*step basis%vector*value

(c) coordinate 10

time step 50 100 150 200 250 basis-vector value

  • 0.3
  • 0.2
  • 0.1

0.1 0.2 0.3 0.4

time*step basis%vector*value

(d) coordinate 100

Higher-index generalized coordinates not ‘forecastable’

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 21 / 27

slide-22
SLIDE 22

Forecasting ‘high-frequency’ coordinates is dangerous

time-parallel iteration 1 2 3 4 5 6 7 8 9 time-parallel error 10-10 10-5 100 105 1010 1015

forecast 1 forecast 5 forecast 10 forecast 15 forecast 20 forecast 100

time%parallel*iteration time%parallel*error

Proceed by forecasting the first 10 coordinates

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 22 / 27

slide-23
SLIDE 23

Comparison: Initial seed and coarse propagator

time-parallel iteration 1 2 3 4 5 6 7 8 9 time-parallel error 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100

Seed: backward Euler, Propagator: backward Euler Seed: backward Euler, Propagator: local forecast Seed: local forecast, Propagator: backward Euler Seed: local forecast, Propagator: local forecast Seed: global forecast, Propagator: backward Euler Seed: global forecast, Propagator: local forecast

time%parallel*iteration time%parallel*error

Initial seed:

+ best performance: global forecast

  • worst performance: local forecast (error accumulation)

Coarse propagator:

+ local forecast outperforms backward Euler

Forecasting improves improves initial seed and coarse propagator!

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 23 / 27

slide-24
SLIDE 24

time step

50 100 150 200 250

state variable 100

1 1.2 1.4 1.6 1.8 2

(e) Seed: Euler, Prop: Euler

time step

50 100 150 200 250

state variable 100

1 1.2 1.4 1.6 1.8 2

(f) Seed: Euler, Prop: local forecast

time step

50 100 150 200 250

state variable 100

1 1.2 1.4 1.6 1.8 2

(g) Seed: glob forecast, Prop: Euler

time step

50 100 150 200 250

state variable 100

1 1.2 1.4 1.6 1.8 2

(h) Seed: glob forecast, Prop: loc fore-

cast

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 24 / 27

slide-25
SLIDE 25

Parareal performance

number of processors

5 10 15 20 25

number of parareal iterations

5 10 15 20 25 backward Euler forecasting worst case

+ Forecasting: minimum possible iterations

  • Backward Euler: often close to worst-case performance

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 25 / 27

slide-26
SLIDE 26

Conclusions

Use temporal data to reduce ROM simulation time

  • ffline: time-evolution bases from right singular vectors
  • nline:

1 global forecast as initial seed 2 local forecast as coarse propagator

+ theory: excellent speedup and stability + ideal parareal performance observed + significant improvement over Backward Euler + no additional error introduced References:

  • K. Carlberg, L. Brencher, B. Haasdonk, and A. Barth.

“Data-driven time parallelism with application to reduced-order models,” in preparation.

  • K. Carlberg, J. Ray, and B. van Bloemen Waanders.

“Decreasing the temporal complexity for nonlinear, implicit reduced-order models by forecasting,” CMAME, Vol. 289, p. 79–103 (2015).

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 26 / 27

slide-27
SLIDE 27

Questions?

n

M

time step ˆ x1 Ξ1

1

Ξ2

1

Ξ4

1

Ξ3

1

Ξ5

1 time step 50 100 150 200 250 state variable 100 1 1.2 1.4 1.6 1.8 2

Backward Euler

time step

50 100 150 200 250

state variable 100

1 1.2 1.4 1.6 1.8 2

Forecasting

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 27 / 27

slide-28
SLIDE 28

Acknowledgments

This research was supported in part by an appointment to the Sandia National Laboratories Truman Fellowship in National Security Science and Engineering, sponsored by Sandia Corporation (a wholly owned subsidiary of Lockheed Martin Corporation) as Operator of Sandia National Laboratories under its U.S. Department of Energy Contract No. DE-AC04-94AL85000.

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 28 / 27

slide-29
SLIDE 29

Baffico, L., Bernard, S., Maday, Y., Turinici, G., and Z´ erah, G. (2002). Parallel-in-time molecular-dynamics simulations. Physical Review E, 66(5):057701. Bal, G. and Maday, Y. (2002). A “parareal” time discretization for non-linear pdes with application to the pricing of an american put. In Recent developments in domain decomposition methods, pages 189–202. Springer Berlin Heidelberg. Blouza, A., Boudin, L., and Kaber, S. M. (2011). Parallel in time algorithms with reduction methods for solving chemical kinetics. Communications in Applied Mathematics and Computational Science, 5(2):241–263. C, K. (2011). Model Reduction of Nonlinear Mechanical Systems via Optimal Projection and Tensor Approximation.

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 28 / 27

slide-30
SLIDE 30

PhD thesis, Stanford University. C, K., Barone, M., and Antil, H. (2015a). Galerkin v. least-squares Petrov–Galerkin projection in nonlinear model reduction. arXiv e-print, (1504.03749). C, K., Bou-Mosleh, C., and Farhat, C. (2011). Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations. International Journal for Numerical Methods in Engineering, 86(2):155–181. C, K., Farhat, C., Cortial, J., and Amsallem, D. (2013). The GNAT method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows. Journal of Computational Physics, 242:623–647. C, K., Ray, J., and van Bloemen Waanders, B. (2015b).

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 28 / 27

slide-31
SLIDE 31

Decreasing the temporal complexity for nonlinear, implicit reduced-order models by forecasting. Computer Methods in Applied Mechanics and Engineering, 289:79–103. Chen, F., Hesthaven, J. S., and Zhu, X. (2014). On the use of reduced basis methods to accelerate and stabilize the parareal method. In Reduced Order Methods for Modeling and Computational Reduction, pages 187–214. Springer. Cortial, J. and Farhat, C. (2009). A time-parallel implicit method for accelerating the solution of non-linear structural dynamics problems. International Journal for Numerical Methods in Engineering, 77(4):451. Engblom, S. (2009). Parallel in time simulation of multiscale stochastic chemical kinetics. Multiscale Modeling & Simulation, 8(1):46–68.

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 28 / 27

slide-32
SLIDE 32

Falgout, R. D., Freidhoff, S., Kolev, T. V., MacLachlan, S. P., and Schroder, J. B. (2014). Parallel time integration with multigrid. SIAM J. Sci. Comput., 36(6):C635–C661. Farhat, C. and Chandesris, M. (2003). Time-decomposed parallel time-integrators: theory and feasibility studies for fluid, structure, and fluid-structure applications. International Journal for Numerical Methods in Engineering, 58(9):1397–1434. Farhat, C., Cortial, J., Dastillung, C., and Bavestrello, H. (2006). Time-parallel implicit integrators for the near-real-time prediction of linear structural dynamic responses. International Journal for Numerical Methods in Engineering, 67:697–724. Farhat, C., Geuzaine, P., and Brown, G. (2003).

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 28 / 27

slide-33
SLIDE 33

Application of a three-field nonlinear fluid-structure formulation to the prediction of the aeroelastic parameters of an F-16 fighter. Computers & Fluids, 32(1):3–29. Fischer, P. F., Hecht, F., and Maday, Y. (2005). A parareal in time semi-implicit approximation of the navier-stokes equations. In Domain decomposition methods in science and engineering, pages 433–440. Springer. Gander, M. and Vandewalle, S. (2007). Analysis of the parareal time-parallel time-integration method. SIAM Journal on Scientific Computing, 29(2):556–578. Guibert, D. and Tromeur-Dervout, D. (2007). Adaptive parareal for systems of odes. In Domain decomposition methods in science and engineering XVI, pages 587–594. Springer. Lions, J., Maday, Y., and Turinici, G. (2001a).

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 28 / 27

slide-34
SLIDE 34

A “parareal” in time discretization of PDEs. Comptes Rendus de l’Academie des Sciences Series I Mathematics, 332(7):661–668. Lions, J.-L., Maday, Y., and Turinici, G. (2001b). R´ esolution d’edp par un sch´ ema en temps “parareal”. Comptes Rendus de l’Acad´ emie des Sciences-Series I-Mathematics, 332(7):661–668. Maday, Y. (2007). Parareal in time algorithm for kinetic systems based on model reduction. High-dimensional partial differential equations in science and engineering, 41:183–194. Maday, Y. and Turinici, G. (2003). Parallel in time algorithms for quantum control: Parareal time discretization scheme. International journal of quantum chemistry, 93(3):223–228. Rewienski, M. J. (2003).

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 28 / 27

slide-35
SLIDE 35

A Trajectory Piecewise-Linear Approach to Model Order Reduction of Nonlinear Dynamical Systems. PhD thesis, Massachusetts Institute of Technology. Ruprecht, D. and Krause, R. (2012). Explicit parallel-in-time integration of a linear acoustic-advection system. Computers & Fluids, 59:72–83.

Data-driven time parallelism Carlberg, Brencher, Haasdonk, Barth 28 / 27