Stochastic solution of large least squares systems in variational - - PowerPoint PPT Presentation

stochastic solution of large least squares systems in
SMART_READER_LITE
LIVE PREVIEW

Stochastic solution of large least squares systems in variational - - PowerPoint PPT Presentation

Stochastic solution of large least squares systems in variational data assimilation Parallel 4DVAR without tangents and adjoints J. Mandel, S. Gratton, and E. Bergou University of Colorado Denver Institute of Computer Science, Czech Academy of


slide-1
SLIDE 1

Stochastic solution of large least squares systems in variational data assimilation

Parallel 4DVAR without tangents and adjoints

  • J. Mandel, S. Gratton, and E. Bergou

University of Colorado Denver Institute of Computer Science, Czech Academy of Sciences INP-ENSEEIHT and CERFACS, Toulouse, France Supported by Fondation STAE project ADTAO NSF grants AGS-0835579 and DMS-1216481 Czech Grant Agency project 13-34856S

Preconditioned Iterative Methods, Prague 2013

slide-2
SLIDE 2

Data assimilation

◮ Standard computing: data in, results out ◮ Not much changed in principle since we were taking punched cards

and magnetic tapes to the mainframe

◮ Data assimilation: combine data with a running model ◮ Cast as Bayesian update or least squares ◮ Used already in Apollo 11 navigation, now in GPS, standard in

numerical weather prediction

slide-3
SLIDE 3

Weak Constraint 4DVAR

◮ We want to determine x0, . . . , xk (xi = state at time i)

approximately from model and observations (data) x0 ≈ xb state at time 0 ≈ the background xi ≈ Mi (xi−1) state evolution ≈ by the model Hi (xi) ≈ yi value of observation operator ≈ the data

◮ quantify “≈” by covariances

x0 ≈ xb ⇔ x0 − xb2

B−1 = (x0 − xb)T B−1 (x0 − xb) ≈ 0 etc.

◮ ⇒ nonlinear least squares problem

x0 − xb2

B−1 + k

i=1

xi − Mi (xi−1)2

Q−1

i

+

k

i=1

yi − Hi (xi)2

R−1

i

→ min

x0:k

.

slide-4
SLIDE 4

Incremental 4DVAR

◮ linearization at [x0, . . . , xk] = x0:k

Mi (xi−1 + zi−1) ≈ Mi (xi−1) + M′

i (xi−1) zi−1

Hi (xi + zi) ≈ Hi (xi) + H′

i (xi) zi

◮ Gauss-Newton method (Bell, 1994; Courtier et al., 1994) consists

  • f the iterations

x0:k ← x0:k + z0:k with the increments z0:k solving the linear least squares problem x0 + z0 − xb2

B−1 + k

i=1

  • xi + zi − Mi (xi−1) − M′

i (xi−1) zi−1

  • 2

Q−1

i

+

k

i=1

  • yi − Hi (xi) − H′

i (xi) zi

  • 2

R−1

i

→ min

z0:k

slide-5
SLIDE 5

Linearized 4DVAR as Kalman smoother

◮ Write the linear least squares problem for the increments z0:k as

z0 − zb2

B−1 + k

i=1

zi − Mizi−1 − mi2

Q−1

i

+

k

i=1

di − Hizi2

R−1

i

→ min

z0:k

◮ Theorem The least squares solution z0:k is the mean of the random

vectors Z0:k satisfying the stochastic equations Z0 = zb +V0, V0 ∼ N (0, B) Zi = MiZi−1 + mi +Vi, Vi ∼ N (0, Qi) , i = 1, . . . , k di = HiZi +Wi, Wi ∼ N (0, Ri) i = 1, . . . , k

◮ This is the Kalman smoother (Rauch et al., 1965; Bell, 1994). ◮ Simulate by Monte-Carlo =

⇒ Ensemble Kalman Smoother (EnKS) (Evensen and van Leeuwen, 2000).

slide-6
SLIDE 6

Kalman smoother

◮ To solve the stochastic Kalman smoother equations (Rauch et al.,

1965; Bell, 1994) Z0 = zb +V0, V0 ∼ N (0, B) Zi = MiZi−1 + mi +Vi, Vi ∼ N (0, Qi) , i = 1, . . . , k di = HiZi +Wi, Wi ∼ N (0, Ri) i = 1, . . . , k

◮ Build the solution recursively: Zi|j incorporates the data d1:j only,

need Zi = Zi|k

◮ Kalman filter is the recursion Zi−1|i−1 → Zi|i. ◮ Kalman smoother applies the same update to previous times to get

Z0:i|i−1 → Z0:i|i

◮ Eventually, we obtain Z0:k as Z0:k|k

slide-7
SLIDE 7

Kalman filter and smoother recursions

◮ Recall notation: Zstep number | data incorporated up to ◮ Kalman filter incorporates data di in the latest state:

Zi|i−1

data di − → Zi|i

Z0|0

M1 − →

Z1|0

data d1 − →

Z1|1

M2 − →

Z2|1

data d2 − →

Z2|2

◮ Kalman smoother incorporates data di also in all previous states

Z0:i|i−1

data di − → Z0:i|i

Z0|0 →

M1 − →

Z0|0 Z1|0

  • d1

− →

Z0|1 Z1|1 → →

M2 − →

  Z0|1 Z1|1 Z2|1  

d2 − →

  Z0|2 Z1|2 Z2|2  

slide-8
SLIDE 8

Ensemble Kalman filter (EnKF)

Represent random vector Zi|j by an ensemble of realizations Z N

i|j =

  • z1

i|j, . . . , zN i|j

  • 1. Initialize by sampling background distribution

zℓ

0|0 ∼ N (zb, B) ,

ℓ = 1, . . . , N.

  • 2. For i = 1, . . . , k:

2.1 Advance in time and randomize by sampling model error zℓ

i|i−1 = Mizℓ i−1|i−1 + mi + vℓ i ,

vℓ

i ∼ N (0, Qi)

2.2 Incorporate data di zℓ

i|i =zℓ i|i−1 − PN i HT i (HiPN i HT i + Ri)−1

· (Hizℓ

i|i−1 − di − wℓ i ),

wℓ

i ∼ N (0, Ri) ,

where PN

i

is the sample covariance from the ensemble Z N

i|i−1

Approximate solution: the ensemble mean: zi|i ≈ 1

N

  • z1

i|i + · · · + zN i|i

slide-9
SLIDE 9

Ensemble Kalman smoother (EnKS)

◮ Run the Ensemble Kalman filter (EnKF). Turns out incorporating

the data di replaces the ensemble by linear combinations (transformation by a TN

i ): ZN i|i = ZN i|i−1TN i ,

TN

i ∈ RN×N.

◮ EnKS = EnKF + transform the history exactly the same way:

ZN

0:i|i = ZN 0:i|i−1TN i .

Approximate solution: the ensemble mean: zi|k ≈ 1

N

  • z1

i|k + · · · + zN i|k

slide-10
SLIDE 10

Convergence of the EnKF and EnKS

Definition (Stochastic Lp norm) For random vector X : Ω → Rn, Xp = E |X|p1/p Theorem (Mandel et al. (2011); Le Gland et al. (2011)) zℓ

i|i → Zi|i in all

Lp, 1 ≤ p < ∞, as the ensemble size N → ∞ Corollary The EnKF ensemble mean

1 N

  • z1

i|i + · · · + zN i|i

  • → zi|i = E
  • Zi|i
  • in all Lp, 1 ≤ p < ∞, as N → ∞

Corollary The EnKS ensemble mean converges to the solution of the linear least squares problem: 1 N

  • z1

i|k + · · · + zN i|k

  • → zi = zi|k = E(Zi|k)

in all Lp, 1 ≤ p < ∞, as N → ∞.

slide-11
SLIDE 11

Model operator

The linearized model Mi = M′

i (xi−1) occurs only in advancing the time

as the action on the ensemble of the increments Mi

  • xi−1 + zℓ

i−1

  • − xi ≈ Mi (xi−1) + M′

i (xi−1) zℓ i−1 − xi = Mizℓ i−1 + mi

Approximating by finite differences with a parameter τ > 0: Mizℓ

i−1 + mi ≈

Mi

  • xi−1 + τzℓ

i−1

  • − Mi (xi−1)

τ + Mi (xi−1) − xi Needs N + 1 evaluations of Mi, at xi−1 and xi−1 + τzℓ

i−1, ℓ = 1, . . . , N

Accurate in the limit τ → 0. For τ = 1, recover the nonlinear model: Mi

  • xi−1 + zℓ

i−1

  • − xi
slide-12
SLIDE 12

Observation operator

The observation matrix occurs only in the action on the ensemble, HiZN =

  • Hiz1, . . . , HizN

. Approximating by finite differences with a parameter τ > 0: Hizℓ

i−1 ≈

Hi

  • xi + τzℓ

i

  • − Hi (xi)

τ Needs N + 1 evaluations of Hi, at xi−1 and xi−1 + τzn

i−1.

Accurate in the limit τ → 0. τ = 1 ⇒ becomes exactly the EnKS run on the nonlinear problem ⇒ EnKS independent of the point of linearization ⇒ no convergence of the 4DVAR iterations. In our tests, τ = 0.1 seems to work well enough.

slide-13
SLIDE 13

Tikhonov regularization and Levenberg-Marquardt method

◮ Gauss-Newton may not converge, even locally. Add a penalty

(Tikhonov regularization) to control the size of the increments zi : z0 − zb2

B−1 + k

i=1

zi − Mizi−1 − mi2

Q−1

i

+

k

i=1

di − Hizi2

R−1

i

+ γ

k

i=0

zi2

S−1

i

→ min

z0:k

◮ Becomes the Levenberg-Marquardt method, which is guaranteed to

converge for large enough γ.

◮ Implement the regularization as independent observations zi ≈ 0

with error covariance Si: simply run the analysis step the second time (Johns and Mandel, 2008). Here, this is statistically exact because the distributions in the Kalman smoother are gaussian.

slide-14
SLIDE 14

Convergence

  • Theorem. Every iteration of the Levenberg-Marquardt or the

Gauss-Newton method with the stochastic linear solver by the EnKS converges in all Lp, 1 ≤ p < ∞ as N → ∞, and τ → 0, to the same iteration with the exact linear solver.

slide-15
SLIDE 15

Functional interface

No matrices needed explicitly. Only certain matrix-vector operations.

◮ Evaluation of the model operator Mi (xi−1) for a given state xi−1. ◮ Evaluation of the observation operator Hi (xi) for a given state xi. ◮ Multiplication of a vector u0 by square root V of the background

covariance B: B = VV T . Similarly, multiplication of a vector ui by a square roots of the model error covariance Qi, and a data-dimensioned vector vi by a square root of the data error covariance Ri.These matrix roots are used only to generate samples from N (0, B), N (0, Qi), and N (0, Ri).

◮ Solving a system with matrix Ri.

slide-16
SLIDE 16

Computational results

Lorenz 63 model dx dt = −σ(x − y) dy dt = ρx − y − xz dz dt = xy − βz

slide-17
SLIDE 17

EnKS-4DVAR for Lorenz 63 model

Twin experiment - one solution chosen as the truth B = σ2

b diag

  • 1, 1

4, 1 9

  • ,

Ri = σ2

r I,

Hi (x, y, z) =

  • x2, y2, z2

. Root mean square error of EnKS-4DVAR iterations over 50 timesteps, ensemble size 100, model error covariances Qi ≈ 0. Iteration 1 2 3 4 5 6 RMSE 20.16 15.37 3.73 2.53 0.09 0.09

slide-18
SLIDE 18

Lorenz 96 model

dxj dt = 1 κ (xj−1(xj+1 − xj−2) − xj + F), j = 1, . . . , 40 cyclic b.c. x−1 = x39, x0 = x40, x41 = x1 (Lorenz, 2006)

◮ behaves chaotically in the case of κ = 1 and F = 8 ◮ model of time evolution on a constant latitude circle

10 20 30 40 50 60 70 80 90 100 −10 −5 5 10 15 x(20) time steps

slide-19
SLIDE 19

EnKS-4DVAR for Lorenz 96 model

◮ Twin experiment - one solution chosen as the truth. The whole

state is observed.

◮ Background covariance B = σ2

bdiag(1, 1 4, 1 9, . . .)

◮ Data error covariance Ri = σ2

r I, observation operator

Hi (Xi) = Xi + X 2

i

◮ Model error covariance Qi = 0.1C, Cmn = exp(−5|m − n|),

ensemble size N = 40.

5 10 15 20 25 30 35 40 45 50 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 time rmse

RMSE of EnKS-4DVAR solution remains bounded over 50 time steps, max cca 5%

slide-20
SLIDE 20

An example where Gauss-Newton does not converge

(x0 − 2)2 + (3 + x2

1)2 + 106(x0 − x1)2 → min

4DVAR with xb = 2, B = I, M1 = I, H1(x) = x2, y1 = 3, Q1 = 10−6 Thus, γ > 0 (regularization) is sometimes required.

slide-21
SLIDE 21

Properties

4DVAR sets up a large nonlinear least squares problem. Classical methods

◮ In each iteration, sequential evaluations of the model, tangent and

adjoint operators, and solving large linear least squares

◮ Extra code for tangent and adjoint operators. ◮ Gauss-Newton iterations may not converge, not even locally. ◮ Difficult to parallelize - only inside the model.

The new method

◮ Solve the linear least squares from 4DVAR by EnKS, naturally

parallel over the ensemble members.

◮ Linear algebra glue is cheap, use parallel dense libraries. ◮ Finite differences ⇒ no tangent and adjoint operators needed. ◮ Add Tikhonov regularization to the linear least squares ⇒

Levelberg-Marquardt method with guaranteed convergence.

◮ Cheap and simple implementation of Tikhonov regularization within

EnKS as an additional observation.

slide-22
SLIDE 22

To do

◮ Testing on large, realistic problems ◮ Localization in the EnKS (the standard sample covariance is low

rank..., with spurious terms)

◮ Adaptive control of initial ensemble spread in each iteration ◮ Adaptive control of the regularization (the Levenberg-Marquardt

parameter γ)

◮ More theory, esp. nonlinear and Levenberg-Marquardt convergence

slide-23
SLIDE 23

Related work

◮ The equivalence between weak constraint 4DVAR and Kalman

smoothing is approximate for nonlinear problems, but still useful (Fisher et al., 2005).

◮ Hamill and Snyder (2000) estimated background covariance from

ensemble for 4DVAR.

◮ Gradient methods in the span of the ensemble for one analysis cycle

(i.e., 3DVAR) include Zupanski (2005), Sakov et al. (2012) (with square root EnKF as a linear solver in Newton method), and Bocquet and Sakov (2012), who added regularization and use LETKF-like approach to minimize the nonlinear cost function over linear combinations of the ensemble.

◮ Liu et al. (2008, 2009) combine ensembles with (strong constraint)

4DVAR and minimize in the observation space.

◮ Zhang et al. (2009) use EnKF to obtain the covariance for 4DVAR,

and 4DVAR to feed the mean analysis into EnKF.

slide-24
SLIDE 24

References

  • B. Bell. The iterated Kalman smoother as a Gauss-Newton method. SIAM Journal on

Optimization, 4(3):626–636, 1994. doi: 10.1137/0804035.

  • M. Bocquet and P. Sakov. Combining inflation-free and iterative ensemble Kalman

filters for strongly nonlinear systems. Nonlinear Processes in Geophysics, 19(3): 383–399, 2012. doi: 10.5194/npg-19-383-2012.

  • P. Courtier, J.-N. Th´

epaut, and A. Hollingsworth. A strategy for operational implementation of 4d-var, using an incremental approach. Quarterly Journal of the Royal Meteorological Society, 120(519):1367–1387, 1994. ISSN 1477-870X. doi: 10.1002/qj.49712051912.

  • G. Evensen and P. J. van Leeuwen. An ensemble Kalman smoother for nonlinear
  • dynamics. Monthly Weather Review, 128(6):1852–1867, 2000. doi:

10.1175/1520-0493(2000)1281852:AEKSFN2.0.CO;2.

  • M. Fisher, M. Leutbecher, and G. A. Kelly. On the equivalence between Kalman

smoothing and weak-constraint four-dimensional variational data assimilation. Quarterly Journal of the Royal Meteorological Society, 131(613, Part c):3235–3246, OCT 2005. ISSN 0035-9009. doi: {10.1256/qj.04.142}.

  • T. M. Hamill and C. Snyder. A hybrid ensemble Kalman filter–3D variational analysis
  • scheme. Monthly Weather Review, 128(8):2905–2919, 2000. doi:

10.1175/1520-0493(2000)1282905:AHEKFV2.0.CO;2.

  • C. J. Johns and J. Mandel. A two-stage ensemble Kalman filter for smooth data
  • assimilation. Environmental and Ecological Statistics, 15:101–110, 2008. doi:

10.1007/s10651-007-0033-0.

slide-25
SLIDE 25
  • F. Le Gland, V. Monbet, and V.-D. Tran. Large sample asymptotics for the ensemble

Kalman filter. In D. Crisan and B. Rozovskii, editors, The Oxford Handbook of Nonlinear Filtering. Oxford University Press, 2011. INRIA Report 7014, August 2009.

  • C. Liu, Q. Xiao, and B. Wang. An ensemble-based four-dimensional variational data

assimilation scheme. Part I: Technical formulation and preliminary test. Monthly Weather Review, 136(9):3363–3373, 2008. doi: 10.1175/2008MWR2312.1.

  • C. Liu, Q. Xiao, and B. Wang. An ensemble-based four-dimensional variational data

assimilation scheme. Part II: Observing system simulation experiments with Advanced Research WRF (ARW). Monthly Weather Review, 137(5):1687–1704,

  • 2009. doi: 10.1175/2008MWR2699.1.
  • E. N. Lorenz. Predictability - a problem partly solved. In T. Palmer and
  • R. Hagendorn, editors, Predictability of Weather and Climate, pages 40–58.

Cambridge University Press, 2006.

  • J. Mandel, L. Cobb, and J. D. Beezley. On the convergence of the ensemble Kalman
  • filter. Applications of Mathematics, 56:533–541, 2011. doi:

10.1007/s10492-011-0031-2. arXiv:0901.2951, January 2009.

  • H. E. Rauch, F. Tung, and C. T. Striebel. Maximum likelihood estimates of linear

dynamic systems. AIAA Journal, 3(8):1445–1450, 1965.

  • P. Sakov, D. S. Oliver, and L. Bertino. An iterative EnKF for strongly nonlinear
  • systems. Monthly Weather Review, 140(6):1988–2004, 2012. doi:

10.1175/MWR-D-11-00176.1.

  • Y. Tr´
  • emolet. Model-error estimation in 4D-Var. Q. J. Royal Meteorological Soc., 133

(626):1267–1280, 2007. ISSN 1477-870X. doi: 10.1002/qj.94.

  • F. Zhang, M. Zhang, and J. Hansen. Coupling ensemble Kalman filter with

four-dimensional variational data assimilation. Advances in Atmospheric Sciences, 26(1):1–8, 2009. doi: 10.1007/s00376-009-0001-8.

slide-26
SLIDE 26
  • M. Zupanski. Maximum likelihood ensemble filter: Theoretical aspects. Monthly

Weather Review, 133(6):1710–1726, 2013/01/01 2005. doi: 10.1175/MWR2946.1.