SLIDE 1 Stochastic solution of large least squares systems in variational data assimilation
Parallel 4DVAR without tangents and adjoints
- J. Mandel, S. Gratton, and E. Bergou
University of Colorado Denver Institute of Computer Science, Czech Academy of Sciences INP-ENSEEIHT and CERFACS, Toulouse, France Supported by Fondation STAE project ADTAO NSF grants AGS-0835579 and DMS-1216481 Czech Grant Agency project 13-34856S
Preconditioned Iterative Methods, Prague 2013
SLIDE 2
Data assimilation
◮ Standard computing: data in, results out ◮ Not much changed in principle since we were taking punched cards
and magnetic tapes to the mainframe
◮ Data assimilation: combine data with a running model ◮ Cast as Bayesian update or least squares ◮ Used already in Apollo 11 navigation, now in GPS, standard in
numerical weather prediction
SLIDE 3 Weak Constraint 4DVAR
◮ We want to determine x0, . . . , xk (xi = state at time i)
approximately from model and observations (data) x0 ≈ xb state at time 0 ≈ the background xi ≈ Mi (xi−1) state evolution ≈ by the model Hi (xi) ≈ yi value of observation operator ≈ the data
◮ quantify “≈” by covariances
x0 ≈ xb ⇔ x0 − xb2
B−1 = (x0 − xb)T B−1 (x0 − xb) ≈ 0 etc.
◮ ⇒ nonlinear least squares problem
x0 − xb2
B−1 + k
∑
i=1
xi − Mi (xi−1)2
Q−1
i
+
k
∑
i=1
yi − Hi (xi)2
R−1
i
→ min
x0:k
.
SLIDE 4 Incremental 4DVAR
◮ linearization at [x0, . . . , xk] = x0:k
Mi (xi−1 + zi−1) ≈ Mi (xi−1) + M′
i (xi−1) zi−1
Hi (xi + zi) ≈ Hi (xi) + H′
i (xi) zi
◮ Gauss-Newton method (Bell, 1994; Courtier et al., 1994) consists
x0:k ← x0:k + z0:k with the increments z0:k solving the linear least squares problem x0 + z0 − xb2
B−1 + k
∑
i=1
i (xi−1) zi−1
Q−1
i
+
k
∑
i=1
i (xi) zi
R−1
i
→ min
z0:k
SLIDE 5 Linearized 4DVAR as Kalman smoother
◮ Write the linear least squares problem for the increments z0:k as
z0 − zb2
B−1 + k
∑
i=1
zi − Mizi−1 − mi2
Q−1
i
+
k
∑
i=1
di − Hizi2
R−1
i
→ min
z0:k
◮ Theorem The least squares solution z0:k is the mean of the random
vectors Z0:k satisfying the stochastic equations Z0 = zb +V0, V0 ∼ N (0, B) Zi = MiZi−1 + mi +Vi, Vi ∼ N (0, Qi) , i = 1, . . . , k di = HiZi +Wi, Wi ∼ N (0, Ri) i = 1, . . . , k
◮ This is the Kalman smoother (Rauch et al., 1965; Bell, 1994). ◮ Simulate by Monte-Carlo =
⇒ Ensemble Kalman Smoother (EnKS) (Evensen and van Leeuwen, 2000).
SLIDE 6
Kalman smoother
◮ To solve the stochastic Kalman smoother equations (Rauch et al.,
1965; Bell, 1994) Z0 = zb +V0, V0 ∼ N (0, B) Zi = MiZi−1 + mi +Vi, Vi ∼ N (0, Qi) , i = 1, . . . , k di = HiZi +Wi, Wi ∼ N (0, Ri) i = 1, . . . , k
◮ Build the solution recursively: Zi|j incorporates the data d1:j only,
need Zi = Zi|k
◮ Kalman filter is the recursion Zi−1|i−1 → Zi|i. ◮ Kalman smoother applies the same update to previous times to get
Z0:i|i−1 → Z0:i|i
◮ Eventually, we obtain Z0:k as Z0:k|k
SLIDE 7 Kalman filter and smoother recursions
◮ Recall notation: Zstep number | data incorporated up to ◮ Kalman filter incorporates data di in the latest state:
Zi|i−1
data di − → Zi|i
Z0|0
M1 − →
Z1|0
data d1 − →
Z1|1
M2 − →
Z2|1
data d2 − →
Z2|2
◮ Kalman smoother incorporates data di also in all previous states
Z0:i|i−1
data di − → Z0:i|i
Z0|0 →
M1 − →
Z0|0 Z1|0
− →
Z0|1 Z1|1 → →
M2 − →
Z0|1 Z1|1 Z2|1
d2 − →
Z0|2 Z1|2 Z2|2
SLIDE 8 Ensemble Kalman filter (EnKF)
Represent random vector Zi|j by an ensemble of realizations Z N
i|j =
i|j, . . . , zN i|j
- 1. Initialize by sampling background distribution
zℓ
0|0 ∼ N (zb, B) ,
ℓ = 1, . . . , N.
2.1 Advance in time and randomize by sampling model error zℓ
i|i−1 = Mizℓ i−1|i−1 + mi + vℓ i ,
vℓ
i ∼ N (0, Qi)
2.2 Incorporate data di zℓ
i|i =zℓ i|i−1 − PN i HT i (HiPN i HT i + Ri)−1
· (Hizℓ
i|i−1 − di − wℓ i ),
wℓ
i ∼ N (0, Ri) ,
where PN
i
is the sample covariance from the ensemble Z N
i|i−1
Approximate solution: the ensemble mean: zi|i ≈ 1
N
i|i + · · · + zN i|i
SLIDE 9 Ensemble Kalman smoother (EnKS)
◮ Run the Ensemble Kalman filter (EnKF). Turns out incorporating
the data di replaces the ensemble by linear combinations (transformation by a TN
i ): ZN i|i = ZN i|i−1TN i ,
TN
i ∈ RN×N.
◮ EnKS = EnKF + transform the history exactly the same way:
ZN
0:i|i = ZN 0:i|i−1TN i .
Approximate solution: the ensemble mean: zi|k ≈ 1
N
i|k + · · · + zN i|k
SLIDE 10 Convergence of the EnKF and EnKS
Definition (Stochastic Lp norm) For random vector X : Ω → Rn, Xp = E |X|p1/p Theorem (Mandel et al. (2011); Le Gland et al. (2011)) zℓ
i|i → Zi|i in all
Lp, 1 ≤ p < ∞, as the ensemble size N → ∞ Corollary The EnKF ensemble mean
1 N
i|i + · · · + zN i|i
- → zi|i = E
- Zi|i
- in all Lp, 1 ≤ p < ∞, as N → ∞
Corollary The EnKS ensemble mean converges to the solution of the linear least squares problem: 1 N
i|k + · · · + zN i|k
in all Lp, 1 ≤ p < ∞, as N → ∞.
SLIDE 11 Model operator
The linearized model Mi = M′
i (xi−1) occurs only in advancing the time
as the action on the ensemble of the increments Mi
i−1
i (xi−1) zℓ i−1 − xi = Mizℓ i−1 + mi
Approximating by finite differences with a parameter τ > 0: Mizℓ
i−1 + mi ≈
Mi
i−1
τ + Mi (xi−1) − xi Needs N + 1 evaluations of Mi, at xi−1 and xi−1 + τzℓ
i−1, ℓ = 1, . . . , N
Accurate in the limit τ → 0. For τ = 1, recover the nonlinear model: Mi
i−1
SLIDE 12 Observation operator
The observation matrix occurs only in the action on the ensemble, HiZN =
. Approximating by finite differences with a parameter τ > 0: Hizℓ
i−1 ≈
Hi
i
τ Needs N + 1 evaluations of Hi, at xi−1 and xi−1 + τzn
i−1.
Accurate in the limit τ → 0. τ = 1 ⇒ becomes exactly the EnKS run on the nonlinear problem ⇒ EnKS independent of the point of linearization ⇒ no convergence of the 4DVAR iterations. In our tests, τ = 0.1 seems to work well enough.
SLIDE 13 Tikhonov regularization and Levenberg-Marquardt method
◮ Gauss-Newton may not converge, even locally. Add a penalty
(Tikhonov regularization) to control the size of the increments zi : z0 − zb2
B−1 + k
∑
i=1
zi − Mizi−1 − mi2
Q−1
i
+
k
∑
i=1
di − Hizi2
R−1
i
+ γ
k
∑
i=0
zi2
S−1
i
→ min
z0:k
◮ Becomes the Levenberg-Marquardt method, which is guaranteed to
converge for large enough γ.
◮ Implement the regularization as independent observations zi ≈ 0
with error covariance Si: simply run the analysis step the second time (Johns and Mandel, 2008). Here, this is statistically exact because the distributions in the Kalman smoother are gaussian.
SLIDE 14 Convergence
- Theorem. Every iteration of the Levenberg-Marquardt or the
Gauss-Newton method with the stochastic linear solver by the EnKS converges in all Lp, 1 ≤ p < ∞ as N → ∞, and τ → 0, to the same iteration with the exact linear solver.
SLIDE 15
Functional interface
No matrices needed explicitly. Only certain matrix-vector operations.
◮ Evaluation of the model operator Mi (xi−1) for a given state xi−1. ◮ Evaluation of the observation operator Hi (xi) for a given state xi. ◮ Multiplication of a vector u0 by square root V of the background
covariance B: B = VV T . Similarly, multiplication of a vector ui by a square roots of the model error covariance Qi, and a data-dimensioned vector vi by a square root of the data error covariance Ri.These matrix roots are used only to generate samples from N (0, B), N (0, Qi), and N (0, Ri).
◮ Solving a system with matrix Ri.
SLIDE 16
Computational results
Lorenz 63 model dx dt = −σ(x − y) dy dt = ρx − y − xz dz dt = xy − βz
SLIDE 17 EnKS-4DVAR for Lorenz 63 model
Twin experiment - one solution chosen as the truth B = σ2
b diag
4, 1 9
Ri = σ2
r I,
Hi (x, y, z) =
. Root mean square error of EnKS-4DVAR iterations over 50 timesteps, ensemble size 100, model error covariances Qi ≈ 0. Iteration 1 2 3 4 5 6 RMSE 20.16 15.37 3.73 2.53 0.09 0.09
SLIDE 18 Lorenz 96 model
dxj dt = 1 κ (xj−1(xj+1 − xj−2) − xj + F), j = 1, . . . , 40 cyclic b.c. x−1 = x39, x0 = x40, x41 = x1 (Lorenz, 2006)
◮ behaves chaotically in the case of κ = 1 and F = 8 ◮ model of time evolution on a constant latitude circle
10 20 30 40 50 60 70 80 90 100 −10 −5 5 10 15 x(20) time steps
SLIDE 19 EnKS-4DVAR for Lorenz 96 model
◮ Twin experiment - one solution chosen as the truth. The whole
state is observed.
◮ Background covariance B = σ2
bdiag(1, 1 4, 1 9, . . .)
◮ Data error covariance Ri = σ2
r I, observation operator
Hi (Xi) = Xi + X 2
i
◮ Model error covariance Qi = 0.1C, Cmn = exp(−5|m − n|),
ensemble size N = 40.
5 10 15 20 25 30 35 40 45 50 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 time rmse
RMSE of EnKS-4DVAR solution remains bounded over 50 time steps, max cca 5%
SLIDE 20
An example where Gauss-Newton does not converge
(x0 − 2)2 + (3 + x2
1)2 + 106(x0 − x1)2 → min
4DVAR with xb = 2, B = I, M1 = I, H1(x) = x2, y1 = 3, Q1 = 10−6 Thus, γ > 0 (regularization) is sometimes required.
SLIDE 21
Properties
4DVAR sets up a large nonlinear least squares problem. Classical methods
◮ In each iteration, sequential evaluations of the model, tangent and
adjoint operators, and solving large linear least squares
◮ Extra code for tangent and adjoint operators. ◮ Gauss-Newton iterations may not converge, not even locally. ◮ Difficult to parallelize - only inside the model.
The new method
◮ Solve the linear least squares from 4DVAR by EnKS, naturally
parallel over the ensemble members.
◮ Linear algebra glue is cheap, use parallel dense libraries. ◮ Finite differences ⇒ no tangent and adjoint operators needed. ◮ Add Tikhonov regularization to the linear least squares ⇒
Levelberg-Marquardt method with guaranteed convergence.
◮ Cheap and simple implementation of Tikhonov regularization within
EnKS as an additional observation.
SLIDE 22
To do
◮ Testing on large, realistic problems ◮ Localization in the EnKS (the standard sample covariance is low
rank..., with spurious terms)
◮ Adaptive control of initial ensemble spread in each iteration ◮ Adaptive control of the regularization (the Levenberg-Marquardt
parameter γ)
◮ More theory, esp. nonlinear and Levenberg-Marquardt convergence
SLIDE 23
Related work
◮ The equivalence between weak constraint 4DVAR and Kalman
smoothing is approximate for nonlinear problems, but still useful (Fisher et al., 2005).
◮ Hamill and Snyder (2000) estimated background covariance from
ensemble for 4DVAR.
◮ Gradient methods in the span of the ensemble for one analysis cycle
(i.e., 3DVAR) include Zupanski (2005), Sakov et al. (2012) (with square root EnKF as a linear solver in Newton method), and Bocquet and Sakov (2012), who added regularization and use LETKF-like approach to minimize the nonlinear cost function over linear combinations of the ensemble.
◮ Liu et al. (2008, 2009) combine ensembles with (strong constraint)
4DVAR and minimize in the observation space.
◮ Zhang et al. (2009) use EnKF to obtain the covariance for 4DVAR,
and 4DVAR to feed the mean analysis into EnKF.
SLIDE 24 References
- B. Bell. The iterated Kalman smoother as a Gauss-Newton method. SIAM Journal on
Optimization, 4(3):626–636, 1994. doi: 10.1137/0804035.
- M. Bocquet and P. Sakov. Combining inflation-free and iterative ensemble Kalman
filters for strongly nonlinear systems. Nonlinear Processes in Geophysics, 19(3): 383–399, 2012. doi: 10.5194/npg-19-383-2012.
epaut, and A. Hollingsworth. A strategy for operational implementation of 4d-var, using an incremental approach. Quarterly Journal of the Royal Meteorological Society, 120(519):1367–1387, 1994. ISSN 1477-870X. doi: 10.1002/qj.49712051912.
- G. Evensen and P. J. van Leeuwen. An ensemble Kalman smoother for nonlinear
- dynamics. Monthly Weather Review, 128(6):1852–1867, 2000. doi:
10.1175/1520-0493(2000)1281852:AEKSFN2.0.CO;2.
- M. Fisher, M. Leutbecher, and G. A. Kelly. On the equivalence between Kalman
smoothing and weak-constraint four-dimensional variational data assimilation. Quarterly Journal of the Royal Meteorological Society, 131(613, Part c):3235–3246, OCT 2005. ISSN 0035-9009. doi: {10.1256/qj.04.142}.
- T. M. Hamill and C. Snyder. A hybrid ensemble Kalman filter–3D variational analysis
- scheme. Monthly Weather Review, 128(8):2905–2919, 2000. doi:
10.1175/1520-0493(2000)1282905:AHEKFV2.0.CO;2.
- C. J. Johns and J. Mandel. A two-stage ensemble Kalman filter for smooth data
- assimilation. Environmental and Ecological Statistics, 15:101–110, 2008. doi:
10.1007/s10651-007-0033-0.
SLIDE 25
- F. Le Gland, V. Monbet, and V.-D. Tran. Large sample asymptotics for the ensemble
Kalman filter. In D. Crisan and B. Rozovskii, editors, The Oxford Handbook of Nonlinear Filtering. Oxford University Press, 2011. INRIA Report 7014, August 2009.
- C. Liu, Q. Xiao, and B. Wang. An ensemble-based four-dimensional variational data
assimilation scheme. Part I: Technical formulation and preliminary test. Monthly Weather Review, 136(9):3363–3373, 2008. doi: 10.1175/2008MWR2312.1.
- C. Liu, Q. Xiao, and B. Wang. An ensemble-based four-dimensional variational data
assimilation scheme. Part II: Observing system simulation experiments with Advanced Research WRF (ARW). Monthly Weather Review, 137(5):1687–1704,
- 2009. doi: 10.1175/2008MWR2699.1.
- E. N. Lorenz. Predictability - a problem partly solved. In T. Palmer and
- R. Hagendorn, editors, Predictability of Weather and Climate, pages 40–58.
Cambridge University Press, 2006.
- J. Mandel, L. Cobb, and J. D. Beezley. On the convergence of the ensemble Kalman
- filter. Applications of Mathematics, 56:533–541, 2011. doi:
10.1007/s10492-011-0031-2. arXiv:0901.2951, January 2009.
- H. E. Rauch, F. Tung, and C. T. Striebel. Maximum likelihood estimates of linear
dynamic systems. AIAA Journal, 3(8):1445–1450, 1965.
- P. Sakov, D. S. Oliver, and L. Bertino. An iterative EnKF for strongly nonlinear
- systems. Monthly Weather Review, 140(6):1988–2004, 2012. doi:
10.1175/MWR-D-11-00176.1.
- Y. Tr´
- emolet. Model-error estimation in 4D-Var. Q. J. Royal Meteorological Soc., 133
(626):1267–1280, 2007. ISSN 1477-870X. doi: 10.1002/qj.94.
- F. Zhang, M. Zhang, and J. Hansen. Coupling ensemble Kalman filter with
four-dimensional variational data assimilation. Advances in Atmospheric Sciences, 26(1):1–8, 2009. doi: 10.1007/s00376-009-0001-8.
SLIDE 26
- M. Zupanski. Maximum likelihood ensemble filter: Theoretical aspects. Monthly
Weather Review, 133(6):1710–1726, 2013/01/01 2005. doi: 10.1175/MWR2946.1.