Stochastic solution of large least squares systems in variational - PowerPoint PPT Presentation

Stochastic solution of large least squares systems in variational data assimilation Parallel 4DVAR without tangents and adjoints J. Mandel, S. Gratton, and E. Bergou University of Colorado Denver Institute of Computer Science, Czech Academy of Sciences INP-ENSEEIHT and CERFACS, Toulouse, France Supported by Fondation STAE project ADTAO NSF grants AGS-0835579 and DMS-1216481 Czech Grant Agency project 13-34856S Preconditioned Iterative Methods, Prague 2013

Data assimilation ◮ Standard computing: data in, results out ◮ Not much changed in principle since we were taking punched cards and magnetic tapes to the mainframe ◮ Data assimilation: combine data with a running model ◮ Cast as Bayesian update or least squares ◮ Used already in Apollo 11 navigation, now in GPS, standard in numerical weather prediction

Weak Constraint 4DVAR ◮ We want to determine x 0 , . . . , x k ( x i = state at time i ) approximately from model and observations (data) ≈ state at time 0 ≈ the background x 0 x b ≈ M i ( x i − 1 ) state evolution ≈ by the model x i H i ( x i ) ≈ value of observation operator ≈ the data y i ◮ quantify “ ≈ ” by covariances B − 1 = ( x 0 − x b ) T B − 1 ( x 0 − x b ) ≈ 0 etc. x 0 ≈ x b ⇔ � x 0 − x b � 2 ◮ ⇒ nonlinear least squares problem k k � x 0 − x b � 2 � x i − M i ( x i − 1 ) � 2 � y i − H i ( x i ) � 2 ∑ ∑ B − 1 + + → min Q − 1 R − 1 x 0 : k i i i = 1 i = 1 .

Incremental 4DVAR ◮ linearization at [ x 0 , . . . , x k ] = x 0 : k M i ( x i − 1 + z i − 1 ) ≈ M i ( x i − 1 ) + M ′ i ( x i − 1 ) z i − 1 H i ( x i + z i ) ≈ H i ( x i ) + H ′ i ( x i ) z i ◮ Gauss-Newton method (Bell, 1994; Courtier et al., 1994) consists of the iterations x 0 : k ← x 0 : k + z 0 : k with the increments z 0 : k solving the linear least squares problem k � 2 � x 0 + z 0 − x b � 2 � x i + z i − M i ( x i − 1 ) − M ′ ∑ � � B − 1 + i ( x i − 1 ) z i − 1 Q − 1 i i = 1 k � 2 � y i − H i ( x i ) − H ′ ∑ � � + i ( x i ) z i → min R − 1 z 0 : k i i = 1

Linearized 4DVAR as Kalman smoother ◮ Write the linear least squares problem for the increments z 0 : k as k k � z 0 − z b � 2 � z i − M i z i − 1 − m i � 2 � d i − H i z i � 2 ∑ ∑ B − 1 + + → min Q − 1 R − 1 z 0 : k i i i = 1 i = 1 ◮ Theorem The least squares solution z 0 : k is the mean of the random vectors Z 0 : k satisfying the stochastic equations Z 0 = z b + V 0 , V 0 ∼ N ( 0, B ) Z i = M i Z i − 1 + m i + V i , V i ∼ N ( 0, Q i ) , i = 1, . . . , k d i = H i Z i + W i , W i ∼ N ( 0, R i ) i = 1, . . . , k ◮ This is the Kalman smoother (Rauch et al., 1965; Bell, 1994). ◮ Simulate by Monte-Carlo = ⇒ Ensemble Kalman Smoother (EnKS) (Evensen and van Leeuwen, 2000).

Kalman smoother ◮ To solve the stochastic Kalman smoother equations (Rauch et al., 1965; Bell, 1994) Z 0 = z b + V 0 , V 0 ∼ N ( 0, B ) Z i = M i Z i − 1 + m i + V i , V i ∼ N ( 0, Q i ) , i = 1, . . . , k d i = H i Z i + W i , W i ∼ N ( 0, R i ) i = 1, . . . , k ◮ Build the solution recursively: Z i | j incorporates the data d 1 : j only, need Z i = Z i | k ◮ Kalman filter is the recursion Z i − 1 | i − 1 �→ Z i | i . ◮ Kalman smoother applies the same update to previous times to get Z 0 : i | i − 1 �→ Z 0 : i | i ◮ Eventually, we obtain Z 0 : k as Z 0 : k | k

Kalman filter and smoother recursions ◮ Recall notation: Z step number | data incorporated up to ◮ Kalman filter incorporates data d i in the latest state: data d i Z i | i − 1 → Z i | i − M 1 data d 1 M 2 data d 2 Z 0 | 0 Z 1 | 0 Z 1 | 1 Z 2 | 1 Z 2 | 2 − → − → − → − → ◮ Kalman smoother incorporates data d i also in all previous states data d i Z 0 : i | i − 1 → Z 0 : i | i − � → � Z 0 | 0 � Z 0 | 1  Z 0 | 1   Z 0 | 2  → � d 1 d 2 → Z 0 | 0 Z 1 | 1 Z 1 | 2 M 1 − →   − →   Z 1 | 0 Z 1 | 1 M 2 − → Z 2 | 1 Z 2 | 2 − →

Ensemble Kalman filter (EnKF) Represent random vector Z i | j by an ensemble of realizations � � z 1 Z N i | j , . . . , z N i | j = i | j 1. Initialize by sampling background distribution z ℓ 0 | 0 ∼ N ( z b , B ) , ℓ = 1, . . . , N . 2. For i = 1, . . . , k : 2.1 Advance in time and randomize by sampling model error z ℓ i | i − 1 = M i z ℓ i − 1 | i − 1 + m i + v ℓ v ℓ i ∼ N ( 0, Q i ) i , 2.2 Incorporate data d i z ℓ i | i = z ℓ i | i − 1 − P N i H T i ( H i P N i H T i + R i ) − 1 · ( H i z ℓ i | i − 1 − d i − w ℓ w ℓ i ) , i ∼ N ( 0, R i ) , where P N is the sample covariance from the ensemble Z N i i | i − 1 � � Approximate solution: the ensemble mean: z i | i ≈ 1 z 1 i | i + · · · + z N N i | i

Ensemble Kalman smoother (EnKS) ◮ Run the Ensemble Kalman filter (EnKF). Turns out incorporating the data d i replaces the ensemble by linear combinations (transformation by a T N i ): Z N i | i = Z N i | i − 1 T N T N i ∈ R N × N . i , ◮ EnKS = EnKF + transform the history exactly the same way : Z N 0 : i | i = Z N 0 : i | i − 1 T N i . � � Approximate solution: the ensemble mean: z i | k ≈ 1 z 1 i | k + · · · + z N N i | k

Convergence of the EnKF and EnKS Definition (Stochastic L p norm) For random vector X : Ω → R n , � | X | p � 1 / p � X � p = E Theorem (Mandel et al. (2011); Le Gland et al. (2011)) z ℓ i | i → Z i | i in all L p , 1 ≤ p < ∞ , as the ensemble size N → ∞ Corollary The EnKF ensemble mean � � � � 1 z 1 i | i + · · · + z N in all L p , 1 ≤ p < ∞ , as N → ∞ → z i | i = E Z i | i N i | i Corollary The EnKS ensemble mean converges to the solution of the linear least squares problem: 1 � � z 1 i | k + · · · + z N → z i = z i | k = E ( Z i | k ) i | k N in all L p , 1 ≤ p < ∞ , as N → ∞ .

Model operator The linearized model M i = M ′ i ( x i − 1 ) occurs only in advancing the time as the action on the ensemble of the increments � � x i − 1 + z ℓ − x i ≈ M i ( x i − 1 ) + M ′ i ( x i − 1 ) z ℓ i − 1 − x i = M i z ℓ M i i − 1 + m i i − 1 Approximating by finite differences with a parameter τ > 0: � � x i − 1 + τ z ℓ M i − M i ( x i − 1 ) i − 1 M i z ℓ i − 1 + m i ≈ + M i ( x i − 1 ) − x i τ Needs N + 1 evaluations of M i , at x i − 1 and x i − 1 + τ z ℓ i − 1 , ℓ = 1, . . . , N Accurate in the limit τ → 0. � � x i − 1 + z ℓ For τ = 1, recover the nonlinear model: M i − x i i − 1

Observation operator The observation matrix occurs only in the action on the ensemble, H i Z N = � H i z 1 , . . . , H i z N � . Approximating by finite differences with a parameter τ > 0: � � x i + τ z ℓ H i − H i ( x i ) i H i z ℓ i − 1 ≈ τ Needs N + 1 evaluations of H i , at x i − 1 and x i − 1 + τ z n i − 1 . Accurate in the limit τ → 0. τ = 1 ⇒ becomes exactly the EnKS run on the nonlinear problem ⇒ EnKS independent of the point of linearization ⇒ no convergence of the 4DVAR iterations. In our tests, τ = 0.1 seems to work well enough.

Tikhonov regularization and Levenberg-Marquardt method ◮ Gauss-Newton may not converge, even locally. Add a penalty (Tikhonov regularization) to control the size of the increments z i : k k � z 0 − z b � 2 � z i − M i z i − 1 − m i � 2 � d i − H i z i � 2 ∑ ∑ B − 1 + + Q − 1 R − 1 i i i = 1 i = 1 k � z i � 2 ∑ + γ → min S − 1 z 0 : k i i = 0 ◮ Becomes the Levenberg-Marquardt method, which is guaranteed to converge for large enough γ . ◮ Implement the regularization as independent observations z i ≈ 0 with error covariance S i : simply run the analysis step the second time (Johns and Mandel, 2008). Here, this is statistically exact because the distributions in the Kalman smoother are gaussian.

Convergence Theorem . Every iteration of the Levenberg-Marquardt or the Gauss-Newton method with the stochastic linear solver by the EnKS converges in all L p , 1 ≤ p < ∞ as N → ∞ , and τ → 0 , to the same iteration with the exact linear solver.

Functional interface No matrices needed explicitly. Only certain matrix-vector operations. ◮ Evaluation of the model operator M i ( x i − 1 ) for a given state x i − 1 . ◮ Evaluation of the observation operator H i ( x i ) for a given state x i . ◮ Multiplication of a vector u 0 by square root V of the background covariance B : B = VV T . Similarly, multiplication of a vector u i by a square roots of the model error covariance Q i , and a data-dimensioned vector v i by a square root of the data error covariance R i .These matrix roots are used only to generate samples from N ( 0, B ) , N ( 0, Q i ) , and N ( 0, R i ) . ◮ Solving a system with matrix R i .

Computational results Lorenz 63 model dx dt = − σ ( x − y ) dy dt = ρ x − y − xz dz dt = xy − β z

EnKS-4DVAR for Lorenz 63 model Twin experiment - one solution chosen as the truth � 1, 1 4, 1 � � x 2 , y 2 , z 2 � B = σ 2 R i = σ 2 H i ( x , y , z ) = b diag , r I , . 9 Root mean square error of EnKS-4DVAR iterations over 50 timesteps, ensemble size 100, model error covariances Q i ≈ 0. Iteration 1 2 3 4 5 6 RMSE 20.16 15.37 3.73 2.53 0.09 0.09

Stochastic solution of large least squares systems in variational - PowerPoint PPT Presentation

Stochastic solution of large least squares systems in variational data assimilation Parallel 4DVAR without tangents and adjoints J. Mandel, S. Gratton, and E. Bergou University of Colorado Denver Institute of Computer Science, Czech Academy of

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

Col A Ax 0 1 Let W = Col A where A is m n and A = . a 1 a 2 a n Suppose b is in R m

Squares of function spaces and function spaces on squares Miko laj Krupski University of

Efficient Determination of the Hyperparameter via L-curve in Large Scale Least Squares and Total

Clustering 2 Clustering 2 Nov 3 2008 HAC Algorithm HAC Algorithm St t Start with all objects in

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , , ! x N R d

Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Sup

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Linear Regression II, SGD Milan Straka October 12, 2020 Charles University in Prague Faculty of

tt ss str

Draft Simulation de chaines de Markov: briser le mur de la convergence en n 1 / 2 Pierre