Preconditioning of conjugate-gradients in observation space for - PowerPoint PPT Presentation

Preconditioning of conjugate-gradients in observation space for 4D-VAR S. Gratton 12 S. Gurol 2 Ph.L. Toint 3 J. Tshimanga 1 1 ENSEEIHT, Toulouse, France 2 CERFACS, Toulouse, France 3 FUNDP, Namur, Belgium Large scale problems and Applications in the Earth Sciences Gratton, Gurol, Toint,Tshimanga RICAM

Outline Introduction 1 Algorithms for primal space 2 Working in the observation space 3 Krylov subspace methods Results on realistic systems Preconditioning Convergence Properties Conclusions 4 Gratton, Gurol, Toint,Tshimanga RICAM

Introduction 4D-Var problem: Formulation → Large-scale non-linear weighted least-squares problem: N � x ∈ R n f ( x ) = 1 B − 1 + 1 2 || x − x b || 2 ||H j ( M j ( x )) − y j || 2 min R − 1 2 j j =0 where: x ∈ R n is the control variable The observations y j and the background x b are noisy M j are model operators H j are observation operators B is the covariance background error matrix R j are covariance observation error matrices Gratton, Gurol, Toint,Tshimanga RICAM

Introduction 4D-Var problem: Formulation → Large-scale nonlinear weighted least-squares problem: N � x ∈ R n f ( x ) = 1 B − 1 + 1 2 || x − x b || 2 ||H j ( M j ( x )) − y j || 2 min R − 1 2 j j =0 Typically solved by a standard Gauss-Newton method known as Incremental 4D-Var in data assimilation community Solve linearized subproblem at iteration k (linesearch or trust-region 1 techniques require this kind or steps) 1 B − 1 + 1 2 � δx − [ x b − x ] � 2 2 � Hδx − d � 2 δx ∈ R n J ( δx ) min = R − 1 Sequence of quadratic minimization problems Perform update x ( k +1) ( t 0 ) = x ( k ) ( t 0 ) + δ x ( k ) 2 Gratton, Gurol, Toint,Tshimanga RICAM

Introduction Quick note on globalization We want to converge to a point where the gradient is zero, from any starting point Technically, if the function is regular (gradient Lipschitz continuous, bounded from below), an ”approximate” gradient is all what is needed to have a globally convergent algorithm There are basically two techniques to obtain provable global convergence: linesearch, and trust region In linesearch, a steplength is controlled along a descent direction In trust regions, a model is minimized inside a sphere. In the case of the Taylor model truncated conjugate gradients (CG) do this In the rest of the talk, we focus on CG, and assume it is suitably truncated Gratton, Gurol, Toint,Tshimanga RICAM

Introduction 4D-Var problem: Solution From optimality condition ( B − 1 + H T R − 1 H ) δ x = B − 1 ( x b − x ) + H T R − 1 d The aim is to approximately solve sequences of this linear system. Solution algorithms: conjugate gradient like method Exact solution writes: � � − 1 H T R − 1 ( d − H ( x b − x 0 )) B − 1 + H T R − 1 H x b − x 0 + Gratton, Gurol, Toint,Tshimanga RICAM

Preconditioned Lanczos algorithm ( F 1 / 2 is not required!) Algorithms for primal space For i = 1 , 2 , ..., l w i = ( B − 1 + H T R − 1 H ) z i → Construction of the Krylov sequence 1 w i = w i − β i v i − 1 2 α i = � w i , z i � → Orthogonalization 3 w i +1 = w i − α i v i 4 z i +1 = Fw i +1 → Apply preconditioner 5 β i +1 = � z i +1 , w i +1 � 1 / 2 6 v i +1 = w i +1 /β i +1 → Normalization 7 z i +1 = z i +1 /β i +1 8 V = [ V, v i +1 ] → Orthonormal basis for Krylov subspace 9 10 T i,i = α i ; T i +1 ,i = T i,i +1 = β i +1 → Generate the tridiagonal matrix Solution y l = T − 1 → Impose the condition r k ⊥K l ( A, r 0 ) β 0 e 1 1 l δx l = FV l y l → Find the approximate solution 2 Gratton, Gurol, Toint,Tshimanga RICAM

Preconditioned CG algorithm ( F 1 / 2 is not required!) Algorithms for primal space Initialization r 0 = Aδx 0 − b , z 0 = Fr 0 , p 0 = z 0 For i = 0 , 1 , ... q i = ( B − 1 + H T R − 1 H ) p i 1 α i = < r i , z i > / < q i , p i > → Compute the step-length 2 δx i +1 = δx i + α i p i → Update the iterate 3 r i +1 = r i − α i q i → Update the residual 4 r i +1 = r i +1 − RZ T r i +1 → Re-orthogonalization 5 z i +1 = Fr i +1 → Update the preconditioned residual 6 β i = < r i +1 , z i +1 > / < r i , z i > → Ensure A-conjugate directions 7 R = [ R, r/β i ] → Re-orthogonalization 8 Z = [ Z, z/β i ] → Re-orthogonalization 9 10 p i +1 = z i +1 + β i p i → Update the descent direction Gratton, Gurol, Toint,Tshimanga RICAM

Algorithms for primal space Inner minimization Minimizing 1 B − 1 + 1 2 � δ x 0 − [ x b − x 0 ] � 2 2 � H δ x 0 − d � 2 J [ δ x 0 ] = R − 1 amounts to solve ( B − 1 + H T R − 1 H ) δ x 0 = B − 1 ( x b − x 0 ) + H T R − 1 d . Exact solution writes � − 1 H T R − 1 � � � x b − x 0 + B − 1 + H T R − 1 H d − H ( x b − x 0 ) , or equivalently (using the S-M-Woodbury formula) R + HBH T � − 1 � � x b − x 0 + BH T � d − H ( x b − x 0 ) . Gratton, Gurol, Toint,Tshimanga RICAM

Algorithms for primal space Dual formulation : PSAS 1 Very popular when few observations compared to model variables. 2 Relies on R + HBH T � − 1 � � x b − x 0 + BH T � d − H ( x b − x 0 ) 3 Iteratively solve � T � v = R − 1 ( d − H ( x b − x 0 )) I + R − 1 HBH � for � v 4 Set δx 0 = x b − x 0 + BH T � v Gratton, Gurol, Toint,Tshimanga RICAM

Algorithms for primal space Experiments Quadratic cost with primal and dual solvers size(B) = 1024 size(R) = 256 5 10 RPCG PSAS 4 10 Cost function 3 10 2 10 0 20 40 60 80 Iterations Gratton, Gurol, Toint,Tshimanga RICAM

Algorithms for primal space Motivation : PSAS and CG-like algorithm 1 CG minimizes the Incremental 4D-Var function during its iterations. It minimizes a quadratic approximation of the non quadratic function : Gauss-Newton in the model space. 2 PSAS does not minimize the Incremental 4D-Var function during its iterations but works in the observation space. Our goal : put the advantages of both approaches together in a Trust-Region framework, to guarantee convergence: Keeping the variational property, to get the so-called Cauchy decrease even when iterations are truncated. Being computationally efficient whenever the number of observations is significantly smaller than the size of the state vector. Getting global convergence in the observation space ! Gratton, Gurol, Toint,Tshimanga RICAM

Algorithms for primal space Dual CG algorithm : assumptions 1 Suppose the CG algorithm is applied to solve the Inc-4D using a preconditioning matrix F 2 Suppose there exists G m × m such that FH T = BH T G For ”exact” preconditioners � � − 1 H T = BH T � I + R − 1 HBH T � − 1 B − 1 + H T R − 1 H Gratton, Gurol, Toint,Tshimanga RICAM

Working in the observation space Preconditioned CG on Incremental 4D-Var cost function Initialization steps Initialization steps Loop: WHILE Loop: WHILE 1 q i − 1 = 1 q i − 1 = Ap i − 1 ( H T R − 1 H + B − 1 ) p i − 1 2 α i − 1 = r T i − 1 z i − 1 / q T i − 1 p i − 1 2 α i − 1 = r T i − 1 z i − 1 / q T i − 1 p i − 1 3 v i = v i − 1 + α i − 1 p i − 1 3 v i = v i − 1 + α i − 1 p i − 1 4 r i = r i − 1 + α i − 1 q i − 1 4 r i = r i − 1 + α i − 1 q i − 1 5 z i = Fr i 5 z i = Fr i 6 β i = r T i z i / r T i − 1 z i − 1 6 β i = r T i z i / r T i − 1 z i − 1 7 p i = − z i + β i p i − 1 7 p i = − z i + β i p i − 1 Gratton, Gurol, Toint,Tshimanga RICAM

Working in the observation space A useful observation Suppose that 1 BH T G = FH T . 2 v 0 = x b − x 0 . theres coordinate vectors � r i , p i , � � v i , � z i and � q i such that H T � r i = r i , BH T � = p i , p i v 0 + BH T � = v i , v i BH T � z i = z i , H T � = q i q i Gratton, Gurol, Toint,Tshimanga RICAM

Working in the observation space Preconditioned CG on Incremental 4D-Var cost function (bis) Initialization steps given v 0 ; r 0 = ( H T R − 1 H + B − 1 ) v 0 − b , . . . Loop: WHILE q i − 1 = H T ( R − 1 HB − 1 H T + I m ) � 1 H T � p i − 1 2 α i − 1 = r T q T i − 1 z i − 1 / � i − 1 � p i − 1 3 BH T � v i = BH T ( � v i − 1 + α i − 1 � p i − 1 ) 4 H T � r i = H T ( � r i − 1 + α i − 1 � q i − 1 ) 5 BH T � r i = BH T G � z i = FH T � r i 6 β i = ( r T i z i / r T i − 1 z i − 1 ) 7 BH T � p i = BH T ( − � z i + β i � p i − 1 ) Gratton, Gurol, Toint,Tshimanga RICAM

Working in the observation space Restricted PCG (version 1) : expensive Initialization steps given v 0 ; r 0 = ( H T R − 1 H + B − 1 ) v 0 − b , . . . Loop: WHILE q i − 1 = ( I m + R − 1 HB − 1 H T ) � � p i − 1 1 i − 1 HBH T � 2 α i − 1 = � r T i − 1 HBH T � q T z i − 1 / � p i − 1 � v i = � v i − 1 + α i − 1 � p i − 1 3 4 � r i = � r i − 1 + α i − 1 � q i − 1 5 � z i = FH T � r i = G � r i 6 β i = � r T i HBH T � r T i − 1 HBH T � z i / � z i − 1 � p i = − � z i + β i � p i − 1 7 Gratton, Gurol, Toint,Tshimanga RICAM

Working in the observation space More transformations 1 Consider w and t defined by w i = HBH T � t i = HBH T � z i and p i 2 From Restricted PCG (version 1) � − w 0 if i = 0 , t i = − w i + β i t i − 1 if i > 0 , 3 Use these relations into Restricted PCG (version 1) 4 Transform Restricted PCG (version 1) into Restricted PCG (version 2) Gratton, Gurol, Toint,Tshimanga RICAM

Preconditioning of conjugate-gradients in observation space for - PowerPoint PPT Presentation

Preconditioning of conjugate-gradients in observation space for 4D-VAR S. Gratton 12 S. Gurol 2 Ph.L. Toint 3 J. Tshimanga 1 1 ENSEEIHT, Toulouse, France 2 CERFACS, Toulouse, France 3 FUNDP, Namur, Belgium Large scale problems and Applications in

CS 5220: More Sparse LA David Bindel 2017-10-26 1 Reminder: Conjugate Gradients What if we only

Blended Conditional Gradients: The unconditioning of conditional gradients Joint work with Gabor

Outline Last time Image gradients Seam carving gradients as energy Edges

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

A new algorithm for computing quadrature-based bounds in conjugate gradients Petr Tich Czech

On computing quadrature-based bounds for the A -norm of the error in conjugate gradients Petr

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

Natural Policy Gradients (cont.) Katerina Fragkiadaki Revision Policy Gradients 1.

Qatar observation stations Qatar observation stations, Instruments and calibrations By By

GPSCP: A General-Purpose Support-Circuit Preconditioning Approach to Large Scale SPICE Accurate

Numerical Upscaling and Preconditioning of Flows in Highly Heterogeneous Porous Media R. Lazarov,

Preconditioning and nonlinear time solvers for the JOREK MHD code E. Franck, A. Lessig, M. H

What is Responsible for the Low-level What is Responsible for the Low-level Moist Preconditioning

Original motivation for nonlinear preconditioning A nonlinear system F ( u ) = 0 may be

Thailand Earth Observation Activities and THEOS Thailand Earth Observation Activities and THEOS

Medicare Medicare Outpatient Observation Notice Outpatient Observation Notice Janet Miller

Superscalar Design: Instruction Flow Techniques Virendra Singh Associate Professor C omputer A

r r t rt

Reducing the branch delay IF.Flush Hazard detection unit ID/EX M u x WB

An Efficient Quantum Collision Search Algorithm and Implications on Symmetric Cryptography

Endpoint behavior of modulation invariant singular integrals Francesco Di Plinio INdAM-Cofund

Voltage-dependent coherent drift modes and turbulent transition regimes in small magnetron

Preparing Symmetric Crypto for the Quantum World Mar a Naya-Plasencia Inria, France ERC

Recent Results in Sparse Domination Michael Lacey Georgia Tech May 31, 2018 Section 0.0 Slide