Parallel Algorithms for Four-Dimensional Variational Data - PowerPoint PPT Presentation

Parallel Algorithms for Four-Dimensional Variational Data Assimilation Mike Fisher ECMWF October 24, 2011 Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 1 / 37

Brief Introduction to 4D-Var Four-Dimensional Variational Data Assimilation 4D-Var is the method used by most national and international Numerical Weather Forecasting Centres to provide initial conditions for their forecast models. 4D-Var combines observations with a prior estimate of the state, provided by an earlier forecast. The method is described as Four-Dimensional because it takes into account observations that are distributed in space and over an interval of time (typically 6 or 12 hours), often called the analysis window. It does this by using a complex and computationally expensive numerical model to propagate information in time. In many applications of 4D-Var, the model is assumed to be perfect. In this talk, I will concentrate on so-called weak-constraint 4D-Var, which takes into account imperfections in the model. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 2 / 37

Brief Introduction to 4D-Var Weak-Constraint 4D-Var represents the data-assimilation problem as a very large least-squares problem. 1 2 ( x 0 − x b ) T B − 1 ( x 0 − x b ) J ( x 0 , x 1 , . . . , x N ) = N +1 ( y k − H k ( x k )) T R − 1 � ( y k − H k ( x k )) k 2 k =0 N +1 q ) T Q − 1 � ( q k − ¯ ( q k − ¯ q ) k 2 k =1 where q k = x k − M k ( x k − 1 ). Here, the cost function J is a function of the states x 0 , x 1 , . . . , x N defined at the start of each of a set of sub-windows that span the analysis window. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 3 / 37

Brief Introduction to 4D-Var 1 2 ( x 0 − x b ) T B − 1 ( x 0 − x b ) J ( x 0 , x 1 , . . . , x N ) = N +1 ( y k − H k ( x k )) T R − 1 � ( y k − H k ( x k )) k 2 k =0 N +1 q ) T Q − 1 � ( q k − ¯ ( q k − ¯ q ) k 2 k =1 Each x i contains ≈ 10 7 elements. Each y i contains ≈ 10 5 elements. The operators H k and M k , and the matrices B , R k and Q k are represented by codes that apply them to vectors. We do not have access to their elements. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 4 / 37

Brief Introduction to 4D-Var 1 2 ( x 0 − x b ) T B − 1 ( x 0 − x b ) J ( x 0 , x 1 , . . . , x N ) = N +1 ( y k − H k ( x k )) T R − 1 � ( y k − H k ( x k )) k 2 k =0 N +1 q ) T Q − 1 � ( q k − ¯ ( q k − ¯ q ) k 2 k =1 H k and M k involve integrations of the numerical model, and are computationally expensive. The covariance matrices B , R k and Q k are less expensive to apply. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 5 / 37

Brief Introduction to 4D-Var x q 3 q 1 x 1 q 2 x 3 x 2 x 0 time Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 6 / 37

Brief Introduction to 4D-Var x q 3 q 1 x 1 q 2 x 3 x 2 x 0 time The cost function contains 3 terms: 2 ( x 0 − x b ) T B − 1 ( x 0 − x b ) ensures that x 0 stays close to the prior 1 estimate. k =0 ( y k − H k ( x k )) T R − 1 � N 1 ( y k − H k ( x k )) keeps the estimate close 2 k to the observations (blue circles). q ) T Q − 1 1 � N k =1 ( q k − ¯ ( q k − ¯ q ) keeps the jumps between 2 k sub-windows small. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 7 / 37

Gauss-Newton (Incremental) Algorithm It is usual to minimize the cost function using a modified Gauss-Newton algorithm. Nearly all the computational cost is in the inner loop, which minimizes the quadratic cost function: 1 δ x 0 − b ( n ) � T � B − 1 � δ x 0 − b ( n ) � ˆ J ( δ x ( n ) 0 , . . . , δ x ( n ) N ) = 2 N +1 � T � � � H ( n ) k δ x k − d ( n ) H ( n ) k δ x k − d ( n ) � R − 1 k k k 2 k =0 N +1 � T � � � δ q k − c ( n ) δ q k − c ( n ) � Q − 1 k k k 2 k =1 δ q k = δ x k − M ( n ) k δ x k − 1 , and where b ( n ) , c ( n ) and d ( n ) come from the outer loop: k k b ( n ) = x b − x ( n ) c ( n ) q − q ( n ) d ( n ) = y k − H k ( x ( n ) = ¯ k ) 0 , k , k k Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 8 / 37

Gauss-Newton (Incremental) Algorithm 1 δ x 0 − b ( n ) � T � B − 1 � δ x 0 − b ( n ) � J ( δ x ( n ) 0 , . . . , δ x ( n ) ˆ N ) = 2 N +1 � T � H ( n ) k δ x k − d ( n ) � H ( n ) k δ x k − d ( n ) � � R − 1 k k k 2 k =0 N +1 � T � � � δ q k − c ( n ) δ q k − c ( n ) � Q − 1 k k k 2 k =1 Note that 4D-Var requires tangent linear versions of M k and H k : • M ( n ) and H ( n ) k , respectively k It also requires the transposes (adjoints) of these operators: • ( M ( n ) k ) T and ( H ( n ) k ) T , respectively Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 9 / 37

Why do we need more parallel algorithms? In its usual implementation, 4D-Var is solved by applying a conjugate-gradient solver. This is highly sequential: • Iterations of CG. • Tangent Linear and Adjoint integrations run one after the other. • Model timesteps follow each other. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 10 / 37

Why do we need more parallel algorithms? In its usual implementation, 4D-Var is solved by applying a conjugate-gradient solver. This is highly sequential: • Iterations of CG. • Tangent Linear and Adjoint integrations run one after the other. • Model timesteps follow each other. Computers are becoming ever more parallel, but processors are not getting faster. Unless we do something to make 4D-Var more parallel, we will soon find that 4D-Var becomes un-affordable (even with a 12-hour window). Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 10 / 37

Why do we need more parallel algorithms? In its usual implementation, 4D-Var is solved by applying a conjugate-gradient solver. This is highly sequential: • Iterations of CG. • Tangent Linear and Adjoint integrations run one after the other. • Model timesteps follow each other. Computers are becoming ever more parallel, but processors are not getting faster. Unless we do something to make 4D-Var more parallel, we will soon find that 4D-Var becomes un-affordable (even with a 12-hour window). We cannot make the model more parallel. • The inner loops of 4D-Var run with a few 10’s of grid columns per processor. • This is barely enough to mask inter-processor communication costs. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 10 / 37

Why do we need more parallel algorithms? In its usual implementation, 4D-Var is solved by applying a conjugate-gradient solver. This is highly sequential: • Iterations of CG. • Tangent Linear and Adjoint integrations run one after the other. • Model timesteps follow each other. Computers are becoming ever more parallel, but processors are not getting faster. Unless we do something to make 4D-Var more parallel, we will soon find that 4D-Var becomes un-affordable (even with a 12-hour window). We cannot make the model more parallel. • The inner loops of 4D-Var run with a few 10’s of grid columns per processor. • This is barely enough to mask inter-processor communication costs. We have to use more parallel algorithms. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 10 / 37

Parallelising within an Iteration The model is already parallel in both horizontal directions. The modellers tell us that it will be hard to parallelise in the vertical (and we already have too little work per processor). We are left with parallelising in the time direction. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 11 / 37

Weak Constraint 4D-Var: Inner Loop Dropping the outer loop index ( n ), the inner loop of weak-constraints 4D-Var minimises: 1 2 ( δ x 0 − b ) T B − 1 ( δ x 0 − b ) ˆ J ( δ x 0 , . . . , δ x N ) = N +1 ( H k δ x k − d k ) T R − 1 � ( H k δ x k − d k ) k 2 k =0 N +1 ( δ q k − c k ) T Q − 1 � ( δ q k − c k ) k 2 k =1 where δ q k = δ x k − M k δ x k − 1 , and where b , c k and d k come from the outer loop: b = x b − x 0 c k = ¯ q − q k d k = y k − H k ( x k ) Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 12 / 37

Weak Constraint 4D-Var: Inner Loop We can simplify this further by defining some 4D vectors and matrices:     δ x 0 δ x 0 δ x 1 δ q 1     δ x = δ p =  .   .  . .     . .     δ x N δ q N These vectors are related through δ q k = δ x k − M k δ x k − 1 . We can write this relationship in matrix form as: δ p = L δ x where:   I − M 1 I     − M 2 I L =     ... ...     − M N I Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 13 / 37

Weak Constraint 4D-Var: Inner Loop   I − M 1 I     − M 2 I L =     ... ...     − M N I δ p = L δ x can be done in parallel: δ q k = δ x k − M k δ x k − 1 . We know all the δ x k − 1 ′ s . We can apply all the M k ′ s simultaneously. δ x = L − 1 δ p is sequential: δ x k = M k δ x k − 1 + δ q k . We have to generate each δ x k − 1 in turn before we can apply the next M k . Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 14 / 37

Brief Introduction to 4D-Var x q 3 q 1 x 1 q 2 x 3 x 2 x 0 time Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 15 / 37

Parallel Algorithms for Four-Dimensional Variational Data - PowerPoint PPT Presentation

Parallel Algorithms for Four-Dimensional Variational Data Assimilation Mike Fisher ECMWF October 24, 2011 Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 1 / 37 Brief Introduction to 4D-Var Four-Dimensional Variational Data

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Handling of Position Errors in Variational and Hybrid Ensemble/Variational Data Assimilation Using

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Chapter 2 The Physical Layer 1 The Theoretical Basis for Data Communication Fourier

2 nd Year Report Stefan Grimm Structure Introduction Detectors Fisher-matrix

Utilization of Geostationary Satellite Observations for Air Quality Modeling During 2013

Aqua Program Status Claire L. Parkinson Aqua Project Scientist AIRS Science Team Meeting

Time-parallel algorithms for 4D-Var Mike Fisher Selime G urol ECMWF 7 October 2013 6 th WMO

Dark Energy From Space: Euclid and WFIRST David Weinberg, Ohio State University Member of

Geolocation Jay Urbain, PhD Credits: http://www.w3schools.com Jpassion.com Dr. Mark

UNITED NATIONS / SOUTH AFRICA SYMPOSIUM ON BASIC SPACE TECHNOLOGY Small Satellite Missions for

Parallel Algorithms for Four-Dimensional Variational Data - PowerPoint PPT Presentation

Parallel Algorithms for Four-Dimensional Variational Data Assimilation Mike Fisher ECMWF October 24, 2011 Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 1 / 37 Brief Introduction to 4D-Var Four-Dimensional Variational Data

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Handling of Position Errors in Variational and Hybrid Ensemble/Variational Data Assimilation Using

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Chapter 2 The Physical Layer 1 The Theoretical Basis for Data Communication Fourier

2 nd Year Report Stefan Grimm Structure Introduction Detectors Fisher-matrix

Utilization of Geostationary Satellite Observations for Air Quality Modeling During 2013

Aqua Program Status Claire L. Parkinson Aqua Project Scientist AIRS Science Team Meeting

Time-parallel algorithms for 4D-Var Mike Fisher Selime G urol ECMWF 7 October 2013 6 th WMO

Dark Energy From Space: Euclid and WFIRST David Weinberg, Ohio State University Member of

Geolocation Jay Urbain, PhD Credits: http://www.w3schools.com Jpassion.com Dr. Mark

UNITED NATIONS / SOUTH AFRICA SYMPOSIUM ON BASIC SPACE TECHNOLOGY Small Satellite Missions for

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions