Improving EnKF with machine learning algorithms John Harlim - PowerPoint PPT Presentation

Improving EnKF with machine learning algorithms John Harlim Department of Mathematics and Department of Meteorology The Pennsylvania State University June 12, 2017

Overview A supervised learning algorithm An unsupervised learning algorithm (diffusion maps) Learning the localization function of EnKF Learning a likelihood function. Application: To Correct biased observation model error in DA

A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N .

A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s .

A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s . ◮ Various methods to estimate H include regression, SVM, KNN, Neural Nets, etc.

A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s . ◮ Various methods to estimate H include regression, SVM, KNN, Neural Nets, etc. ◮ For this talk, we will focus on how to use regression in appropriate spaces to improve EnKF.

An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. 1 Coifman & Lafon 2006, Berry & H, 2016.

An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. In this talk, I will focus on a nonlinear manifold learning algorithm, the diffusion maps 1 : Given { x i } ∈ M ⊂ R n with a sampling measure q , the diffusion maps algorithm is a kernel based method that produces orthonormal basis functions on the manifold, ϕ k ∈ L 2 ( M , q ). 1 Coifman & Lafon 2006, Berry & H, 2016.

An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. In this talk, I will focus on a nonlinear manifold learning algorithm, the diffusion maps 1 : Given { x i } ∈ M ⊂ R n with a sampling measure q , the diffusion maps algorithm is a kernel based method that produces orthonormal basis functions on the manifold, ϕ k ∈ L 2 ( M , q ). These basis functions are solutions of an eigenvalue problem, � � q − 1 div q ∇ ϕ k ( x ) = λ k ϕ k ( x ) , where the weighted Laplacian operator is approximated with an integral operator with appropriate normalization. 1 Coifman & Lafon 2006, Berry & H, 2016.

Examples: Example: Uniformly distributed data on a circle, we obtain the Fourier basis. e ix e i2x e i3x 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Example: Gaussian distributed data on a real line, we obtain the Hermite polynomials. 600 30 300 estimate true 400 20 200 200 10 100 ϕ 3 ( x ) ϕ 1 ( x ) ϕ 2 ( x ) 0 0 0 -200 -10 -100 -400 -20 -200 -600 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -30 -300 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 x

Example: Nonparametric basis functions estimated on nontrivial manifold 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 −20 −20 10 10 −10 −10 0 0 0 0 10 10 −10 −10 20 20 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 −20 −20 10 10 −10 −10 0 0 0 0 10 10 −10 −10 20 20 Remark: Essentially, one can view the DM as a method to learn generalized Fourier basis on the manifold.

Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function.

Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function. ◮ For example, in the serial EnKF, for each scalar observation, y i , one “localizes” the Kalman gain, K = L xy i ◦ XY ⊤ i ( Y i Y ⊤ + R ) − 1 , i with an empirically chosen localization function L xy i (Gaspari-Cohn, etc), which requires some tunings.

Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function. ◮ For example, in the serial EnKF, for each scalar observation, y i , one “localizes” the Kalman gain, K = L xy i ◦ XY ⊤ i ( Y i Y ⊤ + R ) − 1 , i with an empirically chosen localization function L xy i (Gaspari-Cohn, etc), which requires some tunings. ◮ Let’s use the idea from machine learning to train this localization function. The key idea is to find a map that takes poorly estimated correlations to accurately estimated correlations.

Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. 2 De La Chevroti` ere & H, 2017.

Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. ◮ Operationally, we wish to run EnKF with K ≪ L ensemble members. Then our goal is to train a map that transform the subsampled correlation ρ K ij into the accurate correlation ρ L ij . 2 De La Chevroti` ere & H, 2017.

Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. ◮ Operationally, we wish to run EnKF with K ≪ L ensemble members. Then our goal is to train a map that transform the subsampled correlation ρ K ij into the accurate correlation ρ L ij . ◮ Basically, we consider the following optimization problem: � � � 2 � L x i y j ρ K ij − ρ L p ( ρ K ij | ρ L ij ) p ( ρ L ij ) d ρ K ij d ρ L min ij ij L xi yj [ − 1 , 1] [ − 1 , 1] M , S 1 MC � ( L x i y j ρ K ij , m , s − ρ L ij , m ) 2 , ≈ min MS L xi yj m , s =1 where ρ L ij , m ∼ p ( ρ L ij ) and ρ K ij , m , s ∼ p ( ρ K ij | ρ L ij ) is an estimated correlation using only K out of L training data. 2 De La Chevroti` ere & H, 2017.

Example: On Monsoon-Hadley multicloud model 3 It’s a Galerkin projection of zonally symmetric β -plane primitive eqns into the barotropic, and first two baroclinic modes, stochastically driven by a three-cloud model paradigm. Consider observation model h ( x ) that is similar to a RTM. 3 M. De La Chevroti` ere and B. Khouider 2016.

Example of trained localization map Channel 3 and θ 1 Channel 6 and θ eb

DA results

Correcting biased observation model error 4 All the Kalman based DA method assumes unbiased observation model error, e.g., η i ∼ N (0 , R ) . y i = h ( x i ) + η i , Suppose the operator h is un known. Instead, we are only given ˜ h , then y i = ˜ h ( x i ) + b i where we introduce a biased model error, b i = h ( x i ) − ˜ h ( x i ) + η i . 4 Berry & H, 2017.

Example: Basic radiative transfer model Consider solutions of the stochastic cloud model 5 , { T ( z ) , θ eb , q , f d , f s , f c } . Based on this solutions, define a basic radiative transfer model as follows, � ∞ T ( z ) ∂ T ν h ν ( x ) = θ eb T ν (0) + ∂ z ( z ) dz , 0 where T ν is the transmission between heights z to ∞ that is defined to depend on q . The weighting function, ∂ T ν are defined as follows: ∂ z 16 14 12 10 height (z) 8 6 4 2 0 0 0.05 0.1 0.15 weighting function ( ∂T ν ∂z ) 5 Khouider, Biello, Majda 2010

Example: Basic radiative transfer model Suppose the deep and stratiform cloud top height is z d = 12km, while the cumulus cloud top height is z c = 3km. Define f = { f d , f c , f s } and x = { T ( z ) , θ eb , q } . Then the cloudy RTM is given by, � z d T ( z ) ∂ T ν � � h ν ( x , f ) = (1 − f d − f s ) θ eb T ν (0) + ∂ z ( z ) dz 0 � ∞ T ( z ) ∂ T ν +( f d + f s ) T ( z t ) T ν ( z d ) + ∂ z ( z ) dz z d

Improving EnKF with machine learning algorithms John Harlim - PowerPoint PPT Presentation

Improving EnKF with machine learning algorithms John Harlim Department of Mathematics and Department of Meteorology The Pennsylvania State University June 12, 2017 Overview A supervised learning algorithm An unsupervised learning algorithm

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Real-data Mesoscale Applications of EnKF and Towards coupling EnKF with 4DVAR Fuqing Zhang

Morphing and wavelet EnKF data assimilation Jan Mandel Based on joint work with J. D. Beezley, L.

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

EnKF FAQ (Ensemble Kalman filter Frequently asked questions) pdf x Patrick N. Raanes,

Adaptive covariance inflation in the EnKF by Gaussian scale mixtures pdf x Patrick N. Raanes,

EnKF and filter divergence David Kelly Andrew Stuart Kody Law Courant Institute New York

An adaptive plurigaussian truncation scheme for geological uncertainty quantification using EnKF.

Robust method for EnKF in the presence of observation outliers/Multivariate localization methods

Toward Assimilation of Crowdsourcing Data using the EnKF William Lahoz and Philipp Schneider

EnKF and filter divergence David Kelly Andrew Stuart Kody Law Courant Institute New York

Comparison of 4DVAR and EnKF state estimates and forecasts in the Gulf of Mexico Ganesh

What do we know about EnKF? David Kelly Kody Law Andrew Stuart Andrew Majda Xin Tong Courant

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Lecture #7: Regularization Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

Likelihood inference in complex settings Nancy Reid with Uyen Hoang, Wei Lin, Ximing Xu 1 / 30

Social Pr Social Protection otection: : Conc Concepts and Lif epts and Lifec ecycle le

T he So c ia l E nte rprise L ife Cyc le Da na Bra kma n Re ise r a nd Ste ve n De a n Pro

Likelihood Ratio Test Lecture 19 Biostatistics 602 - Statistical Inference . . . . Unbiased

Controlling Health Care Costs Through Limited Network Insurance Plans: Evidence from

Overview 1. Linearization 2. Examples of linearization 3. Example with Mathematica 4.

Using Bayesian Model Probability with ensemble methods to quantify uncertainty in reservoir