Predicting chaotic systems with sparse data lessons from numerical - PowerPoint PPT Presentation

Predicting chaotic systems with sparse data lessons from numerical weather prediction David Kelly, Courant Institute, NYU 1

A toy atmospheric model We have a three dimensional variable evolving in ( x, y, z ) time, governed by a differential equation called the Lorenz model x = σ ( y − x ) ˙ y = x ( ρ − z ) − y ˙ z = xy − β z ˙ The model is chaotic: an uncertain estimate of the current state becomes more uncertain over time 2

A single trajectory 3

An ensemble of trajectories Small errors in the initial state will get larger over time 4

Suppose that we make a partial observation of the ‘true’ trajectory every seconds τ x ( n τ ) + ξ n n = 1 , 2 , . . . ξ n where is a source of observational noise. Can we combine the model and the data to improve the estimate of the current state? 5

Assimilating the x data The blue curve is the true trajectory The dots are an ensemble of state estimates The state estimates synchronize with the true state 6

Not just for toy models … This precise idea is used in numerical weather prediction (any many many more applications) The NWP model is a family of layered, discretized partial differential equations (PDEs). The model can be extremely high dimensional ( ~ 10^9 variables). 7

Not just for toy models … Enormous amounts of observational data, that are accurate (small noise) but very sparse (we do not observe the whole ocean, the small scales) Observational data is not simply ‘observing the x variable’, but complicated functions of many variables (eg. how does this GPS enabled buoy move through the ocean?) 8

The next 25 slides 1. The math behind data assimilation 2. The linear model scenario 3. The nonlinear model scenario 9

The math of data assimilation 10

The model We are tracking a d-dimensional vector whose u n motion is governed by the discrete time random dynamical system u n +1 = ψ ( u n ) + η n where is iid Gaussian noise (zero in the η n ∼ N (0 , Σ ) ψ Lorenz example) and is a deterministic map. The initial state is unknown exactly, but known to u 0 be distributed like a Gaussian u 0 ∼ N ( m 0 , C 0 ) 11

The data We make a partial, noisy observation at every time step z n +1 = h ( u n +1 ) + ξ n +1 where is idd Gaussian noise and is ξ n +1 ∼ N (0 , Γ ) h the observation function 12

Bayes formula The state is a random variable. We would like u n to know the conditional distribution of given all the u n data up to time n Z n = { z 1 , z 2 , . . . , z n } By Bayes’ formula we have P ( u n +1 | Z n +1 ) = P ( u n +1 | Z n , z n +1 ) = P ( z n +1 | u n +1 , Z n ) P ( u n +1 | Z n ) P ( z n +1 | Z n ) = P ( z n +1 | u n +1 ) P ( u n +1 | Z n ) P ( z n +1 | Z n ) 13

y The conditional distribution at time n Ψ P ( u n | Z n ) x obs 15

The ‘forecast’ distribution y (ie. the prior) ψ ( u n ) + η n P ( u n +1 | Z n ) x obs 16

Make an observation y (here just the x variable) x obs 17

Re-weight the forecast distribution y with the likelihood of the observation The green distribution is the posterior x obs P ( u n +1 | Z n +1 ) ∝ P ( z n +1 | u n +1 ) P ( u n +1 | Z n ) 18

In the linear model scenario, everything has a formula! 19

The Kalman filter When the dynamics and the observation function ψ h u n | Z n are both linear, the conditional random variable is a Gaussian and is characterized completely by its mean and covariance ( m n , C n ) The mean and covariance satisfy a recursion formula m n +1 = ( I − K n +1 H ) ψ ( m n ) + K n +1 z n +1 C n +1 = ( I − K n +1 H )( ψ C n ψ T + Σ ) The Kalman gain is the correct convex K n +1 combination of model and data, it is determined by the forecast and data covariances. 20

What can we do for nonlinear models? 21

3DVAR algorithm Obtain a state estimate using the Kalman update m n formula m n +1 = ( I − KH ) ψ ( m n ) + Kz n +1 Since the model is nonlinear, distributions are no longer Gaussian and there is no ‘correct’ choice for the Kalman gain 22

H = (1 , 0 , 0) , K = (1 , 1 , 1) T   x (( n + 1) τ ) + ξ n +1 m n +1 = ψ y ( m n ) + ( x (( n + 1) τ ) + ξ n +1 − ψ x ( m n ))   ψ z ( m n ) + ( x (( n + 1) τ ) + ξ n +1 − ψ x ( m n ) 23

The 3DVAR algorithm is accurate (if properly tuned): in that the estimates get closer to the true state when observational noise is small and when enough variables are observed. When the observations are sparse, we cannot expect accuracy. Instead we would like a set of samples from the posterior. P ( u n | Z n ) The Ensemble Kalman filter (EnKF) uses an ensemble of ‘particles’ to find a state estimate and quantify the uncertainty of the estimate. 24

{ u (1) n , u (2) n , . . . , u ( K ) } The EnKF maintains an ensemble n to represent the whole posterior and not just the mean Each particle is updated much like the 3DVAR u ( k ) n +1 = ( I − K n +1 H ) ψ ( u ( k ) n ) + K n +1 z n +1 But the Kalman gain is chosen using the empirical covariance of the ‘forecast ensemble’ { ψ ( u (1) n ) , , . . . , ψ ( u ( K ) ) } n 25

{ u (1) n , u (2) n , . . . , u ( K ) } y The ensemble n represents the posterior at time n Ψ x obs 26

y The forecast ensemble { ψ ( u (1) n ) , , . . . , ψ ( u ( K ) ) } n represents the distribution P ( u n +1 | Z n ) x obs 27

Ensemble updated to y { u (1) n +1 , . . . , u ( K ) n +1 } To incorporate the observation x obs The convex combination uses the forecast covariance 28

The Canadian Weather Bureau uses EnKF for operational NWP, with around 100 particles for a ~ 10^9 dimensional state variable (!!!) EnKF works surprisingly well with high dimensional models that are effectively lower dimensional. The covariance of the forecast ensemble { ψ ( u (1) n ) , , . . . , ψ ( u ( K ) ) } n only represents the directions of highest model variation. Often much smaller dimension than state. 29

EnKF and the Sombrero SDE Consider the two dimensional stochastic differential equation u = (x,y) du = �r V ( u ) dt + σ dW with the ‘Sombrero potential’ V ( x, y ) = (1 − x 2 − y 2 ) 2 And we only observe the x variable 30

EnKF - only observing x, big red star is true state 31

EnKF - exact same truth, different model noise 32

Particle filter - sampling from the posterior, not just tracking the truth 33

Pros and cons Kalman filter - very efficient, but requires linearity. Can be expensive in high dimensions. 3DVAR - very efficient, requires tuning and has no uncertainty quantification EnKF - very efficient, works in high dimensions, provides UQ but not for non- Gaussians Particle filter - samples from the posterior, but very inefficient in high dimensions. 34

Thank you for listening! Slides are on my website dtbkelly.com 35

Predicting chaotic systems with sparse data lessons from numerical - PowerPoint PPT Presentation

Predicting chaotic systems with sparse data lessons from numerical weather prediction David Kelly, Courant Institute, NYU 1 A toy atmospheric model We have a three dimensional variable evolving in ( x, y, z ) time, governed by a

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Agile Leadership The Business World is Chaotic The world is a chaotic and complex place.

Making sense of energy in a chaotic environment by Chris Yelland CEng www.ee.co.za Outline

Chaotic motion Chaotic motion (chaos, or deterministic chaos) is aperiodic motion sensitive

Chaotic Extension of an Operator on a Hilbert Subspace Kit Chan Bowling Green State University

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Fast Chemical Reactions in Chaotic Flows: Predicting the Product Growth Rate Yue-Kin Tsang

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Secure Chaotic Spread Spectrum Systems Jin Yu Jin Yu WISELAB, ECE Department WISELAB, ECE

Wavefunctions in chaotic quantum systems Arnd B acker Institut f ur Theoretische Physik

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Predicting Return to Work Predicting Return to Work with Data Mining with Data Mining Claim A

May 2018 ALL THINGS ADAPTED LESSONS What are adapted lessons? therapeutic music lessons

Lessons Learned Lessons Learned From From Lessons Learned Lessons Learned From From

On complexity of chaotic elementary cellular Abstract automaton with memory: Rule 126

Highly-Scalable Algorithms for Ensemble Data Assimila8on Jeffrey

Recursive State Estimation Lecture 7 Perception as a Continuous Process Perception as a

Homework 2 CSE 573: Artificial Intelligence Autumn 2012 Particle Filters Particle Filters for

Outline Sequential Data - Part 2 Greg Mori - CMPT 419/726 Hidden Markov Models - Most Likely

RF-based DFAR and implicit ad-hoc usable security Stephan Sigg Aalto University, Communications

Random iteration and disjunctive processes Krzysztof Le sniak Faculty of Mathematics and

Motion in more than one dimension Vector equations of motion Different components (dimensions)

Probability and Random Processes Lecture 11 Measurable dynamical systems Random processes