Data-driven stochastic model reduction Fei Lu joint with Alexandre - PowerPoint PPT Presentation

Data-driven stochastic model reduction Fei Lu joint with Alexandre J. Chorin and Kevin Lin UC Berkeley Lawrence Berkeley National Laboratory U of Arizona July 18, 2016 ICERM, Brown University 1 / 22

Outline Part I: Motivation for stochastic model reduction Part II: A discrete-time approach Part III: Examples 2 / 22

Motivation Climate modeling, numerical weather prediction: High-dimensional Discrete partial Prediction Full system data d Forecast dt x = f ( x ) + U ( x , y ) , Observe only { x ( n δ ) } N n = 1 . x ( t ) , t ≥ N δ . d dt y = g ( x , y ) . Possible approaches: Estimate y , and solve the full system ◮ The full system is expensive to solve ◮ The full system may be unknown Construct a reduced model for x 3 / 22

Why stochastic models Data { x ( n δ ) } ⇒ properties of the full system: long-term statistics (ergodicity) N � 1 F ( x ( n i δ )) → E [ F ( x )] N i = 1 positive Lyapunov exponents the effects of y act like random forces Goal: develop a stochastic reduced system for x that can reproduce long-term statistics; make medium range predictions. 4 / 22

Outline Part I: Motivation for stochastic model reduction Part II: A discrete-time approach Previous work: continuous-time approach Discrete-time stochastic parametrization Part III: Examples 5 / 22

The Mori-Zwanzig Formalism ( Mori (65) and Zwanzig (73) ) Rewrite the deterministic system in a form which resembles a generalized Langevin equation ( Chorin-Hald (06) ): Z t dX t dt = f ( X t ) + Γ( t − s , X s ) ds + W t 0 Markovian memory noise Chorin et al (00), Zwanzig(01), Chorin-Hald(06) equation + a projection operator + Duhamel’ Message: memory exists in a closed system for x . Multi-level regression Atmospheric sciences: Kondrashov et al (05-15), Wilks (05) Hypoelliptic SDEs ( Majda-Harlim (13,14) ) dX t = f ( X t ) dt + V t dt , dV t = a ( X t , V t ) dt + σ dW t . 6 / 22

Continuous-time approach: Discrete partial data dX t = f ( X t ) dt + V t dt , parameter estimation structure dV t = a ( X t , V t ) dt + σ dW t . Prediction derivation A continuous-time A discrete-time stochastic system stochastic system discretization 7 / 22

Continuous-time approach: Discrete partial data dX t = f ( X t ) dt + V t dt , parameter estimation structure dV t = a ( X t , V t ) dt + σ dW t . Prediction derivation A continuous-time A discrete-time stochastic system stochastic system discretization Discrete-time approach: Discrete partial data X n = Φ( X n − p : n − 1 , ξ n − q : n − 1 ) + ξ n parameter structure estimation derivation Prediction A discrete-time stochastic system 8 / 22

Discrete-time stochastic parametrization Identify a discrete-time stochastic system: NARMA ( Chorin-Lu (15) ) X n = X n − 1 + δ R δ ( X n − 1 ) + δ Z n , Z n = Φ n + ξ n , p q r s � � � � Φ n = a j Z n − j + b i , j P i ( X n − j ) + c j ξ n − j ; j = 1 j = 1 i = 1 j = 1 R δ ( X n − 1 ) from a discrete scheme for x ′ ≈ f ( x ) ; Φ n = Φ( Z n − p : n − 1 , X n − r : n − 1 , ξ n − q : n − 1 ) depends on the past; ξ n are i.i.d N ( 0 , σ 2 ) ; a Nonlinear AutoRegression Moving Average model (NARMA). Structure: terms in Φ n ; Parameters: a j , b j , c j , and σ . 9 / 22

X n = X n − 1 + δ R δ ( X n − 1 ) + δ Z n , Z n = Φ n + ξ n , p r s q � � � � Φ n = a j Z n − j + b i , j P i ( X n − j ) + c j ξ n − j ; j = 1 j = 1 i = 1 j = 1 Parameter estimation: conditional maximum likelihood estimator (MLE) Conditional on ξ 1 , . . . , ξ q , the log-likelihood of X q + 1 : N is l ( θ | ξ 1 , . . . , ξ q ) = − � N | Z n − Φ n | 2 − N − q ln σ 2 . n = q + 1 2 σ 2 2 when q = 0, MLE = least squares estimator (LSE) when q > 0, set ξ 1 = · · · = ξ q = 0 ◮ ergodicity ⇒ MLE is consistent ⇒ independent of ξ 1 , . . . , ξ q Structure derivation/selection: stability, long-term statistics numerical schemes analytical properties of the full system 10 / 22

Outline Part I: Motivation for stochastic model reduction Part II: A discrete-time approach Part III: Examples: The Lorenz 96 system The Kuramoto-Sivashinsky equation 11 / 22

Example I: the Lorenz 96 system � dt x k = x k − 1 ( x k + 1 − x k − 2 ) − x k + 10 − 1 d y j , k , J j dt y j , k = 1 d ε [ y j + 1 , k ( y j − 1 , k − y j + 2 , k ) − y j , k + x k ] , where x ∈ R K , y ∈ R JK . a chaotic system K = 18 , J = 20: y ∈ R 360 Find a reduced system for x ∈ R 18 based on ➣ Data { x ( n δ ) } N n = 1 ➣ d dt x k ≈ x k − 1 ( x k + 1 − x k − 2 ) − x k + 10. Wilks, 2005 12 / 22

� NARMA [Chorin-Lu (15)] : x n − 1 + δ R δ ( x n − 1 ) + δ z n ; z n = Φ n + ξ n , x n = p r d x � � � a j z n − j + b j , l ( x n − j ) l Φ n = µ + j = 1 j = 1 l = 1 d R q s � � � c j , l ( R δ ( x n − j )) l + d j ξ n − j . + j = 1 l = 1 j = 1 � Polynomial autoregression (POLYAR) [Wilks (05)]: d = x k − 1 ( x k + 1 − x k − 2 ) − x k + 10 + U , dt x k U = P ( x k ) + η k , with η k ( t + δ ) = φη k ( t ) + ξ ( t ) where P ( x ) = � d j = 0 a j x j ; ξ ( t ) ∼ N ( 0 , σ 2 ) . 13 / 22

Long-term statistics run the reduced systems for a long time; compare with the data: 0.14 1 L96 L96 POLYAR POLYAR NARMAX NARMAX 0.8 0.12 0.6 0.1 0.4 0.08 pdf ACF 0.2 0.06 0 0.04 − 0.2 0.02 − 0.4 0 − 0.6 − 10 − 5 0 5 10 15 0 1 2 3 4 5 6 7 8 9 10 x 1 time empirical probability density function autocorrelation function 14 / 22

Short-term prediction A typical ensemble forecast: RMSE of many forecasts: 20 10 POLYAR 18 5 X 1 16 POLYAR 0 14 − 5 12 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 time RMSE 10 NARMAX 8 10 NARMAX 6 5 X 1 4 0 2 − 5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 lead time time Forecast time: forecast trajectories in cyan POLYAR: t ≈ 1 true trajectory in blue NARMA: t ≈ 2 . 5 15 / 22

Example II: the Kuramoto-Sivashinsky equation The KSE with periodic BC: v ( x , t ) = v ( x + 2 πν, t ) ∂ t + ∂ 2 v ∂ x 2 + ∂ 4 v ∂ v ∂ x 4 + v ∂ v ∂ x = 0 , x ∈ R , t > 0 . spatio-temporally chaotic: Fourier: v ( x , t ) = � + ∞ k = −∞ ˆ v k e iq k x , (here q k = k ν ) ∞ d v k − iq k v k = ( q 2 k − q 4 � dt ˆ k )ˆ ˆ v l ˆ v k − l . 2 l = −∞ Problem setting: ν = 3 . 43 � observe only K = 5 modes, (ˆ v k , k = 1 , . . . , K ) � predict their evolution 16 / 22

The truncated system is not accurate: d v k − iq k v k = ( q 2 k − q 4 � dt ˆ k )ˆ ˆ v l ˆ | k | ≤ K . v k − l , 2 | l |≤ K , | k − l |≤ K NARMA compute R δ ( X n − 1 ) ( from the K -mode X n = X n − 1 + δ R δ ( X n − 1 ) + δ Z n , truncated system) Z n = Φ n + ξ n , structure for Φ n : p q � � a j Z n − j + c j ξ n − j Φ n = ◮ different modes have different j = 1 j = 1 dynamics r s � � b i , j P i ( X n − j ) . + ◮ nonlinear interaction between j = 1 i = 1 modes 17 / 22

Large time behavior of the KSE ( Constantin, Jolly, Kevrekidis, Titi et al (88-94) ) Inertial manifold M Let v = u + w . Rewrite the KSE: if M is the graph of a function du dt = Au + Pf ( u + w ) w = ψ ( u ) , du dw dt = Au + Pf ( u + ψ ( u )) . dt = Aw + Qf ( u + w ) Approximate inertial manifolds (AIMs) Approximate the function w = ψ ( u ) : dw dt ≈ 0 ⇒ w ≈ A − 1 Qf ( u + w ) , Fixed point: ψ 0 = 0; ψ n + 1 = A − 1 Qf ( u + ψ n ) . An accurate AIM requires m = dim ( u ) to be large! ( distance to the attractor ∼ | q m | − γ ) 18 / 22

NARMA with AIMs ( Lu-Lin-Chorin (15) ) The AIMs hint at how the high modes depend on the low modes: � � � A − 1 Qf ( u ) ˆ k ⇒ ˆ v l ˆ ˆ | k | > K : v k ≈ ψ 1 , k = v k ≈ c k v k − l . 1 ≤| l | , | k − l |≤ K Important: keep the terms, and estimate the coefficients � u n 1 ≤ j ≤ K ; j , u n � i � K j = l = j − K u n l u n j − l , K < j ≤ 2 K . A discrete-time stochastic system: u n k = u n − 1 + δ R δ k ( u n − 1 ) + δ z n k , k z n k = Φ n k + ξ n k , p r K � � � Φ n a k , j z n − j b k , j u n − j u n u n c k , j � j + K � k = µ k + + + j + K − k k k j = 1 j = 0 j = 1 q � k ( u n ) + d k , j ξ n − j + c k , ( K + 1 ) R δ . k j = 1 19 / 22

Long-term statistics: data data truncated system truncated system 0.8 NARMAX NARMAX 0 10 0.6 ACF pdf 0.4 0.2 − 2 10 0 − 0.2 − 0.4 − 0.2 0 0.2 0.4 0.6 0 10 20 30 40 50 v x 4 time lead time auto-correlation function probability density function 20 / 22

Short-term prediction A typical forecast: RMSE of many forecasts: 0.5 the truncated system 15 v 4 0 the truncated system − 0.5 RMSE 10 20 40 60 80 ensemble size = 1 0.4 NARMA 0.2 5 NARMA 0 v 4 − 0.2 ensemble size = 20 − 0.4 0 20 40 60 80 20 40 60 80 lead time time time t Forecast time: the truncated system: t ≈ 15 the NARMA system: t ≈ 50 21 / 22

Summary Data-driven stochastic model reduction: High-dimensional Full system Discrete-time stochastic Prediction reduced system Discrete partial data X n = Φ( X n − p : n − 1 , ξ n − q : n − 1 ) + Ψ( X n − p : n − 1 , ξ n − q : n − 1 ) ξ n integrate numerical scheme into statistical inference easy parameter estimation flexible structure selection Thanks for your attention! 22 / 22

Data-driven stochastic model reduction Fei Lu joint with Alexandre - PowerPoint PPT Presentation

Data-driven stochastic model reduction Fei Lu joint with Alexandre J. Chorin and Kevin Lin UC Berkeley Lawrence Berkeley National Laboratory U of Arizona July 18, 2016 ICERM, Brown University 1 / 22 Outline Part I: Motivation for

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Diffusion-Driven Congestion Reduction Diffusion-Driven Congestion Reduction for Substrate

False fasting is driven by pride False fasting is driven by pride False fasting is

Data-driven model reduction for stochastic Burgers equations Fei Lu Department of Mathematics,

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

SAS data reduction Haydyn Mertens (EMBL-Hamburg) Data reduction steps Acquisition Reduction

Introduction to Harm Reduction Definition of Harm Reduction Harm reduction refers to policies,

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Data-driven time parallelism and model reduction Kevin Carlberg 1 , Lukas Brencher 2 , Bernard

Outline 1 Presentation of the problem Truncated Stochastic Algorithms and Variance Reduction:

Entropy Estimation on the Basis Stochastic Model of a Stochastic Model Werner Schindler Bundesamt

Stochastic Model Checking Stochastic Model Checking Marta Kwiatkowska Kwiatkowska Marta

Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Partial Galaxy Clustering : An Estimator Incorporating Probabilistic Distance Measurements Humna

Lecture 6 Discrete Time Series 9/21/2018 1 Discrete Time Series Stationary Processes A

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine

CAD&CG Bundle Adjustment

Lecture 16: Linear Prediction Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020

Instance-level recognition: Local invariant features Cordelia Schmid INRIA, Grenoble Overview

single-electron devices interpreted by means of waiting time distributions Vincent Talbo, Javier

PhD Research Proposal A SAT+CAS system for checking math conjectures Curtis Bright University of

Data-driven stochastic model reduction Fei Lu joint with Alexandre - PowerPoint PPT Presentation

Data-driven stochastic model reduction Fei Lu joint with Alexandre J. Chorin and Kevin Lin UC Berkeley Lawrence Berkeley National Laboratory U of Arizona July 18, 2016 ICERM, Brown University 1 / 22 Outline Part I: Motivation for

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Diffusion-Driven Congestion Reduction Diffusion-Driven Congestion Reduction for Substrate

False fasting is driven by pride False fasting is driven by pride False fasting is

Data-driven model reduction for stochastic Burgers equations Fei Lu Department of Mathematics,

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

SAS data reduction Haydyn Mertens (EMBL-Hamburg) Data reduction steps Acquisition Reduction

Introduction to Harm Reduction Definition of Harm Reduction Harm reduction refers to policies,

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Data-driven time parallelism and model reduction Kevin Carlberg 1 , Lukas Brencher 2 , Bernard

Outline 1 Presentation of the problem Truncated Stochastic Algorithms and Variance Reduction:

Entropy Estimation on the Basis Stochastic Model of a Stochastic Model Werner Schindler Bundesamt

Stochastic Model Checking Stochastic Model Checking Marta Kwiatkowska Kwiatkowska Marta

Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Partial Galaxy Clustering : An Estimator Incorporating Probabilistic Distance Measurements Humna

Lecture 6 Discrete Time Series 9/21/2018 1 Discrete Time Series Stationary Processes A

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine

CAD&amp;CG Bundle Adjustment

Lecture 16: Linear Prediction Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020

Instance-level recognition: Local invariant features Cordelia Schmid INRIA, Grenoble Overview

single-electron devices interpreted by means of waiting time distributions Vincent Talbo, Javier

PhD Research Proposal A SAT+CAS system for checking math conjectures Curtis Bright University of

CAD&CG Bundle Adjustment