Hidden Physics Models Problem Setup Let us consider parametrized - - PowerPoint PPT Presentation

hidden physics models problem setup
SMART_READER_LITE
LIVE PREVIEW

Hidden Physics Models Problem Setup Let us consider parametrized - - PowerPoint PPT Presentation

Maziar Raissi September 14, 2017 Division of Applied Mathematics Brown University, Providence, RI, USA maziar_raissi@brown.edu Hidden Physics Models Problem Setup Let us consider parametrized and nonlinear partial differential equations of


slide-1
SLIDE 1

Hidden Physics Models

Maziar Raissi September 14, 2017

Division of Applied Mathematics Brown University, Providence, RI, USA maziar_raissi@brown.edu

slide-2
SLIDE 2

Problem Setup

slide-3
SLIDE 3

Problem Setup

Let us consider parametrized and nonlinear partial differential equations of the general form ht + N λ

x h = 0, x ∈ Ω, t ∈ [0, T],

where h(t, x) denotes the latent (hidden) solution, N λ

x is a nonlinear

  • perator parametrized by λ, and Ω is a subset of RD.

Example As an example, the one dimensional Burgers’ equation corresponds to the case where N λ

x h = λ1hhx − λ2hxx and λ = (λ1, λ2). Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 1

slide-4
SLIDE 4

Two Distinct Problems

Given noisy measurements of the system, one is typically interested in the solution of two distinct problems. Identification Problem The first problem is that of learning, system identification, or data driven discovery of partial differential equations stating: what are the parameters that best describe the observed data? Inference Problem The second problem is that of inference, filtering and smoothing,

  • r data driven solutions of partial differential equations which

states: given fixed model parameters what can be said about the unknown hidden state h t x of the system?

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 2

slide-5
SLIDE 5

Two Distinct Problems

Given noisy measurements of the system, one is typically interested in the solution of two distinct problems. Identification Problem The first problem is that of learning, system identification, or data driven discovery of partial differential equations stating: what are the parameters λ that best describe the observed data? Inference Problem The second problem is that of inference, filtering and smoothing,

  • r data driven solutions of partial differential equations which

states: given fixed model parameters what can be said about the unknown hidden state h t x of the system?

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 2

slide-6
SLIDE 6

Two Distinct Problems

Given noisy measurements of the system, one is typically interested in the solution of two distinct problems. Identification Problem The first problem is that of learning, system identification, or data driven discovery of partial differential equations stating: what are the parameters λ that best describe the observed data? Inference Problem The second problem is that of inference, filtering and smoothing,

  • r data driven solutions of partial differential equations which

states: given fixed model parameters λ what can be said about the unknown hidden state h(t, x) of the system?

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 2

slide-7
SLIDE 7

Identification Problem – Hidden Physics Models

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 3

slide-8
SLIDE 8

Inference Problem – Numerical Gaussian Processes

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 4

slide-9
SLIDE 9

Gaussian Processes

slide-10
SLIDE 10

Gaussian processes

A Gaussian process f(x) ∼ GP(0, k(x, x′; θ)), is just a shorthand notation for [ f(x) f(x′) ] ∼ N ([ ] , [ k(x, x; θ) k(x, x′; θ) k(x′, x; θ) k(x′, x′; θ) ]) .

Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 5

slide-11
SLIDE 11

Gaussian processes

A Gaussian process f(x) ∼ GP(0, k(x, x′; θ)), is just a shorthand notation for [ f(x) f(x′) ] ∼ N ([ ] , [ k(x, x; θ) k(x, x′; θ) k(x′, x; θ) k(x′, x′; θ) ]) .

Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 6

slide-12
SLIDE 12

Covariance Function

A typical example for the kernel k(x, x′; θ) is the squared exponential covariance function k(x, x′; θ) = γ2 exp ( − 1 2w2(x − x′)2 ) , where θ = (γ, w) are the hyper-parameters of the kernel.

Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 7

slide-13
SLIDE 13

Training

Given a dataset {x, y} of size N, the hyper-parameters θ and the noise variance parameter σ can be trained by minimizing the negative log marginal likelihood NLML(θ, σ) = 1 2yTK−1y + 1 2 log |K| + n 2 log(2π), resulting from y ∼ N(0, K), where K = k(x, x; θ) + σ2I.

Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 8

slide-14
SLIDE 14

Prediction

Having trained the model, one can use the posterior distribution f(x)|y ∼ N(k(x, x)K−1y, k(x, x) − k(x, x)K−1k(x, x)), to make predictions at a new test point x. Further Explanation This is obtained by writing the joint distribution f x y k x x k x x k x x K

Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 9

slide-15
SLIDE 15

Prediction

Having trained the model, one can use the posterior distribution f(x)|y ∼ N(k(x, x)K−1y, k(x, x) − k(x, x)K−1k(x, x)), to make predictions at a new test point x. Further Explanation This is obtained by writing the joint distribution [ f(x) y ] ∼ N ([ ] , [ k(x, x) k(x, x) k(x, x) K ]) .

Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 9

slide-16
SLIDE 16

Example

Code

Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 10

slide-17
SLIDE 17

Multi-fidelity Gaussian Processes

slide-18
SLIDE 18

Multi-fidelity Gaussian Processes

Let us consider the following autoregressive model fH(x) = ρfL(x) + δ(x), where fL(x) and δ(x) are two independent Gaussian Processes with fL(x) ∼ GP(0, k1(x, x′, θ1)), and δ(x) ∼ GP(0, k2(x, x′, θ2)).

Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast approximations are available.” Biometrika 87.1 (2000): 1-13. 11

slide-19
SLIDE 19

Multi-fidelity Gaussian Processes

Therefore, [ fL(x) fH(x) ] ∼ GP ([ ] , [ kLL(x, x′; θ1) kLH(x, x′; θ1, ρ) kHL(x, x′; θ1, ρ) kHH(x, x′; θ1, θ2, ρ) ]) , with kLL(x, x′; θ1) = k1(x, x′; θ1), kLH(x, x′; θ1, ρ) = kHL(x′, x; θ1, ρ) = ρk1(x, x′; θ1), and kHH(x, x′; θ1, θ2, ρ) = ρ2k1(x, x′; θ1) + k2(x, x′; θ2).

Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast approximations are available.” Biometrika 87.1 (2000): 1-13. 12

slide-20
SLIDE 20

Training Data

In the following, we assume that we have access to data with two levels of fidelity {xL, yL}, {xH, yH}, where yH has a higher level of fidelity. Main Assumption We use NL to denote the number of observations in xL and NH to denote the sample size of xH. The main assumption is that NH ≪ NL.

Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast approximations are available.” Biometrika 87.1 (2000): 1-13. 13

slide-21
SLIDE 21

Training

The hyper-parameters {θ1, θ2}, the parameter ρ, and the noise variance parameters {σL, σH} can be trained by minimizing the negative log marginal likelihood NLML(θ1, θ2, ρ, σL, σH) = 1 2yTK−1y + 1 2 log |K| + NL + NH 2 log(2π), resulting from y := [ yL yH ] ∼ N(0, K), where K = [ kLL(xL, xL; θ1) + σ2

LI

kLH(xL, xH; θ1, ρ) kHL(xH, xL; θ1, ρ) kHH(xH, xH; θ1, θ2, ρ) + σ2

HI

] .

Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast approximations are available.” Biometrika 87.1 (2000): 1-13. 14

slide-22
SLIDE 22

Prediction

Having trained the model, one can use the posterior distribution fH(x)|y ∼ N ( qTK−1y, kHH(x, x) − qTK−1q ) , to make predictions at a new test point x. Here, qT = [ kHL(x, xL) kHH(x, xH) ] .

Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast approximations are available.” Biometrika 87.1 (2000): 1-13. 15

slide-23
SLIDE 23

Example

Code

Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast approximations are available.” Biometrika 87.1 (2000): 1-13. 16

slide-24
SLIDE 24

Deep Multi-fidelity Gaussian Processes

slide-25
SLIDE 25

Deep Multi-fidelity Gaussian Processes

A simple way to explain the main idea of this work is to consider the following structure: [ fL(h) fH(h) ] ∼ GP ([ ] , [ kLL(h, h′; θ1) kLH(h, h′; θ1, ρ) kHL(h, h′; θ1, ρ) kHH(h, h′; θ1, θ2, ρ) ]) , where h := h(x), h′ := h(x′).

Raissi, Maziar, and George Karniadakis. “Deep Multi-fidelity Gaussian Processes.” arXiv preprint arXiv:1604.07484 (2016). 17

slide-26
SLIDE 26

Deep Multi-fidelity Gaussian Processes

Here, we employ multi-layer neural networks of the form h(x) := (hL ◦ . . . ◦ h1)(x), where each layer of the network performs the transformation hℓ(z) = σℓ(wℓz + bℓ), with σℓ being the transfer function, wℓ being the shared weights, and bℓ being the biases of each layer ℓ = 1, . . . , L.

Raissi, Maziar, and George Karniadakis. “Deep Multi-fidelity Gaussian Processes.” arXiv preprint arXiv:1604.07484 (2016). 18

slide-27
SLIDE 27

Training

The parameters of the neural networks {w1, b1, . . . , wL, bL}, the hyper-parameters {θ1, θ2}, the parameter ρ, and the noise variance parameters {σL, σH} can be trained by minimizing the negative log marginal likelihood.

Raissi, Maziar, and George Karniadakis. “Deep Multi-fidelity Gaussian Processes.” arXiv preprint arXiv:1604.07484 (2016). 19

slide-28
SLIDE 28

Example

  • 10
  • 5

5 10 15 20 25 30 0.2 0.4 0.6 0.8 1 y x Prediction T wo Standard Deviation Band Posterior Mean High Fidelity Data Low Fidelity Data Low Fidelity - Exact High Fidelity - Exact

Raissi, Maziar, and George Karniadakis. “Deep Multi-fidelity Gaussian Processes.” arXiv preprint arXiv:1604.07484 (2016). 20

slide-29
SLIDE 29

Linear Differential Equations – Identification Problem

slide-30
SLIDE 30

Linear Differential Equations

Let us investigates governing equations of the form Lϕ

x u(x) = f(x).

where u(x) is the unknown solution to a differential equation defined by the operator Lϕ

x , f(x) is a black-box forcing term, and x is

a vector that can include space, time, or parameter coordinates.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Machine learning of linear differential equations using Gaussian processes.” Journal of Computational Physics 348 (2017): 683-693. 21

slide-31
SLIDE 31

Prior

Let us starts by making the assumption that u(x) is Gaussian process with mean 0 and covariance function kuu(x, x′; θ), i.e., u(x) ∼ GP(0, kuu(x, x′; θ)), where θ denotes the hyper-parameters of the kernel kuu.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Machine learning of linear differential equations using Gaussian processes.” Journal of Computational Physics 348 (2017): 683-693. 22

slide-32
SLIDE 32

Kernels

Consequently, Lϕ

x u(x) = f(x) ∼ GP(0, kff(x, x′; θ, ϕ)),

with the following fundamental relationship between the kernels kuu and kff, kff(x, x′; θ, ϕ) = Lϕ

x Lϕ x′kuu(x, x′; θ).

Cross-covariance functions Moreover, the covariance between u x and f x , and similarly the

  • ne between f x and u x , are given by

kuf x x

x kuu x x

and kfu x x

x kuu x x

respectively.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Machine learning of linear differential equations using Gaussian processes.” Journal of Computational Physics 348 (2017): 683-693. 23

slide-33
SLIDE 33

Kernels

Consequently, Lϕ

x u(x) = f(x) ∼ GP(0, kff(x, x′; θ, ϕ)),

with the following fundamental relationship between the kernels kuu and kff, kff(x, x′; θ, ϕ) = Lϕ

x Lϕ x′kuu(x, x′; θ).

Cross-covariance functions Moreover, the covariance between u(x) and f(x′), and similarly the

  • ne between f(x) and u(x′), are given by

kuf(x, x′; θ, ϕ) = Lϕ

x′kuu(x, x′; θ),

and kfu(x, x′; θ, ϕ) = Lϕ

x kuu(x, x′; θ),

respectively.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Machine learning of linear differential equations using Gaussian processes.” Journal of Computational Physics 348 (2017): 683-693. 23

slide-34
SLIDE 34

Training

The hyper-parameters θ and more importantly the parameters ϕ of the linear operator Lϕ

x can be trained by employing a Quasi-Newton

  • ptimizer L-BFGS to minimize the negative log marginal likelihood

− log p(y|ϕ, θ, σ2

u, σ2 f ) = 1

2yTK−1y + 1 2 log |K| + Nu + Nf 2 log 2π, where y = [ yu yf ] , p(y|ϕ, θ, σ2

u, σ2 f ) = N (0, K), and K is given by

K = [ kuu(xu, xu; θ) + σ2

uI

kuf(xu, xf; θ, ϕ) kfu(xf, xu; θ, ϕ) kff(xf, xf; θ, ϕ) + σ2

f I

] .

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Machine learning of linear differential equations using Gaussian processes.” Journal of Computational Physics 348 (2017): 683-693. 24

slide-35
SLIDE 35

Prediction

Having trained the model, one can predict the values u(x) and f(x) at a new test point x by writing the posterior distributions p(u(x)|y) = N ( u(x), s2

u(x)

) , p(f(x)|y) = N ( f(x), s2

f (x)

) , with u(x) = qT

uK−1y, s2 u(x) = kuu(x, x) − qT uK−1qu,

f(x) = qT

f K−1y, s2 f (x) = kff(x, x) − qT f K−1qf,

where qT

u =

[ kuu(x, Xu) kuf(x, Xf) ] , qT

f =

[ kfu(x, Xu) kff(x, Xf) ] .

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Machine learning of linear differential equations using Gaussian processes.” Journal of Computational Physics 348 (2017): 683-693. 25

slide-36
SLIDE 36

Fractional Equation

Consider the one dimensional fractional equation Lα

x u(x) = Dα −∞,xu(x) − u(x) = f(x),

where α ∈ R is the fractional order of the operator that is defined in the Riemann-Liouville sense. The algorithm learns the parameter α = √ 2 to have value 1.412104.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Machine learning of linear differential equations using Gaussian processes.” Journal of Computational Physics 348 (2017): 683-693. 26

slide-37
SLIDE 37

Fractional Equation

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Machine learning of linear differential equations using Gaussian processes.” Journal of Computational Physics 348 (2017): 683-693. 27

slide-38
SLIDE 38

Linear Differential Equations – Inference Problem

slide-39
SLIDE 39

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Inferring solutions of differential equations using noisy multi-fidelity data.” Journal of Computational Physics 335 (2017): 736-746. 28

slide-40
SLIDE 40

Hidden Physics Models

slide-41
SLIDE 41

Problem Setup

Let us consider parametrized and nonlinear partial differential equations of the general form ht + N λ

x h = 0, x ∈ Ω, t ∈ [0, T],

where h(t, x) denotes the latent (hidden) solution, N λ

x is a nonlinear

  • perator parametrized by λ, and Ω is a subset of RD.

Example As an example, the one dimensional Burgers’ equation corresponds to the case where N λ

x h = λ1hhx − λ2hxx and λ = (λ1, λ2). Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 29

slide-42
SLIDE 42

Backward Euler

Let us employ the backward Euler time stepping scheme to obtain hn + ∆tN λ

x hn = hn−1.

Here, hn(x) = h(tn, x) is the hidden state of the system at time tn.

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 30

slide-43
SLIDE 43

Linearization

Approximating the nonlinear operator on the left-hand-side of the backward Euler scheme by a linear one we obtain Lλ

x hn = hn−1.

Example For instance, the nonlinear operator hn + ∆tN λ

x hn = hn + ∆t(λ1hnhn x − λ2hn xx),

involved in the Burgers’ equation can be approximated by the linear operator Lλ

x hn = hn + ∆t(λ1hn−1hn x − λ2hn xx),

where hn−1(x) is the state of the system at the previous time tn−1.

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 31

slide-44
SLIDE 44

Prior

We proceed by placing a Gaussian process prior over the latent function hn(x); i.e., hn(x) ∼ GP(0, k(x, x′, θ)). Here, θ denotes the hyper-parameters of the covariance function k.

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 32

slide-45
SLIDE 45

Hidden Physics Models

This enables us to capture the entire structure of the operator Lλ

x in

the resulting multi-output Gaussian process [ hn hn−1 ] ∼ GP ( 0, [ kn,n kn,n−1 kn−1,n kn−1,n−1 ]) .

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 33

slide-46
SLIDE 46

Training

Given noisy data {xn−1, hn−1} and {xn, hn} on the latent solution at times tn−1 and tn, respectively, the hyper-parameters θ of the covariance functions and more importantly the parameters λ of the

  • perators Lλ

x and N λ x can be learned by employing a Quasi-Newton

  • ptimizer L-BFGS to minimize the negative log marginal likelihood

− log p(h|θ, λ, σ2) = 1 2hTK−1h + 1 2 log |K| + N 2 log(2π), (1) where h = [ hn hn−1 ] , p(h|θ, λ, σ2) = N (0, K), and K is given by K = [ kn,n(xn, xn) kn,n−1(xn, xn−1) kn−1,n(xn−1, xn) kn−1,n−1(xn−1, xn−1) ] + σ2I.

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 34

slide-47
SLIDE 47

Kuramoto Sivashinsky Equation

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 35

slide-48
SLIDE 48

Navier Stokes Equation

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 36

slide-49
SLIDE 49

Levy Processes

Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 37

slide-50
SLIDE 50

Numerical Gaussian Processes

slide-51
SLIDE 51

Burgers’ Equation

In one space dimension the Burgers’ equation reads as ut + uux = νuxx, along with Dirichlet boundary conditions u(t, −1) = u(t, 1) = 0, where u(t, x) denotes the unknown solution and ν = 0.01/π is a viscosity parameter.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 38

slide-52
SLIDE 52

Problem Setup

Let us assume that all we observe are noisy measurements {x0, u0}

  • f the black-box initial function u(0, x).

Given such measurements, we would like to solve the Burgers’ equation while propagating through time the uncertainty associated with the noisy initial data.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 39

slide-53
SLIDE 53

Backward Euler

Let us apply the backward Euler scheme to the Burgers’ equation. This can be written as un + ∆tun d dxun − ν∆t d2 dx2 un = un−1.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 40

slide-54
SLIDE 54

Backward Euler

Let us apply the backward Euler scheme to the Burgers’ equation. This can be written as un + ∆tµn−1 d dxun − ν∆t d2 dx2 un = un−1.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 41

slide-55
SLIDE 55

Prior Assumption

Let us make the prior assumption that un(x) ∼ GP(0, k(x, x′; θ)), is a Gaussian process with a neural network covariance function k(x, x′; θ) = 2 π sin−1   2(σ2

0 + σ2xx′)

√ (1 + 2 ( σ2

0 + σ2x2)

) (1 + 2 ( σ2

0 + σ2x′2)

)   , where θ = ( σ2

0, σ2)

denotes the hyper-parameters.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 42

slide-56
SLIDE 56

Numerical Gaussian Process

This enables us to obtain the following Numerical Gaussian Process [ un un−1 ] ∼ GP ( 0, [ kn,n

u,u

kn,n−1

u,u

kn−1,n−1

u,u

]) .

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 43

slide-57
SLIDE 57

Kernels

The covariance functions for the Burgers’ equation example are given by kn,n

u,u = k,

kn,n−1

u,u

= k + ∆tµn−1(x′) d dx′ k − ν∆t d2 dx′2 k. Compare this with un t

n 1 d

dxun t d2 dx2 un un

1 Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 44

slide-58
SLIDE 58

Kernels

The covariance functions for the Burgers’ equation example are given by kn,n

u,u = k,

kn,n−1

u,u

= k + ∆tµn−1(x′) d dx′ k − ν∆t d2 dx′2 k. Compare this with un + ∆tµn−1 d dxun − ν∆t d2 dx2 un = un−1.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 44

slide-59
SLIDE 59

Kernels

kn−1,n−1

u,u

= k + ∆tµn−1(x′) d dx′ k − ν∆t d2 dx′2 k, + ∆tµn−1(x) d dxk + ∆t2µn−1(x)µn−1(x′) d dx d dx′ k − ν∆t2µn−1(x) d dx d2 dx′2 k − ν∆t d2 dx2 k − ν∆t2µn−1(x′) d2 dx2 d dx′ k + ν2∆t2 d2 dx2 d2 dx′2 k.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 45

slide-60
SLIDE 60

Training

The hyper-parameters θ and the noise parameters σ2

n, σ2 n−1 can be

trained by employing the Negative Log Marginal Likelihood resulting from [ un

b

un−1 ] ∼ N (0, K) , where {xn

b, un b} are the (noisy) data on the boundary and {xn−1, un−1}

are artificially generated data. Here, K = [ kn,n

u,u(xn b, xn b; θ) + σ2 nI

kn,n−1

u,u

(xn

b, xn−1; θ)

kn−1,n

u,u

(xn−1, xn

b; θ)

kn,n−1

u,u

(xn−1, xn−1; θ) + σ2

n−1I

]

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 46

slide-61
SLIDE 61

Prediction & Propagating Uncertainty

In order to predict un(xn) at a new test point xn, we use the following conditional distribution un(xn) | un

b ∼ N (µn(xn), Σn,n(xn, xn)) ,

where µn(xn) = qTK−1 [ un

b

µn−1 ] , and Σn,n(xn, xn) = kn,n

u,u(xn, xn) − qTK−1q + qTK−1

[ Σn−1,n−1 ] K−1q. Here, qT = [ kn,n

u,u(xn, xn b) kn,n−1 u,u

(xn, xn−1) ] .

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 47

slide-62
SLIDE 62

Artificial data

Now, one can use the resulting posterior distribution to obtain the artificially generated data {xn, un} for the next time step with un ∼ N (µn, Σn,n) . Here, µn = µn(xn) and Σn,n = Σn,n(xn, xn).

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 48

slide-63
SLIDE 63

Burgers’ Equation

Movie Code

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 49

slide-64
SLIDE 64

Burgers’ Equation

Movie Code

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 50

slide-65
SLIDE 65

Higher Order Time Stepping

Let us consider linear partial differential equations of the form ut = Lxu, x ∈ Ω, t ∈ [0, T], where Lx is a linear operator and u(t, x) denotes the latent solution.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 51

slide-66
SLIDE 66

Linear Multi-step Methods

The trapezoidal time-stepping scheme can be written as un − 1 2∆tLxun = un−1 + 1 2∆tLxun−1

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 52

slide-67
SLIDE 67

Linear Multi-step Methods

The trapezoidal time-stepping scheme can be written as un − 1 2∆tLxun =: un−1/2 := un−1 + 1 2∆tLxun−1

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 53

slide-68
SLIDE 68

Linear Multi-step Methods

By assuming un−1/2(x) ∼ GP(0, k(x, x′; θ)), we can capture the entire structure of the trapezoidal rule in the resulting joint distribution of un and un−1.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 54

slide-69
SLIDE 69

Wave Equation – Trapezoidal Rule

Movie Code

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 55

slide-70
SLIDE 70

Runge-Kutta Methods

The trapezoidal time-stepping scheme can be equivalently written as un = un−1 + 1 2∆tLxun−1 + 1 2∆tLxun un = un.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 56

slide-71
SLIDE 71

Runge-Kutta Methods

The trapezoidal time-stepping scheme can be equivalently written as un

2

= un−1 + 1 2∆tLxun−1 + 1 2∆tLxun un

1

= un.

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 57

slide-72
SLIDE 72

Runge-Kutta Methods

By assuming un(x) ∼ GP(0, kn,n(x, x′; θn)), un−1(x) ∼ GP(0, kn+1,n+1(x, x′; θn+1)), we can capture the entire structure of the trapezoidal rule in the resulting joint distribution of un, un−1, un

2, and un 1 . Here,

un

2 = un 1 = un. Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 58

slide-73
SLIDE 73

Advection Equation – Gauss-Legendre

Movie Code

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 59

slide-74
SLIDE 74

Heat Equation – Trapezoidal Rule

Movie Code

Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). 60

slide-75
SLIDE 75

Parametric Gaussian Process Regression for Big Data

slide-76
SLIDE 76

Gaussian Process – u(x)

Let us start by making the prior assumption that u(x) ∼ GP (0, k(x, x′; θ)) , is a zero mean Gaussian process with covariance function k(x, x′; θ) which depends on the hyper-parameters θ.

Raissi, Maziar. “Parametric Gaussian Process Regression for Big Data.” arXiv preprint arXiv:1704.03144 (2017). 61

slide-77
SLIDE 77

Hypothetical Dataset

Moreover, let us postulate the existence of some hypothetical dataset {Z, u} with u ∼ N(m, S). Here, Z = {zi}M

i=1 and u = {ui}M i=1. Raissi, Maziar. “Parametric Gaussian Process Regression for Big Data.” arXiv preprint arXiv:1704.03144 (2017). 62

slide-78
SLIDE 78

Parametric Gaussian Process – f(x)

Let us define a parametric Gaussian process by the resulting conditional distribution f(x) := u(x)|m, S ∼ GP (µ(x; θ, m), Σ(x, x′; θ, S)) , where µ(x; θ, m) = k(x, Z; θ)k(Z, Z; θ)−1m, Σ(x, x′; θ, S) = k(x, x′; θ) − k(x, Z; θ)k(Z, Z; θ)−1k(Z, x′; θ) + k(x, Z; θ)k(Z, Z; θ)−1Sk(Z, Z; θ)−1k(Z, x′; θ).

Raissi, Maziar. “Parametric Gaussian Process Regression for Big Data.” arXiv preprint arXiv:1704.03144 (2017). 63

slide-79
SLIDE 79

Parameters m and S

  • The parameters m and S of a parametric Gaussian process will

play a crucial role; The data will be distilled into these parameters and any subsequent predictions will not make use

  • f the original dataset.
  • This is very convenient as it enables an efficient mini-batch

training procedure outlined in the following.

Raissi, Maziar. “Parametric Gaussian Process Regression for Big Data.” arXiv preprint arXiv:1704.03144 (2017). 64

slide-80
SLIDE 80

Update m and S

Taking advantage of the favorable form of a parametric Gaussian process, the mean m and covariance matrix S of the hypothetical dataset can be updated by employing the posterior distribution resulting from conditioning on the observed mini-batch of data { X, y} of size N; i.e.,

m ← µ(Z; θ, m) + Σ(Z, X; θ, S) ( Σ( X, X; θ, S) + σ2

ϵI

)−1 [

  • y − µ(

X; θ, m) ] , S ← Σ(Z, Z; θ, S) − Σ(Z, X; θ, S) ( Σ( X, X; θ, S) + σ2

ϵI

)−1 Σ( X, Z; θ, S).

Raissi, Maziar. “Parametric Gaussian Process Regression for Big Data.” arXiv preprint arXiv:1704.03144 (2017). 65

slide-81
SLIDE 81

Update Hyper-parameters

The information corresponding to the mini-batch { X, y} is now distilled in the parameters m and S. The hyper-parameters θ and noise variance parameter σ2

ϵ can be

updated by taking a step proportional to the gradient of the negative log marginal likelihood NLML(θ, σ2

ϵ) := 1

2mTk(Z, Z; θ)−1m + 1 2 log |k(Z, Z; θ)| + 1 2M log(2π).

Raissi, Maziar. “Parametric Gaussian Process Regression for Big Data.” arXiv preprint arXiv:1704.03144 (2017). 66

slide-82
SLIDE 82

Prediction

The training procedure is initialized by setting m0 = 0 and S0 = k(Z, Z; θ0) where θ0 is some initial set of hyper-parameters. Having trained the hyper-parameters and parameters of the model,

  • ne can use µ(x; θ, m) to predict the mean of the solution at a new

test point x. Moreover, the predicted variance is given by Σ(x, x; θ, S).

Raissi, Maziar. “Parametric Gaussian Process Regression for Big Data.” arXiv preprint arXiv:1704.03144 (2017). 67

slide-83
SLIDE 83

Results

Code −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 x −2 −1 1 2 f(x)

(A)

6000 traning Data −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 x −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 f(x), f(x)

(B)

8 hypothetical data f(x) f(x) Two standard deviations

Raissi, Maziar. “Parametric Gaussian Process Regression for Big Data.” arXiv preprint arXiv:1704.03144 (2017). 68

slide-84
SLIDE 84

Conclusion

slide-85
SLIDE 85

Summary

  • 1. We introduced hidden physics models, which are essentially

data-efficient learning machines capable of leveraging the underlying laws of physics, expressed by time dependent and nonlinear partial differential equations, to extract patterns from high-dimensional data generated from experiments.

  • 2. The methodology provides a promising new direction for

harnessing the long-standing developments of classical methods in applied mathematics and mathematical physics to design learning machines with the ability to operate in complex domains without requiring large quantities of data.

69

slide-86
SLIDE 86

Acknowledgements

This work received support by the DARPA EQUiPS grant N66001-15-2-4055, the MURI/ARO grant W911NF-15-1-0562, and the AFOSR grant FA9550-17-1-0013. All data and codes are publicly available on GitHub at https://github.com/maziarraissi/.

70

slide-87
SLIDE 87

Questions?

70