Stochastic (partial) differential equations and Gaussian processes - - PowerPoint PPT Presentation

stochastic partial differential equations and gaussian
SMART_READER_LITE
LIVE PREVIEW

Stochastic (partial) differential equations and Gaussian processes - - PowerPoint PPT Presentation

Stochastic (partial) differential equations and Gaussian processes Simo Srkk Aalto University, Finland Contents Basic ideas 1 Stochastic differential equations and Gaussian processes 2 Stochastic partial differential equations and


slide-1
SLIDE 1

Stochastic (partial) differential equations and Gaussian processes

Simo Särkkä

Aalto University, Finland

slide-2
SLIDE 2

S(P)DEs and GPs Simo Särkkä 2 / 24

Contents

1

Basic ideas

2

Stochastic differential equations and Gaussian processes

3

Stochastic partial differential equations and Gaussian processes

4

Conclusion

slide-3
SLIDE 3

S(P)DEs and GPs Simo Särkkä 2 / 24

Contents

1

Basic ideas

2

Stochastic differential equations and Gaussian processes

3

Stochastic partial differential equations and Gaussian processes

4

Conclusion

slide-4
SLIDE 4

S(P)DEs and GPs Simo Särkkä 2 / 24

Contents

1

Basic ideas

2

Stochastic differential equations and Gaussian processes

3

Stochastic partial differential equations and Gaussian processes

4

Conclusion

slide-5
SLIDE 5

S(P)DEs and GPs Simo Särkkä 2 / 24

Contents

1

Basic ideas

2

Stochastic differential equations and Gaussian processes

3

Stochastic partial differential equations and Gaussian processes

4

Conclusion

slide-6
SLIDE 6

S(P)DEs and GPs Simo Särkkä 3 / 24

Contents

1

Basic ideas

2

Stochastic differential equations and Gaussian processes

3

Stochastic partial differential equations and Gaussian processes

4

Conclusion

slide-7
SLIDE 7

S(P)DEs and GPs Simo Särkkä 4 / 24

Kernel vs. SPDE representations of GPs

GP model x ∈ Rd, t ∈ R Equivalent S(P)DE model Spatial k(x, x′) SPDE model (L is an operator) L f(x) = w(x) Temporal k(t, t′) State-space/SDE model df(t) dt = A f(t) + L w(t) Spatio-temporal k(x, t; x′, t′) Stochastic evolution equation ∂ ∂t f(x, t) = Ax f(x, t) + L w(x, t)

slide-8
SLIDE 8

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-9
SLIDE 9

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-10
SLIDE 10

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-11
SLIDE 11

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-12
SLIDE 12

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-13
SLIDE 13

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-14
SLIDE 14

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-15
SLIDE 15

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-16
SLIDE 16

S(P)DEs and GPs Simo Särkkä 5 / 24

Why use S(P)DE solvers for GPs?

The O(n3) computational complexity is a challenge. What do we get:

O(n) state-space methods for SDEs/SPDEs. Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.

Downsides:

We often need to approximate. Mathematics can become messy.

slide-17
SLIDE 17

S(P)DEs and GPs Simo Särkkä 6 / 24

Contents

1

Basic ideas

2

Stochastic differential equations and Gaussian processes

3

Stochastic partial differential equations and Gaussian processes

4

Conclusion

slide-18
SLIDE 18

S(P)DEs and GPs Simo Särkkä 7 / 24

Ornstein-Uhlenbeck process

The mean and covariance functions: m(x) = 0 k(x, x′) = σ2 exp(−λ|x − x′|) This has a path representation as a stochastic differential equation (SDE): df(t) dt = −λ f(t) + w(t). where w(t) is a white noise process with x relabeled as t. Ornstein–Uhlenbeck process is a Markov process. What does this actually mean = ⇒ white board.

slide-19
SLIDE 19

S(P)DEs and GPs Simo Särkkä 7 / 24

Ornstein-Uhlenbeck process

The mean and covariance functions: m(x) = 0 k(x, x′) = σ2 exp(−λ|x − x′|) This has a path representation as a stochastic differential equation (SDE): df(t) dt = −λ f(t) + w(t). where w(t) is a white noise process with x relabeled as t. Ornstein–Uhlenbeck process is a Markov process. What does this actually mean = ⇒ white board.

slide-20
SLIDE 20

S(P)DEs and GPs Simo Särkkä 7 / 24

Ornstein-Uhlenbeck process

The mean and covariance functions: m(x) = 0 k(x, x′) = σ2 exp(−λ|x − x′|) This has a path representation as a stochastic differential equation (SDE): df(t) dt = −λ f(t) + w(t). where w(t) is a white noise process with x relabeled as t. Ornstein–Uhlenbeck process is a Markov process. What does this actually mean = ⇒ white board.

slide-21
SLIDE 21

S(P)DEs and GPs Simo Särkkä 7 / 24

Ornstein-Uhlenbeck process

The mean and covariance functions: m(x) = 0 k(x, x′) = σ2 exp(−λ|x − x′|) This has a path representation as a stochastic differential equation (SDE): df(t) dt = −λ f(t) + w(t). where w(t) is a white noise process with x relabeled as t. Ornstein–Uhlenbeck process is a Markov process. What does this actually mean = ⇒ white board.

slide-22
SLIDE 22

S(P)DEs and GPs Simo Särkkä 8 / 24

Ornstein-Uhlenbeck process (cont.)

Consider a Gaussian process regression problem f(x) ∼ GP(0, σ2 exp(−λ|x − x′|)) yk = f(xk) + εk This is equivalent to the state-space model df(t) dt = −λ f(t) + w(t) yk = f(tk) + εk that is, with fk = f(tk) we have a Gauss-Markov model fk+1 ∼ p(fk+1 | fk) yk ∼ p(yk | fk) Solvable in O(n) time using Kalman filter/smoother.

slide-23
SLIDE 23

S(P)DEs and GPs Simo Särkkä 8 / 24

Ornstein-Uhlenbeck process (cont.)

Consider a Gaussian process regression problem f(x) ∼ GP(0, σ2 exp(−λ|x − x′|)) yk = f(xk) + εk This is equivalent to the state-space model df(t) dt = −λ f(t) + w(t) yk = f(tk) + εk that is, with fk = f(tk) we have a Gauss-Markov model fk+1 ∼ p(fk+1 | fk) yk ∼ p(yk | fk) Solvable in O(n) time using Kalman filter/smoother.

slide-24
SLIDE 24

S(P)DEs and GPs Simo Särkkä 8 / 24

Ornstein-Uhlenbeck process (cont.)

Consider a Gaussian process regression problem f(x) ∼ GP(0, σ2 exp(−λ|x − x′|)) yk = f(xk) + εk This is equivalent to the state-space model df(t) dt = −λ f(t) + w(t) yk = f(tk) + εk that is, with fk = f(tk) we have a Gauss-Markov model fk+1 ∼ p(fk+1 | fk) yk ∼ p(yk | fk) Solvable in O(n) time using Kalman filter/smoother.

slide-25
SLIDE 25

S(P)DEs and GPs Simo Särkkä 9 / 24

State Space Form of Linear Time-Invariant SDEs

Consider a Nth order LTI SDE of the form dNf dtN + aN−1 dN−1f dtN−1 + · · · + a0f = w(t). If we define f = (f, . . . , dN−1f/dtN−1), we get a state space model: df dt =      1 ... ... 1 −a0 −a1 . . . −aN−1     

  • A

f +      . . . 1     

L

w(t) f(t) =

  • 1

· · ·

  • H

f. The vector process f(t) is Markovian although f(t) isn’t.

slide-26
SLIDE 26

S(P)DEs and GPs Simo Särkkä 9 / 24

State Space Form of Linear Time-Invariant SDEs

Consider a Nth order LTI SDE of the form dNf dtN + aN−1 dN−1f dtN−1 + · · · + a0f = w(t). If we define f = (f, . . . , dN−1f/dtN−1), we get a state space model: df dt =      1 ... ... 1 −a0 −a1 . . . −aN−1     

  • A

f +      . . . 1     

L

w(t) f(t) =

  • 1

· · ·

  • H

f. The vector process f(t) is Markovian although f(t) isn’t.

slide-27
SLIDE 27

S(P)DEs and GPs Simo Särkkä 9 / 24

State Space Form of Linear Time-Invariant SDEs

Consider a Nth order LTI SDE of the form dNf dtN + aN−1 dN−1f dtN−1 + · · · + a0f = w(t). If we define f = (f, . . . , dN−1f/dtN−1), we get a state space model: df dt =      1 ... ... 1 −a0 −a1 . . . −aN−1     

  • A

f +      . . . 1     

L

w(t) f(t) =

  • 1

· · ·

  • H

f. The vector process f(t) is Markovian although f(t) isn’t.

slide-28
SLIDE 28

S(P)DEs and GPs Simo Särkkä 10 / 24

Spectra of Linear Time-Invariant SDEs

By taking the Fourier transform of the LTI SDE, we can derive the spectral density which has the form: S(ω) = (constant) (polynomial in ω2) We can also do this conversion to the other direction:

With certain parameter values, the Matérn has the form: S(ω) ∝ (λ2 + ω2)−(p+1). Many non-rational spectral densities can be approximated: S(ω) = σ2 π κ exp

  • −ω2

(const) N!/0!(4κ)N + · · · + ω2N

For the conversion of a rational spectral density to a Markovian (state-space) model, we can use the spectral factorization.

slide-29
SLIDE 29

S(P)DEs and GPs Simo Särkkä 10 / 24

Spectra of Linear Time-Invariant SDEs

By taking the Fourier transform of the LTI SDE, we can derive the spectral density which has the form: S(ω) = (constant) (polynomial in ω2) We can also do this conversion to the other direction:

With certain parameter values, the Matérn has the form: S(ω) ∝ (λ2 + ω2)−(p+1). Many non-rational spectral densities can be approximated: S(ω) = σ2 π κ exp

  • −ω2

(const) N!/0!(4κ)N + · · · + ω2N

For the conversion of a rational spectral density to a Markovian (state-space) model, we can use the spectral factorization.

slide-30
SLIDE 30

S(P)DEs and GPs Simo Särkkä 10 / 24

Spectra of Linear Time-Invariant SDEs

By taking the Fourier transform of the LTI SDE, we can derive the spectral density which has the form: S(ω) = (constant) (polynomial in ω2) We can also do this conversion to the other direction:

With certain parameter values, the Matérn has the form: S(ω) ∝ (λ2 + ω2)−(p+1). Many non-rational spectral densities can be approximated: S(ω) = σ2 π κ exp

  • −ω2

(const) N!/0!(4κ)N + · · · + ω2N

For the conversion of a rational spectral density to a Markovian (state-space) model, we can use the spectral factorization.

slide-31
SLIDE 31

S(P)DEs and GPs Simo Särkkä 10 / 24

Spectra of Linear Time-Invariant SDEs

By taking the Fourier transform of the LTI SDE, we can derive the spectral density which has the form: S(ω) = (constant) (polynomial in ω2) We can also do this conversion to the other direction:

With certain parameter values, the Matérn has the form: S(ω) ∝ (λ2 + ω2)−(p+1). Many non-rational spectral densities can be approximated: S(ω) = σ2 π κ exp

  • −ω2

(const) N!/0!(4κ)N + · · · + ω2N

For the conversion of a rational spectral density to a Markovian (state-space) model, we can use the spectral factorization.

slide-32
SLIDE 32

S(P)DEs and GPs Simo Särkkä 10 / 24

Spectra of Linear Time-Invariant SDEs

By taking the Fourier transform of the LTI SDE, we can derive the spectral density which has the form: S(ω) = (constant) (polynomial in ω2) We can also do this conversion to the other direction:

With certain parameter values, the Matérn has the form: S(ω) ∝ (λ2 + ω2)−(p+1). Many non-rational spectral densities can be approximated: S(ω) = σ2 π κ exp

  • −ω2

(const) N!/0!(4κ)N + · · · + ω2N

For the conversion of a rational spectral density to a Markovian (state-space) model, we can use the spectral factorization.

slide-33
SLIDE 33

S(P)DEs and GPs Simo Särkkä 11 / 24

State-space methods for Gaussian processes

Approximation:

S(ω) ≈ b0 + b1 ω2 + · · · + bM ω2M a0 + a1 ω2 + · · · + aN ω2N

L

  • c

a t i

  • n

( x ) T i m e ( t ) f(x, t) The state at time t

Results in a linear stochastic differential equation (SDE) df(t) = A f(t) dt + L dW More generally stochastic evolution equations. O(n) GP regression with Kalman filters and smoothers. Parallel block-sparse precision methods − → O(log n).

slide-34
SLIDE 34

S(P)DEs and GPs Simo Särkkä 11 / 24

State-space methods for Gaussian processes

Approximation:

S(ω) ≈ b0 + b1 ω2 + · · · + bM ω2M a0 + a1 ω2 + · · · + aN ω2N

L

  • c

a t i

  • n

( x ) T i m e ( t ) f(x, t) The state at time t

Results in a linear stochastic differential equation (SDE) df(t) = A f(t) dt + L dW More generally stochastic evolution equations. O(n) GP regression with Kalman filters and smoothers. Parallel block-sparse precision methods − → O(log n).

slide-35
SLIDE 35

S(P)DEs and GPs Simo Särkkä 11 / 24

State-space methods for Gaussian processes

Approximation:

S(ω) ≈ b0 + b1 ω2 + · · · + bM ω2M a0 + a1 ω2 + · · · + aN ω2N

L

  • c

a t i

  • n

( x ) T i m e ( t ) f(x, t) The state at time t

Results in a linear stochastic differential equation (SDE) df(t) = A f(t) dt + L dW More generally stochastic evolution equations. O(n) GP regression with Kalman filters and smoothers. Parallel block-sparse precision methods − → O(log n).

slide-36
SLIDE 36

S(P)DEs and GPs Simo Särkkä 11 / 24

State-space methods for Gaussian processes

Approximation:

S(ω) ≈ b0 + b1 ω2 + · · · + bM ω2M a0 + a1 ω2 + · · · + aN ω2N

L

  • c

a t i

  • n

( x ) T i m e ( t ) f(x, t) The state at time t

Results in a linear stochastic differential equation (SDE) df(t) = A f(t) dt + L dW More generally stochastic evolution equations. O(n) GP regression with Kalman filters and smoothers. Parallel block-sparse precision methods − → O(log n).

slide-37
SLIDE 37

S(P)DEs and GPs Simo Särkkä 11 / 24

State-space methods for Gaussian processes

Approximation:

S(ω) ≈ b0 + b1 ω2 + · · · + bM ω2M a0 + a1 ω2 + · · · + aN ω2N

L

  • c

a t i

  • n

( x ) T i m e ( t ) f(x, t) The state at time t

Results in a linear stochastic differential equation (SDE) df(t) = A f(t) dt + L dW More generally stochastic evolution equations. O(n) GP regression with Kalman filters and smoothers. Parallel block-sparse precision methods − → O(log n).

slide-38
SLIDE 38

S(P)DEs and GPs Simo Särkkä 12 / 24

State-space methods – temporal example

Example (Matérn class 1d)

The Matérn class of covariance functions is k(t, t′) = σ2 21−ν Γ(ν) √ 2ν ℓ |t − t′| ν Kν √ 2ν ℓ |t − t′|

  • .

When, e.g., ν = 3/2, we have df(t) = 1 −λ2 −2λ

  • f(t) dt +

q1/2

  • dW(t),

f(t) =

  • 1
  • f(t).
slide-39
SLIDE 39

S(P)DEs and GPs Simo Särkkä 13 / 24

State-space methods – spatio-temporal example

Example (2D Matérn covariance function)

Consider a space-time Matérn covariance function k(x, t; x′, t′) = σ2 21−ν Γ(ν) √ 2ν ρ l ν Kν √ 2ν ρ l

  • .

where we have ρ =

  • (t − t′)2 + (x − x′)2, ν = 1 and

d = 2. We get the following representation: df(x, t) =

  • 1

∂2 ∂x2 − λ2

−2

  • λ2 − ∂2

∂x2

  • f(x, t) dt+

1

  • dW(x, t).
slide-40
SLIDE 40

S(P)DEs and GPs Simo Särkkä 13 / 24

State-space methods – spatio-temporal example

Example (2D Matérn covariance function)

Consider a space-time Matérn covariance function k(x, t; x′, t′) = σ2 21−ν Γ(ν) √ 2ν ρ l ν Kν √ 2ν ρ l

  • .

where we have ρ =

  • (t − t′)2 + (x − x′)2, ν = 1 and

d = 2. We get the following representation: df(x, t) =

  • 1

∂2 ∂x2 − λ2

−2

  • λ2 − ∂2

∂x2

  • f(x, t) dt+

1

  • dW(x, t).
slide-41
SLIDE 41

S(P)DEs and GPs Simo Särkkä 14 / 24

Contents

1

Basic ideas

2

Stochastic differential equations and Gaussian processes

3

Stochastic partial differential equations and Gaussian processes

4

Conclusion

slide-42
SLIDE 42

S(P)DEs and GPs Simo Särkkä 15 / 24

Basic idea of SPDE inference on GPs [1/2]

Consider e.g. the stochastic partial differential equation: ∂2f(x, y) ∂x2 + ∂2f(x, y) ∂y2 − λ2 f(x, y) = w(x, y) Fourier transforming gives the spectral density: S(ωx, ωy) ∝

  • λ2 + ω2

x + ω2 y

−2 . Inverse Fourier transform gives the covariance function:

k(x, y; x′, y′) =

  • (x − x′)2 + (y − y′)2

2λ K1(λ

  • (x − x′)2 + (y − y′)2)

But this is just the Matérn covariance function.

slide-43
SLIDE 43

S(P)DEs and GPs Simo Särkkä 15 / 24

Basic idea of SPDE inference on GPs [1/2]

Consider e.g. the stochastic partial differential equation: ∂2f(x, y) ∂x2 + ∂2f(x, y) ∂y2 − λ2 f(x, y) = w(x, y) Fourier transforming gives the spectral density: S(ωx, ωy) ∝

  • λ2 + ω2

x + ω2 y

−2 . Inverse Fourier transform gives the covariance function:

k(x, y; x′, y′) =

  • (x − x′)2 + (y − y′)2

2λ K1(λ

  • (x − x′)2 + (y − y′)2)

But this is just the Matérn covariance function.

slide-44
SLIDE 44

S(P)DEs and GPs Simo Särkkä 15 / 24

Basic idea of SPDE inference on GPs [1/2]

Consider e.g. the stochastic partial differential equation: ∂2f(x, y) ∂x2 + ∂2f(x, y) ∂y2 − λ2 f(x, y) = w(x, y) Fourier transforming gives the spectral density: S(ωx, ωy) ∝

  • λ2 + ω2

x + ω2 y

−2 . Inverse Fourier transform gives the covariance function:

k(x, y; x′, y′) =

  • (x − x′)2 + (y − y′)2

2λ K1(λ

  • (x − x′)2 + (y − y′)2)

But this is just the Matérn covariance function.

slide-45
SLIDE 45

S(P)DEs and GPs Simo Särkkä 15 / 24

Basic idea of SPDE inference on GPs [1/2]

Consider e.g. the stochastic partial differential equation: ∂2f(x, y) ∂x2 + ∂2f(x, y) ∂y2 − λ2 f(x, y) = w(x, y) Fourier transforming gives the spectral density: S(ωx, ωy) ∝

  • λ2 + ω2

x + ω2 y

−2 . Inverse Fourier transform gives the covariance function:

k(x, y; x′, y′) =

  • (x − x′)2 + (y − y′)2

2λ K1(λ

  • (x − x′)2 + (y − y′)2)

But this is just the Matérn covariance function.

slide-46
SLIDE 46

S(P)DEs and GPs Simo Särkkä 16 / 24

Basic idea of SPDE inference on GPs [2/2]

More generally, SPDE for some linear operator L: L f(x) = w(x) Now f is a GP with precision and covariance operators: K−1 = L∗ L K = (L∗ L)−1 Idea: approximate L or L−1 using PDE/ODE methods:

1

Finite-differences/FEM methods lead to sparse precision approximations.

2

Fourier/basis-function methods lead to reduced rank covariance approximations.

3

Spectral factorization leads to state-space (Kalman) methods which are time-recursive (or sparse in precision).

slide-47
SLIDE 47

S(P)DEs and GPs Simo Särkkä 16 / 24

Basic idea of SPDE inference on GPs [2/2]

More generally, SPDE for some linear operator L: L f(x) = w(x) Now f is a GP with precision and covariance operators: K−1 = L∗ L K = (L∗ L)−1 Idea: approximate L or L−1 using PDE/ODE methods:

1

Finite-differences/FEM methods lead to sparse precision approximations.

2

Fourier/basis-function methods lead to reduced rank covariance approximations.

3

Spectral factorization leads to state-space (Kalman) methods which are time-recursive (or sparse in precision).

slide-48
SLIDE 48

S(P)DEs and GPs Simo Särkkä 16 / 24

Basic idea of SPDE inference on GPs [2/2]

More generally, SPDE for some linear operator L: L f(x) = w(x) Now f is a GP with precision and covariance operators: K−1 = L∗ L K = (L∗ L)−1 Idea: approximate L or L−1 using PDE/ODE methods:

1

Finite-differences/FEM methods lead to sparse precision approximations.

2

Fourier/basis-function methods lead to reduced rank covariance approximations.

3

Spectral factorization leads to state-space (Kalman) methods which are time-recursive (or sparse in precision).

slide-49
SLIDE 49

S(P)DEs and GPs Simo Särkkä 16 / 24

Basic idea of SPDE inference on GPs [2/2]

More generally, SPDE for some linear operator L: L f(x) = w(x) Now f is a GP with precision and covariance operators: K−1 = L∗ L K = (L∗ L)−1 Idea: approximate L or L−1 using PDE/ODE methods:

1

Finite-differences/FEM methods lead to sparse precision approximations.

2

Fourier/basis-function methods lead to reduced rank covariance approximations.

3

Spectral factorization leads to state-space (Kalman) methods which are time-recursive (or sparse in precision).

slide-50
SLIDE 50

S(P)DEs and GPs Simo Särkkä 16 / 24

Basic idea of SPDE inference on GPs [2/2]

More generally, SPDE for some linear operator L: L f(x) = w(x) Now f is a GP with precision and covariance operators: K−1 = L∗ L K = (L∗ L)−1 Idea: approximate L or L−1 using PDE/ODE methods:

1

Finite-differences/FEM methods lead to sparse precision approximations.

2

Fourier/basis-function methods lead to reduced rank covariance approximations.

3

Spectral factorization leads to state-space (Kalman) methods which are time-recursive (or sparse in precision).

slide-51
SLIDE 51

S(P)DEs and GPs Simo Särkkä 16 / 24

Basic idea of SPDE inference on GPs [2/2]

More generally, SPDE for some linear operator L: L f(x) = w(x) Now f is a GP with precision and covariance operators: K−1 = L∗ L K = (L∗ L)−1 Idea: approximate L or L−1 using PDE/ODE methods:

1

Finite-differences/FEM methods lead to sparse precision approximations.

2

Fourier/basis-function methods lead to reduced rank covariance approximations.

3

Spectral factorization leads to state-space (Kalman) methods which are time-recursive (or sparse in precision).

slide-52
SLIDE 52

S(P)DEs and GPs Simo Särkkä 17 / 24

Finite-differences/FEM – sparse precision

Basic idea:

∂f(x) ∂x ≈ f(x + h) − f(x) h ∂2f(x) ∂x2 ≈ f(x + h) − 2f(x) + f(x − h) h2

We get an SPDE approximation L ≈ L, where L is sparse The precision operator approximation is then sparse: K−1 ≈ LT L = sparse L need to be approximated as integro-differential operator. Requires formation of a grid, but parallelizes well.

slide-53
SLIDE 53

S(P)DEs and GPs Simo Särkkä 17 / 24

Finite-differences/FEM – sparse precision

Basic idea:

∂f(x) ∂x ≈ f(x + h) − f(x) h ∂2f(x) ∂x2 ≈ f(x + h) − 2f(x) + f(x − h) h2

We get an SPDE approximation L ≈ L, where L is sparse The precision operator approximation is then sparse: K−1 ≈ LT L = sparse L need to be approximated as integro-differential operator. Requires formation of a grid, but parallelizes well.

slide-54
SLIDE 54

S(P)DEs and GPs Simo Särkkä 17 / 24

Finite-differences/FEM – sparse precision

Basic idea:

∂f(x) ∂x ≈ f(x + h) − f(x) h ∂2f(x) ∂x2 ≈ f(x + h) − 2f(x) + f(x − h) h2

We get an SPDE approximation L ≈ L, where L is sparse The precision operator approximation is then sparse: K−1 ≈ LT L = sparse L need to be approximated as integro-differential operator. Requires formation of a grid, but parallelizes well.

slide-55
SLIDE 55

S(P)DEs and GPs Simo Särkkä 17 / 24

Finite-differences/FEM – sparse precision

Basic idea:

∂f(x) ∂x ≈ f(x + h) − f(x) h ∂2f(x) ∂x2 ≈ f(x + h) − 2f(x) + f(x − h) h2

We get an SPDE approximation L ≈ L, where L is sparse The precision operator approximation is then sparse: K−1 ≈ LT L = sparse L need to be approximated as integro-differential operator. Requires formation of a grid, but parallelizes well.

slide-56
SLIDE 56

S(P)DEs and GPs Simo Särkkä 17 / 24

Finite-differences/FEM – sparse precision

Basic idea:

∂f(x) ∂x ≈ f(x + h) − f(x) h ∂2f(x) ∂x2 ≈ f(x + h) − 2f(x) + f(x − h) h2

We get an SPDE approximation L ≈ L, where L is sparse The precision operator approximation is then sparse: K−1 ≈ LT L = sparse L need to be approximated as integro-differential operator. Requires formation of a grid, but parallelizes well.

slide-57
SLIDE 57

S(P)DEs and GPs Simo Särkkä 18 / 24

Classical and random Fourier methods – reduced rank approximations and FFT

Approximation:

f(x) ≈

  • k∈Nd

ck exp

  • 2π i kT x
  • ck ∼ Gaussian

We use less coefficients ck than the number of data points. Leads to reduced-rank covariance approximations k(x, x′) ≈

  • |k|≤N

σ2

k exp

  • 2π i kT x
  • exp
  • 2π i kT x′∗

Truncated series, random frequencies, FFT, . . .

slide-58
SLIDE 58

S(P)DEs and GPs Simo Särkkä 18 / 24

Classical and random Fourier methods – reduced rank approximations and FFT

Approximation:

f(x) ≈

  • k∈Nd

ck exp

  • 2π i kT x
  • ck ∼ Gaussian

We use less coefficients ck than the number of data points. Leads to reduced-rank covariance approximations k(x, x′) ≈

  • |k|≤N

σ2

k exp

  • 2π i kT x
  • exp
  • 2π i kT x′∗

Truncated series, random frequencies, FFT, . . .

slide-59
SLIDE 59

S(P)DEs and GPs Simo Särkkä 18 / 24

Classical and random Fourier methods – reduced rank approximations and FFT

Approximation:

f(x) ≈

  • k∈Nd

ck exp

  • 2π i kT x
  • ck ∼ Gaussian

We use less coefficients ck than the number of data points. Leads to reduced-rank covariance approximations k(x, x′) ≈

  • |k|≤N

σ2

k exp

  • 2π i kT x
  • exp
  • 2π i kT x′∗

Truncated series, random frequencies, FFT, . . .

slide-60
SLIDE 60

S(P)DEs and GPs Simo Särkkä 18 / 24

Classical and random Fourier methods – reduced rank approximations and FFT

Approximation:

f(x) ≈

  • k∈Nd

ck exp

  • 2π i kT x
  • ck ∼ Gaussian

We use less coefficients ck than the number of data points. Leads to reduced-rank covariance approximations k(x, x′) ≈

  • |k|≤N

σ2

k exp

  • 2π i kT x
  • exp
  • 2π i kT x′∗

Truncated series, random frequencies, FFT, . . .

slide-61
SLIDE 61

S(P)DEs and GPs Simo Särkkä 19 / 24

Hilbert-space/Galerkin methods – reduced rank approximations

Approximation:

f(x) ≈

  • i

ci φi(x) φi, φjH ≈ δij, e.g. ∇2φi = −λi φi

Again, use less coefficients than the number of data points. Reduced-rank covariance approximations such as k(x, x′) ≈

N

  • i=1

σ2

i φi(x) φi(x′).

Wavelets, Galerkin, finite elements, . . .

slide-62
SLIDE 62

S(P)DEs and GPs Simo Särkkä 19 / 24

Hilbert-space/Galerkin methods – reduced rank approximations

Approximation:

f(x) ≈

  • i

ci φi(x) φi, φjH ≈ δij, e.g. ∇2φi = −λi φi

Again, use less coefficients than the number of data points. Reduced-rank covariance approximations such as k(x, x′) ≈

N

  • i=1

σ2

i φi(x) φi(x′).

Wavelets, Galerkin, finite elements, . . .

slide-63
SLIDE 63

S(P)DEs and GPs Simo Särkkä 19 / 24

Hilbert-space/Galerkin methods – reduced rank approximations

Approximation:

f(x) ≈

  • i

ci φi(x) φi, φjH ≈ δij, e.g. ∇2φi = −λi φi

Again, use less coefficients than the number of data points. Reduced-rank covariance approximations such as k(x, x′) ≈

N

  • i=1

σ2

i φi(x) φi(x′).

Wavelets, Galerkin, finite elements, . . .

slide-64
SLIDE 64

S(P)DEs and GPs Simo Särkkä 19 / 24

Hilbert-space/Galerkin methods – reduced rank approximations

Approximation:

f(x) ≈

  • i

ci φi(x) φi, φjH ≈ δij, e.g. ∇2φi = −λi φi

Again, use less coefficients than the number of data points. Reduced-rank covariance approximations such as k(x, x′) ≈

N

  • i=1

σ2

i φi(x) φi(x′).

Wavelets, Galerkin, finite elements, . . .

slide-65
SLIDE 65

S(P)DEs and GPs Simo Särkkä 20 / 24

Contents

1

Basic ideas

2

Stochastic differential equations and Gaussian processes

3

Stochastic partial differential equations and Gaussian processes

4

Conclusion

slide-66
SLIDE 66

S(P)DEs and GPs Simo Särkkä 21 / 24

Back to SPDE representations of GPs

GP model x ∈ Rd, t ∈ R Equivalent S(P)DE model Spatial k(x, x′) SPDE model (L is an operator) L f(x) = w(x) Temporal k(t, t′) State-space/SDE model df(t) dt = A f(t) + L w(t) Spatio-temporal k(x, t; x′, t′) Stochastic evolution equation ∂ ∂t f(x, t) = Ax f(x, t) + L w(x, t)

slide-67
SLIDE 67

S(P)DEs and GPs Simo Särkkä 22 / 24

What then?

Exchange and map approximations between the fields:

Inducing points ↔ point-collocation; spectral methods ↔ Galerkin methods; finite-differences ↔ GMRFs;

Non-Gaussian processes: Student’s-t processes, non-linear Itô processes, jump processes, hybrid point/Gaussian processes. Hierarchical (deep) SPDE models: we stack SPDEs on top

  • f each other – the SPDE just becomes non-linear.

Combined first-principles and nonparametric models – latent force models (LFM), also non-linear and non-Gaussian LFMs. Inverse problems – operators in the measurement model.

slide-68
SLIDE 68

S(P)DEs and GPs Simo Särkkä 22 / 24

What then?

Exchange and map approximations between the fields:

Inducing points ↔ point-collocation; spectral methods ↔ Galerkin methods; finite-differences ↔ GMRFs;

Non-Gaussian processes: Student’s-t processes, non-linear Itô processes, jump processes, hybrid point/Gaussian processes. Hierarchical (deep) SPDE models: we stack SPDEs on top

  • f each other – the SPDE just becomes non-linear.

Combined first-principles and nonparametric models – latent force models (LFM), also non-linear and non-Gaussian LFMs. Inverse problems – operators in the measurement model.

slide-69
SLIDE 69

S(P)DEs and GPs Simo Särkkä 22 / 24

What then?

Exchange and map approximations between the fields:

Inducing points ↔ point-collocation; spectral methods ↔ Galerkin methods; finite-differences ↔ GMRFs;

Non-Gaussian processes: Student’s-t processes, non-linear Itô processes, jump processes, hybrid point/Gaussian processes. Hierarchical (deep) SPDE models: we stack SPDEs on top

  • f each other – the SPDE just becomes non-linear.

Combined first-principles and nonparametric models – latent force models (LFM), also non-linear and non-Gaussian LFMs. Inverse problems – operators in the measurement model.

slide-70
SLIDE 70

S(P)DEs and GPs Simo Särkkä 22 / 24

What then?

Exchange and map approximations between the fields:

Inducing points ↔ point-collocation; spectral methods ↔ Galerkin methods; finite-differences ↔ GMRFs;

Non-Gaussian processes: Student’s-t processes, non-linear Itô processes, jump processes, hybrid point/Gaussian processes. Hierarchical (deep) SPDE models: we stack SPDEs on top

  • f each other – the SPDE just becomes non-linear.

Combined first-principles and nonparametric models – latent force models (LFM), also non-linear and non-Gaussian LFMs. Inverse problems – operators in the measurement model.

slide-71
SLIDE 71

S(P)DEs and GPs Simo Särkkä 22 / 24

What then?

Exchange and map approximations between the fields:

Inducing points ↔ point-collocation; spectral methods ↔ Galerkin methods; finite-differences ↔ GMRFs;

Non-Gaussian processes: Student’s-t processes, non-linear Itô processes, jump processes, hybrid point/Gaussian processes. Hierarchical (deep) SPDE models: we stack SPDEs on top

  • f each other – the SPDE just becomes non-linear.

Combined first-principles and nonparametric models – latent force models (LFM), also non-linear and non-Gaussian LFMs. Inverse problems – operators in the measurement model.

slide-72
SLIDE 72

S(P)DEs and GPs Simo Särkkä 22 / 24

What then?

Exchange and map approximations between the fields:

Inducing points ↔ point-collocation; spectral methods ↔ Galerkin methods; finite-differences ↔ GMRFs;

Non-Gaussian processes: Student’s-t processes, non-linear Itô processes, jump processes, hybrid point/Gaussian processes. Hierarchical (deep) SPDE models: we stack SPDEs on top

  • f each other – the SPDE just becomes non-linear.

Combined first-principles and nonparametric models – latent force models (LFM), also non-linear and non-Gaussian LFMs. Inverse problems – operators in the measurement model.

slide-73
SLIDE 73

S(P)DEs and GPs Simo Särkkä 23 / 24

Summary

Gaussian processes (GPs) are nice, but the computational scaling is bad. GPs have representations as solutions to SPDEs. In temporal models we can use Kalman/Bayesian filters and smoothers. SPDE methods can be used to speed up GP inference. New paths towards non-linear GP models. Work nicely in latent force models.

slide-74
SLIDE 74

S(P)DEs and GPs Simo Särkkä 23 / 24

Summary

Gaussian processes (GPs) are nice, but the computational scaling is bad. GPs have representations as solutions to SPDEs. In temporal models we can use Kalman/Bayesian filters and smoothers. SPDE methods can be used to speed up GP inference. New paths towards non-linear GP models. Work nicely in latent force models.

slide-75
SLIDE 75

S(P)DEs and GPs Simo Särkkä 23 / 24

Summary

Gaussian processes (GPs) are nice, but the computational scaling is bad. GPs have representations as solutions to SPDEs. In temporal models we can use Kalman/Bayesian filters and smoothers. SPDE methods can be used to speed up GP inference. New paths towards non-linear GP models. Work nicely in latent force models.

slide-76
SLIDE 76

S(P)DEs and GPs Simo Särkkä 23 / 24

Summary

Gaussian processes (GPs) are nice, but the computational scaling is bad. GPs have representations as solutions to SPDEs. In temporal models we can use Kalman/Bayesian filters and smoothers. SPDE methods can be used to speed up GP inference. New paths towards non-linear GP models. Work nicely in latent force models.

slide-77
SLIDE 77

S(P)DEs and GPs Simo Särkkä 23 / 24

Summary

Gaussian processes (GPs) are nice, but the computational scaling is bad. GPs have representations as solutions to SPDEs. In temporal models we can use Kalman/Bayesian filters and smoothers. SPDE methods can be used to speed up GP inference. New paths towards non-linear GP models. Work nicely in latent force models.

slide-78
SLIDE 78

S(P)DEs and GPs Simo Särkkä 23 / 24

Summary

Gaussian processes (GPs) are nice, but the computational scaling is bad. GPs have representations as solutions to SPDEs. In temporal models we can use Kalman/Bayesian filters and smoothers. SPDE methods can be used to speed up GP inference. New paths towards non-linear GP models. Work nicely in latent force models.

slide-79
SLIDE 79

S(P)DEs and GPs Simo Särkkä 24 / 24

Useful references

  • N. Wiener (1950). Extrapolation, Interpolation and Smoothing of

Stationary Time Series with Engineering Applications. John Wiley & Sons, Inc.

  • R. L. Stratonovich (1963). Topics in the Theory of Random Noise.

Gordon and Breach.

  • J. Hartikainen and S. Särkkä (2010). Kalman Filtering and Smoothing

Solutions to Temporal Gaussian Process Regression Models. Proc. MLSP .

  • S. Särkkä, A. Solin, and J. Hartikainen (2013). Spatio-Temporal

Learning via Infinite-Dimensional Bayesian Filtering and Smoothing. IEEE Sig.Proc.Mag., 30(5):51–61.

  • S. Särkkä (2013). Bayesian Filtering and Smoothing. Cambridge

University Press.

  • S. Särkkä and A. Solin (2017, to appear) Applied Stochastic

Differential Equations. Cambridge University Press.