ODE Filtering A Gaussian Decision Agent for Forward Problems Hans - - PowerPoint PPT Presentation

ode filtering a gaussian decision agent for forward
SMART_READER_LITE
LIVE PREVIEW

ODE Filtering A Gaussian Decision Agent for Forward Problems Hans - - PowerPoint PPT Presentation

ODE Filtering A Gaussian Decision Agent for Forward Problems Hans Kersting Alan Turing Institute 12 April 2018 some of the presented work is supported by the European Research Council. ODEs from a Bayesian machine learning perspective How we


slide-1
SLIDE 1

ODE Filtering A Gaussian Decision Agent for Forward Problems

Hans Kersting Alan Turing Institute 12 April 2018

some of the presented work is supported by the European Research Council.

slide-2
SLIDE 2

ODEs from a Bayesian machine learning perspective

How we think about ODEs...

˙ x(t) = f(x(t)), t ∈ [0, T] x(0) = x0 ∈ Rd (1) We model all unkown (even if deterministic) objects, i.e.

solution x ∈ C1

[0, T]; Rd ,

vector field f ∈ C0

[0, T]; Rd by random variables or stochastic processes (prior information), and define which information we

  • btain in the course of the numerical computation of the solution (measurement model).

Prior information + measurement model → application of Bayes rule Choice of Prior information

1

slide-3
SLIDE 3

Prior information on x

For prior information on f see our publications.

We a priori model x and ˙ x with an arbitrary Gauss-Markov process, i.e. with linear SDE dXt = FXtdt + LdBt, with Gaussian initial condition X0 ∼ N(m0, P0). For an Integrated Brownian motion (Wiener process) x(t) ˙ x(t)

dXt d ˙ Xt

  • =

1 Xt ˙ Xt

  • dt +

σ

  • dBt,

(2) the ODE filter coincides with Runge-Kutta and Nordsieck methods in a certain sense [SSH18]. An Ornstein Uhlenbeck prior x(t) ˙ x(t)

dXt d ˙ Xt

  • =

1 −θ Xt ˙ Xt

  • dt +

σ

  • dBt,

(3) has also been studied [MKSH17].

2

slide-4
SLIDE 4

Numerical solutions of IVPs

plots: Runge-Kutta of order 3

How classical solvers extrapolate forward from time t0 to t0 + h:

Estimate ˙

x(ti), t0 ≤ t1 ≤ · · · ≤ tn ≤ t0 + h by evaluating yi ≈ f(ˆ x(ti)), where ˆ x(t) is itself an estimate for x(t)

Use this data yi := ˙

x(ti) to estimate x(t0 + h), i.e. ˆ x(t0 + h) ≈ x(t0) + h

b

  • i=1

wiyi. t0 t0 + c1 t0 + c2 t0 + h t x(t)

3

slide-5
SLIDE 5

Numerical solutions of IVPs

plots: Runge-Kutta of order 3

How classical solvers extrapolate forward from time t0 to t0 + h:

Estimate ˙

x(ti), t0 ≤ t1 ≤ · · · ≤ tn ≤ t0 + h by evaluating yi ≈ f(ˆ x(ti)), where ˆ x(t) is itself an estimate for x(t)

Use this data yi := ˙

x(ti) to estimate x(t0 + h), i.e. ˆ x(t0 + h) ≈ x(t0) + h

b

  • i=1

wiyi. t0 t0 + c1 t0 + c2 t0 + h t x(t)

3

slide-6
SLIDE 6

Numerical solutions of IVPs

plots: Runge-Kutta of order 3

How classical solvers extrapolate forward from time t0 to t0 + h:

Estimate ˙

x(ti), t0 ≤ t1 ≤ · · · ≤ tn ≤ t0 + h by evaluating yi ≈ f(ˆ x(ti)), where ˆ x(t) is itself an estimate for x(t)

Use this data yi := ˙

x(ti) to estimate x(t0 + h), i.e. ˆ x(t0 + h) ≈ x(t0) + h

b

  • i=1

wiyi. t0 t0 + c1 t0 + c2 t0 + h t x(t)

3

slide-7
SLIDE 7

Numerical solutions of IVPs

plots: Runge-Kutta of order 3

How classical solvers extrapolate forward from time t0 to t0 + h:

Estimate ˙

x(ti), t0 ≤ t1 ≤ · · · ≤ tn ≤ t0 + h by evaluating yi ≈ f(ˆ x(ti)), where ˆ x(t) is itself an estimate for x(t)

Use this data yi := ˙

x(ti) to estimate x(t0 + h), i.e. ˆ x(t0 + h) ≈ x(t0) + h

b

  • i=1

wiyi. Information in these calculations: ˙ x(t) = f(x(t)) ≈ f(ˆ x(t)) (4) For information, f is evaluated at (or around) the current numerical estimate ˆ x of x.

3

slide-8
SLIDE 8

Measurement Models

In principle, given a Gaussian believe x(t) ˙ x(t)

  • ∼ N

m(t) ˙ m(t)

  • ,

P00 P01 P10 P11

  • ,

(5) the ‘true’ information on ˙ x(t) would be the pushforward measure f∗N ( ˙ m(t), P00). For computational speed, we want a Gaussian, with matched moments y =

  • f(ξ) dN (ξ; m(t), P00) ,

(6) and covariance R =

  • f(ξ)fT(ξ) ddN (ξ; m(t), P00) .

(7) Suitable ways to approximate these integrals have been studied in [KH16]. For maximum speed, we can just use y = f(m(t)) and R = 0 as proposed in [SSH18]. This yields a (Kalman) filtering algorithm for ODEs.

4

slide-9
SLIDE 9

Filtering–based probabilistic ODE solvers

Gaussian filtering [SDH14]

We interpret (x, ˙ x, x(2), ... , x(q−1)) as a draw from a q-times-integrated Wiener process (Xt)t∈[0,T] = (X(1)

t , ... , X(q) t )T t∈[0,T]

given by a linear time-invariant SDE: dXt = FXtdt + QdWt, X0 = ξ, ξ ∼ N(m(0), P(0)).

5

slide-10
SLIDE 10

Filtering–based probabilistic ODE solvers

Gaussian filtering [SDH14]

We interpret (x, ˙ x, x(2), ... , x(q−1)) as a draw from a q-times-integrated Wiener process (Xt)t∈[0,T] = (X(1)

t , ... , X(q) t )T t∈[0,T]

given by a linear time-invariant SDE: dXt = FXtdt + QdWt, X0 = ξ, ξ ∼ N(m(0), P(0)). Calculation of Posterior by Gaussian filtering

5

slide-11
SLIDE 11

Filtering–based probabilistic ODE solvers

Gaussian filtering [SDH14]

We interpret (x, ˙ x, x(2), ... , x(q−1)) as a draw from a q-times-integrated Wiener process (Xt)t∈[0,T] = (X(1)

t , ... , X(q) t )T t∈[0,T]

given by a linear time-invariant SDE: dXt = FXtdt + QdWt, X0 = ξ, ξ ∼ N(m(0), P(0)). Calculation of Posterior by Gaussian filtering Prediction step: m−

t+h = A(h)mt,

P−

t+h = A(h)PtA(h)T + Q(h),

5

slide-12
SLIDE 12

Filtering–based probabilistic ODE solvers

Gaussian filtering [SDH14]

We interpret (x, ˙ x, x(2), ... , x(q−1)) as a draw from a q-times-integrated Wiener process (Xt)t∈[0,T] = (X(1)

t , ... , X(q) t )T t∈[0,T]

given by a linear time-invariant SDE: dXt = FXtdt + QdWt, X0 = ξ, ξ ∼ N(m(0), P(0)). Calculation of Posterior by Gaussian filtering Prediction step: m−

t+h = A(h)mt,

P−

t+h = A(h)PtA(h)T + Q(h),

Gradient prediction at t + h: Approximate y ≈

  • f(ξ) dN (ξ; m(t), P00) ,

R ≈

  • f(ξ)fT(ξ) dN (ξ; m(t), P00)

5

slide-13
SLIDE 13

Filtering–based probabilistic ODE solvers

Gaussian filtering [SDH14]

We interpret (x, ˙ x, x(2), ... , x(q−1)) as a draw from a q-times-integrated Wiener process (Xt)t∈[0,T] = (X(1)

t , ... , X(q) t )T t∈[0,T]

given by a linear time-invariant SDE: dXt = FXtdt + QdWt, X0 = ξ, ξ ∼ N(m(0), P(0)). Calculation of Posterior by Gaussian filtering Prediction step: m−

t+h = A(h)mt,

P−

t+h = A(h)PtA(h)T + Q(h),

Gradient prediction at t + h: Approximate y ≈

  • f(ξ) dN (ξ; m(t), P00) ,

R ≈

  • f(ξ)fT(ξ) dN (ξ; m(t), P00)

Update step: z = y − eT

nm− t+h,

S = eT

nP− t+hen + R,

K = P−

t+henS−1,

mt+h = m−

t+h + Kz,

Pt+h = P−

t+h − KeT nP− t+h,

5

slide-14
SLIDE 14

Research Questions

  • 1. worst-case convergence rates vs. average convergence rates (over a measure on f),
  • 2. trade-off between computational speed (with Gaussians) and statistical accuracy (with samples),
  • 3. properties of different priors on x,
  • 4. in which sense are ‘Bayesian’ algorithms (like the above) approximations of Bayesian algorithms in

the sense of [COSG17],

  • 5. can PN algorithms for ODEs be extended to SDEs?,
  • 6. Bayesian inverse problems—inner loop vs outer loop trade-off like in Bayesian optimization?
  • 7. different filters (particle filter, ensemble Kalman filter)?

6

slide-15
SLIDE 15

Thank you for listening!

7

slide-16
SLIDE 16

Bibliography

◮ J. Cockayne, C.J. Oates, T. Sullivan, and M.A. Girolami. Bayesian probabilistic numerical methods. arXiv:1702.03673 [stat.ME], February 2017. ◮ Hans Kersting and P . Hennig. Active Uncertainty Calibration in Bayesian ODE Solvers. Uncertainty in Artificial Intelligence (UAI), 2016. ◮ E. Magnani, H. Kersting, M. Schober, and P . Hennig. Bayesian Filtering for ODEs with Bounded Derivatives. arXiv:1709.08471 [cs.NA], September 2017. ◮ M. Schober, D. Duvenaud, and P . Hennig. Probabilistic ODE Solvers with Runge–Kutta Means. Advances in Neural Information Processing Systems (NIPS), 2014. ◮ Michael Schober, Simo Särkkä, and Philipp Hennig. A probabilistic model for the numerical solution of initial value problems. Statistics and Computing, January 2018.

8