AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs - - PowerPoint PPT Presentation

ares and mars adversarial and mmd minimizing regression
SMART_READER_LITE
LIVE PREVIEW

AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs - - PowerPoint PPT Presentation

AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs Gabriele Abbati *1 , Philippe Wenk *23 , Michael A Osborne 1 , Andreas Krause 2 , Bernhard Schlkopf 4 , Stefan Bauer 4 1 University of Oxford, 2 ETH Zrich, 3 Max Planck ETH


slide-1
SLIDE 1

AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs

Gabriele Abbati*1, Philippe Wenk*23, Michael A Osborne1, Andreas Krause2, Bernhard Schölkopf4, Stefan Bauer4

1University of Oxford, 2ETH Zürich, 3Max Planck ETH Center for Learning Systems, 4Max

Planck Institute for Intelligent Systems

Thirty-sixth International Conference on Machine Learning

1

slide-2
SLIDE 2

Stochastic Differential Equations in the Wild

(a) Robotics (source: Athena robot, MPI-IS) (b) Atmospheric Modeling (source: wikipedia) (c) Stock Markets (source: Yahoo Finance)

2

slide-3
SLIDE 3

Gradient Matching

ODE { ˙ x = f(x, θ) y = x + ϵ with ϵ ∼ N(0, σy) Given f and y, infer x and θ SDE { dx = f(x, θ)dt + GdW y = x + ϵ with ϵ ∼ N(0, σy) Given f and y, infer x, G and θ Integration-based methods parameters → trajectory Integration-free methods trajectory → parameters

3

slide-4
SLIDE 4

Classic Gradient Matching - Model

(1) Gaussian Process prior on states p(x | ϕ) = N(x | µy, Cφ) p(˙ x | x, ϕ) = N(˙ x | Dx, A) φ x ˙ x y σ (2) ODE Model p(˙ x | x, θ, γ) = N(˙ x | f(x, θ), γI) ˙ x x γ θ

4

slide-5
SLIDE 5

Classic Gradient Matching - Inference

Calderhead, Girolami, and Lawrence (2009) and Dondelinger et al. (2013) Product of Experts: p(˙ x) ∝ pdata(˙ x)pODE(˙ x) Wenk et al. (2018), FGPGM Forced equality: p(˙ x) ∝ pdata(˙ xdata)pODE(˙ xODE)δ(˙ xdata − ˙ x)δ(˙ xODE − ˙ x) Wenk*, Abbati* et al. (2019), ODIN ODEs as constraints

5

slide-6
SLIDE 6

Stochastic Differential Equations

General SDE Problem { dx = f(x, θ)dt + GdW y = x + ϵ with ϵ ∼ N(0, σy) Given f and y, infer x, G and θ

5 10 15 20 t −2 2

Example { dx = θ0x(θ1 − x2)dt + Gdw y = x + ϵ with ϵ ∼ N(0, σy) Given f and y, infer x, G and θ

6

slide-7
SLIDE 7

Stochastic Gradient Matching?

Problems: Both observation and process noise Stochastic sample paths Paths are not differentiable

5 10 15 20 t −2 2

7

slide-8
SLIDE 8

The Doss-Sussmann Transformation

General SDE Problem { dx = f(x, θ)dt + GdW y = x + ϵ with ϵ ∼ N(0, σy) Given f and y, infer x, G and θ Definition (Ornstein-Uhlenbeck Process) A stochastic process o defined by the equation: do = −odt + GdW We introduce the latent variable z = x − o to get the stochastic gradients dz(t) = {f(z(t) + o(t), θ) + o(t)} dt

8

slide-9
SLIDE 9

The Doss-Sussmann Transformation

General SDE Problem { dx = f(x, θ)dt + GdW y = x + ϵ with ϵ ∼ N(0, σy) Given f and y, infer x, G and θ Definition (Ornstein-Uhlenbeck Process) A stochastic process o defined by the equation: do = −odt + GdW We introduce the latent variable z = x − o to get the stochastic gradients dz(t) = {f(z(t) + o(t), θ) + o(t)} dt

8

slide-10
SLIDE 10

The Doss-Sussmann Transformation

General SDE Problem { dx = f(x, θ)dt + GdW y = x + ϵ with ϵ ∼ N(0, σy) Given f and y, infer x, G and θ Definition (Ornstein-Uhlenbeck Process) A stochastic process o defined by the equation: do = −odt + GdW We introduce the latent variable z = x − o to get the stochastic gradients dz(t) = {f(z(t) + o(t), θ) + o(t)} dt

8

slide-11
SLIDE 11

A Novel Generative Model

Previous Generative Model Y = X + E New Generative Model Y = Z + O + E Resulting observation marginal distribution: p(˜ y | ϕ, G, σ) = N(0, Cφ + BΩBT + T) Gaussian prior OU process

  • Obs. noise

9

slide-12
SLIDE 12

A Novel Generative Model

Previous Generative Model Y = X + E New Generative Model Y = Z + O + E Resulting observation marginal distribution: p(˜ y | ϕ, G, σ) = N(0, Cφ + BΩBT + T) Gaussian prior OU process

  • Obs. noise

9

slide-13
SLIDE 13

A Tale of Two Graphical Models

SDE-based model G

  • σ

˙ z z y ϕ θ p(˙ z | o, z, θ) = δ(˙ z−f(z+o, θ)−o) Data-based model G

  • σ

˙ z z y ϕ p(˙ z | z, ϕ) = N (˙ z | Dz, A) Good θ estimate p(˙ z | o, z, θ) ∼ p(˙ z | z, ϕ)

10

slide-14
SLIDE 14

A Tale of Two Graphical Models

SDE-based model G

  • σ

˙ z z y ϕ θ p(˙ z | o, z, θ) = δ(˙ z−f(z+o, θ)−o) Data-based model G

  • σ

˙ z z y ϕ p(˙ z | z, ϕ) = N (˙ z | Dz, A) Good θ estimate p(˙ z | o, z, θ) ∼ p(˙ z | z, ϕ)

10

slide-15
SLIDE 15

Sample-based Parameter Inference

GP fit (ZOE noise model) Samples from pSDE ˙ zSDE ∼ p(˙ z | o, z, θ) Samples from pdata ˙ zdata ∼ p(˙ z | z, ϕ) Iterative Gradient-based

  • ptimization

→ ˙ zSDE ∼ ˙ zdata (1) AReS (WGAN) θ ← −∇θ [

1 M

∑M

i=1 fω(˙

z(i)

SDE)

] (2) MaRS (MMD) θ ← ∇θMMD2

u [˙

zSDE, ˙ zdata]

11

slide-16
SLIDE 16

Sample-based Parameter Inference

GP fit (ZOE noise model) Samples from pSDE ˙ zSDE ∼ p(˙ z | o, z, θ) Samples from pdata ˙ zdata ∼ p(˙ z | z, ϕ) Iterative Gradient-based

  • ptimization

→ ˙ zSDE ∼ ˙ zdata (1) AReS (WGAN) θ ← −∇θ [

1 M

∑M

i=1 fω(˙

z(i)

SDE)

] (2) MaRS (MMD) θ ← ∇θMMD2

u [˙

zSDE, ˙ zdata]

11

slide-17
SLIDE 17

Sample-based Parameter Inference

GP fit (ZOE noise model) Samples from pSDE ˙ zSDE ∼ p(˙ z | o, z, θ) Samples from pdata ˙ zdata ∼ p(˙ z | z, ϕ) Iterative Gradient-based

  • ptimization

→ ˙ zSDE ∼ ˙ zdata (1) AReS (WGAN) θ ← −∇θ [

1 M

∑M

i=1 fω(˙

z(i)

SDE)

] (2) MaRS (MMD) θ ← ∇θMMD2

u [˙

zSDE, ˙ zdata]

11

slide-18
SLIDE 18

Sample-based Parameter Inference

GP fit (ZOE noise model) Samples from pSDE ˙ zSDE ∼ p(˙ z | o, z, θ) Samples from pdata ˙ zdata ∼ p(˙ z | z, ϕ) Iterative Gradient-based

  • ptimization

→ ˙ zSDE ∼ ˙ zdata (1) AReS (WGAN) θ ← −∇θ [

1 M

∑M

i=1 fω(˙

z(i)

SDE)

] (2) MaRS (MMD) θ ← ∇θMMD2

u [˙

zSDE, ˙ zdata]

11

slide-19
SLIDE 19

Samples during Training

t ˙ z(t)

Data-based Model-based

(a) Samples before training

t ˙ z(t)

Data-based Model-based

(b) Samples after training

12

slide-20
SLIDE 20

Experimental Results - Lotka Volterra

LV, GT NPSDE ESGF AReS MaRS θ0 = 2 1.58 ± 0.71 2.04 ± 0.09 2.36 ± 0.18 2.00 ± 0.09 θ1 = 1 0.74 ± 0.31 1.02 ± 0.05 1.18 ± 0.9 1.00 ± 0.04 θ2 = 4 2.26 ± 1.51 3.87 ± 0.59 3.97 ± 0.63 3.70 ± 0.51 θ3 = 1 0.49 ± 0.35 0.96 ± 0.14 0.98 ± 0.18 0.91 ± 0.14 H1,1 = 0.05 / 0.01 ± 0.03 0.03 ± 0.004 H1,2 = 0.03 / 0.01 ± 0.01 0.02 ± 0.01 H2,1 = 0.03 / 0.01 ± 0.01 0.02 ± 0.01 H2,2 = 0.09 / 0.03 ± 0.02 0.09 ± 0.03

0.0 0.5 1.0 1.5 2.0 t 2 4 6 8

dx1(t) = [θ1x1(t) − θ2x1(t)x2(t)]dt +G11dw1(t) dx2(t) = [−θ3x2(t) + θ4x1(t)x2(t)]dt +G21dw1(t) + G22dw2(t)

13

slide-21
SLIDE 21

Experimental Results - Double Well Potential

DW, GT NPSDE VGPA ESGF AReS MaRS θ0 = 0.1 0.09 ± 7.00 0.05 ± 0.04 0.01 ± 0.03 0.09 ± 0.04 0.10 ± 0.05 θ1 = 4 3.36 ± 248.82 1.11 ± 0.66 0.11 ± 0.16 3.68 ± 1.34 3.85 ± 1.10 H = 0.25 0.00 ± 0.02 / 0.20 ± 0.05 0.21 ± 0.09

5 10 15 20 t −2 2

dx(t) = θ0x(θ1 − x2)dt + Gdw(t)

14

slide-22
SLIDE 22

Contributions

We extend classical gradient matching to SDEs We introduce a novel statistical framework combining the Doss-Sussmann transformation and GPs We introduce a novel parameter inference scheme that leverages adversarial and moment matching loss functions We improve parameter inference accuracy in systems of SDEs

15

slide-23
SLIDE 23

Thank you

Come and catch us → poster #216 Bonus Round: check out our paper on classic gradient matching! Wenk*, P ., Abbati*, G., Bauer, S., Osborne, M. A., Krause, A., Schölkopf, B. (2019). ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical

  • Systems. ArXiv Preprint ArXiv:1902.06278.

16