Recurrent Gaussian processes Andreas Damianou with C esar Lincoln - - PowerPoint PPT Presentation

recurrent gaussian processes andreas damianou
SMART_READER_LITE
LIVE PREVIEW

Recurrent Gaussian processes Andreas Damianou with C esar Lincoln - - PowerPoint PPT Presentation

Robust, Deep Recurrent Gaussian processes Andreas Damianou with C esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto Royal Society , 06 June 2016 Challenge: Learn patterns from sequences Recurrent Gaussian


slide-1
SLIDE 1

Robust, Deep

Recurrent Gaussian processes Andreas Damianou

with C´ esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto

Royal Society, 06 June 2016

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

Challenge: Learn patterns from sequences

Recurrent Gaussian Processes (RGP): a family of recurrent Bayesian nonparametric models (data efficient, uncertainty handling). Latent deep RGP: a deep RGP with latent states (simultaneous representation + dynamical learning). Recurrent Variational Bayes (REVARB) framework (efficient inference + coherent propagation of uncertainty) Extension: RNN-based sequential recognition models (Regularization + parameter reduction). Extension: Robustness to outliers. Comparison with LSTMs, parametric and non-latent models.

slide-6
SLIDE 6

NARX model

A standard NARX model considers an input vector xi ∈ RD comprised of Ly past observed outputs yi ∈ R and Lu past exogenous inputs ui ∈ R: xi = [yi−1, · · · , yi−Ly, ui−1, · · · , ui−Lu]⊤, yi = f(xi) + ǫ(y)

i ,

ǫ(y)

i

∼ N(ǫ(y)

i |0, σ2 y),

State-space model: xi = f(xi−1, · · · , xi−Lxui−1, · · · , ui−Lu) + ǫ(x)

i

, (transition) yi = xi + ǫ(y)

i

(emission) Non-linear emission: yi = g(xi) + ǫ(y)

i

slide-7
SLIDE 7

NARX model

A standard NARX model considers an input vector xi ∈ RD comprised of Ly past observed outputs yi ∈ R and Lu past exogenous inputs ui ∈ R: xi = [yi−1, · · · , yi−Ly, ui−1, · · · , ui−Lu]⊤, yi = f(xi) + ǫ(y)

i ,

ǫ(y)

i

∼ N(ǫ(y)

i |0, σ2 y),

State-space model: xi = f(xi−1, · · · , xi−Lxui−1, · · · , ui−Lu) + ǫ(x)

i

, (transition) yi = xi + ǫ(y)

i

(emission) Non-linear emission: yi = g(xi) + ǫ(y)

i

slide-8
SLIDE 8

NARX vs State-space

◮ Latent inputs allow for simultaneous representation learning

and dynamical learning.

◮ Latent inputs means that noisy predictions are not fed back to

the model.

slide-9
SLIDE 9

(Deep) RGP

Start from a deep GP:

y x(H) · · · x(2) x(1) u

slide-10
SLIDE 10

(Deep) RGP

Latent states formed from lagged window of length L:

y ¯ x(H) · · · ¯ x(2) ¯ x(1) u

For one layer: ¯ xi= [xi, · · · , xi−L+1]⊤, xj ∈ R

slide-11
SLIDE 11

(Deep) RGP

Add recursion in the latent states:

y ¯ x(H) · · · ¯ x(2) ¯ x(1) u

For one layer: ¯ xi= [xi, · · · , xi−L+1]⊤, xj ∈ R So that: xi= f(¯ xi−1, ¯ ui−1) + ǫx

i

yi= g(¯ xi) + ǫy

i

slide-12
SLIDE 12

REVARB: REcurrent VARiational Bayes

Extend joint probability with inducing points: p(joint) = p

  • y, f(H+1), z(H+1),
  • x(h), f(h), z(h)
  • H

h=1

  • Lower bound: log p(y) ≤
  • f,x,z Q

p(joint) Q=q(x(h),f (h),z(h)),∀h

Posterior marginal: q

  • x(h)

= N

i=1 N

  • x(h)

i

  • µ(h)

i

, λ(h)

i

  • Mean-field for q(x) allows analytical solution without having to

resort to sampling. Additional layers to compensate for the uncorrelated posterior.

slide-13
SLIDE 13

REVARB: REcurrent VARiational Bayes

Extend joint probability with inducing points: p(joint) = p

  • y, f(H+1), z(H+1),
  • x(h), f(h), z(h)
  • H

h=1

  • Lower bound: log p(y) ≤
  • f,x,z Q

p(joint) Q=q(x(h),f (h),z(h)),∀h

Posterior marginal: q

  • x(h)

= N

i=1 N

  • x(h)

i

  • µ(h)

i

, λ(h)

i

  • Mean-field for q(x) allows analytical solution without having to

resort to sampling. Additional layers to compensate for the uncorrelated posterior.

slide-14
SLIDE 14

REVARB: REcurrent VARiational Bayes

Extend joint probability with inducing points: p(joint) = p

  • y, f(H+1), z(H+1),
  • x(h), f(h), z(h)
  • H

h=1

  • Lower bound: log p(y) ≤
  • f,x,z Q

p(joint) Q=q(x(h),f (h),z(h)),∀h

Posterior marginal: q

  • x(h)

= N

i=1 N

  • x(h)

i

  • µ(h)

i

, λ(h)

i

  • Mean-field for q(x) allows analytical solution without having to

resort to sampling. Additional layers to compensate for the uncorrelated posterior.

slide-15
SLIDE 15

REVARB: REcurrent VARiational Bayes

Extend joint probability with inducing points: p(joint) = p

  • y, f(H+1), z(H+1),
  • x(h), f(h), z(h)
  • H

h=1

  • Lower bound: log p(y) ≤
  • f,x,z Q

p(joint) Q=q(x(h),f (h),z(h)),∀h

Posterior marginal: q

  • x(h)

= N

i=1 N

  • x(h)

i

  • µ(h)

i

, λ(h)

i

  • Mean-field for q(x) allows analytical solution without having to

resort to sampling. Additional layers to compensate for the uncorrelated posterior.

slide-16
SLIDE 16

RNN-based recognition model

Reduce variational parameters by reparameterizing the variational means µ(h)

i

using RNNs: µ(h)

i

= g(h) ˆ x(h)

i−1

  • ,

where g(x) = V⊤

LN φLN (WLN−1φLN−1(· · · W2φ1(U1x))),

Amortized inference also regularizes the optimization procedure.

slide-17
SLIDE 17

RNN-based recognition model

Reduce variational parameters by reparameterizing the variational means µ(h)

i

using RNNs: µ(h)

i

= g(h) ˆ x(h)

i−1

  • ,

where g(x) = V⊤

LN φLN (WLN−1φLN−1(· · · W2φ1(U1x))),

Amortized inference also regularizes the optimization procedure.

slide-18
SLIDE 18

Robustness to outliers

Recall the RGP variant with parametric emission: xi = f(xi−1, · · · , xi−Lxui−1, · · · , ui−Lu) + ǫ(x)

i

, yi = xi + ǫ(y)

i ,

ǫ(x)

i

∼ N(ǫ(x)

i

|0, σ2

x),

ǫ(y)

i ∼ N(ǫ(y) i |0, τ −1 i

), τi ∼ Γ(τi|α, β),

◮ “Switching-off” outliers by including the above Student-t

likelihood.

◮ Modified REVARB allows for analytic solution.

slide-19
SLIDE 19

Robustness to outliers

Recall the RGP variant with parametric emission: xi = f(xi−1, · · · , xi−Lxui−1, · · · , ui−Lu) + ǫ(x)

i

, yi = xi + ǫ(y)

i ,

ǫ(x)

i

∼ N(ǫ(x)

i

|0, σ2

x),

ǫ(y)

i ∼ N(ǫ(y) i |0, τ −1 i

), τi ∼ Γ(τi|α, β),

◮ “Switching-off” outliers by including the above Student-t

likelihood.

◮ Modified REVARB allows for analytic solution.

slide-20
SLIDE 20

Robust GP autoregressive model: demonstration

(a) Artificial 1. (b) Artificial 2.

Figure: RMSE values for free simulation on test data with different levels

  • f contamination by outliers.
slide-21
SLIDE 21

(c) Artificial 3. (d) Artificial 4. (e) Artificial 5.

slide-22
SLIDE 22

Results

Results in nonlinear systems identification:

  • 1. artificial dataset
  • 2. “drive” dataset: by a system with two electric motors that

drive a pulley using a flexible belt.

◮ input: the sum of voltages applied to the motors ◮ output: speed of the belt.

slide-23
SLIDE 23

RGP GPNARX MLP-NARX LSTM

slide-24
SLIDE 24

RGP GPNARX MLP-NARX LSTM

slide-25
SLIDE 25

Avatar control

Figure: The generated motion with a step function signal, starting with walking (blue), switching to running (red) and switching back to walking (blue).

Videos:

https://youtu.be/FR-oeGxV6yY

Switching between learned speeds

https://youtu.be/AT0HMtoPgjc Interpolating (un)seen speed https://youtu.be/FuF-uZ83VMw Constant unseen speed