SLIDE 1
Recurrent Gaussian processes Andreas Damianou with C esar Lincoln - - PowerPoint PPT Presentation
Recurrent Gaussian processes Andreas Damianou with C esar Lincoln - - PowerPoint PPT Presentation
Robust, Deep Recurrent Gaussian processes Andreas Damianou with C esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto Royal Society , 06 June 2016 Challenge: Learn patterns from sequences Recurrent Gaussian
SLIDE 2
SLIDE 3
SLIDE 4
SLIDE 5
Challenge: Learn patterns from sequences
Recurrent Gaussian Processes (RGP): a family of recurrent Bayesian nonparametric models (data efficient, uncertainty handling). Latent deep RGP: a deep RGP with latent states (simultaneous representation + dynamical learning). Recurrent Variational Bayes (REVARB) framework (efficient inference + coherent propagation of uncertainty) Extension: RNN-based sequential recognition models (Regularization + parameter reduction). Extension: Robustness to outliers. Comparison with LSTMs, parametric and non-latent models.
SLIDE 6
NARX model
A standard NARX model considers an input vector xi ∈ RD comprised of Ly past observed outputs yi ∈ R and Lu past exogenous inputs ui ∈ R: xi = [yi−1, · · · , yi−Ly, ui−1, · · · , ui−Lu]⊤, yi = f(xi) + ǫ(y)
i ,
ǫ(y)
i
∼ N(ǫ(y)
i |0, σ2 y),
State-space model: xi = f(xi−1, · · · , xi−Lxui−1, · · · , ui−Lu) + ǫ(x)
i
, (transition) yi = xi + ǫ(y)
i
(emission) Non-linear emission: yi = g(xi) + ǫ(y)
i
SLIDE 7
NARX model
A standard NARX model considers an input vector xi ∈ RD comprised of Ly past observed outputs yi ∈ R and Lu past exogenous inputs ui ∈ R: xi = [yi−1, · · · , yi−Ly, ui−1, · · · , ui−Lu]⊤, yi = f(xi) + ǫ(y)
i ,
ǫ(y)
i
∼ N(ǫ(y)
i |0, σ2 y),
State-space model: xi = f(xi−1, · · · , xi−Lxui−1, · · · , ui−Lu) + ǫ(x)
i
, (transition) yi = xi + ǫ(y)
i
(emission) Non-linear emission: yi = g(xi) + ǫ(y)
i
SLIDE 8
NARX vs State-space
◮ Latent inputs allow for simultaneous representation learning
and dynamical learning.
◮ Latent inputs means that noisy predictions are not fed back to
the model.
SLIDE 9
(Deep) RGP
Start from a deep GP:
y x(H) · · · x(2) x(1) u
SLIDE 10
(Deep) RGP
Latent states formed from lagged window of length L:
y ¯ x(H) · · · ¯ x(2) ¯ x(1) u
For one layer: ¯ xi= [xi, · · · , xi−L+1]⊤, xj ∈ R
SLIDE 11
(Deep) RGP
Add recursion in the latent states:
y ¯ x(H) · · · ¯ x(2) ¯ x(1) u
For one layer: ¯ xi= [xi, · · · , xi−L+1]⊤, xj ∈ R So that: xi= f(¯ xi−1, ¯ ui−1) + ǫx
i
yi= g(¯ xi) + ǫy
i
SLIDE 12
REVARB: REcurrent VARiational Bayes
Extend joint probability with inducing points: p(joint) = p
- y, f(H+1), z(H+1),
- x(h), f(h), z(h)
- H
h=1
- Lower bound: log p(y) ≤
- f,x,z Q
p(joint) Q=q(x(h),f (h),z(h)),∀h
Posterior marginal: q
- x(h)
= N
i=1 N
- x(h)
i
- µ(h)
i
, λ(h)
i
- Mean-field for q(x) allows analytical solution without having to
resort to sampling. Additional layers to compensate for the uncorrelated posterior.
SLIDE 13
REVARB: REcurrent VARiational Bayes
Extend joint probability with inducing points: p(joint) = p
- y, f(H+1), z(H+1),
- x(h), f(h), z(h)
- H
h=1
- Lower bound: log p(y) ≤
- f,x,z Q
p(joint) Q=q(x(h),f (h),z(h)),∀h
Posterior marginal: q
- x(h)
= N
i=1 N
- x(h)
i
- µ(h)
i
, λ(h)
i
- Mean-field for q(x) allows analytical solution without having to
resort to sampling. Additional layers to compensate for the uncorrelated posterior.
SLIDE 14
REVARB: REcurrent VARiational Bayes
Extend joint probability with inducing points: p(joint) = p
- y, f(H+1), z(H+1),
- x(h), f(h), z(h)
- H
h=1
- Lower bound: log p(y) ≤
- f,x,z Q
p(joint) Q=q(x(h),f (h),z(h)),∀h
Posterior marginal: q
- x(h)
= N
i=1 N
- x(h)
i
- µ(h)
i
, λ(h)
i
- Mean-field for q(x) allows analytical solution without having to
resort to sampling. Additional layers to compensate for the uncorrelated posterior.
SLIDE 15
REVARB: REcurrent VARiational Bayes
Extend joint probability with inducing points: p(joint) = p
- y, f(H+1), z(H+1),
- x(h), f(h), z(h)
- H
h=1
- Lower bound: log p(y) ≤
- f,x,z Q
p(joint) Q=q(x(h),f (h),z(h)),∀h
Posterior marginal: q
- x(h)
= N
i=1 N
- x(h)
i
- µ(h)
i
, λ(h)
i
- Mean-field for q(x) allows analytical solution without having to
resort to sampling. Additional layers to compensate for the uncorrelated posterior.
SLIDE 16
RNN-based recognition model
Reduce variational parameters by reparameterizing the variational means µ(h)
i
using RNNs: µ(h)
i
= g(h) ˆ x(h)
i−1
- ,
where g(x) = V⊤
LN φLN (WLN−1φLN−1(· · · W2φ1(U1x))),
Amortized inference also regularizes the optimization procedure.
SLIDE 17
RNN-based recognition model
Reduce variational parameters by reparameterizing the variational means µ(h)
i
using RNNs: µ(h)
i
= g(h) ˆ x(h)
i−1
- ,
where g(x) = V⊤
LN φLN (WLN−1φLN−1(· · · W2φ1(U1x))),
Amortized inference also regularizes the optimization procedure.
SLIDE 18
Robustness to outliers
Recall the RGP variant with parametric emission: xi = f(xi−1, · · · , xi−Lxui−1, · · · , ui−Lu) + ǫ(x)
i
, yi = xi + ǫ(y)
i ,
ǫ(x)
i
∼ N(ǫ(x)
i
|0, σ2
x),
ǫ(y)
i ∼ N(ǫ(y) i |0, τ −1 i
), τi ∼ Γ(τi|α, β),
◮ “Switching-off” outliers by including the above Student-t
likelihood.
◮ Modified REVARB allows for analytic solution.
SLIDE 19
Robustness to outliers
Recall the RGP variant with parametric emission: xi = f(xi−1, · · · , xi−Lxui−1, · · · , ui−Lu) + ǫ(x)
i
, yi = xi + ǫ(y)
i ,
ǫ(x)
i
∼ N(ǫ(x)
i
|0, σ2
x),
ǫ(y)
i ∼ N(ǫ(y) i |0, τ −1 i
), τi ∼ Γ(τi|α, β),
◮ “Switching-off” outliers by including the above Student-t
likelihood.
◮ Modified REVARB allows for analytic solution.
SLIDE 20
Robust GP autoregressive model: demonstration
(a) Artificial 1. (b) Artificial 2.
Figure: RMSE values for free simulation on test data with different levels
- f contamination by outliers.
SLIDE 21
(c) Artificial 3. (d) Artificial 4. (e) Artificial 5.
SLIDE 22
Results
Results in nonlinear systems identification:
- 1. artificial dataset
- 2. “drive” dataset: by a system with two electric motors that
drive a pulley using a flexible belt.
◮ input: the sum of voltages applied to the motors ◮ output: speed of the belt.
SLIDE 23
RGP GPNARX MLP-NARX LSTM
SLIDE 24
RGP GPNARX MLP-NARX LSTM
SLIDE 25
Avatar control
Figure: The generated motion with a step function signal, starting with walking (blue), switching to running (red) and switching back to walking (blue).
Videos:
https://youtu.be/FR-oeGxV6yY
Switching between learned speeds
https://youtu.be/AT0HMtoPgjc Interpolating (un)seen speed https://youtu.be/FuF-uZ83VMw Constant unseen speed