Derivative-Free Methods for Machine Learning Tasks Inverse Problem - - PowerPoint PPT Presentation

derivative free methods for machine learning tasks
SMART_READER_LITE
LIVE PREVIEW

Derivative-Free Methods for Machine Learning Tasks Inverse Problem - - PowerPoint PPT Presentation

EnKF Kovachki & Stuart Derivative-Free Methods for Machine Learning Tasks Inverse Problem Formulations The Ensemble Kalman Filter Ensemble Kalman Filter Numerics Nikola B. Kovachki 1 Andrew M. Stuart 1 1 Computational and Mathematical


slide-1
SLIDE 1

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Derivative-Free Methods for Machine Learning Tasks

The Ensemble Kalman Filter Nikola B. Kovachki1 Andrew M. Stuart1

1Computational and Mathematical Sciences

California Institute of Technology

Inverse Problems and Machine Learning February 9-11th, 2018

slide-2
SLIDE 2

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Table of Contents

1 Inverse Problem Formulations 2 Ensemble Kalman Filter 3 Numerics

slide-3
SLIDE 3

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Table of Contents

1 Inverse Problem Formulations 2 Ensemble Kalman Filter 3 Numerics

slide-4
SLIDE 4

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Supervised Learning

  • Data: {(xj, yj)}N

j=1 with xj ∈ X, yj ∈ Y and X, Y Hilbert spaces.

  • Find: G(u|·) : X → Y for parameter u ∈ U consistent with the data.
  • Concatenate:

y = G(u|x) + η where G(·|x) : U → YN and η is model or data error.

  • Losses: Φ(u; x, y)

1 2y − G(u|x)2

YN + R(u)

  • r

N

  • j=1

yj, log G(u|, xj)Y + R(u)

  • Standard Solution (SGD):

˙ u = −∇uΦ(u; x, y); u(0) = u0 u∗ = u(T)

slide-5
SLIDE 5

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Example

  • Classification:
  • NLP:
slide-6
SLIDE 6

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Online Supervised Learning

  • Data: As before possibly with N = ∞.
  • Dynamic: For j = 0, 1, 2, . . .

uj+1 = uj yj+1 = G(uj+1|xj+1) + ηj+1

  • Find: uj given Yj = {yk}j

k=1 and update sequentially.

  • Loss: Φ(u; x, y)

1 2y − G(u|x)2

Y + R(u)

  • r

− y, log G(u|x)Y + R(u)

  • Standard Solution (OGD):

˙ u = −∇uΦ(u; xj+1, yj+1); u(0) = uj uj+1 = u(Tj)

slide-7
SLIDE 7

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Example

  • Model Improvement:
  • Stream Data:
slide-8
SLIDE 8

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Semi-Supervised Learning (on a graph)

Bertozzi and Flenner 2012. (MMS) Bertozzi, Luo, Stuart, Zygalakis 2017. (preprint)

  • Data: {xj}j∈Z and {yj}j∈Z ′ with Z ′ ⊂ Z and |Z ′| ≪ |Z|.
  • Find: u : Z → Rm such that

yj = S(u(j)) + ηj ∀j ∈ Z ′ S : Rm → Y is pre-specified.

  • Loss:

Φ(u; x, y) = 1 2γ2

  • j∈Z ′

yj − S(u(j))2

Y + R(u; x)

  • Standard Solution: Probit (convex optimization) or MCMC (Bayesian).
slide-9
SLIDE 9

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Example

  • Clustering:
slide-10
SLIDE 10

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Table of Contents

1 Inverse Problem Formulations 2 Ensemble Kalman Filter 3 Numerics

slide-11
SLIDE 11

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Continous-time EnKF

Kantas, Beskos, Jasra, (2014) (JUQ) Iglesias, Law and Stuart, 2013. (IP)

  • Inverse Problem:

y = G(u) + η η ∼ N(0, Γ) u ∼ µ0(u)

  • Sequential Monte Carlo (SMC):

µn(du) ∝ exp(−nhΦ(u; y))µ0(du)

  • Approximate SMC (EnKF):

u(j)

n+1 = u(j) n + C uw(un)(C ww(un) + Γ)−1(y − G(u(j) n ))

  • Continous-time: Γ → 1

hΓ, h → 0

˙ u(j) = −1 J

J

  • k=1

G(u(k)) − ¯ G, G(u(j)) − yΓ u(k)

slide-12
SLIDE 12

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Approximate Natural Gradient Decent

Amari, 1998. (NC)

  • Linear: G(·) = A·

˙ u(j) = −C(u)∇uΦ(u(j), y) where C(u) = 1 J

J

  • j=1

(u(j) − ¯ u) ⊗ (u(j) − ¯ u); Φ(u, y) = 1 2y − Au2

Γ

  • Natural Gradient Decent:

˙ u = −F −1(u)∇uΦ(u, y)

  • Cram´

er-Rao: Cov[ˆ u] F −1(u)

slide-13
SLIDE 13

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Long-time Linear Behavior

Schillings and Stuart 2017. (SINUM)

Theorem

Suppse G(·) = A· and that y is the image of a truth u† under A. Define r(j)(t) = u(j)(t) − u† then (under some assumptions) Ar(j)(t) = Ar(j)

(t) + Ar(j) ⊥ (t)

with Ar(j)

  • ∈ span{A(u(j)(0) − ¯

u(0))} and Ar(j)

⊥ ∈ span{A(u(j)(0) − ¯

u(0))}⊥. Furthermore Ar(j)

(t) → 0

as t → ∞ Ar(j)

⊥ (t) = Ar(1) ⊥ (0)

∀t ≥ 0.

slide-14
SLIDE 14

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Arbitrary Loss

  • Non-linear:

˙ u(j) = −C uw(u)Γ−1(G(u(j)) − y) = −C uw(u)∇zΨ(G(u(j)), y) = −1 J

J

  • k=1

G(u(k)) − ¯ G, ∇zΨ(G(u(j)), y)u(k) C uw(u) = 1 J

J

  • j=1

(u(j) − ¯ u) ⊗ (G(u(j)) − ¯ G); Ψ(z, y) = 1 2y − z2

Γ

  • Concatenate: u = [u(1), . . . , u(J)]

˙ u = −D(u)u where D(jk)(u) = 1 J G(u(k)) − ¯ G, ∇zΨ(G(u(j)), y)

slide-15
SLIDE 15

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Nesterov Momentum

Su, Boyd, Cand´ es 2014. (NIPS)

  • Momentum:

     un+1 = vn − h∇f (vn) vn+1 = un+1 +

n n+3(un+1 − un)

v0 = u0 ⇐ ⇒      ¨ u + 3

t ˙

u + ∇f (u) = 0 ˙ u(0) = 0 u(0) = u0

  • Modified Limit:

     ¨ u(j) + 3

t ˙

u(j) = −C uw(u)∇zΨ(G(u(j)), y) ˙ u(j)(0) = 0 u(j)(0) = u(j)

slide-16
SLIDE 16

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Discrete Scheme

  • Concatenate: u = [u(1), . . . , u(J)]

¨ u + 3 t ˙ u = −D(u)u where D(jk)(u) = 1 J G(u(k)) − ¯ G, ∇zΨ(G(u(j)), y)

  • Discretize:

     un+1 = vn − hnD(vn)vn vn+1 = un+1 +

n n+3 (un+1 − un)

v0 = u0 = [u(1)

0 , . . . , u(J) 0 ]T

slide-17
SLIDE 17

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Initialization, Noise, and Predictions

  • Initial Ensemble:

u(1)

0 , . . . , u(J)

∼ µ0(u)

  • Noise (Supervised):

˜ v(j)

n+1 = v(j) n

+ ξ(j)

n+1

ξ(j)

n+1 ∼ µn+1(u)

Cov[µn+1] ∝

  • hnCov[µ0]
  • Ensemble Refresh (Online):

u(j)

n+1 = ¯

un + ξ(j)

n+1

ξ(j)

n+1 ∼ µ0(u)

  • Predict:

¯ un+1 = 1 J

J

  • j=1

u(j)

n+1

slide-18
SLIDE 18

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Complete Algorithm

  • Mini-batch data (at each step):

xn = {xi(n)

l }m

l=1

yn = {yi(n)

l }m

l=1

where {i(n)

1 , . . . , i(n) m } ⊆ {1, . . . , N}.

  • Compute: use discrete scheme as shown with

x → xn y → yn

  • Step-size: adaptive

hn = h ǫ + Dn

slide-19
SLIDE 19

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Table of Contents

1 Inverse Problem Formulations 2 Ensemble Kalman Filter 3 Numerics

slide-20
SLIDE 20

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Convolutional Models

Ba, Kiros and Hinton 2016. (NIPS)

Net 1 ∼ 14k Net 2 ∼ 30k Conv12x3x3 MaxPool2x2 Conv12x3x3 Conv12x3x3 MaxPool2x2 Conv24x3x3 MaxPool2x2 Conv24x3x3 Conv24x3x3 MaxPool2x2 Conv32x3x3 MaxPool2x2 Conv32x3x3 Conv32x3x3 FC-100 FC-100 FC-10 FC-10

  • ReLU applied after each block.
  • Layer Normalization applied after each convolutional layer.
slide-21
SLIDE 21

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

MNIST Dataset

LeCun and Cortes 1999.

slide-22
SLIDE 22

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

MNIST Supervised

Figure: Test Accuracy of Net 1 on MNIST (batched).

J Loss Momentum Noise Refresh 5000 Cross Entropy

  • χ
slide-23
SLIDE 23

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

MNIST Online

Figure: Test Accuracy of Net 1 on MNIST (online).

J Loss Momentum Noise Refresh 5000 Cross Entropy χ χ

slide-24
SLIDE 24

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Fashion MNIST Dataset

Xiao, Rasul and Vollgraf 2017.

slide-25
SLIDE 25

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Fashion MNIST Supervised

Figure: Test Accuracy of Net 2 on Fashion MNIST (batched).

J Loss Momentum Noise Refresh 5000 Cross Entropy

  • χ
slide-26
SLIDE 26

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

RNN

slide-27
SLIDE 27

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Time Series Online

Figure: Time Series prediction with a RNN.

J Loss Momentum Noise Refresh 1000 MSE χ χ χ

slide-28
SLIDE 28

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Voting Records Dataset

U.S. House of Representatives 1984, 16 key votes. For each congress representative we have an associated feature vector xj ∈ R16 such as xj = (1, −1, 0, · · · , 1)T; 1 is “yes”, -1 is “no” and 0 abstain/no-show. Here |Z| = 435 and |Z ′| = 5. Figure: Strong Prior Information: Fiedler Vector and Spectrum (Normalized)

slide-29
SLIDE 29

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Voting Records Semi-Supervised

Figure: Accuracy on Voting Records.

J Loss Momentum Noise Refresh 2000 MSE χ

  • χ
slide-30
SLIDE 30

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

Summary

  • Machine learning as inverse/filtering problem.
  • EnKF as a minimization scheme.
  • Modifications to original method.
  • Numerics show promise as alternative for:
  • SGD
  • OGD
  • BPTT
  • MC
slide-31
SLIDE 31

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

References I

Bertozzi L. A., Flenner A. Diffuse Interface Models on Graphs for Classification of High Dimensional Data. Multiscale Model. Simul. 10(3). pp. 1090–1118. 2012. Bertozzi L. A., Luo X., Stuart M. A., Zygalakis C. K. Uncertainty Qunatification in the Classification of High Dimensional Data. Preprint. arXiv:1703.08816. 2017. Kantas N., Luo X., Beskos A., Jasra A. Sequential Monte Carlo methods for high-dimensional inverse problems. SIAM/ASA Journal on Uncertainty Quantification, 2. pp. 464-489. 2014. Iglesias M., Law K., Stuart M. A. Ensemble Kalman methods for inverse

  • problems. Inverse Problems, 29. p. 045001. 2013.

Schillings C., Stuart M. A. Analysis of the ensemble Kalman filter for inverse

  • problems. SIAM J Numerical Analysis, 55(3). pp. 1264-1290. 2017.
slide-32
SLIDE 32

EnKF Kovachki & Stuart Inverse Problem Formulations Ensemble Kalman Filter Numerics

References II

Amari S. Natural Gradient Works Efficiently in Learning. Neural Computation,

  • 10. pp. 251–276. 1998.

Su W., Boyd S., Cand´ es J. E. A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights. Advances in Neural Information Processing Systems, 28. 2014. Ba J., Kiros J., Hinton J. Layer Normalization. Advances in Neural Information Processing Systems, 30. 2016. LeCun Y., Cortes C. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. 1998. Xiao H., Rasul K., Vollgraf R. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:1708.07747. 2017.