Reparameterization Gradient for Non-differentiable Models Wonyeol - - PowerPoint PPT Presentation

reparameterization gradient
SMART_READER_LITE
LIVE PREVIEW

Reparameterization Gradient for Non-differentiable Models Wonyeol - - PowerPoint PPT Presentation

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok Yang KAIST Published at NeurIPS 2018 Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok Yang


slide-1
SLIDE 1

Reparameterization Gradient for Non-differentiable Models

Wonyeol Lee Hangyeol Yu Hongseok Yang KAIST

Published at NeurIPS 2018

slide-2
SLIDE 2

Reparameterization Gradient for Non-differentiable Models

Wonyeol Lee Hangyeol Yu Hongseok Yang KAIST

Published at NeurIPS 2018

slide-3
SLIDE 3

Reparameterization Gradient for Non-differentiable Models

Wonyeol Lee Hangyeol Yu Hongseok Yang KAIST

Published at NeurIPS 2018

slide-4
SLIDE 4

Reparameterization Gradient for Non-differentiable Models

Wonyeol Lee Hangyeol Yu Hongseok Yang KAIST

Published at NeurIPS 2018

slide-5
SLIDE 5

Backgrounds

slide-6
SLIDE 6

Posterior inference

  • Latent variable z n.
  • Observed variable x m.
  • Joint density p(x,z).
  • Want to infer posterior p(z|x0) given a

particular value x0 of x.

slide-7
SLIDE 7

Variational inference

  • 1. Fix a family of variational distr. {qθ(z)}θ.
  • 2. Find qθ(z) that approximates p(z|x0) well.
  • Typically, by solving

argmaxθ(ELBOθ) where ELBOθ = qθ(z)[ log( p(x0,z)/qθ(z) ) ].

slide-8
SLIDE 8

Variational inference

  • 1. Fix a family of variational distr. {qθ(z)}θ.
  • 2. Find qθ(z) that approximates p(z|x0) well.
  • Typically, by solving

argmaxθ(ELBOθ) where ELBOθ = qθ(z)[ log( p(x0,z)/qθ(z) ) ].

differentiable & easy-to-sample

slide-9
SLIDE 9

Variational inference

  • 1. Fix a family of variational distr. {qθ(z)}θ.
  • 2. Find qθ(z) that approximates p(z|x0) well.
  • Typically, by solving

argmaxθ(ELBOθ) where ELBOθ = qθ(z)[ log( p(x0,z)/qθ(z) ) ].

Typically, by solving argmaxθ(ELBOθ) where ELBOθ = qθ(z)[ log( p(x0,z)/qθ(z) ) ].

differentiable & easy-to-sample

slide-10
SLIDE 10

Variational inference

  • 1. Fix a family of variational distr. {qθ(z)}θ.
  • 2. Find qθ(z) that approximates p(z|x0) well.
  • Typically, by solving

argmaxθ(ELBOθ) where ELBOθ = qθ(z)[ log( p(x0,z)/qθ(z) ) ].

Typically, by solving argmaxθ(ELBOθ) where ELBOθ = qθ(z)[ log( p(x0,z)/qθ(z) ) ].

.. z .. z ..

differentiable & easy-to-sample

slide-11
SLIDE 11

Gradient ascent

θn+1 = θn + η × θELBOθ=θn

  • Difficult to compute θELBOθ.
  • Use an estimated gradient instead.
slide-12
SLIDE 12

Gradient ascent

θn+1 = θn + η × θELBOθ=θn

  • Difficult to compute θELBOθ.
  • Use an estimated gradient instead.
slide-13
SLIDE 13

Gradient ascent

θn+1 = θn + η × θELBOθ=θn

  • Difficult to compute θELBOθ.
  • Use an estimated gradient instead.
slide-14
SLIDE 14

Reparameterization estimator

  • Works if p(x0,z) is differentiable wrt. z.
  • Need distr. q(ε) & smooth function fθ(ε) s.t.

fθ(ε) for ε ~ q(ε) has the distr. qθ(z).

  • Derived from the equation:

θELBOθ = q(ε)[ θ(.. fθ(ε) .. fθ(ε) ..) ]

slide-15
SLIDE 15

Reparameterization estimator

  • Works if p(x0,z) is differentiable wrt. z.
  • Need distr. q(ε) & smooth function fθ(ε) s.t.

fθ(ε) for ε ~ q(ε) has the distr. qθ(z).

  • Derived from the equation:

θELBOθ = q(ε)[ θ(.. fθ(ε) .. fθ(ε) ..) ]

slide-16
SLIDE 16

Reparameterization estimator

  • Works if p(x0,z) is differentiable wrt. z.
  • Need distr. q(ε) & smooth function fθ(ε) s.t.

fθ(ε) for ε ~ q(ε) has the distr. qθ(z).

  • Derived from the equation:

θELBOθ = q(ε)[ θ(.. fθ(ε) .. fθ(ε) ..) ] θELBOθ = θqθ(z)[.. z .. z ..] = θq(ε)[.. fθ(ε) .. fθ(ε) ..] = q(ε)[θ(.. fθ(ε) .. fθ(ε) ..)]

θ θ θ qθ(z) θ q(ε) θ θ q(ε) θ θ θ

slide-17
SLIDE 17

Reparameterization estimator

  • Works if p(x0,z) is differentiable wrt. z.
  • Need distr. q(ε) & smooth function fθ(ε) s.t.

fθ(ε) for ε ~ q(ε) has the distr. qθ(z).

  • Derived from the equation:

θELBOθ = q(ε)[ θ(.. fθ(ε) .. fθ(ε) ..) ] θELBOθ = θqθ(z)[.. z .. z ..] = θq(ε)[.. fθ(ε) .. fθ(ε) ..] = q(ε)[θ(.. fθ(ε) .. fθ(ε) ..)]

θ θ θ qθ(z) θ q(ε) θ θ q(ε) θ θ θ

slide-18
SLIDE 18

Reparameterization estimator

θELBOθ = θqθ(z)[.. z .. z ..] = θq(ε)[.. fθ(ε) .. fθ(ε) ..] = q(ε)[θ(.. fθ(ε) .. fθ(ε) ..)]

  • Works if p(x0,z) is differentiable wrt. z.
  • Need distr. q(ε) & smooth function fθ(ε) s.t.

fθ(ε) for ε ~ q(ε) has the distr. qθ(z).

  • Derived from the equation:

θELBOθ = q(ε)[ θ(.. fθ(ε) .. fθ(ε) ..) ]

slide-19
SLIDE 19

Reparameterization estimator

  • Works if p(x0,z) is differentiable wrt. z.
  • Need distr. q(ε) & smooth function fθ(ε) s.t.

fθ(ε) for ε ~ q(ε) has the distr. qθ(z).

  • Derived from the equation:

θELBOθ = q(ε)[ θ(.. fθ(ε) .. fθ(ε) ..) ] θELBOθ = θqθ(z)[.. z .. z ..] = θq(ε)[.. fθ(ε) .. fθ(ε) ..] = q(ε)[θ(.. fθ(ε) .. fθ(ε) ..)]

θ θ θ qθ(z) θ q(ε) θ θ q(ε) θ θ θ

slide-20
SLIDE 20

Non-differentiable models from probabilistic programming

slide-21
SLIDE 21

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-22
SLIDE 22

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-23
SLIDE 23

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-24
SLIDE 24

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-25
SLIDE 25

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-26
SLIDE 26

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-27
SLIDE 27

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

slide-28
SLIDE 28

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

slide-29
SLIDE 29

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z) z p(z,x=0)

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-30
SLIDE 30

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z) (let [ε (sample (normal 0 1)) z (+ ε θ)] z)

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

slide-31
SLIDE 31

(let [ε (sample (normal 0 1)) z (+ ε θ)] z)

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-32
SLIDE 32

(let [ε (sample (normal 0 1)) z (+ ε θ)] z)

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-33
SLIDE 33

(let [ε (sample (normal 0 1)) z (+ ε θ)] z)

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z)

slide-34
SLIDE 34

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

q(ε) = (ε|0,1) z = ε+θ

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z) (let [ε (sample (normal 0 1)) z (+ ε θ)] z)

slide-35
SLIDE 35

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z) (let [ε (sample (normal 0 1)) z (+ ε θ)] z)

How to find a good θ?

slide-36
SLIDE 36

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

How to find a good θ? By gradient ascent on ELBOθ. θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z) (let [ε (sample (normal 0 1)) z (+ ε θ)] z)

slide-37
SLIDE 37

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

(let [z (sample (normal 0 1))] (if (> z 0) (observe (normal 3 1) 0) (observe (normal -2 1) 0)) z) (let [ε (sample (normal 0 1)) z (+ ε θ)] z)

How to find a good θ? By gradient ascent on ELBOθ.

slide-38
SLIDE 38

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[θ]

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

How to find a good θ? By gradient ascent on ELBOθ.

slide-39
SLIDE 39

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[θ]

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

How to find a good θ? By gradient ascent on ELBOθ.

slide-40
SLIDE 40

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[θ]

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

How to find a good θ? By gradient ascent on ELBOθ.

slide-41
SLIDE 41

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[θ]

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

How to find a good θ? By gradient ascent on ELBOθ.

slide-42
SLIDE 42

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[θ]

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

q(ε) = (ε|0,1) z = ε+θ

How to find a good θ? By gradient ascent on ELBOθ.

slide-43
SLIDE 43

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[ -θ-ε ] = -θ

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

How to find a good θ? By gradient ascent on ELBOθ.

slide-44
SLIDE 44

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ q(ε) = (ε|0,1) z = ε+θ

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[ -θ-ε ] = -θ

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

How to find a good θ? By gradient ascent on ELBOθ.

slide-45
SLIDE 45

How to find a good θ? By gradient ascent on ELBOθ.

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ q(ε) = (ε|0,1) z = ε+θ

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[ -θ-ε ] = -θ

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

z

slide-46
SLIDE 46

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] = q(ε)[ -θ-ε ] = -θ

θ θ θ q(ε) 1 2 q(ε) θ 1 θ 2 q(ε)

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

How to find a good θ? By gradient ascent on ELBOθ.

slide-47
SLIDE 47

r1(z) = (z|0,1)(x=0|3,1) r2(z) = (z|0,1)(x=0|-2,1) p(z,x=0) = [z>0]r1(z) + [z≤0]r2(z)

θn+1 ← θn + η × θ ELBOθ=θn

q(ε) = (ε|0,1) z = ε+θ

θELBOθ = θq(ε)[[ε>-θ]log(r1(ε+θ)) + [ε≤-θ]log(r2(ε+θ))] = q(ε)[[ε>-θ]θlog(r1(ε+θ)) + [ε≤-θ]θlog(r2(ε+θ))] + Correction Term How to find a good θ? By gradient ascent on ELBOθ.

slide-48
SLIDE 48

Why doesn’t it work?

  • Careful when exchanging gradient and

integration.

  • May fail unexpectedly.
  • May hold unexpectedly, but with correction.
slide-49
SLIDE 49

Why doesn’t it work?

  • Careful when exchanging gradient and

integration.

  • May fail unexpectedly.
  • May hold unexpectedly, but with correction.
slide-50
SLIDE 50

Why doesn’t it work?

  • Careful when exchanging gradient and

integration.

  • May fail unexpectedly.
  • May hold unexpectedly, but with correction.

+ CorrectionTerm

slide-51
SLIDE 51

Our results formally

slide-52
SLIDE 52

Non-differentiable models

slide-53
SLIDE 53

Non-differentiable models

slide-54
SLIDE 54

Non-differentiable models

  • forms a partition of .

is differentiable.

slide-55
SLIDE 55

Non-differentiable models

  • forms a partition of .

is differentiable.

has Lebesgue measure zero.

slide-56
SLIDE 56

Wishful thinking

slide-57
SLIDE 57

Wishful thinking

slide-58
SLIDE 58

Wishful thinking

slide-59
SLIDE 59

Wishful thinking

slide-60
SLIDE 60

Wishful thinking

slide-61
SLIDE 61

Wishful thinking

slide-62
SLIDE 62

Wishful thinking

slide-63
SLIDE 63

Wishful thinking

slide-64
SLIDE 64

Correction

  • surface integral over
slide-65
SLIDE 65

Correction

  • surface integral over
slide-66
SLIDE 66

Correction

  • surface integral over
slide-67
SLIDE 67

Correction

  • surface integral over
slide-68
SLIDE 68

Correction

  • surface integral over
slide-69
SLIDE 69

Correction

  • surface integral over
slide-70
SLIDE 70

Correction

  • surface integral over
slide-71
SLIDE 71

Correction

  • surface integral over
slide-72
SLIDE 72

Correction

  • surface integral over
slide-73
SLIDE 73

Correction

  • surface integral over
slide-74
SLIDE 74

Correction

  • surface integral over
  • Accounts for the impact of moving the boundaries.

Can be estimated by manifold sampling.

slide-75
SLIDE 75

Correction

  • surface integral over
  • Accounts for the impact of moving the boundaries.

Can be estimated by manifold sampling.

slide-76
SLIDE 76

Two ingredients

  • Differentiation under moving domain:
slide-77
SLIDE 77

Two ingredients

  • Differentiation under moving domain:
  • Divergence theorem:
slide-78
SLIDE 78

Two ingredients

slide-79
SLIDE 79

Surface integral over

  • is a normal vector of
  • .

Correction term

slide-80
SLIDE 80

Surface integral over

  • Correction term

Requires manifold sampling Hard to estimate in general cases

slide-81
SLIDE 81

Surface integral over

  • Easy to estimate if
  • is a hyperplane.

Correction term

slide-82
SLIDE 82

Surface integral over

  • Easy to estimate if
  • is a hyperplane.
  • Assume the branch condition of each if-

statement is linear in .

Correction term

slide-83
SLIDE 83

Subsampling

  • surface integral over
  • For computational efficiency,

we subsample surface integrals.

slide-84
SLIDE 84

Subsampling

  • surface integral over
  • For computational efficiency,

we subsample surface integrals.

slide-85
SLIDE 85

Experiments

slide-86
SLIDE 86

Implementation

  • Implemented a black-box variational

inference engine for a simple probabilistic programming language

  • Supports sample, observe, if, ...
  • Written in Python, using autograd package.
slide-87
SLIDE 87

Benchmarks

textmsg

  • Models #’s of per-day SNS msg’s, where SNS-

usage pattern changes on some day.

  • Non-differentiable part: the day of change in

SNS-usage pattern.

  • Given #’s of per-day SNS msg’s over 2 months,

infer the day when the pattern changes.

slide-88
SLIDE 88

Benchmarks

temperature

  • Models random dynamics of a controller that

tries to keep room temp. stable.

  • Non-differentiable part: on/off of air conditioner,
  • n which evolution of room temp. depends.
  • Given noisy observations of temp. at each step,

infer on/off status of the controller at each step.

slide-89
SLIDE 89

ELBO

{dotted, dashed, solid} lines: {N = 1, N = 8, N = 16}

slide-90
SLIDE 90

ELBO

{dotted, dashed, solid} lines: {N = 1, N = 8, N = 16}

slide-91
SLIDE 91

Computation time

slide-92
SLIDE 92

High-level message

  • Careful when exchanging gradient and integration.
slide-93
SLIDE 93

High-level message

  • Careful when exchanging gradient and integration.
  • May fail unexpectedly.
slide-94
SLIDE 94

High-level message

  • Careful when exchanging gradient and integration.
  • May fail unexpectedly.
  • May hold unexpectedly, but with correction.
slide-95
SLIDE 95

Any questions?