Deep Gaussian Processes with Importance-Weighted Variational - - PowerPoint PPT Presentation

deep gaussian processes with importance weighted
SMART_READER_LITE
LIVE PREVIEW

Deep Gaussian Processes with Importance-Weighted Variational - - PowerPoint PPT Presentation

Deep Gaussian Processes with Importance-Weighted Variational Inference Hugh Salimbeni Vincent Dutordoir, James Hensman, Marc P Deisenroth Problem setting Problem setting Bimodal density Problem setting Changes with input Problem setting


slide-1
SLIDE 1

Deep Gaussian Processes with Importance-Weighted Variational Inference

Hugh Salimbeni Vincent Dutordoir, James Hensman, Marc P Deisenroth

slide-2
SLIDE 2

Problem setting

slide-3
SLIDE 3

Problem setting

Bimodal density

slide-4
SLIDE 4

Problem setting

Changes with input

slide-5
SLIDE 5

Problem setting

Skewness

slide-6
SLIDE 6

Problem setting

Skewness

  • Bus arrival times
slide-7
SLIDE 7

Problem setting

Skewness

  • Bus arrival times
  • Confounding variables
slide-8
SLIDE 8

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-9
SLIDE 9

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

training data test samples

slide-10
SLIDE 10

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

Neural network training data test samples

slide-11
SLIDE 11

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

Neural network Latent variable (per point) training data test samples

slide-12
SLIDE 12

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

Neural network Latent variable (per point) Concatenation with inputs training data test samples

slide-13
SLIDE 13

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-14
SLIDE 14

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-15
SLIDE 15

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-16
SLIDE 16

A possible approach

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-17
SLIDE 17

A possible approach

Unreliable extrapolation

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-18
SLIDE 18

A possible approach

Unreliable extrapolation Overfitting

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-19
SLIDE 19

A possible approach

Deterministic function Unreliable extrapolation Overfitting

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-20
SLIDE 20

A possible approach

Deterministic function Unreliable extrapolation Overfitting Small number of examples per input xn

xn yn fφ wn N yn = N(fφ([xn, wn]), σ2) wn ∼ N(0, 1)

slide-21
SLIDE 21

Another possible approach

xn yn f wn N yn = N(f([xn, wn]), σ2)

wn ∼ N(0, 1) f ∼ GP(µ, k)

slide-22
SLIDE 22

Another possible approach

xn yn f wn N yn = N(f([xn, wn]), σ2)

wn ∼ N(0, 1) f ∼ GP(µ, k)

Non-parametric prior

slide-23
SLIDE 23

Another possible approach

xn yn f wn N yn = N(f([xn, wn]), σ2)

wn ∼ N(0, 1) f ∼ GP(µ, k)

Non-parametric prior Better extrapolation

slide-24
SLIDE 24

Another possible approach

xn yn f wn N yn = N(f([xn, wn]), σ2)

wn ∼ N(0, 1) f ∼ GP(µ, k)

Non-parametric prior Better extrapolation Underfitting

slide-25
SLIDE 25

Our model

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

slide-26
SLIDE 26

Our model

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

slide-27
SLIDE 27

Our model

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

slide-28
SLIDE 28

Our model

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

Extrapolating gracefully

slide-29
SLIDE 29

Our model

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

Extrapolating gracefully Better data fit

slide-30
SLIDE 30

Contributions

slide-31
SLIDE 31

Contributions

  • New architecture - latent

variables by concatenation, not addition

slide-32
SLIDE 32

Contributions

  • New architecture - latent

variables by concatenation, not addition

  • Importance-weighted

variational inference, exploiting analytic results

slide-33
SLIDE 33

Contributions

  • New architecture - latent

variables by concatenation, not addition

  • Importance-weighted

variational inference, exploiting analytic results

  • Provide an extensive empirical

comparison with all 41 UCI regression datasets

slide-34
SLIDE 34

A few details

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

slide-35
SLIDE 35

A few details

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

Importance weighting (Gaussian proposal)

slide-36
SLIDE 36

A few details

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

Importance weighting (Gaussian proposal) Variational inference (sparse GP posterior)

slide-37
SLIDE 37

A few details

xn yn f g wn N

∞ ∞

yn = N(f(g([xn, wn])), σ2) wn ∼ N(0, 1) f ∼ GP(µ1, k1) g ∼ GP(µ2, k2)

Importance weighting (Gaussian proposal) Variational inference (sparse GP posterior) Our approach exploits analytic results, leading to a tighter bound

slide-38
SLIDE 38

Results

slide-39
SLIDE 39

Results

  • Latent variables in the DGP

are highly beneficial

slide-40
SLIDE 40

Results

  • Latent variables in the DGP

are highly beneficial

  • Sometimes depth is enough.

Sometimes latent variables are enough. Some datasets need both.

slide-41
SLIDE 41

Results

  • Latent variables in the DGP

are highly beneficial

  • Sometimes depth is enough.

Sometimes latent variables are enough. Some datasets need both.

  • Importance-weighted VI
  • utperforms VI
slide-42
SLIDE 42

Results

  • Latent variables in the DGP

are highly beneficial

  • Sometimes depth is enough.

Sometimes latent variables are enough. Some datasets need both.

  • Importance-weighted VI
  • utperforms VI
slide-43
SLIDE 43

Thanks for listening

  • New architecture
  • Importance-weighted
  • 41 datasets

Poster #218