Scalable Training of Inference Networks for Gaussian-Process Models - - PowerPoint PPT Presentation

scalable training of inference networks for gaussian
SMART_READER_LITE
LIVE PREVIEW

Scalable Training of Inference Networks for Gaussian-Process Models - - PowerPoint PPT Presentation

Scalable Training of Inference Networks for Gaussian-Process Models Jiaxin Shi Tsinghua University Joint work with Mohammad Emtiyaz Khan Jun Zhu Gaussian Process mean function covariance function / kernel inducing points Posterior


slide-1
SLIDE 1

Scalable Training of Inference Networks for Gaussian-Process Models

Jiaxin Shi

Tsinghua University

Jun Zhu Mohammad Emtiyaz Khan Joint work with

slide-2
SLIDE 2

Gaussian Process

mean function covariance function / kernel Sparse variational GP

[Titsias, 09; Hensman et al., 13] Gaussian field inducing points

Posterior inference complexity, conjugate likelihoods

slide-3
SLIDE 3

Inference Networks for GP Models

Remove sparse assumption Gaussian field Inputs Observations Data Prediction Inference network

slide-4
SLIDE 4

Inference Networks for GP Models

Remove sparse assumption Inference network Data Prediction Gaussian field Inputs Observations

slide-5
SLIDE 5

Examples of Inference Networks

weight space function space

  • Bayesian neural networks:

○ intractable output density sin cos sin freq (s) weights (w)

[Sun et al., 19]

  • Inference network architecture can be derived from

the weight-space posterior ○ Random feature expansions ○ Deep neural nets

[Cutajar, et al., 18]

slide-6
SLIDE 6
  • Consider matching variational and true posterior processes at arbitrary
  • This objective is doing improper minibatch for the KL divergence term

Minibatch Training is Difficult

Functional Variational Bayesian Neural Networks (Sun et al., 19)

  • Full batch fELBO
  • Practical fELBO

Measurement points

slide-7
SLIDE 7

Scalable Training of Inference Networks for GP Models

Stochastic, functional mirror descent

  • work with the functional density directly

○ natural gradient in the density space ○ minibatch approximation with stochastic functional gradient

  • closed-form solution as an adaptive Bayesian filter

[Dai et al., 16; Cheng & Boots, 16] seeing next data point adapted prior

  • sequentially applying Bayes’ rule is the most natural gradient

○ in conjugate models: equivalent to natural gradient for exponential families

[Raskutti & Mukherjee, 13; Khan & Lin, 17]

slide-8
SLIDE 8

Scalable Training of Inference Networks for GP Models

Minibatch training of inference networks

student teacher

  • an idea from filtering: bootstrap

○ similar idea: temporal difference (TD) learning with function approximation

slide-9
SLIDE 9

Scalable Training of Inference Networks for GP Models

Minibatch training of inference networks

  • (Gaussian likelihood case) closed-form marginals of at locations

○ equivalent to GP regression

  • (Nonconjugate case) optimize an upper bound of
slide-10
SLIDE 10

Scalable Training of Inference Networks for GP Models

Measurement points vs. inducing points GPNet M=2 M=5 M=20 SVGP

  • inducing points - expressiveness of variational approximation
  • measurement points - variance of training
slide-11
SLIDE 11

Scalable Training of Inference Networks for GP Models

Effect of proper minibatch training FBNN, M=20 GPNet, M=20 Airline Delay (700K)

  • Fix underfitting

N=100, batch size=20

  • Better performance with more measurement points
slide-12
SLIDE 12

Scalable Training of Inference Networks for GP Models

Regression & Classification

Regression benchmarks GP classification with a prior derived from infinite-width Bayesian ConvNets

slide-13
SLIDE 13

Poster #227

Code: https://github.com/thjashin/gp-infer-net