FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY - - PowerPoint PPT Presentation

function space distributions over kernels
SMART_READER_LITE
LIVE PREVIEW

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY - - PowerPoint PPT Presentation

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON FUNCTIONAL KERNEL LEARNING Gaussian Process (GP): stochastic process HIGH LEVEL IDEA for which any finite collection


slide-1
SLIDE 1

GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS

slide-2
SLIDE 2

HIGH LEVEL IDEA

FUNCTIONAL KERNEL LEARNING

y(x) ∼ GP(μ(x), k(x, x′))

  • Gaussian Process (GP): stochastic process

for which any finite collection of points is jointly normal

  • a kernel function describing

covariance

k(x, x′)

slide-3
SLIDE 3

HIGH LEVEL IDEA

FUNCTIONAL KERNEL LEARNING

y(x) ∼ GP(μ(x), k(x, x′))

slide-4
SLIDE 4

FUNCTIONAL KERNEL LEARNING

OUTLINE

  • Introduction
  • Mathematical Foundation
  • Model Specification
  • Inference Procedure
slide-5
SLIDE 5

FUNCTIONAL KERNEL LEARNING

OUTLINE

  • Introduction
  • Experimental Results
  • Recovery of known kernels
  • Interpolation and extrapolation of real data
slide-6
SLIDE 6

FUNCTIONAL KERNEL LEARNING

OUTLINE

  • Introduction
  • Experimental Results
  • Extension to multi-task time-series
  • Precipitation data
slide-7
SLIDE 7

BOCHNER’S THEOREM

FUNCTIONAL KERNEL LEARNING

k(τ) = ∫ℝ e2πiωτS(ω)dω

  • If then we can represent via its spectral density:
  • Learning the spectral representation of is sufficient to learn the

entire kernel

k(x, x′) = k(τ) k(τ) k(τ)

slide-8
SLIDE 8

BOCHNER’S THEOREM

FUNCTIONAL KERNEL LEARNING

k(τ) = ∫ℝ e2πiωτS(ω)dω

  • If then we can represent via its spectral density:
  • Learning the spectral representation of is sufficient to learn the

entire kernel

  • Assuming is symmetric and data are finitely sampled, the

reconstruction simplifies to:

k(x, x′) = k(τ) k(τ) k(τ)

k(τ) = ∫[0,π/Δ) cos(2πτω)S(ω)dω

k(τ)

slide-9
SLIDE 9

N I ωi xn fn gi si θ γ yn

FUNCTIONAL KERNEL LEARNING

FUNCTIONAL KERNEL LEARNING

p(ϕ) = p(θ, γ) g(ω)|θ ∼ GP (μ(ω; θ), kg(ω, ω′; θ))

S(ω) = exp{g(ω)}

f(x)|S(ω), γ ∼ GP(γ0, k(τ; S(ω)))

Hyper-prior Latent GP Spectral Density Data GP

Graphical Model

slide-10
SLIDE 10

FUNCTIONAL KERNEL LEARNING

FUNCTIONAL KERNEL LEARNING

p(ϕ) = p(θ, γ) g(ω)|θ ∼ GP (μ(ω; θ), kg(ω, ω′; θ))

S(ω) = exp{g(ω)}

f(x)|S(ω), γ ∼ GP(γ0, k(τ; S(ω)) + γ1δτ=0)

Hyper-prior Latent GP Spectral Density Data GP

slide-11
SLIDE 11

LATENT MODEL

  • Mean of latent GP is log of RBF spectral density
  • Covariance is Matérn with

FUNCTIONAL KERNEL LEARNING

μ(ω; θ) = θ0 − ω2 2 ˜ θ1

2

ν = 1.5

kg(ω, ω′; θ) = 21−ν Γ(ν) ( 2ν |ω − ω′| ˜ θ2 ) Kν ( 2ν |ω − ω′| ˜ θ2 ) + ˜ θ3δτ=0

˜ θi = softmax(θi)

slide-12
SLIDE 12

INFERENCE

  • Need to update the hyper parameters and the latent GP
  • Initialize to the log-periodogram of the data
  • Alternate:
  • Fix and use Adam to update
  • Fix and use elliptical slice sampling to draw samples of

FUNCTIONAL KERNEL LEARNING

ϕ

g(ω) g(ω) g(ω)

ϕ ϕ

g(ω)

slide-13
SLIDE 13

FUNCTIONAL KERNEL LEARNING

OUTLINE

  • Introduction
  • Experimental Results
  • Recovery of known kernels
  • Interpolation and extrapolation of real data
slide-14
SLIDE 14

DATA FROM A SPECTRAL MIXTURE KERNEL

FUNCTIONAL KERNEL LEARNING

  • Generative kernel has mixture of Gaussians as spectral density
slide-15
SLIDE 15

DATA FROM A SPECTRAL MIXTURE KERNEL

FUNCTIONAL KERNEL LEARNING

  • Generative kernel has mixture of Gaussians as spectral density
slide-16
SLIDE 16

AIRLINE PASSENGER DATA

FUNCTIONAL KERNEL LEARNING

slide-17
SLIDE 17

FUNCTIONAL KERNEL LEARNING

OUTLINE

  • Introduction
  • Experimental Results
  • Extension to multi-task time-series
  • Precipitation data
slide-18
SLIDE 18

FUNCTIONAL KERNEL LEARNING

  • Can ‘link’ multiple time series by sharing the latent GP across outputs
  • Let denote the realization of the latent GP and be the GP
  • ver the time-series

p(ϕ) = p(θ, γ) g(ω)|θ ∼ GP (μ(ω; θ), kg(ω, ω′; θ))

St(ω) = exp{gt(ω)}

ft(x)|S(ω), γ ∼ GP(γ0, k(τ; St(ω)) + γ1δτ=0)

Hyper-prior Latent GP Spectral Density GP for task

gt(ω)

tth

ft(x)

tth

tth tth

MULTIPLE TIME SERIES

slide-19
SLIDE 19

FUNCTIONAL KERNEL LEARNING

  • Can ‘link’ multiple time series by sharing the latent GP across outputs
  • Let denote the realization of the latent GP and be the GP
  • ver the time-series

p(ϕ) = p(θ, γ) g(ω)|θ ∼ GP (μ(ω; θ), kg(ω, ω′; θ))

St(ω) = exp{gt(ω)}

ft(x)|S(ω), γ ∼ GP(γ0, k(τ; St(ω)) + γ1δτ=0)

Hyper-prior Latent GP Spectral Density GP for task

gt(ω)

tth

ft(x)

tth

tth tth

MULTIPLE TIME SERIES

  • Test this on data from USHCN, daily precipitation values from

continental US

  • Inductive bias: yearly precipitation for climatologically similar regions

should have similar covariance, similar spectral densities

slide-20
SLIDE 20

PRECIPITATION DATA

FUNCTIONAL KERNEL LEARNING

Ran on two climatologically similar locations

slide-21
SLIDE 21

PRECIPITATION DATA

FUNCTIONAL KERNEL LEARNING

Used 108 locations across the Northeast USA Each station, n = 300 Total: 300 * 108 = 32,400 data points

40 42 44 46 −80 −75 −70

Longitude Latitude

Here’s 48 of them… Locations Used

slide-22
SLIDE 22

FUNCTIONAL KERNEL LEARNING

CONCLUSION

  • FKL: Nonparametric, function-space view
  • f kernel learning
  • Can express any stationary kernel with

uncertainty representation

  • GPyTorch Code: https://github.com/

wjmaddox/spectralgp

Link to Code

slide-23
SLIDE 23

FUNCTIONAL KERNEL LEARNING

CONCLUSION

  • FKL: Nonparametric, function-space view
  • f kernel learning
  • Can express any stationary kernel with

uncertainty representation

  • GPyTorch Code: https://github.com/

wjmaddox/spectralgp

Link to Code

QUESTIONS?

  • Poster 52
slide-24
SLIDE 24

REFERENCES

Spectral Mixture Kernels: Wilson, Andrew, and Ryan Adams. "Gaussian process kernels for pattern discovery and extrapolation." International Conference on Machine Learning. 2013. BNSE: Tobar, Felipe. "Bayesian Nonparametric Spectral Estimation." Advances in Neural Information Processing Systems. 2018. Elliptical Slice Sampling: Murray, Iain, Ryan Adams, and David MacKay. "Elliptical slice sampling." Proceedings of the Thirteenth International Conference on Artificial Intelligence and

  • Statistics. 2010.

GPyTorch: Gardner, Jacob, et al. "Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration." Advances in Neural Information Processing Systems. 2018. FUNCTIONAL KERNEL LEARNING

slide-25
SLIDE 25

SINC DATA

FUNCTIONAL KERNEL LEARNING

sinc(x) = sin(πx)/(πx)

slide-26
SLIDE 26

QUASI-PERIODIC DATA

FUNCTIONAL KERNEL LEARNING

  • Generative kernel is product of RBF and periodic kernels
slide-27
SLIDE 27
  • Generative kernel is product of RBF and periodic kernels

QUASI-PERIODIC DATA

FUNCTIONAL KERNEL LEARNING

slide-28
SLIDE 28

FUNCTIONAL KERNEL LEARNING

ELLIPTICAL SLICE SAMPLING (MURRAY, ADAMS, MACKAY, 2010)

Sample zero mean Gaussians Re-parameterize for non-zero mean