GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON
FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY - - PowerPoint PPT Presentation
FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY - - PowerPoint PPT Presentation
FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON FUNCTIONAL KERNEL LEARNING Gaussian Process (GP): stochastic process HIGH LEVEL IDEA for which any finite collection
HIGH LEVEL IDEA
FUNCTIONAL KERNEL LEARNING
y(x) ∼ GP(μ(x), k(x, x′))
- Gaussian Process (GP): stochastic process
for which any finite collection of points is jointly normal
- a kernel function describing
covariance
k(x, x′)
HIGH LEVEL IDEA
FUNCTIONAL KERNEL LEARNING
y(x) ∼ GP(μ(x), k(x, x′))
FUNCTIONAL KERNEL LEARNING
OUTLINE
- Introduction
- Mathematical Foundation
- Model Specification
- Inference Procedure
FUNCTIONAL KERNEL LEARNING
OUTLINE
- Introduction
- Experimental Results
- Recovery of known kernels
- Interpolation and extrapolation of real data
FUNCTIONAL KERNEL LEARNING
OUTLINE
- Introduction
- Experimental Results
- Extension to multi-task time-series
- Precipitation data
BOCHNER’S THEOREM
FUNCTIONAL KERNEL LEARNING
k(τ) = ∫ℝ e2πiωτS(ω)dω
- If then we can represent via its spectral density:
- Learning the spectral representation of is sufficient to learn the
entire kernel
k(x, x′) = k(τ) k(τ) k(τ)
BOCHNER’S THEOREM
FUNCTIONAL KERNEL LEARNING
k(τ) = ∫ℝ e2πiωτS(ω)dω
- If then we can represent via its spectral density:
- Learning the spectral representation of is sufficient to learn the
entire kernel
- Assuming is symmetric and data are finitely sampled, the
reconstruction simplifies to:
k(x, x′) = k(τ) k(τ) k(τ)
k(τ) = ∫[0,π/Δ) cos(2πτω)S(ω)dω
k(τ)
N I ωi xn fn gi si θ γ yn
FUNCTIONAL KERNEL LEARNING
FUNCTIONAL KERNEL LEARNING
p(ϕ) = p(θ, γ) g(ω)|θ ∼ GP (μ(ω; θ), kg(ω, ω′; θ))
S(ω) = exp{g(ω)}
f(x)|S(ω), γ ∼ GP(γ0, k(τ; S(ω)))
Hyper-prior Latent GP Spectral Density Data GP
Graphical Model
FUNCTIONAL KERNEL LEARNING
FUNCTIONAL KERNEL LEARNING
p(ϕ) = p(θ, γ) g(ω)|θ ∼ GP (μ(ω; θ), kg(ω, ω′; θ))
S(ω) = exp{g(ω)}
f(x)|S(ω), γ ∼ GP(γ0, k(τ; S(ω)) + γ1δτ=0)
Hyper-prior Latent GP Spectral Density Data GP
LATENT MODEL
- Mean of latent GP is log of RBF spectral density
- Covariance is Matérn with
FUNCTIONAL KERNEL LEARNING
μ(ω; θ) = θ0 − ω2 2 ˜ θ1
2
ν = 1.5
kg(ω, ω′; θ) = 21−ν Γ(ν) ( 2ν |ω − ω′| ˜ θ2 ) Kν ( 2ν |ω − ω′| ˜ θ2 ) + ˜ θ3δτ=0
˜ θi = softmax(θi)
INFERENCE
- Need to update the hyper parameters and the latent GP
- Initialize to the log-periodogram of the data
- Alternate:
- Fix and use Adam to update
- Fix and use elliptical slice sampling to draw samples of
FUNCTIONAL KERNEL LEARNING
ϕ
g(ω) g(ω) g(ω)
ϕ ϕ
g(ω)
FUNCTIONAL KERNEL LEARNING
OUTLINE
- Introduction
- Experimental Results
- Recovery of known kernels
- Interpolation and extrapolation of real data
DATA FROM A SPECTRAL MIXTURE KERNEL
FUNCTIONAL KERNEL LEARNING
- Generative kernel has mixture of Gaussians as spectral density
DATA FROM A SPECTRAL MIXTURE KERNEL
FUNCTIONAL KERNEL LEARNING
- Generative kernel has mixture of Gaussians as spectral density
AIRLINE PASSENGER DATA
FUNCTIONAL KERNEL LEARNING
FUNCTIONAL KERNEL LEARNING
OUTLINE
- Introduction
- Experimental Results
- Extension to multi-task time-series
- Precipitation data
FUNCTIONAL KERNEL LEARNING
- Can ‘link’ multiple time series by sharing the latent GP across outputs
- Let denote the realization of the latent GP and be the GP
- ver the time-series
p(ϕ) = p(θ, γ) g(ω)|θ ∼ GP (μ(ω; θ), kg(ω, ω′; θ))
St(ω) = exp{gt(ω)}
ft(x)|S(ω), γ ∼ GP(γ0, k(τ; St(ω)) + γ1δτ=0)
Hyper-prior Latent GP Spectral Density GP for task
gt(ω)
tth
ft(x)
tth
tth tth
MULTIPLE TIME SERIES
FUNCTIONAL KERNEL LEARNING
- Can ‘link’ multiple time series by sharing the latent GP across outputs
- Let denote the realization of the latent GP and be the GP
- ver the time-series
p(ϕ) = p(θ, γ) g(ω)|θ ∼ GP (μ(ω; θ), kg(ω, ω′; θ))
St(ω) = exp{gt(ω)}
ft(x)|S(ω), γ ∼ GP(γ0, k(τ; St(ω)) + γ1δτ=0)
Hyper-prior Latent GP Spectral Density GP for task
gt(ω)
tth
ft(x)
tth
tth tth
MULTIPLE TIME SERIES
- Test this on data from USHCN, daily precipitation values from
continental US
- Inductive bias: yearly precipitation for climatologically similar regions
should have similar covariance, similar spectral densities
PRECIPITATION DATA
FUNCTIONAL KERNEL LEARNING
Ran on two climatologically similar locations
PRECIPITATION DATA
FUNCTIONAL KERNEL LEARNING
Used 108 locations across the Northeast USA Each station, n = 300 Total: 300 * 108 = 32,400 data points
40 42 44 46 −80 −75 −70
Longitude Latitude
Here’s 48 of them… Locations Used
FUNCTIONAL KERNEL LEARNING
CONCLUSION
- FKL: Nonparametric, function-space view
- f kernel learning
- Can express any stationary kernel with
uncertainty representation
- GPyTorch Code: https://github.com/
wjmaddox/spectralgp
Link to Code
FUNCTIONAL KERNEL LEARNING
CONCLUSION
- FKL: Nonparametric, function-space view
- f kernel learning
- Can express any stationary kernel with
uncertainty representation
- GPyTorch Code: https://github.com/
wjmaddox/spectralgp
Link to Code
QUESTIONS?
- Poster 52
REFERENCES
Spectral Mixture Kernels: Wilson, Andrew, and Ryan Adams. "Gaussian process kernels for pattern discovery and extrapolation." International Conference on Machine Learning. 2013. BNSE: Tobar, Felipe. "Bayesian Nonparametric Spectral Estimation." Advances in Neural Information Processing Systems. 2018. Elliptical Slice Sampling: Murray, Iain, Ryan Adams, and David MacKay. "Elliptical slice sampling." Proceedings of the Thirteenth International Conference on Artificial Intelligence and
- Statistics. 2010.
GPyTorch: Gardner, Jacob, et al. "Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration." Advances in Neural Information Processing Systems. 2018. FUNCTIONAL KERNEL LEARNING
SINC DATA
FUNCTIONAL KERNEL LEARNING
sinc(x) = sin(πx)/(πx)
QUASI-PERIODIC DATA
FUNCTIONAL KERNEL LEARNING
- Generative kernel is product of RBF and periodic kernels
- Generative kernel is product of RBF and periodic kernels
QUASI-PERIODIC DATA
FUNCTIONAL KERNEL LEARNING
FUNCTIONAL KERNEL LEARNING
ELLIPTICAL SLICE SAMPLING (MURRAY, ADAMS, MACKAY, 2010)
Sample zero mean Gaussians Re-parameterize for non-zero mean