FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY - PowerPoint PPT Presentation

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON

FUNCTIONAL KERNEL LEARNING ‣ Gaussian Process (GP): stochastic process HIGH LEVEL IDEA for which any finite collection of points is jointly normal ‣ a kernel function describing k ( x , x ′ � ) covariance y ( x ) ∼ GP ( μ ( x ), k ( x , x ′ � ))

FUNCTIONAL KERNEL LEARNING HIGH LEVEL IDEA y ( x ) ∼ GP ( μ ( x ), k ( x , x ′ � ))

FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Mathematical Foundation ‣ Model Specification ‣ Inference Procedure

FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Recovery of known kernels ‣ Interpolation and extrapolation of real data

FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Extension to multi-task time-series ‣ Precipitation data

FUNCTIONAL KERNEL LEARNING BOCHNER’S THEOREM ‣ If then we can represent via its spectral density: k ( x , x ′ � ) = k ( τ ) k ( τ ) k ( τ ) = ∫ ℝ e 2 π i ωτ S ( ω ) d ω ‣ Learning the spectral representation of is sufficient to learn the k ( τ ) entire kernel

FUNCTIONAL KERNEL LEARNING BOCHNER’S THEOREM ‣ If then we can represent via its spectral density: k ( x , x ′ � ) = k ( τ ) k ( τ ) k ( τ ) = ∫ ℝ e 2 π i ωτ S ( ω ) d ω ‣ Learning the spectral representation of is sufficient to learn the k ( τ ) entire kernel ‣ Assuming is symmetric and data are finitely sampled, the k ( τ ) reconstruction simplifies to: k ( τ ) = ∫ [0, π / Δ ) cos(2 πτω ) S ( ω ) d ω

FUNCTIONAL KERNEL LEARNING FUNCTIONAL KERNEL LEARNING Graphical Model ω i Hyper-prior p ( ϕ ) = p ( θ , γ ) g i θ g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP x n s i I Spectral Density S ( ω ) = exp{ g ( ω )} f n f ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S ( ω ))) γ Data GP y n N

FUNCTIONAL KERNEL LEARNING FUNCTIONAL KERNEL LEARNING Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP Spectral Density S ( ω ) = exp{ g ( ω )} f ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S ( ω )) + γ 1 δ τ =0 ) Data GP

FUNCTIONAL KERNEL LEARNING LATENT MODEL ‣ Mean of latent GP is log of RBF spectral density μ ( ω ; θ ) = θ 0 − ω 2 2 2 ˜ θ 1 ‣ Covariance is Matérn with ν = 1.5 ) K ν ( Γ ( ν ) ( ) + ˜ k g ( ω , ω ′ � ; θ ) = 2 1 − ν 2 ν | ω − ω ′ � | 2 ν | ω − ω ′ � | θ 3 δ τ =0 ˜ ˜ θ 2 θ 2 θ i = softmax ( θ i ) ˜

FUNCTIONAL KERNEL LEARNING INFERENCE ‣ Need to update the hyper parameters and the latent GP g ( ω ) ϕ ‣ Initialize to the log-periodogram of the data g ( ω ) ‣ Alternate: ‣ Fix and use Adam to update ϕ g ( ω ) ‣ Fix and use elliptical slice sampling to draw samples of ϕ g ( ω )

FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Recovery of known kernels ‣ Interpolation and extrapolation of real data

FUNCTIONAL KERNEL LEARNING DATA FROM A SPECTRAL MIXTURE KERNEL ‣ Generative kernel has mixture of Gaussians as spectral density

FUNCTIONAL KERNEL LEARNING AIRLINE PASSENGER DATA

FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Extension to multi-task time-series ‣ Precipitation data

FUNCTIONAL KERNEL LEARNING MULTIPLE TIME SERIES ‣ Can ‘link’ multiple time series by sharing the latent GP across outputs ‣ Let denote the realization of the latent GP and be the GP t th g t ( ω ) f t ( x ) over the time-series t th Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP S t ( ω ) = exp{ g t ( ω )} Spectral Density t th f t ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S t ( ω )) + γ 1 δ τ =0 ) GP for task t th

FUNCTIONAL KERNEL LEARNING MULTIPLE TIME SERIES ‣ Can ‘link’ multiple time series by sharing the latent GP across outputs ‣ Let denote the realization of the latent GP and be the GP t th g t ( ω ) f t ( x ) over the time-series t th Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP S t ( ω ) = exp{ g t ( ω )} Spectral Density t th f t ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S t ( ω )) + γ 1 δ τ =0 ) GP for task t th ‣ Test this on data from USHCN, daily precipitation values from continental US ‣ Inductive bias: yearly precipitation for climatologically similar regions should have similar covariance, similar spectral densities

FUNCTIONAL KERNEL LEARNING PRECIPITATION DATA Ran on two climatologically similar locations

FUNCTIONAL KERNEL LEARNING PRECIPITATION DATA Used 108 locations across the Northeast USA Each station, n = 300 Total: 300 * 108 = 32,400 data points 46 44 Latitude 42 40 − 80 − 75 − 70 Longitude Locations Used Here’s 48 of them…

FUNCTIONAL KERNEL LEARNING CONCLUSION ‣ FKL: Nonparametric, function-space view Link to Code of kernel learning ‣ Can express any stationary kernel with uncertainty representation ‣ GPyTorch Code: https://github.com/ wjmaddox/spectralgp

FUNCTIONAL KERNEL LEARNING CONCLUSION ‣ FKL: Nonparametric, function-space view Link to Code of kernel learning ‣ Can express any stationary kernel with uncertainty representation ‣ GPyTorch Code: https://github.com/ wjmaddox/spectralgp QUESTIONS? ‣ Poster 52

FUNCTIONAL KERNEL LEARNING REFERENCES Spectral Mixture Kernels: Wilson, Andrew, and Ryan Adams. "Gaussian process kernels for pattern discovery and extrapolation." International Conference on Machine Learning . 2013. BNSE: Tobar, Felipe. "Bayesian Nonparametric Spectral Estimation." Advances in Neural Information Processing Systems . 2018. Elliptical Slice Sampling: Murray, Iain, Ryan Adams, and David MacKay. "Elliptical slice sampling." Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics . 2010. GPyTorch: Gardner, Jacob, et al. "Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration." Advances in Neural Information Processing Systems. 2018.

FUNCTIONAL KERNEL LEARNING SINC DATA sinc ( x ) = sin( π x )/( π x )

FUNCTIONAL KERNEL LEARNING QUASI-PERIODIC DATA ‣ Generative kernel is product of RBF and periodic kernels

FUNCTIONAL KERNEL LEARNING ELLIPTICAL SLICE SAMPLING (MURRAY, ADAMS, MACKAY, 2010) Sample zero mean Gaussians Re-parameterize for non-zero mean

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY - PowerPoint PPT Presentation

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON FUNCTIONAL KERNEL LEARNING Gaussian Process (GP): stochastic process HIGH LEVEL IDEA for which any finite collection

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Lecture 5: Probability Distributions Random Variables Probability Distributions

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

On enumerating the kernels in a bipolar valued digraph Raymond Bisdorff University of Luxembourg

Kernel on Automata Cousins of String Kernels and Dynamic Systems Kernels? S.V.N. Vishy

Univariate Smoothing Overview Problem Definition & Interpolation Problem definition

Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks

PXUPA Investor Group Supporters PIGS is a committed group of volunteers formed in early 2012

4/6/2011 The Market Is Heating Up Are You Prepared? The Aerospace & Defense Forum March

Nonparametric estimation of extreme risks from heavy-tailed distributions St ephane GIRARD

Extrapolation of operator moments, with problems C. Brezinski, applications to linear algebra

Extracting neutrino oscillation parameters using a simultaneous fit of the e appearance and

Usefulness of a Carbon target in DUNE ND : first thoughts DUNE ND meeting 15 May 2019

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY - PowerPoint PPT Presentation

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON FUNCTIONAL KERNEL LEARNING Gaussian Process (GP): stochastic process HIGH LEVEL IDEA for which any finite collection

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Lecture 5: Probability Distributions Random Variables Probability Distributions

Outline Power Law Size Distributions Distributions Power Law Size Distributions Overview

On enumerating the kernels in a bipolar valued digraph Raymond Bisdorff University of Luxembourg

Kernel on Automata Cousins of String Kernels and Dynamic Systems Kernels? S.V.N. Vishy

Univariate Smoothing Overview Problem Definition &amp; Interpolation Problem definition

Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks

PXUPA Investor Group Supporters PIGS is a committed group of volunteers formed in early 2012

4/6/2011 The Market Is Heating Up Are You Prepared? The Aerospace &amp; Defense Forum March

Nonparametric estimation of extreme risks from heavy-tailed distributions St ephane GIRARD

Extrapolation of operator moments, with problems C. Brezinski, applications to linear algebra

Extracting neutrino oscillation parameters using a simultaneous fit of the e appearance and

Usefulness of a Carbon target in DUNE ND : first thoughts DUNE ND meeting 15 May 2019

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Univariate Smoothing Overview Problem Definition & Interpolation Problem definition

4/6/2011 The Market Is Heating Up Are You Prepared? The Aerospace & Defense Forum March