Outline GP hyperparameter inference Priors on GP hyperparameters - PowerPoint PPT Presentation

I NTEGRATION OVER HYPERPARAMETERS AND ESTIMATION OF PREDICTIVE PERFORMANCE Aki Vehtari Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland aki.vehtari@aalto.fi Priors and integration for GP hyperparameters Vehtari

Outline ◮ GP hyperparameter inference ◮ Priors on GP hyperparameters ◮ Benefits of integration vs. point estimate ◮ MCMC, CCD Priors and integration for GP hyperparameters Vehtari

Gaussian processes and hyperparameters ◮ Gaussian processes are priors on function space ◮ GPs are usually constructed with a parametric covariance function ◮ we need to think about priors on those parameters Priors and integration for GP hyperparameters Vehtari

Gaussian processes and hyperparameters ◮ Gaussian processes are priors on function space ◮ GPs are usually constructed with a parametric covariance function ◮ we need to think about priors on those parameters ◮ If we have “big data” and small number of hyperparameters ◮ priors and integration over the posterior is not so important ◮ even more so when sparse approximations, which limit the complexity of the models, are used Priors and integration for GP hyperparameters Vehtari

1D demo ◮ 1D demo originally by Michael Betancourt Priors and integration for GP hyperparameters Vehtari

1D demo Priors and integration for GP hyperparameters Vehtari

1D demo summary ◮ Likelihood for lengthscale beyond the data scale is flat and non-identifiable because the functions looks all the same ◮ add prior making large lengthscale less likely ◮ If no repeated measurements non-identifiability between signal magnitude and noise magnitude when lengthscale short ◮ add prior making short lengthscale less likely ◮ add prior on measurement noise ◮ make repeated measurements ◮ Nonidentifiability between lengthscale and magnitude Priors and integration for GP hyperparameters Vehtari

Non-Gaussian likelihoods ◮ Poisson ◮ variance is equal to mean, and thus can’t overfit Priors and integration for GP hyperparameters Vehtari

Non-Gaussian likelihoods ◮ Poisson ◮ variance is equal to mean, and thus can’t overfit ◮ except if data is not conditionally Poisson distributed Priors and integration for GP hyperparameters Vehtari

Non-Gaussian likelihoods ◮ Poisson ◮ variance is equal to mean, and thus can’t overfit ◮ except if data is not conditionally Poisson distributed ◮ Binary classification (logit/probit) ◮ unbounded likelihood if separable ◮ with short if enough lengthscale separable Priors and integration for GP hyperparameters Vehtari

Sparse approximations ◮ Sparse approximations limit the complexity ◮ FITC type models work only with large lengthscale Priors and integration for GP hyperparameters Vehtari

Higher dimensions ◮ Separate lengthscale for each dimension, aka ARD ◮ lengthscale is related to non-linearity Priors and integration for GP hyperparameters Vehtari

Toy example f 1 ( x 1 ) f 2 ( x 2 ) f 3 ( x 3 ) f 4 ( x 4 ) 2 f ( x ) = f 1 ( x 1 ) + · · · + f 8 ( x 8 ) , 1 0 − 1 � f , 0 . 3 2 � − 2 y ∼ N , − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 f 5 ( x 5 ) f 6 ( x 6 ) f 7 ( x 7 ) f 8 ( x 8 ) 2 � � f j = 1 for all j . Var 1 0 ⇒ All inputs equally relevant − 1 − 2 − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 1 True relevance 0 . 5 0 2 4 6 8 Input Priors and integration for GP hyperparameters Vehtari

Toy example f 1 ( x 1 ) f 2 ( x 2 ) f 3 ( x 3 ) f 4 ( x 4 ) 2 f ( x ) = f 1 ( x 1 ) + · · · + f 8 ( x 8 ) , 1 0 − 1 � f , 0 . 3 2 � − 2 y ∼ N , − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 f 5 ( x 5 ) f 6 ( x 6 ) f 7 ( x 7 ) f 8 ( x 8 ) 2 � � f j = 1 for all j . Var 1 0 ⇒ All inputs equally relevant − 1 − 2 − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 1 True relevance Optimized ARD-values, 0 . 5 ARD-value ARD ( j ) = 1 /ℓ j (averaged over 100 data realizations, n = 200) 0 2 4 6 8 Input Priors and integration for GP hyperparameters Vehtari

Bayesian optimization ◮ GPs have been used too much as black boxes ◮ Bonus: use shape constrained GPs (see, e.g., Siivola et al., 2017) Priors and integration for GP hyperparameters Vehtari

Periodic covariance function ◮ If you know the period fix it ◮ If you don’t know, there can be serious identifiability problems unless informative priors are used Priors and integration for GP hyperparameters Vehtari

Parametric model plus GP ◮ For example, linear model plus GP ◮ with long lengthscale GP is like a linear model which causes non-identifiability and problems in interpretation Priors and integration for GP hyperparameters Vehtari

Parametric model plus GP ◮ For example, linear model plus GP ◮ with long lengthscale GP is like a linear model which causes non-identifiability and problems in interpretation ◮ Same for other parametric model + GP ◮ need more informative priors Priors and integration for GP hyperparameters Vehtari

GP plus GP Relative Number of Births 110 Trends 100 90 Slow trend Fast non−periodic component 80 Mean 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 110 Day of week effect 100 1972 1976 90 1980 1984 80 1988 Mon Tue Wed Thu Fri Sat Sun 110 Seasonal effect 100 1972 1976 90 1980 1984 80 1988 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 110 Day of year effect Valentine’s day 100 April 1st Memorial day Halloween Leap day Labor day 90 Thanksgiving Priors and integration for GP hyperparameters Independence day Vehtari 80 New year Christmas Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

GP plus GP ◮ Identifiability problems as different components are explaining same features in the data ◮ priors which “encourage” specialization of the components Priors and integration for GP hyperparameters Vehtari

Summary on priors and benefits of integration ◮ Specific prior recommendations for length scale ◮ inverse gamma has a sharp left tail that puts negligible mass on small length-scales, but a generous right tail, allowing for large length-scales (but still reducing non-identifiability) ◮ generalized inverse Gaussian has an inverse gamma left tail (if p ≤ 0) and a Gaussian right tail (avoids identifiability issue when combined with linear model) Priors and integration for GP hyperparameters Vehtari

Summary on priors and benefits of integration ◮ Specific prior recommendations for length scale ◮ inverse gamma has a sharp left tail that puts negligible mass on small length-scales, but a generous right tail, allowing for large length-scales (but still reducing non-identifiability) ◮ generalized inverse Gaussian has an inverse gamma left tail (if p ≤ 0) and a Gaussian right tail (avoids identifiability issue when combined with linear model) ◮ Specific weakly informative prior recommendations for signal and noise magnitude ◮ half-normals are often enough if length-scale has informative prior ◮ if information about measurement accuracy is available, informative prior such as gamma or scaled inverse Chi 2 for variance Priors and integration for GP hyperparameters Vehtari

GPs in Stan ◮ Stan manual 2.16.0 (and later) chapter 16 http://mc-stan.org/users/documentation/index.html ◮ code and documentation by Rob Trangucci ◮ prior recommendations by Rob Trangucci, Michael Betancourt, Aki Vehtari ◮ Code examples https://github.com/rtrangucci/gps in stan ◮ by Rob Trangucci Priors and integration for GP hyperparameters Vehtari

Hamiltonian Monte Carlo + NUTS ◮ Uses gradient information for more efficient sampling ◮ Alternating dynamic simulation and sampling of the energy level ◮ Parameters ◮ step size, number of steps in each chain Priors and integration for GP hyperparameters Vehtari

Hamiltonian Monte Carlo + NUTS ◮ Uses gradient information for more efficient sampling ◮ Alternating dynamic simulation and sampling of the energy level ◮ Parameters ◮ step size, number of steps in each chain ◮ No U-Turn Sampling ◮ adaptively selects number of steps to improve robustness and efficiency Priors and integration for GP hyperparameters Vehtari

Hamiltonian Monte Carlo + NUTS ◮ Uses gradient information for more efficient sampling ◮ Alternating dynamic simulation and sampling of the energy level ◮ Parameters ◮ step size, number of steps in each chain ◮ No U-Turn Sampling ◮ adaptively selects number of steps to improve robustness and efficiency ◮ Adaptation in Stan ◮ Step size adjustment (mass matrix) is estimated during initial adaptation phase Priors and integration for GP hyperparameters Vehtari

Hamiltonian Monte Carlo + NUTS ◮ Uses gradient information for more efficient sampling ◮ Alternating dynamic simulation and sampling of the energy level ◮ Parameters ◮ step size, number of steps in each chain ◮ No U-Turn Sampling ◮ adaptively selects number of steps to improve robustness and efficiency ◮ Adaptation in Stan ◮ Step size adjustment (mass matrix) is estimated during initial adaptation phase ◮ Demo ◮ https://chi-feng.github.io/mcmc-demo/app.html# RandomWalkMH,donut ◮ note that HMC/NUTS in this demo is not exactly same as in Stan Priors and integration for GP hyperparameters Vehtari

CCD ◮ Deterministic placement of integration points Priors and integration for GP hyperparameters Vehtari

Outline GP hyperparameter inference Priors on GP hyperparameters - PowerPoint PPT Presentation

I NTEGRATION OVER HYPERPARAMETERS AND ESTIMATION OF PREDICTIVE PERFORMANCE Aki Vehtari Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland aki.vehtari@aalto.fi Priors and integration

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

EBLL Response in Public Housing Units Segment 1: The Basics EBLL Response in Public Housing

Twitter De-Identification Jonathon Storrick Jon.Storrick@gmail.com Center for Computational

Modeling, Identification, & Fault Diagnostics of Batteries Scott Moura Assistant Professor |

Identifiability and Consistency of Bayesian Network Structure Learning from Incomplete Data

Testing Identifiable Kernel P Systems using an X-machine Approach Marian Gheorghe 1 , Florentin

Probabilistic Graphical Models David Sontag New York University Lecture 13, May 2, 2013 David

Lecture 6: GLMs Author: Nicholas Reich Transcribed by Nutcha Wattanachit/Edited by Bianca Doone

TOWARDS A SOLUTION TO THE TOWARDS A SOLUTION TO THE "SAMEAS PROBLEM" "SAMEAS

Sambuz

Useful Links

Newsletter

Mail Us

Outline GP hyperparameter inference Priors on GP hyperparameters - PowerPoint PPT Presentation

I NTEGRATION OVER HYPERPARAMETERS AND ESTIMATION OF PREDICTIVE PERFORMANCE Aki Vehtari Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland aki.vehtari@aalto.fi Priors and integration

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

EBLL Response in Public Housing Units Segment 1: The Basics EBLL Response in Public Housing

Twitter De-Identification Jonathon Storrick Jon.Storrick@gmail.com Center for Computational

Modeling, Identification, &amp; Fault Diagnostics of Batteries Scott Moura Assistant Professor |

Identifiability and Consistency of Bayesian Network Structure Learning from Incomplete Data

Testing Identifiable Kernel P Systems using an X-machine Approach Marian Gheorghe 1 , Florentin

Probabilistic Graphical Models David Sontag New York University Lecture 13, May 2, 2013 David

Lecture 6: GLMs Author: Nicholas Reich Transcribed by Nutcha Wattanachit/Edited by Bianca Doone

TOWARDS A SOLUTION TO THE TOWARDS A SOLUTION TO THE &quot;SAMEAS PROBLEM&quot; &quot;SAMEAS

Sambuz

Useful Links

Newsletter

Mail Us

Modeling, Identification, & Fault Diagnostics of Batteries Scott Moura Assistant Professor |

TOWARDS A SOLUTION TO THE TOWARDS A SOLUTION TO THE "SAMEAS PROBLEM" "SAMEAS