mismatched models
play

Mismatched Models & Can GP Regression Be Made Robust Against - PowerPoint PPT Presentation

Gaussian Process Regression with Mismatched Models & Can GP Regression Be Made Robust Against Model Mismatch? Peter Sollich NEURIPS 2002 & International Workshop on Deterministic and Statistical Methods in Machine Learning (2004)


  1. Gaussian Process Regression with Mismatched Models & Can GP Regression Be Made Robust Against Model Mismatch? Peter Sollich NEURIPS 2002 & International Workshop on Deterministic and Statistical Methods in Machine Learning (2004)

  2. Learning curve Ideal learning curve: β€’ Performance on true distribution β€’ Average over multiple training datasets

  3. What is GP regression? 𝑧 = 𝑔 𝑦 + πœ— , πœ—~ 𝑂 0,𝜏 2 want to estimate 𝑔 Put a GP-prior on 𝑔 β€’ 𝐷𝑝𝑀 𝑔 𝑦 𝑗 ,𝑔 𝑦 π‘˜ = 𝐿(𝑦 𝑗 ,𝑦 π‘˜ ) β€’ 𝐹 𝑔 𝑦 = 0 Why GP-regression? β€’ Posterior analytically (requires 𝑃(π‘œ 3 ) ) β€’ Error bars

  4. Mismatched model? Input to GP: kernel 𝐿 , noise level 𝜏 β€’ What if we use the wrong one? Setting: β€’ Assume p(x) known: uniform on line or β€’ Theory exact if 𝑒 = ∞ , otherwise all kinds hypercube β€’ Assume K x, x β€² = g( 𝑦 βˆ’ 𝑦 β€² ) of approximations

  5. Weird learning curves β€’ Plateaus or arbitrary # overfitting maxima Line 1D, noise level too low results in plateau Hypercube, d=10, noise level too small: 1e-4, 1e- 3, … true = 1

  6. Asymptotic problems β€’ No asymptotic decay πœ— = 𝑃( π‘œ ) such as for parametric models, much 1 β€’ If true kernel (OU, MB2) is less smooth than chosen kernel (RBF) slower (log. slow) β€’ Prior cannot be overwhelmed by data (is too strong)

  7. Fix? β€’ But maybe we just chose very bad hyperparameters? β€’ Maximize: 𝑄 𝐸 = ∫ 𝑄 𝐸 πœ„ 𝑄 πœ„ π‘’πœ„ w.r.t. hyperpar. β€’ A true Bayesian is too expensive… What if evidence maximization? β€’ Setting: assume wrong kernel, but we tune 𝜏, 𝑏, π‘š using evidence β€’ All kinds of approximations to make the analysis tractable…

  8. Hypercube analysis β€’ If we can tune par to get Bayes optimal performance, we will get it. β€’ If we cannot find those par (for example, π‘š β†’ ∞ ), convergence still very slow β€’ No maxima’s β€’ No experiments???

  9. 1D case β€’ True kernel = MB2 β€’ Used kernel = in plot β€’ No maxima, plateaus β€’ Optimal rate achieved

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend