Contents I Introduction I Automatic relevance determination (ARD) I - PowerPoint PPT Presentation

P ROJECTION P REDICTIVE M ODEL S ELECTION F OR G AUSSIAN P ROCESSES Juho Piironen, Aki Vehtari Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland juho.piironen@aalto.fi, aki.vehtari@aalto.fi Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Contents I Introduction I Automatic relevance determination (ARD) I Projection predictive method I Examples I Summary Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Introduction I Model target y with several input variables x I Only some of the inputs x relevant I Bayesian approach: use a relevant prior and integrate over all uncertainties Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Introduction I Model target y with several input variables x I Only some of the inputs x relevant I Bayesian approach: use a relevant prior and integrate over all uncertainties I Radford Neal won the NIPS 2003 feature selection competition using Bayesian methods with all the features (500 – 100 000) Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Introduction I Model target y with several input variables x I Only some of the inputs x relevant I Bayesian approach: use a relevant prior and integrate over all uncertainties I Radford Neal won the NIPS 2003 feature selection competition using Bayesian methods with all the features (500 – 100 000) I Sometimes we want to select a minimal subset from x with a good predictive performance I improved model interpretability I reduced measurement costs in the future I reduced prediction time Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Gaussian process (GP) regression I GP-prior 0 , k ( x , x 0 ) � � f ( x ) ⇠ GP I Observation model ⇣ ⌘ y | f , � 2 I y | f ⇠ N I Predictive distribution f ⇤ | y ⇠ N ( f ⇤ | µ ⇤ , Σ ⇤ ) , µ ⇤ = K ⇤ ( K + � 2 I ) � 1 y Σ ⇤ = K ⇤⇤ � K ⇤ ( K + � 2 I ) � 1 K T ⇤ . Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

“Automatic relevance determination” I Squared exponential (SE) or exponentiated quadratic covariance function 0 1 D j ) 2 ( x j � x 0 @ � 1 k SE ( x , x 0 ) = � 2 X A . f exp ` 2 2 j j = 1 I Use of separate length-scales ` j for each input referred to as automatic relevance determination (ARD) I Idea: Optimizing marginal likelihood will yield large values ` j for irrelevant inputs I Problem: Large length-scale may simply mean linearity w.r.t. the input (not irrelevance) Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Toy example f 1 ( x 1 ) f 2 ( x 2 ) f 3 ( x 3 ) f 4 ( x 4 ) 2 f ( x ) = f 1 ( x 1 ) + · · · + f 8 ( x 8 ) , 1 0 − 1 ⇣ f , 0 . 3 2 ⌘ − 2 y ⇠ N , − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 f 5 ( x 5 ) f 6 ( x 6 ) f 7 ( x 7 ) f 8 ( x 8 ) 2 � � f j = 1 for all j . Var 1 0 ) All inputs equally relevant − 1 − 2 − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 1 True relevance 0 . 5 0 2 4 6 8 Input Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Toy example f 1 ( x 1 ) f 2 ( x 2 ) f 3 ( x 3 ) f 4 ( x 4 ) 2 f ( x ) = f 1 ( x 1 ) + · · · + f 8 ( x 8 ) , 1 0 − 1 ⇣ f , 0 . 3 2 ⌘ − 2 y ⇠ N , − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 f 5 ( x 5 ) f 6 ( x 6 ) f 7 ( x 7 ) f 8 ( x 8 ) 2 � � f j = 1 for all j . Var 1 0 ) All inputs equally relevant − 1 − 2 − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 1 True relevance Optimized ARD-values, 0 . 5 ARD-value ARD ( j ) = 1 / ` j (averaged over 100 data realizations, n = 200) 0 2 4 6 8 Input Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

How about estimating the predictive performance? I Cross-validation gives an (almost) unbiased estimate of the predictive performance I Fast LOO-CV approximations in Vehtari, Mononen, Tolvanen, Sivula, and Winther (2017). Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. JMLR 17(103):1-38. Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

How about estimating the predictive performance? I Cross-validation gives an (almost) unbiased estimate of the predictive performance I Fast LOO-CV approximations in Vehtari, Mononen, Tolvanen, Sivula, and Winther (2017). Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. JMLR 17(103):1-38. I But... Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Selection induced bias in variable selection I Even if the model performance estimate is unbiased (like LOO-CV), but it’s noisy (like LOO-CV), then using it for model selection introduces additional fitting to the data Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Selection induced bias in variable selection I Even if the model performance estimate is unbiased (like LOO-CV), but it’s noisy (like LOO-CV), then using it for model selection introduces additional fitting to the data I Performance of the selection process itself can be assessed using two level cross-validation, but it does not help choosing better models Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Selection induced bias in variable selection I Even if the model performance estimate is unbiased (like LOO-CV), but it’s noisy (like LOO-CV), then using it for model selection introduces additional fitting to the data I Performance of the selection process itself can be assessed using two level cross-validation, but it does not help choosing better models I Bigger problem if there is a large number of models as in covariate selection I Juho Piironen and Aki Vehtari (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing , 27(3):711-735. doi:10.1007/s11222-016-9649-y. arXiv preprint arXiv:1503.08650. Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Selection induced bias in variable selection n = 20 n = 50 n = 100 − 0.5 − 1.4 − 1.5 − 1.5 − 1.8 − 2.4 − 2.5 − 3.5 − 3.3 − 2.2 0 25 50 0 25 50 0 25 50 Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Selection induced bias in variable selection n = 100 n = 200 n = 400 0 . 3 0 . 3 0 . 3 0 0 0 CV-10 − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 . 3 0 . 3 0 . 3 0 0 0 WAIC − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 100 0 100 0 100 25 50 75 25 50 75 25 50 75 0 . 3 0 . 3 0 . 3 0 0 0 DIC − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 . 3 0 . 3 0 . 3 0 0 0 MPP − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 100 0 100 0 100 25 50 75 25 50 75 25 50 75 0 . 3 0 . 3 0 . 3 0 0 0 BMA-ref − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 . 3 0 . 3 0 . 3 0 0 0 BMA-proj − 0 . 3 − 0 . 3 − 0 . 3 Piironen & Vehtari (2017) − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Selection induced bias in variable selection n = 100 n = 200 n = 400 0 0 0 CV-10 − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 WAIC − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 DIC − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 MPP − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 BMA-ref − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 BMA-proj − 0 . 3 − 0 . 3 − 0 . 3 Piironen & Vehtari (2017) − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari

Contents I Introduction I Automatic relevance determination (ARD) I - PowerPoint PPT Presentation

P ROJECTION P REDICTIVE M ODEL S ELECTION F OR G AUSSIAN P ROCESSES Juho Piironen, Aki Vehtari Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland juho.piironen@aalto.fi,

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Oasys Post Processing New Features in Version 16.0 www.arup.com/dyna Back to Contents Back to

Contents averages averages Contents Contents Harmonic mean (average) Harmonic mean (average)

Sage as a Calculator By Samaneh shafi naderi By Samaneh shafi naderi Sage as a Calculator

Contents Contents Fluid

Contents Contents.....2 Butter

PRODUCT LAW WORLDVIEW PRODUCT LAW WORLDVIEW TABLE OF CONTENTS TABLE OF CONTENTS INTRODUCTION

The Waterbase Limited Investor Presentation June - 2016 Contents Contents 2 Safe Harbour

17 www.scad.ae Table of Contents Table of Contents

Scytls voter-verifiability solutions Pnyx.DRE and Pnyx.VVPAT Contents Contents

Cencosud April 2016 Corporate Presentation | Contents | 2 Contents Investment Highlights

3 August 2006 Hong Kong www.solomon-systech.com Table of contents Table of contents

CONTENTS CONTENTS A. Company Profile 03 B. Products 06 Appendix 29 2/30 A. Company Profile

INVESTOR PRESENTATION February 2020 CONTENTS TABLE OF CONTENTS Majid Al Futtaim 2019

Marine Biodiversity Yoshihisa Shirayama Contents Contents Characteristics of Marine

Taeil Enterprise the antimicrobial material technology Table of Contents Table of Contents

Reading Strand and Ideas Standard Statement 9 Range of Reading and Standard Statement 10 Level

Notice of Funding Availability (NOFA) for the Fiscal Year 2015 Continuum of Care Program

SUMMARY OF PART TWO Issues to consider when deciding to terminate Contractual or common law

SUMMARY OF PART ONE How do I terminate a contract? Termination clauses. Common law

Understanding Human Teaching Modalities in Reinforcement Learning Environments A Preliminary

ORED Revision Awards and Resubmission Tips March 26, 2018 ORED Revision Awards Purpose

EuCARD magnet development HFM-EuCARD, GdR, 14 October 2010 Gijs de Rijk CERN EUCARD - HE-LHC'10

Do Banks Pass Through Credit Expansions to Consumers Who Want to Borrow? Evidence from Credit