Multivariate Emulation: Is it Worth the Trouble? Tom Fricker and - PowerPoint PPT Presentation

Multivariate Emulation: Is it Worth the Trouble? Tom Fricker and Jeremy Oakley University of Sheffield Statistics and Machine Learning Interface Meeting 24th July 2009

Emulators for computer models We want to emulate a p -input, k -output deterministic computer model. • Treat the computer model as an unknown function η : X ⊂ R p �→ R k • Prior: η ( . ) | β, Σ , Φ ∼ GP k [ m ( . ) , C( ., . )] • m ( x ) = ( 1 x T ) β : we use a linear trend • C( x , x ′ ) : a k × k matrix covariance function with hyperparameters (Σ , Φ) ⊲ A more complex regression structure may reduce the importance of the covariance function (cf J. Rougier) ⊲ But only if it is a good representation of the structure of the computer model.

The covariance function We assume there is little knowledge about structure of η ( . ). The focus of our work is the multivariate covariance function C( ., . ). • Represents 2 types of correlation in our beliefs about the residuals (after subtracting the trend): ⊲ correlation between different outputs ⊲ correlation over input-space - η ( . ) is smooth • Remember: there is no ‘true’ correlation between the outputs. How do we go about specifying and combining the 2 types of correlation?

1. Independent outputs ( IND ) Most straightforward: Ignore any between-output correlation, treat outputs as being independent cov[ η i ( x ) , η j ( x ′ )] = δ ij σ 2 j c j ( x , x ′ ) • Build a univariate GP emulator for each output • Each output has its own spatial correlation function • Train the emulator for output j using only data from output j .

2. Separable covariance ( SEP ) Easiest way to define a multivariate covariance function: Treat the two types of correlation as separable (e.g. Conti & O’Hagan, 2007) C( x , x ′ ) = Σ c ( x , x ′ ) • Σ : between-outputs covariance matrix • c ( x , x ′ ) : spatial correlation function Disadvantage: all outputs share the same spatial correlation function c ( x , x ′ )

3. Non-separable covariance Somewhere between IND and SEP : The Linear Model of Coregionalization ( LMC ) (e.g. Wackernagel, 1995; Gelfand et al. , 2004) • Outputs are linear combination of independent univariate GPs in vector Z ( . ): η ( . ) = β h ( . ) + R Z ( . ) Z j ( . ) ∼ GP [0 , κ j ( ., . )] j = 1 , ..., k ⊲ we use squared exponentials for κ j ( ., . ) • Between-output covariance at any given input is Σ = RR T

η ( . ) = β h ( . ) + R Z ( . ) , Z j ( . ) ∼ GP [0 , κ j ( ., . )] k � ⇒ C( x , x ′ ) = T ℓ κ ℓ ( x , x ′ ) , T ℓ = R • ℓ R • ℓ ℓ =1 This is a special case of the ‘nested covariance’ model, S � C( x , x ′ ) = T ℓ κ ℓ ( x , x ′ ) ℓ =1 • Taking S = k and T ℓ = R • ℓ R • ℓ is a ‘natural’ way of ensuring the T ℓ are positive semi-def: ⊲ parameterise by Σ = cov[ η ( x ) , η ( x )] ⊲ decompose as Σ = RR T ⊲ the correlation function for an individual output is a weighted sum of ‘basis’ functions κ j ( ., . ). ⊲ if no between-output correlation, then corr[ η j ( x ) , η j ( x ′ )] = κ j ( x , x ′ ), i.e. equivalent to IND .

Inference for hyperparameters Hyperparameters in the GP prior, η ( . ) | β, Σ , Φ ∼ GP k [ m ( . ) , C( ., . )]: • β , regression coefficients ⊲ conjugate prior, integrated out • Σ, between-output covariance ⊲ SEP / IND : conjugate prior, integrated out ⊲ LMC : analytic integration not possible • Φ, spatial correlation function parameters ⊲ analytic integration not possible for any of the emulators For hyperparameters that cannot be analytically integrated: we estimate by MLE and treat as fixed .

Regular outputs We make the assumption that the computer model has regular outputs : • The set of outputs is finite and fixed. • Every output is observed at every input point (cf. isotopic data in geostatistics) For SEP , this implies that the posterior for output j is a function only of data from output j : η j ( . ) | y j ⊥ y i ∀ i � = j Does a multivariate specification ever help?

Case Study 1: Simple Climate Model (Work with Nathan Urban) • 5 inputs • We shall focus on 2 univariate outputs: ⊲ CO 2 flux in the year 2000 ( CO 2 ) ⊲ Surface temperature in the year 2000 ( temp ) • Data: 60 training runs in an Latin hypercube design. • Validation: a further 100 model runs. • Emulators: ⊲ SEP , a separable emulator • 1 squared-exponential correlation function ⊲ LMC , an LMC emulator • 2 squared-exponential basis correlation functions ⊲ IND , 2 independent univariate emulators • each with 1 squared-exponential correlation function

CO 2 T emp MSPE MSPE SEP LMC IND SEP LMC IND 82.4 19.0 15.2 7.4 4.0 3.0

CO 2 T emp MSPE MSPE SEP LMC IND SEP LMC IND 82.4 19.0 15.2 7.4 4.0 3.0 % of CIs containing true values % of CIs containing true values 1.0 IND 1.0 IND SEP SEP LMC LMC 0.8 0.8 0.6 0.6 D α D α 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 α α

Independent emulators do just as well as LMC - So why bother with the multivariate specification? Example : Gross Primary Productivity (GPP), Π, a univariate function of the outputs � � CO 2 ( CO 2 + C ) + ( T opt × temp + 0 . 5 × temp 2 Π = Π max What is the predictive distribution Π? • simulate from the joint posterior of ( CO 2 , T emp )

GPP Joint posterior of ( CO 2 , T emp ) at one particular validation point 0.90 0.90 0.90 0.85 0.85 0.85 temp temp temp 0.80 0.80 0.80 0.75 0.75 0.75 378 380 382 384 386 378 380 382 384 386 378 380 382 384 386 CO2 CO2 CO2 SEP LMC IND

GPP SEP LMC IND MSPE 9.35 1.97 2.13 % of CIs containing true values 1.0 IND SEP LMC 0.8 0.6 D α 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 α

Case Study 2: A Finite Element Model A simple finite element model for an aeroplane (Work with Neil Sims) • The structure is represented by a large number of nodes. ⊲ The structure is represented by a large number of nodes. ⊲ A smaller number of parameters are used to set the overall physical properties of the structure - e.g. wing length, fuselage thickness, etc. ⊲ Select 5 as the variable inputs • Outputs: ⊲ 3 pairs of mass and stiffness ‘modal parameters’, ( m i , k i ). • The outputs are then combined to form the coefficients in a frequency response function , 3 1 � FRF ( ω ) = k i − ω 2 m i i =1

  m 1   x 1 k 1   x 2 3     1 m 2 � η     − → → FRF ( ω ) = − x 3     k 2 k i − ω 2 m i     x 4 i =1     m 3   x 5 k 3 0.0020 0.0015 0.0010 |FRF| 0.0005 0.0000 0 50 100 150 200 250 300 omega

Single validation point, m v. k 143.0 143.0 6.70 142.5 142.5 6.68 142.0 142.0 6.66 k k k 141.5 141.5 6.64 141.0 141.0 6.62 prediction prediction 140.5 140.5 true 6.60 true prediction true 12.62 12.64 12.66 12.68 12.62 12.64 12.66 12.68 3.9535 3.9536 3.9537 3.9538 3.9539 3.9540 m m m 120 120 120 110 110 110 100 100 100 k k k 90 90 90 80 80 80 prediction prediction prediction true true true 1.2 1.4 1.6 1.8 1.2 1.4 1.6 1.8 1.2 1.4 1.6 1.8 m m m Independent Separable LMC

|FRF| |FRF| 0.0000 0.0005 0.0010 0.0015 0.0020 0.0000 0.0005 0.0010 0.0015 0.0020 100 Independent 230 102 240 104 omega omega 250 Single validation point, FRF ( ω ) 106 260 108 true prediction 270 110 |FRF| |FRF| 0.0000 0.0005 0.0010 0.0015 0.0020 0.0000 0.0005 0.0010 0.0015 0.0020 100 230 Separable 102 240 104 omega omega 250 106 260 108 true prediction 270 110 |FRF| |FRF| 0.0000 0.0005 0.0010 0.0015 0.0020 0.0000 0.0005 0.0010 0.0015 0.0020 100 230 102 LMC 240 104 omega omega 250 106 260 108 270 110

Conclusions • I have not found any circumstances where a multivariate emulator outperforms independent univariate emulators if we are only interested in marginal predictions of individual outputs . • But it does not seem uncommon for multiple outputs of a computer model to be used jointly. • In this case, a multivariate specification can be important for propagating the uncertainty surrounding the joint predictions. • A non-separable covariance structure can lead to better predictions by allowing different spatial correlation functions for different outputs.

Acknowledgements Many thanks to Dr. Nathan Urban (Geosciences, Penn State university) for providing the Simple Climate Model data, and Neil Sims (Dept. Mechanical Engineering, University of Sheffield) for providing the FEM data.

References • Conti, S. and O’Hagan, A. (2007). Bayesian emulation of complex multi-output and dynamic computer models, Journal of Statistical Planning and Inference . In review. • Wackernagel, H. (1995). Multivariate Geostatistics , Springer. • Gelfand, A. E., Schmidt A. M., Banerjee, S., and Sirmans, C. F. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization (with discussion), Test , v. 13, no. 2, p. 1-50. • Urban, N. M. and Keller, K. (2008). Probabilistic hindcasts and projections of the coupled climate, carbon cycle, and Atlantic meridional overturning circulation systems: A Bayesian fusion of century-scale observations with a simple model, Tellus A , In review.

Multivariate Emulation: Is it Worth the Trouble? Tom Fricker and - PowerPoint PPT Presentation

Multivariate Emulation: Is it Worth the Trouble? Tom Fricker and Jeremy Oakley University of Sheffield Statistics and Machine Learning Interface Meeting 24th July 2009 Emulators for computer models We want to emulate a p -input, k -output

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Fort Worth Public Art a City of Fort Worth Program Fort Worth Public Art (FWPA) FWPA

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

A New Vision of Assessment Texts Worth Reading Problems Worth Solving Tests Worth Taking NCSA

2017 Audit and Accounting update Peter Worth Director Worth Technical Accounting Solutions

MAPS UMTS for IuCS, IuH Interfaces Emulator (IuCS Emulation over IP and ATM; and IuH Emulation

Emulation Outline Emulation Interpretation basic, threaded, directed threaded

Emulation in ns Presented by Alefiya Hussain What is Emulation Ability to introduce the

"ENLIGHTENING" KVM "ENLIGHTENING" KVM HYPER-V EMULATION HYPER-V EMULATION

Game boy emulation Nicolas Montanaro nicolas.moe Emulation Overview hardware or software

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Look for monodisciplinary journals with a high clustering coefficient, and multidisciplinary

BATH & NORTH EAST SOMERSET ALLOTMENT ASSOCIATION ANNUAL GENERAL MEETING 25th November 2020

Shear viscosity of a highly excited string and black hole membrane paradigm Yuya Sasai Helsinki

Abiotic stress upregulates the expression of genes involved in PSV and autophagy routes Joo

-nucleus optical potential and the search for mesic states in photo nuclear reactions

Recurrent Neural Networks using TensorFlow Jindich Libovick December 5, 2018 B4M36NLP

F i r s t l o o k a t t h e i n f l u e n c e o f m o i s t u r e

ma mainly y N 2 and O and O 2 (variable) Greenhouse Gases: H 2 O, CO 2 , CH 4 SIO15: Topic 15:

Multivariate Emulation: Is it Worth the Trouble? Tom Fricker and - PowerPoint PPT Presentation

Multivariate Emulation: Is it Worth the Trouble? Tom Fricker and Jeremy Oakley University of Sheffield Statistics and Machine Learning Interface Meeting 24th July 2009 Emulators for computer models We want to emulate a p -input, k -output

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Fort Worth Public Art a City of Fort Worth Program Fort Worth Public Art (FWPA) FWPA

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

A New Vision of Assessment Texts Worth Reading Problems Worth Solving Tests Worth Taking NCSA

2017 Audit and Accounting update Peter Worth Director Worth Technical Accounting Solutions

MAPS UMTS for IuCS, IuH Interfaces Emulator (IuCS Emulation over IP and ATM; and IuH Emulation

Emulation Outline Emulation Interpretation basic, threaded, directed threaded

Emulation in ns Presented by Alefiya Hussain What is Emulation Ability to introduce the

&quot;ENLIGHTENING&quot; KVM &quot;ENLIGHTENING&quot; KVM HYPER-V EMULATION HYPER-V EMULATION

Game boy emulation Nicolas Montanaro nicolas.moe Emulation Overview hardware or software

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Look for monodisciplinary journals with a high clustering coefficient, and multidisciplinary

BATH &amp; NORTH EAST SOMERSET ALLOTMENT ASSOCIATION ANNUAL GENERAL MEETING 25th November 2020

Shear viscosity of a highly excited string and black hole membrane paradigm Yuya Sasai Helsinki

Abiotic stress upregulates the expression of genes involved in PSV and autophagy routes Joo

-nucleus optical potential and the search for mesic states in photo nuclear reactions

Recurrent Neural Networks using TensorFlow Jindich Libovick December 5, 2018 B4M36NLP

F i r s t l o o k a t t h e i n f l u e n c e o f m o i s t u r e

ma mainly y N 2 and O and O 2 (variable) Greenhouse Gases: H 2 O, CO 2 , CH 4 SIO15: Topic 15:

"ENLIGHTENING" KVM "ENLIGHTENING" KVM HYPER-V EMULATION HYPER-V EMULATION

BATH & NORTH EAST SOMERSET ALLOTMENT ASSOCIATION ANNUAL GENERAL MEETING 25th November 2020