A Sequential Split-Conquer-Combine Approach for Gaussian Process - PowerPoint PPT Presentation

A Sequential Split-Conquer-Combine Approach for Gaussian Process Modeling in Computer Experiments Chengrui Li Department of Statistics and Biostatistics, Rutgers University Joint work with Ying Hung and Min-ge Xie 2017 QPRC JUNE 13, 2017 1

Outline � Introduction � A Unified Framework with Theoretical Supports � Simulation Study � Real Data Example � Summary 2

Introduction

Motivating example: Data center thermal management • A data center is an integrated facility housing multiple-unit servers, providing application services or management for data processing. • Goal : Design a data center with an efficient heat removal mechanism. • Computational Fluid Dynamics (CFD) simulation ( n = 26820, p = 9) Figure 1: Heat map for IBM T. J. Watson Data Center 3

✵ Gaussian process model • Gaussian process (GP) model: y = X β + Z ( x ) , • y : n × 1 vector of observations (e.g., room temperatures) • X : n × p design matrix • β : p × 1 unknown parameters 4

Gaussian process model • Gaussian process (GP) model: y = X β + Z ( x ) , • y : n × 1 vector of observations (e.g., room temperatures) • X : n × p design matrix • β : p × 1 unknown parameters • Z ( x ) is a GP process with mean ✵ and covariance σ 2 Σ( θ ) • Σ( θ ) : n -by- n Correlation matrix with correlation parameters θ The ij th element of Σ is defined by a power exponential function p corr ( Z ( x i ) , Z ( x j )) = exp( − θ T | x i − x j | ) = exp( − θ k | x ik − x jk | ) . � k = 1 • Remark : Assume σ is known for simplicity in this talk. 4

✵ ✵ ✵ ✵ ✵ ✵ ✵ ✵ Estimation and prediction • Likelihood inference l ( β , θ , σ ) = − 1 2 σ 2 ( y − X β ) ⊤ Σ − 1 ( θ )( y − X β ) − 1 2 log | Σ( θ ) | − n 2 log( σ 2 ) So, { l ( β | θ , σ 2 ) } = ( X ⊤ Σ − 1 ( θ ) X ) − 1 X ⊤ Σ − 1 ( θ ) y � β | θ = ❛r❣ ♠❛① β β , σ 2 = ❛r❣ ♠❛① β , σ 2 ) } θ | � � { l ( θ | � θ 5

Estimation and prediction • Likelihood inference l ( β , θ , σ ) = − 1 2 σ 2 ( y − X β ) ⊤ Σ − 1 ( θ )( y − X β ) − 1 2 log | Σ( θ ) | − n 2 log( σ 2 ) So, { l ( β | θ , σ 2 ) } = ( X ⊤ Σ − 1 ( θ ) X ) − 1 X ⊤ Σ − 1 ( θ ) y � β | θ = ❛r❣ ♠❛① β β , σ 2 = ❛r❣ ♠❛① β , σ 2 ) } θ | � � { l ( θ | � θ • GP prediction, say y ✵ , at a new point x ✵ , given parameters ( β , θ ) , follows a normal distribution with mean p ✵ ( β , θ ) and variance m ✵ ( β , θ ) , where, ✵ β + γ ( θ ) ⊤ Σ − 1 ( θ )( y − X β ) p ✵ ( β , θ ) = x ⊤ m ✵ ( β , θ ) = σ 2 ( 1 − γ ( θ ) ⊤ Σ − 1 ( θ ) γ ( θ )) , and γ ( θ ) is a n × 1 vector of i th element equals to φ ( || x i − x ✵ || ; θ ) . 5

Two challenges in GP modeling � Computational issue: • Estimation and prediction involve Σ − 1 and | Σ | with order of O ( n 3 ) : • Not feasible when n is large 6

Two challenges in GP modeling � Computational issue: • Estimation and prediction involve Σ − 1 and | Σ | with order of O ( n 3 ) : • Not feasible when n is large � Uncertainty quantification of GP predictor • Plug-in predictive distribution is widely used • It underestimates the uncertainty 6

Existing methods � For the computational issue: • Change the model to one that is computationally convenient: Rue and Held (2005), Cressie and Johannesson (2008). • Approximate the likelihood function: Stein et al. (2004), Furrer et al. (2006), Fuentes (2007), Kaufman et al. (2008). • Not focus on uncertainty quantification and bring in addition uncertainty � For uncertainty quantification of GP predictor • Bayesian predictive distribution • Bootstrap approach (Luna and Young 2003) • Intensive computation 7

Solve both problems by a unified framework? • Yes! 8

A Unified Framework

Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): • Point estimate • Interval estimate • Distribution estimate Example : X 1 , . . . , X n i.i.d. follows N ( µ, 1 ) � n x n = 1 • Point estimate: ¯ i = 1 x i n x n − 1 . 96 / √ n , ¯ x n + 1 . 96 / √ n ) • Interval estimate: (¯ x n , 1 • Distribution estimate: N (¯ n ) 9

Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): • Point estimate • Interval estimate • Distribution estimate Example : X 1 , . . . , X n i.i.d. follows N ( µ, 1 ) � n x n = 1 • Point estimate: ¯ i = 1 x i n x n − 1 . 96 / √ n , ¯ x n + 1 . 96 / √ n ) • Interval estimate: (¯ x n , 1 • Distribution estimate: N (¯ n ) The idea of the CD approach is to use a sample-dependent distribution (or density) function to estimate the parameter of interest. • Wide range of examples: bootstrap distribution, (normalized) likelihood function, p -value functions, fiducial distributions, some informative priors and Bayesian posteriors, among others (Xie and Singh 2013) 9

Overview: Sequential Split-Conquer-Combine  D D D D Data 1 2 3 m Split and Conquer ˆ  D * : Step 1: 1 1 ˆ *  Step 2: D : 2 2 ˆ *  Step 3: D : 3 3   ˆ *  Step m : D : m m  Combine ˆ ˆ ˆ ˆ     m 2 3 1 ˆ  c Figure 2: Sequential Split-Conquer-Combine Approach 10

Ingredients � Split the entire dataset into subsets (correlated) based on compact support correlation assumption for 1-D 11

Ingredients � Split the entire dataset into subsets (correlated) based on compact support correlation assumption for 1-D � Perform a sequential updating to create independent subsets and estimate on each updated subsets 11

Ingredients � Split the entire dataset into subsets (correlated) based on compact support correlation assumption for 1-D � Perform a sequential updating to create independent subsets and estimate on each updated subsets � Combine estimators 11

Ingredients � Split the entire dataset into subsets (correlated) based on compact support correlation assumption for 1-D � Perform a sequential updating to create independent subsets and estimate on each updated subsets � Combine estimators � Quantify prediction uncertainty 11

Split � Split the entire dataset into subsets y = { y a } , a = 1 , ..., m . Denote the size of y a by n a , i.e. � n a = n . • Assumption: compactly supported correlation   O O Σ 11 Σ 12 · · ·  ...    O Σ 21 Σ 22 · · ·     . . ... ... ... . .   Σ t = . . ,     ...   O   · · · Σ ( m − 1 )( m − 1 ) Σ ( m − 1 ) m O O · · · Σ m ( m − 1 ) Σ mm n × n (after index sorting according to X 1 values) 12

Sequentially update data � Transform y to y ∗ by sequentially updating: y ∗ a = y a − L a ( a − 1 ) y ∗ a − 1 , where L ( a + 1 ) a = Σ t ( a + 1 ) a D − 1 a , D a = Σ aa − L a ( a − 1 ) D ( a − 1 ) L ⊤ a ( a − 1 ) . • Sequential updates are computationally efficient . • The updated block y ∗ a ’s are independent. 13

Estimation from each subset Given θ , we have • MLE of the a th subset: l ( a ) a D − 1 a C a ) − 1 C ⊤ a D − 1 � t ( β | θ ) = ( C ⊤ a y ∗ β a = ❛r❣ ♠❛① a . β | θ • An individual CD for the a th updated subset is (cf., Xie and Singh 2013): N p ( � β a , Cov ( � β a )) . Given β , we have θ a = ❛r❣ ♠❛① θ l ( a ) • MLE of the a th subset: � t ( θ | β ) . • Given β , an individual CD for the a th updated subset is N ( � θ a , Cov ( � θ a )) . Significant computational reduction because D a is much smaller than the original covariance matrix. 14

CD combining • Following Singh, Xie and Strawderman (2005), Liu, Liu and Xie (2014) and Yang et al. (2014), a combined CD is N p ( β c , S c ) , where β c = ( � W a ) − 1 ( � W a � a C a ) − 1 and β a ) with W a = ( C ⊤ a D − 1 � S c = Cov ( � β c ) . • Similar framework can be applied to all the parameters ( β , θ ) . 15

CD combining • Following Singh, Xie and Strawderman (2005), Liu, Liu and Xie (2014) and Yang et al. (2014), a combined CD is N p ( β c , S c ) , where β c = ( � W a ) − 1 ( � W a � a C a ) − 1 and β a ) with W a = ( C ⊤ a D − 1 � S c = Cov ( � β c ) . • Similar framework can be applied to all the parameters ( β , θ ) . Theorem 1 Under some regularity assumptions, when τ > O p ( n 1 / 2 ) and n → ∞ , the SSCC estimator � θ c ) is asymptotically as efficient as MLE λ c = ( � β c , � θ mle ) . � λ mle = ( � β mle , � 15

GP predictive distribution • GP predictor at a new point x ✵ , given parameters ( β , θ ) , follows a normal distribution with mean p ✵ ( β , θ ) and variance m ✵ ( β , θ ) , where, ✵ β + γ ( θ ) ⊤ Σ − 1 ( θ )( y − X β ) p ✵ ( β , θ ) = x ⊤ m ✵ ( β , θ ) = σ 2 ( 1 − γ ( θ ) ⊤ Σ − 1 ( θ ) γ ( θ )) , and γ ( θ ) is a n × 1 vector of i th element equals to φ ( || x i − x ✵ || ; θ ) . 16

A Sequential Split-Conquer-Combine Approach for Gaussian Process - PowerPoint PPT Presentation

A Sequential Split-Conquer-Combine Approach for Gaussian Process Modeling in Computer Experiments Chengrui Li Department of Statistics and Biostatistics, Rutgers University Joint work with Ying Hung and Min-ge Xie 2017 QPRC JUNE 13, 2017 1

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Divide-Conquer-Glue Algorithms Divide-and-conquer. Divide up problem into several subproblems.

Divide and Conquer Summary Divide Identify one or more subproblems Conquer Solve

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

SPL SPLIT IT CA CAST ST Installation of Split Cast Kit Split Cast Kit Rf QM 2000 2 screws

PRODUCT DECOMPOSITION Ante Rozga, University of Split, Faculty of Economics/Split - Cvite

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Week 2 Growth of Functions Divide-and- Divide and Conquer Conquer Min-Max- Problem Tutorial

Divide and Conquer Algorithm Design Techniques Greedy Divide and Conquer Dynamic Programming

U i U i University of Split University of Split i i f S li f S li Livanjska 5 Livanjska 5

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each

CS Lunch Mary Allen Wilkes Wednesday 12:15 Kendade 307 2 Divide and Conquer Divide-and-conquer.

CS Lunch Mary Allen Wilkes Wednesday 12:15 Kendade 307 2 Divide and Conquer Divide-and-conquer.

A Reference Model for Autonomic Networking draft-behringer-anima-reference-model-00.txt 92 nd

PTPC Report October 2, 2014 Prepared and Presented by David J. Piazza, Superintendent 1

Chomsky normal form (CNF) I Purpose: a simplified form of grammars Every rule must be either A

Change: Help them EAT it Up Academic and Student Affairs Leadership Conference Minnesota State

S " # Context-Free Pushdown Grammars Automata A derivation: S aSb aaSbb aabb We will

Control toolbox in Matlab: General Creating linear models. System interconnections. Data

Fluid models in performance analysis Mikl os Telek Dept. of Telecom., Technical University of

Little Higgs and T Parity Claudia Frugiuele -------------------------- Carleton University 11

Sambuz

Useful Links

Newsletter

Mail Us

A Sequential Split-Conquer-Combine Approach for Gaussian Process - PowerPoint PPT Presentation

A Sequential Split-Conquer-Combine Approach for Gaussian Process Modeling in Computer Experiments Chengrui Li Department of Statistics and Biostatistics, Rutgers University Joint work with Ying Hung and Min-ge Xie 2017 QPRC JUNE 13, 2017 1

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Divide-Conquer-Glue Algorithms Divide-and-conquer. Divide up problem into several subproblems.

Divide and Conquer Summary Divide Identify one or more subproblems Conquer Solve

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

SPL SPLIT IT CA CAST ST Installation of Split Cast Kit Split Cast Kit Rf QM 2000 2 screws

PRODUCT DECOMPOSITION Ante Rozga, University of Split, Faculty of Economics/Split - Cvite

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Week 2 Growth of Functions Divide-and- Divide and Conquer Conquer Min-Max- Problem Tutorial

Divide and Conquer Algorithm Design Techniques Greedy Divide and Conquer Dynamic Programming

U i U i University of Split University of Split i i f S li f S li Livanjska 5 Livanjska 5

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each

CS Lunch Mary Allen Wilkes Wednesday 12:15 Kendade 307 2 Divide and Conquer Divide-and-conquer.

CS Lunch Mary Allen Wilkes Wednesday 12:15 Kendade 307 2 Divide and Conquer Divide-and-conquer.

A Reference Model for Autonomic Networking draft-behringer-anima-reference-model-00.txt 92 nd

PTPC Report October 2, 2014 Prepared and Presented by David J. Piazza, Superintendent 1

Chomsky normal form (CNF) I Purpose: a simplified form of grammars Every rule must be either A

Change: Help them EAT it Up Academic and Student Affairs Leadership Conference Minnesota State

S &quot; # Context-Free Pushdown Grammars Automata A derivation: S aSb aaSbb aabb We will

Control toolbox in Matlab: General Creating linear models. System interconnections. Data

Fluid models in performance analysis Mikl os Telek Dept. of Telecom., Technical University of

Little Higgs and T Parity Claudia Frugiuele -------------------------- Carleton University 11

Sambuz

Useful Links

Newsletter

Mail Us

S " # Context-Free Pushdown Grammars Automata A derivation: S aSb aaSbb aabb We will