Tutorials on the Gaussian Random Process and its OR Applications By - PowerPoint PPT Presentation

Tutorials on the Gaussian Random Process and its OR Applications By Juta Pichitlamken Department of Industrial Engineering Kasetsart University, Bangkok juta.p@ku.ac.th http://pirun.ku.ac.th/ ∼ fengjtp/ September 3, 2009 Operations Research Network of Thailand (OR-NET 2009)

Motivation: Dagstuhl Seminar • Near Frankfurt, Germany. Web: http://www.dagstuhl.de/ • Concentrate on computer sciences, but the topic that I went to is Sampling-based Optimization in the Presence of Uncertainty organized by Jurgen Branke (Universitat Karlsruhe, Germany), Barry Nelson (Northwestern University, US), Warren Powell (Princeton University, US), and Thomas J. Santner (Ohio State University, US). • Small workshop of 20-30 people (invitation only) from different fields (Statistics, Simulation, OR, Computer Sciences, Business Administration, Mathematics, ...). • Despite these diversity, many people use Gaussian Random Processes (GP) as modeling tools.

Guassian Process (GP) Application: Spatial Statistics • Model spatial distribution of environmental or socioeconomic data (e.g., geographical distributions of cases of type-A (H1N1) influenza, or data from Geographic Information System (GIS)) for statistical inference. • Intuition: “ Everything is related to everything else, but near things are more related than distant things. ” (Waldo Tobler) • GP prediction (known as kriging) can be used to model this spatial dependency , i.e., spatial data is viewed as a realization of a stochastic process . • To build a model, the stochastic process is assumed stable: mean is constant, and covariance depends on the distance. Covariance is a measure of how much two variables change together: Cov( X, Y ) = E[( X − µ X )( Y − µ Y )]. • Classic textbook: Cressie, N.A.C. 1993. Statistics for Spatial Data. Wiley. Source: Cˆ amara, G., A.M. Monteiro, S.D. Fucks, and M. S. Carvalho. 2004. Spatial Analysis and GIS Primer . Downloadable from www.dpi.inpe.br/gilberto/spatial analysis.html

GP Application: Metamodeling of Deterministic Responses • In OR, a metamodel is a mathematical model of a set of related models (Xavier 2004). • Design and analysis of computer experiments: Data are generated from a computer code (e.g., finite element models) whose responses are deterministic; computation may be time consuming; and large number of factors are involved. • Given the training data { ( x i , y i ) } n i =1 , we want to find an approximation (metamodel) to the computer code. Called surrogates in the (deterministic) global optimization literature. • Ex: a computational fluid dynamics (CFD) model to study the oil mist separator system in the internal combustion engine (Satoh, Kawai, Ishikawa, and Matsuoka 2000). • Computer models with multiple levels of fidelity for optimization: Huang, Allen, Notz, and Miller (2006). • Textbooks: Fang K.-T., R. Li, and A. Sudjianto. 2006. Design and Modeling for Computer Experiments . Taylor and Francis. Santner, T.J., B.J. Williams, and W. I. Notz. 2003. The Design and Analysis of Computer Experiments . Springer-Verlag.

GP Application: Metamodeling of Probabilistic Responses Probabilistic responses (e.g., discrete-event simulation outputs). Approaches: 1. Consider the variability in the observed responses and fitting errors simultaneously: Ankenman, Nelson, and Staum (2008). 2. Separate the trend in the data using least squares models. Then, apply kriging models to model the residuals: van Beers and Kleijnen (2003, 2007).

GP Application: Machine learning • Supervised learning: f : input → output from empirical data (training data set). – For continuous outputs (aka responses or dependent variables), f is called regression. Ex: in manufacturing systems, WIP = f (throughput or production rate). – For discrete outputs, f is known as classification. Ex: classification of handwritten images into digits (0-9). • Approach: Assume prior distributions (beliefs over the types of f we will observe, e.g., mean and variance) to be GP. Once we get actual data, we can reject f that do not agree with the data, i.e., find the GP parameters that best fit the data. • Desirable properties of prior: smooth and stationary ← covariance function. • Resources: MacKay, D. 2002. Information Theory, Inference & Learning Algorithms. Cambridge University Press. Rasmussen C.E., and C.K.I. Williams. 2006. Gaussian Processes for Machine Learning . MIT Press. Also available online: www.gaussianprocess.org/gpml

Gaussian Process Regression Demo Thanks to Rasmussen C.E., and C.K.I. Williams. 2006. GPML Code. Sample Y from N (0 , 1) with exponential covariance function: � � − 1 Cov ( Y ( x p ) , Y ( x q )) = σ 2 2 ℓ 2 ( x p − x q ) 2 + σ 2 n 1 { x p = x q } f exp (1) Note: • Covariance between outputs are a function of inputs . • If x p ≈ x q , then Corr( Y ( x p ) , Y ( x q ) ≈ 1 . • Hyperparameters are length-scale ℓ = 1, signal variance σ 2 f = 1, and noise variance σ 2 n = 0 . 01.

Gaussian Process Regression Demo (Cont’d) Consider the following cases: 1. Fit a GP assuming zero mean and using the true covariance (1). 2. Shorten the length-scale to 0.3 ( σ f = 1 . 08 , σ n = 0 . 00005); the noise level is much reduced. 3. Lengthen the length-scale to 3 ( σ f = 1 . 16 , σ n = 0 . 89); the noise level is higher. 4. Assume the hyperparameters are unknown and estimated as the maximizer of the posterior probability given the training data, using the same covariance family as (1). 5. Assume the hyperparameters are unknown, using the Mat´ ern covariance family. Directory: D:\GP\gpml_matlab\gpml-demo

Definition of GP • Probability distribution characterizes a random variable (scalar or vector). Stochastic process governs the properties of functions. • GP is a collection of random variables, any finite number of which have a joint Gaussian distribution. • Suppose that X ∈ R d having positive dimensional volume. We say that Y ( x ), for x ∈ X is a Gaussian Process if for any L ≥ 1 and any choice of x 1 , x 2 , . . . , x L , the vector ( Y ( x 1 ) , Y ( x 2 ) , . . . , Y ( x L )) has a multivariate normal distribution. • Other examples of GP: Brownian motion processes and Kalman filters.

Review on Joint, Marginal and Conditional Probability Suppose we partition y into two groups: y A and y B , so that the joint probability is p ( y ) = p ( y A , y B ). Marginal probability of y A : � p ( y A ) = p ( y A , y B ) d y B . Conditional probability: p ( y A | y B ) = p ( y A , y B ) , for p ( y B ) > 0 . p ( y B ) Using the definitions of both p ( y A | y B ) and p ( y B | y A ), we get Bayes’theorem: p ( y A | y B ) = p ( y A ) p ( y B | y A ) . p ( y B ) In Bayesian terminology, this is likelihood × prior posterior = marginal likelihood .

The Univariate Normal Distribution Consider Z ∼ Normal with mean 0 and variance 1 ( N (0 , 1)). We can transform Z to have mean µ and variance σ 2 : Y = σZ + µ, where Y is also normal with the density � � � − ( y − µ ) 2 1 � � µ, σ 2 ) = √ p ( y 2 πσ exp . 2 σ 2

The Multivariate Normal Distribution (MVN) Consider Z i ∼ N (0 , 1) , i = 1 , 2 , . . . , d , and we group them into a random vector: Z = ( Z 1 , Z 2 , . . . , Z d ) ′ ∼ N d ( 0 , I ) . Similar to a N (0 , 1) case, we can transform Z to have mean vector µ and covariance matrix Σ : Y = Σ 1 / 2 Z + µ, where Σ 1 / 2 Σ 1 / 2 = Σ , a positive definite matrix. Y is normal with � � 1 − 1 2( y − µ ) ′ Σ − 1 ( y − µ ) √ p ( y | µ, Σ ) = 2 π ) d | Σ | 1 / 2 exp . ( Small | Σ | Large | Σ |

The Multivariate Normal Distribution (cont’d) Suppose that � � �� W 1 µ 1 Σ 1 , 1 Σ 1 , 2 ∼ N m + n , . W 2 µ 2 Σ 2 , 1 Σ 2 , 2 Then, the marginal distribution of W 1 is N ( µ 1 , Σ 1 , 1 ) . Conditional distribution: � � µ 1 + Σ 1 , 2 Σ − 1 2 , 2 ( W 2 − µ 2 ) , Σ 1 , 1 − Σ 1 , 2 Σ − 1 [ W 1 | W 2 ] ∼ N m 2 , 2 Σ 2 , 1 . (2) Product of two Gaussians give another (unnormalized) Guassian: N d ( x | a , A ) N d ( x | b , B ) = Z − 1 N d ( x | c , C ) , where c = C ( A − 1 a + B − 1 b ) , C = ( A − 1 + B − 1 ) − 1 , and � � − 1 Z − 1 = (2 π ) − d/ 2 | A + B | − 1 / 2 exp 2( a − b ) ′ ( A + B ) − 1 ( a − b ) .

GP Characterization Mostly, we consider strongly stationary GP: ( Y ( x 1 ) , Y ( x 2 ) , . . . , Y ( x L )) ∼ ( Y ( x 1 + h ) , Y ( x 2 + h ) , . . . , Y ( x L + h )) , i.e., it is invariant to translation. Therefore, the distribution of Y ( x ), mean and covariance matrix are the same for all x . The covariance function must depend only on x 1 − x 2 . Thus, we can characterize GPs by: 1. Mean function E( Y ( x )). Generally, we consider E[ Y ( x )] = 0. 2. Process variance, Cov(0 , 0), and correlation function: R ( x 1 − x 2 ) or covariance function C ( x 1 − x 2 ). Because correlation functions dictate the behavior of the GP (e.g., continuity and differentiability), certain families of correlation functions are generally used. Specification of C ( x 1 − x 2 ) implies a distribution over functions.

Tutorials on the Gaussian Random Process and its OR Applications By - PowerPoint PPT Presentation

Tutorials on the Gaussian Random Process and its OR Applications By Juta Pichitlamken Department of Industrial Engineering Kasetsart University, Bangkok juta.p@ku.ac.th http://pirun.ku.ac.th/ fengjtp/ September 3, 2009 Operations

Tutorials By Dr Sharon Truter To the Tutorials By Dr Sharon Truter What to expect from the

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Gaussian Free Field in (self-adjoint) random matrices and random surfaces Alexei Borodin Corners

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Ruin problem for integrated stationary Gaussian process Kobelkov S. G. 1 Consider a random process

Tutorials 3 tutorials: Day 1: introduction to Bayesian analysis and BAT, basic examples Day 2:

TexProtects Tutorials Project Partner: TexProtects Project Name: TexProtects Tutorials Team

lear learnr nr: : Inter Interactiv active e R R tutorials tutorials Jiena McLellan Kans

Gaussian Free Field in beta ensembles and random surfaces Alexei Borodin Corners of random

TA2 Test Case Praveen. C 1 R. Duvigneau 2 1 Tata Institute of Fundamental Research Center for

Recursive identification of smoothing spline ANOVA models Marco Ratto, Andrea Pagano European

Design and Analysis of Computer Experiments for Bulk Acoustic Wave filters:

R05 - Multiple Regression STAT 587 (Engineering) Iowa State University October 30, 2020

The Hottest, and Most Liquid, Liquid in the Universe Krishna Rajagopal MIT & CERN European

Natural Language Processing (CSE 517): Sequence Models Noah Smith 2018 c University of

Distan ances an ces and infor ormation g geom eometry: y: A A compu putational onal v

An Experimental System for Adaptive Services in Information Retrieval Claus-Peter Klas Sascha

Tutorials on the Gaussian Random Process and its OR Applications By - PowerPoint PPT Presentation

Tutorials on the Gaussian Random Process and its OR Applications By Juta Pichitlamken Department of Industrial Engineering Kasetsart University, Bangkok juta.p@ku.ac.th http://pirun.ku.ac.th/ fengjtp/ September 3, 2009 Operations

Tutorials By Dr Sharon Truter To the Tutorials By Dr Sharon Truter What to expect from the

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Gaussian Free Field in (self-adjoint) random matrices and random surfaces Alexei Borodin Corners

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Ruin problem for integrated stationary Gaussian process Kobelkov S. G. 1 Consider a random process

Tutorials 3 tutorials: Day 1: introduction to Bayesian analysis and BAT, basic examples Day 2:

TexProtects Tutorials Project Partner: TexProtects Project Name: TexProtects Tutorials Team

lear learnr nr: : Inter Interactiv active e R R tutorials tutorials Jiena McLellan Kans

Gaussian Free Field in beta ensembles and random surfaces Alexei Borodin Corners of random

TA2 Test Case Praveen. C 1 R. Duvigneau 2 1 Tata Institute of Fundamental Research Center for

Recursive identification of smoothing spline ANOVA models Marco Ratto, Andrea Pagano European

Design and Analysis of Computer Experiments for Bulk Acoustic Wave filters:

R05 - Multiple Regression STAT 587 (Engineering) Iowa State University October 30, 2020

The Hottest, and Most Liquid, Liquid in the Universe Krishna Rajagopal MIT &amp; CERN European

Natural Language Processing (CSE 517): Sequence Models Noah Smith 2018 c University of

Distan ances an ces and infor ormation g geom eometry: y: A A compu putational onal v

An Experimental System for Adaptive Services in Information Retrieval Claus-Peter Klas Sascha

The Hottest, and Most Liquid, Liquid in the Universe Krishna Rajagopal MIT & CERN European