Lecture 12 Gaussian Process Models 10/16/2018 1 Multivariate - - PowerPoint PPT Presentation
Lecture 12 Gaussian Process Models 10/16/2018 1 Multivariate - - PowerPoint PPT Presentation
Lecture 12 Gaussian Process Models 10/16/2018 1 Multivariate Normal Multivariate Normal Distribution 11 1 1 1 1 1 1
Multivariate Normal
Multivariate Normal Distribution
For an ๐-dimension multivate normal distribution with covariance ๐ป (positive semidefinite) can be written as
๐
๐ร1 โผ ๐( ๐ ๐ร1
, ๐ป
๐ร๐) where {๐ป}๐๐ = ๐2 ๐๐ = ๐๐๐ ๐๐ ๐๐
โ โ โ ๐1 โฎ ๐๐ โ โ โ โผ ๐ โ โ โ โ โ โ ๐1 โฎ ๐๐ โ โ โ , โ โ โ ๐11๐1๐1 โฏ ๐1๐๐1๐๐ โฎ โฑ โฎ ๐๐1๐๐๐1 โฏ ๐๐๐๐๐๐๐ โ โ โ โ โ โ
2
Density
For the ๐ dimensional multivate normal given on the last slide, its density is given by
(2๐)โ๐/2 det(๐ป)โ1/2 exp (โ1 2(๐ โ ๐)โฒ
1ร๐
๐ปโ1
๐ร๐(๐ โ ๐) ๐ร1
)
and its log density is given by
โ๐ 2 log 2๐ โ 1 2 log det(๐ป) โ โ1 2(๐ โ ๐)โฒ
1ร๐
๐ปโ1
๐ร๐(๐ โ ๐) ๐ร1
3
Sampling
To generate draws from an ๐-dimensional multivate normal with mean ๐ and covariance matrix ๐ป,
- Find a matrix ๐ such that ๐ป = ๐ ๐๐ข, most often we use
๐ = Chol(๐ป) where ๐ is a lower triangular matrix.
- Draw ๐ iid unit normals (๐ช(0, 1)) as ๐ด
- Obtain multivariate normal draws using
๐ = ๐ + ๐ ๐ด
4
Sampling
To generate draws from an ๐-dimensional multivate normal with mean ๐ and covariance matrix ๐ป,
- Find a matrix ๐ such that ๐ป = ๐ ๐๐ข, most often we use
๐ = Chol(๐ป) where ๐ is a lower triangular matrix.
- Draw ๐ iid unit normals (๐ช(0, 1)) as ๐ด
- Obtain multivariate normal draws using
๐ = ๐ + ๐ ๐ด
4
Sampling
To generate draws from an ๐-dimensional multivate normal with mean ๐ and covariance matrix ๐ป,
- Find a matrix ๐ such that ๐ป = ๐ ๐๐ข, most often we use
๐ = Chol(๐ป) where ๐ is a lower triangular matrix.
- Draw ๐ iid unit normals (๐ช(0, 1)) as ๐ด
- Obtain multivariate normal draws using
๐ = ๐ + ๐ ๐ด
4
Sampling
To generate draws from an ๐-dimensional multivate normal with mean ๐ and covariance matrix ๐ป,
- Find a matrix ๐ such that ๐ป = ๐ ๐๐ข, most often we use
๐ = Chol(๐ป) where ๐ is a lower triangular matrix.
- Draw ๐ iid unit normals (๐ช(0, 1)) as ๐ด
- Obtain multivariate normal draws using
๐ = ๐ + ๐ ๐ด
4
Bivariate Example
๐ = (0 0) ๐ป = (1 ๐ ๐ 1)
rho=โ0.9 rho=โ0.7 rho=โ0.5 rho=โ0.1 rho=0.9 rho=0.7 rho=0.5 rho=0.1 โ2.5 0.0 2.5 โ2.5 0.0 2.5 โ2.5 0.0 2.5 โ2.5 0.0 2.5 โ4 โ2 2 โ4 โ2 2
x y 5
Marginal distributions
Proposition - For an ๐-dimensional multivate normal with mean ๐ and covariance matrix ๐ป, any marginal or conditional distribution of the ๐งโs will also be (multivariate) normal. For a univariate marginal distribution,
๐ง๐ = ๐ช(๐๐, ๐ป๐๐)
For a bivariate marginal distribution,
๐ณ๐๐ = ๐ช ((๐๐ ๐๐ ) , (๐ป๐๐ ๐ป๐๐ ๐ป๐๐ ๐ป๐๐ ))
For a ๐-dimensional marginal distribution,
๐ณ๐,โฏ,๐ = ๐ช โ โ โ โ โ โ ๐๐ โฎ ๐๐ โ โ โ , โ โ โ ๐ป๐๐ โฏ ๐ป๐๐ โฎ โฑ โฎ ๐ป๐๐ โฏ ๐ป๐๐ โ โ โ โ โ โ
6
Marginal distributions
Proposition - For an ๐-dimensional multivate normal with mean ๐ and covariance matrix ๐ป, any marginal or conditional distribution of the ๐งโs will also be (multivariate) normal. For a univariate marginal distribution,
๐ง๐ = ๐ช(๐๐, ๐ป๐๐)
For a bivariate marginal distribution,
๐ณ๐๐ = ๐ช ((๐๐ ๐๐ ) , (๐ป๐๐ ๐ป๐๐ ๐ป๐๐ ๐ป๐๐ ))
For a ๐-dimensional marginal distribution,
๐ณ๐,โฏ,๐ = ๐ช โ โ โ โ โ โ ๐๐ โฎ ๐๐ โ โ โ , โ โ โ ๐ป๐๐ โฏ ๐ป๐๐ โฎ โฑ โฎ ๐ป๐๐ โฏ ๐ป๐๐ โ โ โ โ โ โ
6
Marginal distributions
Proposition - For an ๐-dimensional multivate normal with mean ๐ and covariance matrix ๐ป, any marginal or conditional distribution of the ๐งโs will also be (multivariate) normal. For a univariate marginal distribution,
๐ง๐ = ๐ช(๐๐, ๐ป๐๐)
For a bivariate marginal distribution,
๐ณ๐๐ = ๐ช ((๐๐ ๐๐ ) , (๐ป๐๐ ๐ป๐๐ ๐ป๐๐ ๐ป๐๐ ))
For a ๐-dimensional marginal distribution,
๐ณ๐,โฏ,๐ = ๐ช โ โ โ โ โ โ ๐๐ โฎ ๐๐ โ โ โ , โ โ โ ๐ป๐๐ โฏ ๐ป๐๐ โฎ โฑ โฎ ๐ป๐๐ โฏ ๐ป๐๐ โ โ โ โ โ โ
6
Marginal distributions
Proposition - For an ๐-dimensional multivate normal with mean ๐ and covariance matrix ๐ป, any marginal or conditional distribution of the ๐งโs will also be (multivariate) normal. For a univariate marginal distribution,
๐ง๐ = ๐ช(๐๐, ๐ป๐๐)
For a bivariate marginal distribution,
๐ณ๐๐ = ๐ช ((๐๐ ๐๐ ) , (๐ป๐๐ ๐ป๐๐ ๐ป๐๐ ๐ป๐๐ ))
For a ๐-dimensional marginal distribution,
๐ณ๐,โฏ,๐ = ๐ช โ โ โ โ โ โ ๐๐ โฎ ๐๐ โ โ โ , โ โ โ ๐ป๐๐ โฏ ๐ป๐๐ โฎ โฑ โฎ ๐ป๐๐ โฏ ๐ป๐๐ โ โ โ โ โ โ
6
Conditional Distributions
If we partition the ๐-dimensions into two pieces such that
๐ = (๐1, ๐2)๐ข then
๐
๐ร1 โผ ๐ช โ
โ โ โ (๐1 ๐2 )
๐ร1
, (๐ป11 ๐ป12 ๐ป21 ๐ป22 )
๐ร๐
โ โ โ โ ๐1
๐ร1
โผ ๐ช( ๐1
๐ร1
, ๐ป11
๐ร๐
) ๐2
๐โ๐ร1
โผ ๐ช( ๐2
๐โ๐ร1
, ๐ป22
๐โ๐ร๐โ๐
) then the conditional distributions are given by ๐๐ | ๐2 = ๐ โผ ๐ช(๐๐ + ๐ป๐๐ ๐ปโ1
๐๐ (๐ โ ๐๐), ๐ป๐๐ โ ๐ป๐๐ ๐ปโ1 ๐๐ ๐ป๐๐)
๐๐ | ๐1 = ๐ โผ ๐ช(๐๐ + ๐ป๐๐ ๐ปโ1
๐๐ (๐ โ ๐๐), ๐ป๐๐ โ ๐ป๐๐ ๐ปโ1 ๐๐ ๐ป๐๐)
7
Conditional Distributions
If we partition the ๐-dimensions into two pieces such that
๐ = (๐1, ๐2)๐ข then
๐
๐ร1 โผ ๐ช โ
โ โ โ (๐1 ๐2 )
๐ร1
, (๐ป11 ๐ป12 ๐ป21 ๐ป22 )
๐ร๐
โ โ โ โ ๐1
๐ร1
โผ ๐ช( ๐1
๐ร1
, ๐ป11
๐ร๐
) ๐2
๐โ๐ร1
โผ ๐ช( ๐2
๐โ๐ร1
, ๐ป22
๐โ๐ร๐โ๐
) then the conditional distributions are given by ๐๐ | ๐2 = ๐ โผ ๐ช(๐๐ + ๐ป๐๐ ๐ปโ1
๐๐ (๐ โ ๐๐), ๐ป๐๐ โ ๐ป๐๐ ๐ปโ1 ๐๐ ๐ป๐๐)
๐๐ | ๐1 = ๐ โผ ๐ช(๐๐ + ๐ป๐๐ ๐ปโ1
๐๐ (๐ โ ๐๐), ๐ป๐๐ โ ๐ป๐๐ ๐ปโ1 ๐๐ ๐ป๐๐)
7
Gaussian Processes
From Shumway, A process, ๐ = {๐ (๐ข) โถ ๐ข โ ๐}, is said to be a Gaussian process if all possible finite dimensional vectors ๐ณ = (๐ง๐ข1, ๐ง๐ข2, ..., ๐ง๐ข๐)๐ข, for every collection of time points ๐ข1, ๐ข2, โฆ , ๐ข๐, and every positive integer ๐, have a multivariate normal distribution. So far we have only looked at examples of time series where ๐ is discete (and evenly spaces & contiguous), it turns out things get a lot more interesting when we explore the case where ๐ is defined on a continuous space (e.g.
- r some subset of
).
8
Gaussian Processes
From Shumway, A process, ๐ = {๐ (๐ข) โถ ๐ข โ ๐}, is said to be a Gaussian process if all possible finite dimensional vectors ๐ณ = (๐ง๐ข1, ๐ง๐ข2, ..., ๐ง๐ข๐)๐ข, for every collection of time points ๐ข1, ๐ข2, โฆ , ๐ข๐, and every positive integer ๐, have a multivariate normal distribution. So far we have only looked at examples of time series where ๐ is discete (and evenly spaces & contiguous), it turns out things get a lot more interesting when we explore the case where ๐ is defined on a continuous space (e.g. R or some subset of R).
8
Gaussian Process Regression
Parameterizing a Gaussian Process
Imagine we have a Gaussian process defined such that
๐ = {๐ (๐ข) โถ ๐ข โ [0, 1]},
- We now have an uncountably infinite set of possible ๐ขโs and ๐ (๐ข)s.
- We will only have a (small) finite number of observations
๐ (๐ข1), โฆ , ๐ (๐ข๐) with which to say something useful about this
infinite dimensional process.
- The unconstrained covariance matrix for the observed data can have
up to ๐(๐ + 1)/2 unique valuesโ
- Necessary to make some simplifying assumptions:
- Stationarity
- Simple parameterization of ฮฃ
9
Parameterizing a Gaussian Process
Imagine we have a Gaussian process defined such that
๐ = {๐ (๐ข) โถ ๐ข โ [0, 1]},
- We now have an uncountably infinite set of possible ๐ขโs and ๐ (๐ข)s.
- We will only have a (small) finite number of observations
๐ (๐ข1), โฆ , ๐ (๐ข๐) with which to say something useful about this
infinite dimensional process.
- The unconstrained covariance matrix for the observed data can have
up to ๐(๐ + 1)/2 unique valuesโ
- Necessary to make some simplifying assumptions:
- Stationarity
- Simple parameterization of ฮฃ
9
Parameterizing a Gaussian Process
Imagine we have a Gaussian process defined such that
๐ = {๐ (๐ข) โถ ๐ข โ [0, 1]},
- We now have an uncountably infinite set of possible ๐ขโs and ๐ (๐ข)s.
- We will only have a (small) finite number of observations
๐ (๐ข1), โฆ , ๐ (๐ข๐) with which to say something useful about this
infinite dimensional process.
- The unconstrained covariance matrix for the observed data can have
up to ๐(๐ + 1)/2 unique valuesโ
- Necessary to make some simplifying assumptions:
- Stationarity
- Simple parameterization of ฮฃ
9
Parameterizing a Gaussian Process
Imagine we have a Gaussian process defined such that
๐ = {๐ (๐ข) โถ ๐ข โ [0, 1]},
- We now have an uncountably infinite set of possible ๐ขโs and ๐ (๐ข)s.
- We will only have a (small) finite number of observations
๐ (๐ข1), โฆ , ๐ (๐ข๐) with which to say something useful about this
infinite dimensional process.
- The unconstrained covariance matrix for the observed data can have
up to ๐(๐ + 1)/2 unique valuesโ
- Necessary to make some simplifying assumptions:
- Stationarity
- Simple parameterization of ฮฃ
9
Parameterizing a Gaussian Process
Imagine we have a Gaussian process defined such that
๐ = {๐ (๐ข) โถ ๐ข โ [0, 1]},
- We now have an uncountably infinite set of possible ๐ขโs and ๐ (๐ข)s.
- We will only have a (small) finite number of observations
๐ (๐ข1), โฆ , ๐ (๐ข๐) with which to say something useful about this
infinite dimensional process.
- The unconstrained covariance matrix for the observed data can have
up to ๐(๐ + 1)/2 unique valuesโ
- Necessary to make some simplifying assumptions:
- Stationarity
- Simple parameterization of ฮฃ
9
Covariance Functions
More on these next week, but for now some simple / common examples Exponential Covariance:
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ |๐ข โ ๐ขโฒ| ๐ )
Squared Exponential Covariance:
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ (|๐ข โ ๐ขโฒ| ๐ )2)
Powered Exponential Covariance (๐ โ (0, 2]):
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ (|๐ข โ ๐ขโฒ| ๐ )๐)
10
Covariance Functions
More on these next week, but for now some simple / common examples Exponential Covariance:
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ |๐ข โ ๐ขโฒ| ๐ )
Squared Exponential Covariance:
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ (|๐ข โ ๐ขโฒ| ๐ )2)
Powered Exponential Covariance (๐ โ (0, 2]):
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ (|๐ข โ ๐ขโฒ| ๐ )๐)
10
Covariance Functions
More on these next week, but for now some simple / common examples Exponential Covariance:
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ |๐ข โ ๐ขโฒ| ๐ )
Squared Exponential Covariance:
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ (|๐ข โ ๐ขโฒ| ๐ )2)
Powered Exponential Covariance (๐ โ (0, 2]):
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ (|๐ข โ ๐ขโฒ| ๐ )๐)
10
Covariance Functions
More on these next week, but for now some simple / common examples Exponential Covariance:
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ |๐ข โ ๐ขโฒ| ๐ )
Squared Exponential Covariance:
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ (|๐ข โ ๐ขโฒ| ๐ )2)
Powered Exponential Covariance (๐ โ (0, 2]):
ฮฃ(๐ง๐ข, ๐ง๐ขโฒ) = ๐2 exp ( โ (|๐ข โ ๐ขโฒ| ๐ )๐)
10
Covariance Function - Correlation Decay
exp cov sq exp cov 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
d corr l
1 2 3 4 5 6 7 8 9 10
11
Correlation Decay - AR(1)
Recall that for a stationary AR(1) process:
๐ฟ(โ) = ๐2
๐ฅ๐|โ| and ๐(โ) = ๐|โ|
therefore we can draw a somewhat similar picture about the decay of ๐ as a function of distance.
0.25 0.50 0.75 1.00
rho phi
0.1 0.3 0.5 0.7 0.9
12
Example
โ2 โ1 1 0.00 0.25 0.50 0.75 1.00
t y 13
Prediction
Our example has 15 observations which we would like to use as the basis for predicting ๐ (๐ข) at other values of ๐ข (say a sequence of values from 0 to 1). For now lets use a square exponential covariance with ๐2 = 10 and ๐ = 5 We therefore want to sample from ๐๐๐ ๐๐|๐๐๐๐ก.
๐๐๐ ๐๐ | ๐๐๐๐ก = ๐ณ โผ ๐ช(๐ป๐๐ ๐ปโ1
๐๐๐ก ๐ณ, ๐ป๐ช๐ฌ๐๐ โ ๐ป๐๐ ๐ปโ1 ๐๐ ๐๐ ๐ป๐๐)
14
Prediction
Our example has 15 observations which we would like to use as the basis for predicting ๐ (๐ข) at other values of ๐ข (say a sequence of values from 0 to 1). For now lets use a square exponential covariance with ๐2 = 10 and ๐ = 5 We therefore want to sample from ๐๐๐ ๐๐|๐๐๐๐ก.
๐๐๐ ๐๐ | ๐๐๐๐ก = ๐ณ โผ ๐ช(๐ป๐๐ ๐ปโ1
๐๐๐ก ๐ณ, ๐ป๐ช๐ฌ๐๐ โ ๐ป๐๐ ๐ปโ1 ๐๐ ๐๐ ๐ป๐๐)
14
Prediction
Our example has 15 observations which we would like to use as the basis for predicting ๐ (๐ข) at other values of ๐ข (say a sequence of values from 0 to 1). For now lets use a square exponential covariance with ๐2 = 10 and ๐ = 5 We therefore want to sample from ๐๐๐ ๐๐|๐๐๐๐ก.
๐๐๐ ๐๐ | ๐๐๐๐ก = ๐ณ โผ ๐ช(๐ป๐๐ ๐ปโ1
๐๐๐ก ๐ณ, ๐ป๐ช๐ฌ๐๐ โ ๐ป๐๐ ๐ปโ1 ๐๐ ๐๐ ๐ป๐๐)
14
Draw 1
โ4 โ2 0.00 0.25 0.50 0.75 1.00
t y 15
Draw 2
โ4 โ2 0.00 0.25 0.50 0.75 1.00
t y 16
Draw 3
โ4 โ2 2 0.00 0.25 0.50 0.75 1.00
t y 17
Draw 4
โ4 โ2 2 0.00 0.25 0.50 0.75 1.00
t y 18
Draw 5
โ4 โ2 2 0.00 0.25 0.50 0.75 1.00
t y 19
Many draws later
โ5.0 โ2.5 0.0 2.5 5.0 0.00 0.25 0.50 0.75 1.00
t y 20
Exponential Covariance
โ7.5 โ5.0 โ2.5 0.0 2.5 5.0 0.00 0.25 0.50 0.75 1.00
t y 21
Exponential Covariance - Draw 2
โ7.5 โ5.0 โ2.5 0.0 2.5 5.0 0.00 0.25 0.50 0.75 1.00
t y 22
Exponential Covariance - Draw 3
โ4 4 0.00 0.25 0.50 0.75 1.00
t y 23
Exponential Covariance - Posterior
โ5.0 โ2.5 0.0 2.5 5.0 0.00 0.25 0.50 0.75 1.00
t y 24
Powered Exponential Covariance (๐ = 1.5)
โ6 โ3 3 6 0.00 0.25 0.50 0.75 1.00
t y 25
Back to the square exponential
โ5.0 โ2.5 0.0 2.5 5.0 0.00 0.25 0.50 0.75 1.00
t y 26
Changing the range (๐)
Sq Exp Cov โ sigma2=10, l=15 Sq Exp Cov โ sigma2=10, l=20 Sq Exp Cov โ sigma2=10, l=5 Sq Exp Cov โ sigma2=10, l=10 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 โ5 5 โ5 5
t y 27
Effective Range
For the square exponential covariance
๐ท๐๐ค(๐) = ๐2 exp (โ(๐ โ ๐)2) ๐ท๐๐ ๐ (๐) = exp (โ(๐ โ ๐)2)
we would like to know, for a given value of ๐, beyond what distance apart must observations be to have a correlation less than 0.05?
28
Changing the scale (๐2)
Sq Exp Cov โ sigma2=5, l=20 Sq Exp Cov โ sigma2=15, l=20 Sq Exp Cov โ sigma2=5, l=15 Sq Exp Cov โ sigma2=15, l=15 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 โ5 5 โ5 5
t y 29
Fitting
gp_sq_exp_model = โmodel{ y ~ dmnorm(mu, inverse(Sigma)) for (i in 1:N) { mu[i] <- 0 } for (i in 1:(N-1)) { for (j in (i+1):N) { Sigma[i,j] <- sigma2 * exp(- pow(l*d[i,j],2)) Sigma[j,i] <- Sigma[i,j] } } for (k in 1:N) { Sigma[k,k] <- sigma2 + 0.00001 } sigma2 ~ dlnorm(0, 1.5) l ~ dt(0, 2.5, 1) T(0,) # Half-cauchy(0,2.5) }โ
30
Trace plots
l sigma2 250 500 750 1000 20 30 40 50 60 1 2 3 4
.iteration estimate term
l sigma2
param post_mean post_med post_lower post_upper l 30.20 28.70 20.63 51.51 sigma2 1.44 1.33 0.72 2.78
31
Fitted models
โ2 2 0.00 0.25 0.50 0.75 1.00
t y
Post Mean Model
32
Forcasting
โ3 โ2 โ1 1 2 0.0 0.5 1.0 1.5
t y
Post Mean Model
33
Improving the model
gp_sq_exp_model2 = โmodel{ y ~ dmnorm(mu, inverse(Sigma)) for (i in 1:N) { mu[i] <- 0 } for (i in 1:(N-1)) { for (j in (i+1):N) { Sigma[i,j] <- sigma2 * exp(- pow(l*d[i,j],2)) Sigma[j,i] <- Sigma[i,j] } } for (k in 1:N) { Sigma[k,k] <- sigma2 + nugget } sigma2 ~ dlnorm(0, 1.5) l ~ dt(0, 2.5, 1) T(0,) # Half-cauchy(0,2.5) nugget ~ dlnorm(0, 1) }โ
34
Trace plots
l nugget sigma2 250 500 750 1000 5 10 15 20 0.0 0.5 1.0 2 4 6 8
.iteration estimate term
l nugget sigma2
param post_mean post_med post_lower post_upper l 7.01 6.75 2.17 11.79 nugget 0.13 0.09 0.03 0.57 sigma2 1.73 1.53 0.64 4.04
35
Fitted models
Post Mean Model 0.00 0.25 0.50 0.75 1.00 โ3 โ2 โ1 1
t y 36
Forcasting
Post Mean Model 0.0 0.5 1.0 1.5 โ3 โ2 โ1 1 2 3