SLIDE 1
Lecture 1
Spatio-temporal data & Linear Models
Colin Rundel 1/18/2017
1
SLIDE 2
Spatio-temporal data
2
SLIDE 3 Time Series Data - Discrete
Jan 03 2017 Jan 05 2017 Jan 09 2017 Jan 11 2017 Jan 13 2017 Jan 17 2017 2255 2260 2265 2270
S&P 500 Open (^GSPC)
3
SLIDE 4 Time Series Data - Continuous
Jan 01 Jan 15 Feb 01 Feb 15 5 10 15 20
FRN Measured PM25
PM25 (µg/m3)
4
SLIDE 5 Spatial Data - Areal
84°W 82°W 80°W 78°W 76°W 33.5°N 34°N 34.5°N 35°N 35.5°N 36°N 36.5°N 37°N
SID79
5
SLIDE 6 Spatial Data - Point referenced
178500 179500 180500 181500 329000 330000 331000 332000 333000 334000
copper
178500 179500 180500 181500 329000 330000 331000 332000 333000 334000
lead
178500 179500 180500 181500 329000 330000 331000 332000 333000 334000
zinc
Meuse River
6
SLIDE 7 Point Pattern Data - Time
500 1000 1500 2 3 4 5
Old Faithful Eruption Duration
time duration
7
SLIDE 8
Point Pattern Data - Space
8
SLIDE 9
Point Pattern Data - Space + Time
9
SLIDE 10
(Bayesian) Linear Models
10
SLIDE 11
Linear Models
Pretty much everything we a going to see in this course will fall under the umbrella of linear or generalized linear models. Yi = β0 + β1 xi1 + · · · + βp xip + ϵi
ϵi ∼ N(0, σ2)
which we can also express using matrix notation as Y
n×1 =
X
n×p β p×1 + ϵ n×1
ϵ ∼ N( 0
n×1, σ2 In n×n) 11
SLIDE 12
Multivariate Normal Distribution
For an n-dimension multivate normal distribution with covariance Σ (positive semidefinite) can be written as Y
n×1 ∼ N( µ n×1, Σ n×n) where {Σ}ij = ρijσiσj
Y1 . . . Yn
∼ N
µ1
. . .
µn
,
ρ11σ1σ1 · · · ρ1nσ1σn
. . . ... . . .
ρn1σnσ1 · · · ρnnσnσn
12
SLIDE 13
Multivariate Normal Distribution - Density
For the n dimensional multivate normal given on the last slide, its density is given by
(2π)−n/2 det(Σ)−1/2 exp
(
−
1 2(Y − µ)′
1×n
Σ−1
n×n (Y − µ) n×1
)
and its log density is given by
−
n 2 log 2π − 1 2 log det(Σ) − − 1 2(Y − µ)′
1×n
Σ−1
n×n (Y − µ) n×1 13
SLIDE 14
A Simple Linear Regression Example
Lets generate some simulated data where the underlying model is known and see how various regression preceedures function.
β0 = 0.7, β1 = 1.5, β2 = −2.2, β3 = 0.1
n = 100,
ϵi ∼ N(0, 1)
14
SLIDE 15
Generating the data
set.seed(01172017) n = 100 beta = c(0.7,1.5,-2.2,0.1) eps = rnorm(n) X0 = rep(1, n) X1 = rt(n,df=5) X2 = rt(n,df=5) X3 = rt(n,df=5) X = cbind(X0, X1, X2, X3) Y = X %*% beta + eps d = data.frame(Y,X[,-1])
15
SLIDE 16
Least squares fit
Let ˆ Y be our estimate for Y based on our estimate of β,
ˆ
Y = ˆ
β0 + ˆ β1 X1 + ˆ β2 X2 + ˆ β3 X3 = X ˆ β
The least squares estimate,
ls, is given by
arg min
n i 1
Yi Xi
2
With a bit of calculus and algebra we can derive
ls
XtX
1Xt Y 16
SLIDE 17
Least squares fit
Let ˆ Y be our estimate for Y based on our estimate of β,
ˆ
Y = ˆ
β0 + ˆ β1 X1 + ˆ β2 X2 + ˆ β3 X3 = X ˆ β
The least squares estimate, ˆ
βls, is given by
arg min
β
n
∑
i=1
(Yi − Xi·β)2
With a bit of calculus and algebra we can derive
ls
XtX
1Xt Y 16
SLIDE 18
Least squares fit
Let ˆ Y be our estimate for Y based on our estimate of β,
ˆ
Y = ˆ
β0 + ˆ β1 X1 + ˆ β2 X2 + ˆ β3 X3 = X ˆ β
The least squares estimate, ˆ
βls, is given by
arg min
β
n
∑
i=1
(Yi − Xi·β)2
With a bit of calculus and algebra we can derive
ˆ βls = (XtX)−1Xt Y
16
SLIDE 19
Maximum Likelihood
17
SLIDE 20
Frequentist Fit
lm(Y ~ ., data=d)$coefficients ## (Intercept) X1 X2 X3 ## 0.73726738 1.65321096 -2.16499958 0.07996257 (beta_hat = solve(t(X) %*% X, t(X)) %*% Y) ## [,1] ## X0 0.73726738 ## X1 1.65321096 ## X2 -2.16499958 ## X3 0.07996257
18
SLIDE 21
Bayesian Model
Y1, . . . , Y100 | β, σ2 ∼ N(Xi·β, σ2)
β0, β1, β2, β3 ∼ N(0, σ2
β = 100)
τ 2 = 1/σ2 ∼ Gamma(a = 1, b = 1)
19
SLIDE 22
Deriving the posterior
[β0, β1, β2, β3, σ2 | Y ] = [Y | β, σ2] [Y ] [β, σ2] ∝ [Y | β, σ2][β][σ2]
where, Y
2
2
2 n 2 exp n i 1 Yi 1Xi 1 1Xi 2 3Xi 3 2
2
2 1 2 3 2
2
2 4 2 exp 3 i 2 i
2
2 2 a b
ba a
2 a 1 exp
b
2 20
SLIDE 23
Deriving the posterior
[β0, β1, β2, β3, σ2 | Y ] = [Y | β, σ2] [Y ] [β, σ2] ∝ [Y | β, σ2][β][σ2]
where,
[Y | β, σ2] =
(
2πσ2)−n/2 exp
(
−
∑n
i=1 (Yi − β0 − β1Xi,1 − β1Xi,2 − β3Xi,3)2
2σ2
)
1 2 3 2
2
2 4 2 exp 3 i 2 i
2
2 2 a b
ba a
2 a 1 exp
b
2 20
SLIDE 24
Deriving the posterior
[β0, β1, β2, β3, σ2 | Y ] = [Y | β, σ2] [Y ] [β, σ2] ∝ [Y | β, σ2][β][σ2]
where,
[Y | β, σ2] =
(
2πσ2)−n/2 exp
(
−
∑n
i=1 (Yi − β0 − β1Xi,1 − β1Xi,2 − β3Xi,3)2
2σ2
)
[β0, β1, β2, β3 | σ2
β] = (2πσ2 β)−4/2 exp
(
−
∑3
i=0 β2 i
2σ2
β
)
2 a b
ba a
2 a 1 exp
b
2 20
SLIDE 25
Deriving the posterior
[β0, β1, β2, β3, σ2 | Y ] = [Y | β, σ2] [Y ] [β, σ2] ∝ [Y | β, σ2][β][σ2]
where,
[Y | β, σ2] =
(
2πσ2)−n/2 exp
(
−
∑n
i=1 (Yi − β0 − β1Xi,1 − β1Xi,2 − β3Xi,3)2
2σ2
)
[β0, β1, β2, β3 | σ2
β] = (2πσ2 β)−4/2 exp
(
−
∑3
i=0 β2 i
2σ2
β
)
[σ2 | a, b] =
ba
Γ(a)(σ2)−a−1 exp
(
−
b
σ2
)
20
SLIDE 26
Deriving the posterior (cont.)
[β0, β1, β2, β3, σ2 | Y ] ∝
(
2πσ2)−n/2 exp
(
−
∑n
i=1 (Yi − β0 − β1Xi,1 − β1Xi,2 − β3Xi,3)2
2σ2
)
(2πσ2
β)−4/2 exp
(
−β2
0 + β2 1 + β2 2 + β2 3
2σ2
β
)
ba
Γ(a)(σ2)−a−1 exp
(
−
b
σ2
)
21
SLIDE 27
Deriving the Gibbs sampler (σ2 step)
22
SLIDE 28
Deriving the Gibbs sampler (βi step)
23