Gaussian Processes
Dan Cervone
NYU CDS
November 10, 2015
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 1 / 22
Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan - - PowerPoint PPT Presentation
Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 1 / 22 What are Gaussian processes? GPs let us do Bayesian inference on functions . Using GPs we can: Interpolate spatial
NYU CDS
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 1 / 22
[https://pythonhosted.org/infpy/gps.html] [http://becs.aalto.fi/en/research/bayes/mcmcstuff/traindata.jpg] Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 2 / 22
iid
ǫ)
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 3 / 22
iid
ǫ)
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 3 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 4 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 4 / 22
1), . . . , f (x∗ k)} at x∗.
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 5 / 22
1), . . . , f (x∗ k)} at x∗.
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 5 / 22
1), . . . , f (x∗ k)} at x∗.
ǫIn
ǫIn]−1y,
ǫIn]−1K(X, X∗)
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 5 / 22
ǫIn:
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 6 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 7 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 7 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 7 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 7 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 7 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 7 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 7 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 7 / 22
iw
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 8 / 22
iw
ind
iw, σ2 ǫ)
ǫ.
ǫ + Σ−1.
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 8 / 22
ǫI]−1y,
ǫI]−1Φ′Σφ∗
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 9 / 22
ǫI]−1y,
ǫI]−1Φ′Σφ∗
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 9 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 10 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 11 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 11 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 11 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 11 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 12 / 22
ǫI)
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 13 / 22
ǫI)
y
y y − 1
y
ǫI.
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 13 / 22
ǫI)
y
y y − 1
y
ǫI.
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 13 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 14 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 14 / 22
j′=1 exp(fj′(xi))
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 14 / 22
j′=1 exp(fj′(xi))
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 14 / 22
iid
ind
Gaussian Processes November 10, 2015 15 / 22
iid
ind
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 15 / 22
iid
ind
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 15 / 22
iid
ind
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 15 / 22
iid
ind
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 15 / 22
iid
ind
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 15 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 16 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 16 / 22
[M. Tingley and P. Huybers, “Recent temperature extremes at high northern latitudes unprecedented in the past 600 years.” Nature, 2013]
0o 9
1 8
90oW 8 0o N 6 5o N 5 0o N MXD Ice δ18O Varves CRU Target 1400 1500 1600 1700 1800 1900 2000 50 100 Count Year (b) (a) 1850 1900 1950 2000 100 200 Year Count (c)
Figure S.1: Data availability in space and time. (a) Locations of the data time series. In the legend, MXD refers to the tree ring density series, and Target refers to locations where temperature anomalies are inferred but where there are no observations. The two areas outlined in black are used to assess anomalous warmth in 2010. (b) and (c) The number and type of proxy (b) and instrumental observations (c) available at each year.
WWW.NATURE.COM/ NATURE 11969
[Tingley & Huybers]
Data Oi: temperature data for year i from location set XO. RI : “proxy” data for year i from location set XR. Model TO
i : latent true temperature
for year i at locations XO. Oi = AOTO
i .
Ri = ARTR
i .
TR
i : latent true temperature
for year i at locations XR. Ti = (TO
i
TR
i )
Ti = ΓTi−1 + ηi. η ∼ GP.
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 17 / 22
[M. Tingley and P. Huybers, “Recent temperature extremes at high northern latitudes unprecedented in the past 600 years.” Nature, 2013]
0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW
Posterior Median Year = 1453
Proxy Inst. Both −4 −2 2 4 0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW
90% Credible Interval
1 2 3 0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW
Year = 1601
0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW
Year = 1642
0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW
Temperature in
°C
Year = 1695
0o 9 0o E 1 8 0o W 90oW 0o 9 0o E 1 8 0o W 90oW
Width in °C
Figure S.8: Temperature anomaly estimates and uncertainties for four years. The top row plots the posterior median of the temperature distribution for each location for 1453, 1601, 1642, and 1695, respectively, while the bottom row plots the widths of the corresponding 90% credible intervals. In the bottom row, symbols denote that a proxy, and/or instrumental observation is available for that location and year. 33
33 11969
[Tingley & Huybers]
Data Oi: temperature data for year i from location set XO. RI : “proxy” data for year i from location set XR. Model TO
i : latent true temperature
for year i at locations XO. Oi = AOTO
i .
Ri = ARTR
i .
TR
i : latent true temperature
for year i at locations XR. Ti = (TO
i
TR
i )
Ti = ΓTi−1 + ηi. η ∼ GP.
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 17 / 22
[A. Chakraborty et al., “Modeling large scale species abundance with latent spatial processes.” Annals of Applied Statistics, 2010.]
Posterior mean spatial effects (θ) for Protea punctata (PRPUNC) and Protea repens (PRREPE). These effects offer local adjustment to potential abundance. Cells with values greater than zero represent regions with larger than expected populations, conditional
[Chakraborty et al.] Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 18 / 22
[B. Gramacy and H. Lee, “Bayesian treed Gaussian process models with an application to computer modeling.” Journal of the American Statistical Association, 2008.]
Mach (speed) 1 2 3 4 5 6 a l p h a ( a n g l e
a t t a c k ) 10 20 30 l i f t 0.0 0.5 1.0 1.5
lift=f(mach,alpha,beta=0,)
Mach (speed) 1 2 3 4 5 6 a l p h a ( a n g l e
a t t a c k ) 10 20 30 l i f t 0.0 0.5 1.0
lift=f(mach,alpha,beta=0.5)
Mach (speed) 1 2 3 4 5 6 a l p h a ( a n g l e
a t t a c k ) 10 20 30 l i f t 0.0 0.5 1.0
lift=f(mach,alpha,beta=1)
Mach (speed) 1 2 3 4 5 6 a l p h a ( a n g l e
a t t a c k ) 10 20 30 l i f t 0.0 0.5 1.0
lift=f(mach,alpha,beta=2)
Mach (speed) 1 2 3 4 5 6 a l p h a ( a n g l e
a t t a c k ) 10 20 30 l i f t 0.0 0.5 1.0
lift=f(mach,alpha,beta=3)
Mach (speed) 1 2 3 4 5 6 a l p h a ( a n g l e
a t t a c k ) 10 20 30 l i f t 0.0 0.5 1.0
lift=f(mach,alpha,beta=4)
Figure 1: Interpolation of lift by speed and angle of attack for all sideslip levels. Note that for levels 0.5 and 3 (center), Mach ranges only in (1, 5) and (1.2, 2.2).
[Gramacy and Lee] Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 19 / 22
[J. Snoek et al. “Practical Bayesian optimization of machine learning algorithms.” NIPS, 2012.]
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 20 / 22
[J. Snoek et al. “Practical Bayesian optimization of machine learning algorithms.” NIPS, 2012.]
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 20 / 22
[J. Snoek et al. “Practical Bayesian optimization of machine learning algorithms.” NIPS, 2012.]
10 20 30 40 50 1260 1270 1280 1290 1300 1310 1320 1330 1340 1350 Min Function Value Function evaluations GP EI MCMC GP EI per second GP EI Opt Random Grid Search 3x GP EI MCMC 5x GP EI MCMC 10x GP EI MCMC
(a)
2 4 6 8 10 12 1260 1270 1280 1290 1300 1310 1320 1330 1340 1350 Min function value Time (Days) GP EI MCMC GP EI per second GP EI Opt 3x GP EI MCMC 5x GP EI MCMC 10x GP EI MCMC
(b)
10 20 30 40 50 1260 1270 1280 1290 1300 1310 1320 1330 1340 1350 Min Function Value Function evaluations 3x GP EI MCMC (On grid) 5x GP EI MCMC (On grid) 3x GP EI MCMC (Off grid) 5x GP EI MCMC (Off grid)
(c) Figure 4: Different strategies of optimization on the Online LDA problem compared in terms of function evaluations (4a), walltime (4b) and constrained to a grid or not (4c).
[Snoek et al.] Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 20 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 21 / 22
j=1 wjmj(x).
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 21 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 22 / 22
Dan Cervone (NYU CDS) Gaussian Processes November 10, 2015 22 / 22