Multiple-output Gaussian processes
Mauricio A. ´ Alvarez
Department of Computer Science, The University of Sheffield.
1 / 76
Multiple-output Gaussian processes Mauricio A. Alvarez Department - - PowerPoint PPT Presentation
Multiple-output Gaussian processes Mauricio A. Alvarez Department of Computer Science, The University of Sheffield. 1 / 76 Sensor Network South Coast of England Sensor location 2 / 76 Sensor Network South Coast of England Sensor
Department of Computer Science, The University of Sheffield.
1 / 76
2 / 76
2 / 76
3 / 76
4 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
5 / 76
6 / 76
6 / 76
6 / 76
6 / 76
6 / 76
6 / 76
6 / 76
6 / 76
6 / 76
6 / 76
6 / 76
6 / 76
1I)
2I)
6 / 76
1I)
2I)
1I
2I
1I)
2I)
1I
2I
6 / 76
1I)
2I)
1I
2I
6 / 76
1I)
2I)
1I
2I
6 / 76
1I)
2I)
1I
2I
6 / 76
7 / 76
7 / 76
7 / 76
8 / 76
i=1}
i=1}
8 / 76
i=1}
i=1}
8 / 76
i=1}
i=1}
i=1}
i=1}
8 / 76
9 / 76
1u1(x)
2u1(x)
10 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4
11 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.2 0.4 0.6 0.8 1
1 2
11 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.2 0.4 0.6 0.8 1
1 2 0.2 0.4 0.6 0.8 1
0.5
11 / 76
0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1
11 / 76
0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 2.5 3 3.5 4 4.5 5
11 / 76
0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 2.5 3 3.5 4 4.5 5 0.2 0.4 0.6 0.8 1 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
11 / 76
1u1(x)a1 1u1(x′)
1)2E
1u1(x)a1 2(x′)
1a1 2E
2u1(x)a1 2u1(x′)
2)2E
1)2E
1a1 2E
2)2E
1)2
1a1 2
1a1 2
2)2
1u1(x)
2u1(x)
1
2
1)2
1a1 2
1a1 2
2)2
1
2
1
2
1 a1 2]⊤,
14 / 76
1u1(x) + a2 1u2(x)
2u1(x) + a2 2u2(x)
15 / 76
0.2 0.4 0.6 0.8 1
1 2 3 0.2 0.4 0.6 0.8 1
0.5 1 1.5
16 / 76
0.2 0.4 0.6 0.8 1
1 2 3 0.2 0.4 0.6 0.8 1
0.5 1 1.5 0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
5 10
16 / 76
0.2 0.4 0.6 0.8 1
1 2 3 0.2 0.4 0.6 0.8 1
0.5 1 1.5 0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
5 0.2 0.4 0.6 0.8 1
5
16 / 76
0.2 0.4 0.6 0.8 1
0.5 1 0.2 0.4 0.6 0.8 1
0.5 1 1.5 2 2.5
16 / 76
0.2 0.4 0.6 0.8 1
0.5 1 0.2 0.4 0.6 0.8 1
0.5 1 1.5 2 2.5 0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
5 10
16 / 76
0.2 0.4 0.6 0.8 1
0.5 1 0.2 0.4 0.6 0.8 1
0.5 1 1.5 2 2.5 0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
1 2 0.2 0.4 0.6 0.8 1
1 2
16 / 76
1 a1 2]⊤ and a2 = [a2 1 a2 2]⊤.
17 / 76
0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
1 2
18 / 76
0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
1 2
18 / 76
0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
1 2
0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
1 2
18 / 76
0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
1 2
18 / 76
0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
1 2
0.2 0.4 0.6 0.8 1
5 10 0.2 0.4 0.6 0.8 1
1 2
18 / 76
d=1.
R
dui(x),
19 / 76
20 / 76
21 / 76
22 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1 1.5
23 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1 1.5 0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
23 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1 1.5 0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
2 4 0.2 0.4 0.6 0.8 1
2 4
23 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
1 2 3
23 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
1 2 3 0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
23 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
1 2 3 0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
2 0.2 0.4 0.6 0.8 1
2
23 / 76
24 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
2
25 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
2
25 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
2
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
2
25 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
2
25 / 76
d=1.
Q
Q
q kq(x, x′) = Q
26 / 76
27 / 76
d=1.
Q
Rq
d,qui q(x),
q(x) are GPs with zero means and covariance
q(x), ui′ q′(x′)] = kq(x, x′),
28 / 76
Q
Rq
d,qui q(x).
1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),
2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),
30 / 76
1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),
2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),
30 / 76
1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),
2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),
30 / 76
1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),
2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),
30 / 76
1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),
2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),
30 / 76
Q
q kq(x, x′) = Q
q a2 q · · · aRq q ].
31 / 76
32 / 76
32 / 76
Q
32 / 76
Q
32 / 76
Q
ij .
32 / 76
33 / 76
d=1.
d=1, and a function u(x),
34 / 76
u(x): latent function.
35 / 76
1
2
u(x): latent function. G1(x), G2(x): smoothing kernels.
35 / 76
1
2
2
1
u(x): latent function. G1(x), G2(x): smoothing kernels. f1(x), f2(x): output functions.
35 / 76
36 / 76
36 / 76
36 / 76
36 / 76
36 / 76
1 )N(x|µ2, P−1 2 ) = N(µ1|µ2, P−1 1
2 )N(x|µc, P−1 c ),
c
37 / 76
eqv (x − x′)
d
d′ .
38 / 76
0.2 0.4 0.6 0.8 1
0.5 1 1.5 2 2.5 3 0.2 0.4 0.6 0.8 1
1 2 3
39 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1
39 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1
40 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1
40 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1
40 / 76
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
0.5 1
40 / 76
d=1.
d=1, and a function u(x),
41 / 76
eqv (x − x′)
d
d′ + Λ−1.
42 / 76
eqv (x − x′)
d
d′ + Λ−1.
42 / 76
eqv (x − x′)
d
d′ + Λ−1.
42 / 76
Q
Rq
d,q(x − z)ui q(z)dz,
q(z), ui′ q′(z′)] = kq(z, z′)δi,i′δq,q′.
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
43 / 76
Q
Rq
d,q(x − z)ui q(z)dz,
q(z), ui′ q′(z′)] = kq(z, z′)δi,i′δq,q′.
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
43 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
44 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
44 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
44 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
d,q(x − z) = ai d,qδ(x − z),
44 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
d,q(x − z) = ai d,qδ(x − z),
R1
d,1ai d′,1k1(x, x′).
44 / 76
45 / 76
1 2 3 4 5 !1 1 2
ICM Rq = 1, f1(x)
1 2 3 4 5 !5 5 10
ICM Rq = 1, f2(x)
1 2 3 4 5 !2 !1 1 2
ICM Rq = 2, f1(x)
1 2 3 4 5 !4 !2 2
ICM Rq = 2, f2(x)
45 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
46 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
46 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
d,q(x − z) = ai d,qδ(x − z),
46 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
d,q(x − z) = ai d,qδ(x − z),
Q
d,qa1 d′,qkq(x, x′).
46 / 76
Q
q ⊗ Kq
47 / 76
Q
q ⊗ Kq
1 2 3 4 5 !2 2 4
LMC with Rq = 1 and Q = 2, f1(x)
1 2 3 4 5 !2 2 4
LMC with Rq = 1 and Q = 2, f2(x)
47 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
48 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
48 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
d,q(x − z) = ai d,qδ(x − z),
48 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
d,q(x − z) = ai d,qδ(x − z),
Q
Rq
d,qai d′,qkq(x, x′).
48 / 76
Q
49 / 76
Q
1 2 3 4 5 !5 5
LMC with Rq = 2 and Q = 2, f1(x)
1 2 3 4 5 !5 5
LMC with Rq = 2 and Q = 2, f2(x)
49 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
50 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
50 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
50 / 76
d=1. The covariance between fd(x) and
Q
Rq
d,q(x − z)
d′,q(x′ − z′)kq(z, z′)dz′dz.
50 / 76
51 / 76
5 !1 1 2 3 ICM, f1(x) 5 !5 5 LMC, f1(x) 5 !2 2 4 PC, f1(x) 5 !5 5 10 ICM, f2(x) 5 !5 5 LMC, f2(x) 5 !5 5 10 PC, f2(x)
51 / 76
Foundations and Trends R
in
Machine Learning
c 2012 M. A. ´ Alvarez, L. Rosasco and N. D. Lawrence DOI: 10.1561/2200000036
52 / 76
53 / 76
d=1, that led to
n=1, the prior distribution over the
1 , . . . , f⊤ D ]⊤ is given as
54 / 76
d=1, that led to
n=1, the prior distribution over the
1 , . . . , f⊤ D ]⊤ is given as
54 / 76
d=1, that led to
n=1, the prior distribution over the
1 , . . . , f⊤ D ]⊤ is given as
54 / 76
d=1 using
d=1 are independent white Gaussian noise processes
d.
1 , y⊤ 2 . . . , y⊤ D
55 / 76
d=1 using
d=1 are independent white Gaussian noise processes
d.
1 , y⊤ 2 . . . , y⊤ D
55 / 76
d=1 using
d=1 are independent white Gaussian noise processes
d.
1 , y⊤ 2 . . . , y⊤ D
55 / 76
n=1 represents the data, and θ represents the
56 / 76
f∗,f + Σ∗.
57 / 76
f∗,f + Σ∗.
58 / 76
59 / 76
D
ns(x∗)
E,
E(x∗) = var
60 / 76
61 / 76
D
ns(x∗)
E(x∗), leading to
−1
62 / 76
63 / 76
64 / 76
65 / 76
66 / 76
66 / 76
67 / 76
68 / 76
Q
Rq
d,qui q(x).
q(x) [Guzm´
Q
Q
69 / 76
70 / 76
71 / 76
72 / 76
i (xi−x′
i )2
4C
73 / 76
74 / 76
75 / 76
Mauricio A. ´ Alvarez, David Luengo, and Neil D. Lawrence. Latent Force Models. In David van Dyk and Max Welling, editors, Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pages 9–16, Clearwater Beach, Florida, 16-18 April 2009. JMLR W&CP 5. Mauricio A. ´ Alvarez, Lorenzo Rosasco, and Neil D. Lawrence. Kernels for vector-valued functions: a review. Foundations and Trends R in Machine Learning, 4(3):195–266, 2012. Edwin V. Bonilla, Kian Ming Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In John C. Platt, Daphne Koller, Yoram Singer, and Sam Roweis, editors, NIPS, volume 20, Cambridge, MA, 2008. MIT Press. Phillip Boyle and Marcus Frean. Dependent Gaussian processes. In Lawrence Saul, Yair Weiss, and L´ eon Bouttou, editors, NIPS, volume 17, pages 217–224, Cambridge, MA, 2005. MIT Press. Catherine A. Calder and Noel Cressie. Some topics in convolution-based spatial modeling. In Proceedings of the 56th Session of the International Statistics Institute, August 2007. Alan E. Gelfand, Alexandra M. Schmidt, Sudipto Banerjee, and C.F. Sirmans. Nonstationary multivariate process modeling through spatially varying
Pierre Goovaerts. Geostatistics For Natural Resources Evaluation. Oxford University Press, USA, 1997. J.A. Vargas Guzm´ an, A.W. Warrick, and D.E. Myers. Coregionalization by linear combination of nonorthogonal components. Mathematical Geology, 34 (4):405–419, 2002. David M. Higdon. Space and space-time modelling using process convolutions. In C. Anderson, V. Barnett, P . Chatwin, and A. El-Shaarawi, editors, Quantitative methods for current environmental issues, pages 37–56. Springer-Verlag, 2002. Andre G. Journel and Charles J. Huijbregts. Mining Geostatistics. Academic Press, London, 1978. ISBN 0-12391-050-1. Yee Whye Teh, Matthias Seeger, and Michael I. Jordan. Semiparametric latent factor models. In Robert G. Cowell and Zoubin Ghahramani, editors, AISTATS 10, pages 333–340, Barbados, 6-8 January 2005. Society for Artificial Intelligence and Statistics. Hans Wackernagel. Multivariate Geostatistics. Springer-Verlag Heidelberg New york, 2003. Andrew Gordon Wilson, David A. Knowles, and Zoubin Ghahramani. Gaussian process regression networks. In Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML ’12, pages 1139–1146, 2012. 76 / 76