Scalable Gaussian Processes
Zhenwen Dai
Amazon
September 4, 2018 @GPSS2018
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 1 / 55
Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 - - PowerPoint PPT Presentation
Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 1 / 55 Gaussian process Input and Output Data: X = ( x 1 , . . . , x N ) y = ( y 1 ,
Amazon
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 1 / 55
0.4 0.5 0.6 0.7 0.8 0.9 1.0 −6 −4 −2 2 4 6 8 10 Mean Data Confidence
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 2 / 55
500 1000 1500 2000 2500 data size (N) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 time (second) Mean Data Confidence Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 3 / 55
θ
θ
∗
Scalable Gaussian Processes September 4, 2018 @GPSS2018 4 / 55
2l2(xi − xj)⊤(xi − xj)
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 5 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 6 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 7 / 55
−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 −10 −5 5 10 15 20 Mean Data Confidence
20 40 60 80 100 500 1000 1500 2000 2500 3000
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 8 / 55
−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 −10 −5 5 10 15 20 Mean Inducing Data Confidence Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 9 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 10 / 55
zz K⊤ z , where Kz = K(X, Z) and Kzz = K(Z, Z).
zz K⊤ z + σ2I
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 11 / 55
zz K⊤ z + σ2I)−1 = σ−2I − σ−4Kz(Kzz + σ−2K⊤ z Kz)−1K⊤ z
z Kz) ∈ RM×M.
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 12 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 13 / 55
fu
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 14 / 55
uuu, Kff − KfuK−1 uuK⊤ fu + σ2I
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 15 / 55
uuK⊤ fu + σ2I.
uuu, Λ + σ2I
uuK⊤ fu) ◦ I.
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 16 / 55
uuK⊤ fu + Λ + σ2I
zz K⊤ z + Λ + σ2I)−1 = A − AKz(Kzz + K⊤ z AKz)−1K⊤ z A,
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 17 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 18 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 19 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 20 / 55
uuu, Kff − KfuK−1 uuK⊤ fu + σ2I
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 21 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 22 / 55
✭
✭✭✭✭✭✭✭ ✭
uuu, σ2I
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 23 / 55
uuu − y)⊤(KfuK−1 uuu − y)
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 24 / 55
µ,Σ
uuK⊤ fu + σ2I
uuK⊤ fu
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 25 / 55
0.4 0.5 0.6 0.7 0.8 0.9 1.0 −6 −4 −2 2 4 6 8 10 Mean Inducing Data Confidence
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 26 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 27 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 28 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 29 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 30 / 55
L−1
i
Scalable Gaussian Processes September 4, 2018 @GPSS2018 31 / 55
l=1 p(hl|hl+1)p(hL)
L−1
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 32 / 55
3 hidden layers (100-100-10) the generated examples from each value in the top layer in total 1024 examples columns encode first 5 bits. rows encode later 5 bits.
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 33 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 34 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 35 / 55
θ
θ
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 36 / 55
✘
✘✘✘✘✘ ✘
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 37 / 55
✭
✭✭✭✭✭✭✭ ✭
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 38 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 39 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 40 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 41 / 55
ff f, Kuu − K⊤ fuK−1 ff Kfu
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 42 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 43 / 55
c=1 Dc.
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 44 / 55
N
C
1
nc∈Dc ||ync − fθ(xnc)||2.
2
3
c=1 lc and
c=1 ∂lc/∂θ.
4
5
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 45 / 55
fuy − 1
uuΦ
fuKfu and φ = tr (Kff).
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 46 / 55
N
n,
N
N
fnuKfnu
N
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 47 / 55
1
c yc, y⊤ c Kfcu,
2
3
4
5
6
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 48 / 55
10000 20000 30000 40000 50000 60000 70000 number of datapoints 5 10 15 20 25 30 35 40 average time per iteration (seconds)
1 CPUs 2 CPUs 4 CPUs 8 CPUs 16 CPUs 32 CPUs 1 GPUs 2 GPUs 4 GPUs
10000 20000 30000 40000 50000 60000 70000 number of datapoints 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% percentage of indistributable computational time
1 cpu cores 2 cpu cores 4 cpu cores 8 cpu cores 16 cpu cores 32 cpu cores 1 GPUs 2 GPUs 4 GPUs Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 49 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 50 / 55
GTX 580 GPU has only 3GB of memory Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 51 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 52 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 53 / 55
◮ writing new kernel with auto-differentiation ◮ scalable inference on GPU ◮ Construct hybrid GP, deep GP, recurrent GP by re-using GP module with
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 54 / 55
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 55 / 55
Thang D Bui, Josiah Yan, and Richard E Turner. A unifying framework for gaussian process pseudo-point approximations using power expectation propagation. Journal of Machine Learning Research, 18:3649–3720, 2017. Zhenwen Dai and Neil D. Lawrence. Variational hierarchical community of experts. In ICML Deep Learning workshop, 2015. Zhenwen Dai, Andreas Damianou, James Hensman, and Neil D. Lawrence. Gaussian process models with parallelization and gpu acceleration. In NIPS workshop Software Engineering for Machine Learning, 2014. Yarin Gal, Mark van der Wilk, and Carl Edward Rasmussen. Distributed variational inference in sparse gaussian process regression and latent variable models. In Advances in Neural Information Processing Systems 27, pages 3257–3265, 2014. James Hensman and Neil D. Lawrence. Nested variational compression in deep gaussian processes. arXiv:1412.1370, 2014. Andriy Mnih and Karol Gregor. Neural variational inference and learning in belief networks. In International Conference on Machine Learning, 2014. Joaquin Qui˜ nonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approximate gaussian process regression. Journal of Machine Learning Research, 6:1939–1959, 2005. Edward Snelson and Zoubin Ghahramani. Sparse gaussian processes using pseudo-inputs. In Advances in Neural Information Processing Systems, pages 1257–1264. 2006. Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, pages 567–574, 2009.
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 55 / 55
Dustin Tran, Rajesh Ranganath, and David M. Blei. The variational gaussian process. In International Conference on Learning Representations, 2016. Christopher K. I. Williams and Matthias Seeger. Using the nystr¨
Zhenwen Dai (Amazon) Scalable Gaussian Processes September 4, 2018 @GPSS2018 55 / 55