Scalable Multi-Class Gaussian Process Classification using Expectation Propagation
Carlos Villacampa-Calvo and Daniel Hern´ andez–Lobato Computer Science Department Universidad Aut´
- noma de Madrid
http://dhnzl.org, daniel.hernandez@uam.es
1 / 22
Scalable Multi-Class Gaussian Process Classification using - - PowerPoint PPT Presentation
Scalable Multi-Class Gaussian Process Classification using Expectation Propagation Carlos Villacampa-Calvo and Daniel Hern andezLobato Computer Science Department Universidad Aut onoma de Madrid http://dhnzl.org ,
1 / 22
k
2 / 22
k
4 2 2 4 1.50 0.75 0.00 0.75 1.50 x f(x) 4 2 2 4 1 2 3 x Labels
2 / 22
k
4 2 2 4 1.50 0.75 0.00 0.75 1.50 x f(x) 4 2 2 4 1 2 3 x Labels
2 / 22
1 Approximate inference is more difficult.
3 / 22
1 Approximate inference is more difficult. 2 C > 2 latent functions instead of just one.
3 / 22
1 Approximate inference is more difficult. 2 C > 2 latent functions instead of just one. 3 Deal with more complicated likelihood factors.
3 / 22
1 Approximate inference is more difficult. 2 C > 2 latent functions instead of just one. 3 Deal with more complicated likelihood factors. 4 More expensive algorithms, computationally.
3 / 22
1 Approximate inference is more difficult. 2 C > 2 latent functions instead of just one. 3 Deal with more complicated likelihood factors. 4 More expensive algorithms, computationally.
Kim & Ghahramani, 2006; Girolami & Rogers, 2006; Chai, 2012; Riihim¨ aki et al., 2013).
3 / 22
1 Approximate inference is more difficult. 2 C > 2 latent functions instead of just one. 3 Deal with more complicated likelihood factors. 4 More expensive algorithms, computationally.
Kim & Ghahramani, 2006; Girolami & Rogers, 2006; Chai, 2012; Riihim¨ aki et al., 2013).
3 / 22
p(yi|fi) = (1 ✏)pi + ✏ C 1(1 pi) with pi = 8 < : 1 if yi = arg max
k
f k(xi)
4 / 22
p(yi|fi) = (1 ✏)pi + ✏ C 1(1 pi) with pi = 8 < : 1 if yi = arg max
k
f k(xi)
q(f) = QC
k=1 N(f k|µk, Σk)
f
k = (f k(xk 1), . . . , f k(xk M))T
X
k = (xk 1, . . . , xk M)T
4 / 22
p(yi|fi) = (1 ✏)pi + ✏ C 1(1 pi) with pi = 8 < : 1 if yi = arg max
k
f k(xi)
q(f) = QC
k=1 N(f k|µk, Σk)
f
k = (f k(xk 1), . . . , f k(xk M))T
X
k = (xk 1, . . . , xk M)T
4 / 22
p(yi|fi) = (1 ✏)pi + ✏ C 1(1 pi) with pi = 8 < : 1 if yi = arg max
k
f k(xi)
q(f) = QC
k=1 N(f k|µk, Σk)
f
k = (f k(xk 1), . . . , f k(xk M))T
X
k = (xk 1, . . . , xk M)T
N
i=1
4 / 22
p(yi|fi) = (1 ✏)pi + ✏ C 1(1 pi) with pi = 8 < : 1 if yi = arg max
k
f k(xi)
q(f) = QC
k=1 N(f k|µk, Σk)
f
k = (f k(xk 1), . . . , f k(xk M))T
X
k = (xk 1, . . . , xk M)T
N
i=1
4 / 22
p(yi|fi) = (1 ✏)pi + ✏ C 1(1 pi) with pi = 8 < : 1 if yi = arg max
k
f k(xi)
q(f) = QC
k=1 N(f k|µk, Σk)
f
k = (f k(xk 1), . . . , f k(xk M))T
X
k = (xk 1, . . . , xk M)T
N
i=1
4 / 22
n=1 fn(θ) with q(θ) / p0(θ) QN n=1 ˜
5 / 22
n=1 fn(θ) with q(θ) / p0(θ) QN n=1 ˜
5 / 22
n=1 fn(θ) with q(θ) / p0(θ) QN n=1 ˜
j6=n ˜
j6=n ˜
5 / 22
k
p(y|f) = QN
i=1 p(yi|fi) = QN i=1
Q
k6=yi Θ(f yi(xi) f k(xi))
6 / 22
k
p(y|f) = QN
i=1 p(yi|fi) = QN i=1
Q
k6=yi Θ(f yi(xi) f k(xi))
6 / 22
k
p(y|f) = QN
i=1 p(yi|fi) = QN i=1
Q
k6=yi Θ(f yi(xi) f k(xi))
p(f|y) = R p(y|f)p(f|f)dfp(f) p(y) ⇡ [QN
i=1
R p(yi|fi)p(fi|f)dfi]p(f) p(y)
i=1 p(fi|f).
6 / 22
k
p(y|f) = QN
i=1 p(yi|fi) = QN i=1
Q
k6=yi Θ(f yi(xi) f k(xi))
p(f|y) = R p(y|f)p(f|f)dfp(f) p(y) ⇡ [QN
i=1
R p(yi|fi)p(fi|f)dfi]p(f) p(y)
i=1 p(fi|f).
i(f) = Z hQ
k6=yi Θ
i
f k
i
i QC
k=1 p(f k i |f k)dfi
6 / 22
k
p(y|f) = QN
i=1 p(yi|fi) = QN i=1
Q
k6=yi Θ(f yi(xi) f k(xi))
p(f|y) = R p(y|f)p(f|f)dfp(f) p(y) ⇡ [QN
i=1
R p(yi|fi)p(fi|f)dfi]p(f) p(y)
i=1 p(fi|f).
i(f) = Z hQ
k6=yi Θ
i
f k
i
i QC
k=1 p(f k i |f k)dfi
6 / 22
i
i , . . . , f yi i
i
i
i
i
i )
7 / 22
i
i , . . . , f yi i
i
i
i
i
i )
i
i | . . . , f yi i
i
i
i
i
i ) ⇥
i
i | . . . , f yi i
i
i
i
i
i ) ⇥ · · ·
i
i
i
i ) ⇥ p(f yi i
i )
7 / 22
i
i , . . . , f yi i
i
i
i
i
i )
i
i | . . . , f yi i
i
i
i
i
i ) ⇥
i
i | . . . , f yi i
i
i
i
i
i ) ⇥ · · ·
i
i
i
i ) ⇥ p(f yi i
i )
k6=yi
i
i ) =
k6=yi
i )
7 / 22
i
i , . . . , f yi i
i
i
i
i
i )
i
i | . . . , f yi i
i
i
i
i
i ) ⇥
i
i | . . . , f yi i
i
i
i
i
i ) ⇥ · · ·
i
i
i
i ) ⇥ p(f yi i
i )
k6=yi
i
i ) =
k6=yi
i )
i = (myi i mk i )/
i + vk i
i , mk i , vyi i
i the mean and variances of f yi i
i .
7 / 22
i with a Gaussian factor:
i ) = k i (f) ⇡ ˜
i (f) = ˜
yi)T ˜
i,kf yi + (f yi)T ˜
i,k
k)T ˜
i,kf k + (f k)T ˜
i,k
i with a Gaussian factor:
i ) = k i (f) ⇡ ˜
i (f) = ˜
yi)T ˜
i,kf yi + (f yi)T ˜
i,k
k)T ˜
i,kf k + (f k)T ˜
i,k
i,k and ˜
i,k are 1-rank matrices. Each ˜
i only has O(M) parameters.
8 / 22
i with a Gaussian factor:
i ) = k i (f) ⇡ ˜
i (f) = ˜
yi)T ˜
i,kf yi + (f yi)T ˜
i,k
k)T ˜
i,kf k + (f k)T ˜
i,k
i,k and ˜
i,k are 1-rank matrices. Each ˜
i only has O(M) parameters.
N
i=1
k6=yi
i (f)p(f)
8 / 22
k to find good hyper-parameters.
9 / 22
k to find good hyper-parameters.
j
j
prior
j
N
i=1
k6=yi
j
9 / 22
k to find good hyper-parameters.
j
j
prior
j
N
i=1
k6=yi
j
Hern´ andez-Lobato and Hern´ andez-Lobato, 2016 show convergence is not needed.
500 1000 1500 2000
Training Time in Seconds log Zq
EP - Inner - Approx. Gradient EP - Inner - Exact Gradient EP - Outer Update
9 / 22
10 / 22
1 Refine in parallel all approximate factors ˜
10 / 22
1 Refine in parallel all approximate factors ˜
2 Reconstruct the posterior approximation q.
10 / 22
1 Refine in parallel all approximate factors ˜
2 Reconstruct the posterior approximation q. 3 Get a noisy estimate of the grad of log Zq w.r.t to each ⇠k j and xk i,d.
10 / 22
1 Refine in parallel all approximate factors ˜
2 Reconstruct the posterior approximation q. 3 Get a noisy estimate of the grad of log Zq w.r.t to each ⇠k j and xk i,d. 4 Update all model hyper-parameters.
10 / 22
1 Refine in parallel all approximate factors ˜
2 Reconstruct the posterior approximation q. 3 Get a noisy estimate of the grad of log Zq w.r.t to each ⇠k j and xk i,d. 4 Update all model hyper-parameters. 5 Reconstruct the posterior approximation q.
10 / 22
1 Refine in parallel all approximate factors ˜
2 Reconstruct the posterior approximation q. 3 Get a noisy estimate of the grad of log Zq w.r.t to each ⇠k j and xk i,d. 4 Update all model hyper-parameters. 5 Reconstruct the posterior approximation q.
10 / 22
i :
N
i=1
k6=yi
i
11 / 22
i :
N
i=1
k6=yi
i
Nfactors . 11 / 22
i :
N
i=1
k6=yi
i
Nfactors .
EP SEP
11 / 22
i :
N
i=1
k6=yi
i
Nfactors .
EP SEP
11 / 22
12 / 22
12 / 22
C
k=1
NN diag
NN Qk NN
12 / 22
C
k=1
NN diag
NN Qk NN
12 / 22
13 / 22
Problem GFITC EP SEP VI
M = 5%
Glass 0.23 ± 0.02 0.31 ± 0.02 0.31 ± 0.02 0.35 ± 0.02 New-thyroid 0.02 ± 0.01 0.04 ± 0.01 0.02 ± 0.01 0.03 ± 0.01 Satellite 0.12 ± 0.01 0.11 ± 0.01 0.12 ± 0.01 0.12 ± 0.01 Svmguide2 0.2 ± 0.01 0.2 ± 0.01 0.2 ± 0.02 0.19 ± 0.01 Vehicle 0.17 ± 0.01 0.17 ± 0.01 0.16 ± 0.01 0.17 ± 0.01 Vowel 0.05 ± 0.01 0.09 ± 0.01 0.09 ± 0.01 0.06 ± 0.01 Waveform 0.17 ± 0.01 0.15 ± 0.01 0.16 ± 0.01 0.17 ± 0.01 Wine 0.03 ± 0.01 0.03 ± 0.01 0.03 ± 0.01 0.04 ± 0.01
2.24 ± 0.07 2.33 ± 0.07 2.61 ± 0.06 2.82 ± 0.08
131 ± 3.11 53.8 ± 0.19 48.5 ± 0.97 157 ± 0.59
M = 10%
Glass 0.2 ± 0.01 0.29 ± 0.02 0.3 ± 0.02 0.35 ± 0.02 New-thyroid 0.03 ± 0.01 0.02 ± 0.01 0.03 ± 0.01 0.03 ± 0.01 Satellite 0.11 ± 0.01 0.11 ± 0.01 0.12 ± 0.01 0.12 ± 0.01 Svmguide2 0.19 ± 0.02 0.2 ± 0.02 0.2 ± 0.02 0.17 ± 0.02 Vehicle 0.17 ± 0.01 0.16 ± 0.01 0.16 ± 0.01 0.15 ± 0.01 Vowel 0.03 ± 0.01 0.05 ± 0.01 0.06 ± 0.01 0.06 ± 0.01 Waveform 0.17 ± 0.01 0.16 ± 0.01 0.16 ± 0.01 0.18 ± 0.01 Wine 0.04 ± 0.01 0.02 ± 0.01 0.03 ± 0.01 0.03 ± 0.01
2.4 ± 0.08 2.21 ± 0.07 2.62 ± 0.06 2.76 ± 0.08
264 ± 6.91 102 ± 0.64 96.6 ± 1.99 179 ± 0.78
M = 20%
Glass 0.2 ± 0.02 0.28 ± 0.02 0.28 ± 0.02 0.36 ± 0.02 New-thyroid 0.03 ± 0.01 0.02 ± 0.01 0.02 ± 0.01 0.03 ± 0.01 Satellite 0.11 ± 0.01 0.11 ± 0.01 0.12 ± 0.01 0.11 ± 0.01 Svmguide2 0.2 ± 0.01 0.19 ± 0.01 0.2 ± 0.02 0.19 ± 0.02 Vehicle 0.17 ± 0.01 0.16 ± 0.01 0.16 ± 0.01 0.15 ± 0.01 Vowel 0.03 ± 0.01 0.03 ± 0.01 0.05 ± 0.01 0.03 ± 0.01 Waveform 0.17 ± 0.01 0.16 ± 0.01 0.17 ± 0.01 0.18 ± 0.01 Wine 0.04 ± 0.01 0.01 ± 0.01 0.03 ± 0.01 0.03 ± 0.01
2.48 ± 0.08 2.06 ± 0.07 2.69 ± 0.07 2.77 ± 0.08
683 ± 17.3 228 ± 0.78 216 ± 2.88 248 ± 0.66 14 / 22
Problem GFITC EP SEP VI
M = 5%
Glass 0.23 ± 0.02 0.31 ± 0.02 0.31 ± 0.02 0.35 ± 0.02 New-thyroid 0.02 ± 0.01 0.04 ± 0.01 0.02 ± 0.01 0.03 ± 0.01 Satellite 0.12 ± 0.01 0.11 ± 0.01 0.12 ± 0.01 0.12 ± 0.01 Svmguide2 0.2 ± 0.01 0.2 ± 0.01 0.2 ± 0.02 0.19 ± 0.01 Vehicle 0.17 ± 0.01 0.17 ± 0.01 0.16 ± 0.01 0.17 ± 0.01 Vowel 0.05 ± 0.01 0.09 ± 0.01 0.09 ± 0.01 0.06 ± 0.01 Waveform 0.17 ± 0.01 0.15 ± 0.01 0.16 ± 0.01 0.17 ± 0.01 Wine 0.03 ± 0.01 0.03 ± 0.01 0.03 ± 0.01 0.04 ± 0.01
2.24 ± 0.07 2.33 ± 0.07 2.61 ± 0.06 2.82 ± 0.08
131 ± 3.11 53.8 ± 0.19 48.5 ± 0.97 157 ± 0.59
M = 10%
Glass 0.2 ± 0.01 0.29 ± 0.02 0.3 ± 0.02 0.35 ± 0.02 New-thyroid 0.03 ± 0.01 0.02 ± 0.01 0.03 ± 0.01 0.03 ± 0.01 Satellite 0.11 ± 0.01 0.11 ± 0.01 0.12 ± 0.01 0.12 ± 0.01 Svmguide2 0.19 ± 0.02 0.2 ± 0.02 0.2 ± 0.02 0.17 ± 0.02 Vehicle 0.17 ± 0.01 0.16 ± 0.01 0.16 ± 0.01 0.15 ± 0.01 Vowel 0.03 ± 0.01 0.05 ± 0.01 0.06 ± 0.01 0.06 ± 0.01 Waveform 0.17 ± 0.01 0.16 ± 0.01 0.16 ± 0.01 0.18 ± 0.01 Wine 0.04 ± 0.01 0.02 ± 0.01 0.03 ± 0.01 0.03 ± 0.01
2.4 ± 0.08 2.21 ± 0.07 2.62 ± 0.06 2.76 ± 0.08
264 ± 6.91 102 ± 0.64 96.6 ± 1.99 179 ± 0.78
M = 20%
Glass 0.2 ± 0.02 0.28 ± 0.02 0.28 ± 0.02 0.36 ± 0.02 New-thyroid 0.03 ± 0.01 0.02 ± 0.01 0.02 ± 0.01 0.03 ± 0.01 Satellite 0.11 ± 0.01 0.11 ± 0.01 0.12 ± 0.01 0.11 ± 0.01 Svmguide2 0.2 ± 0.01 0.19 ± 0.01 0.2 ± 0.02 0.19 ± 0.02 Vehicle 0.17 ± 0.01 0.16 ± 0.01 0.16 ± 0.01 0.15 ± 0.01 Vowel 0.03 ± 0.01 0.03 ± 0.01 0.05 ± 0.01 0.03 ± 0.01 Waveform 0.17 ± 0.01 0.16 ± 0.01 0.17 ± 0.01 0.18 ± 0.01 Wine 0.04 ± 0.01 0.01 ± 0.01 0.03 ± 0.01 0.03 ± 0.01
2.48 ± 0.08 2.06 ± 0.07 2.69 ± 0.07 2.77 ± 0.08
683 ± 17.3 228 ± 0.78 216 ± 2.88 248 ± 0.66 14 / 22
Problem GFITC EP SEP VI
M = 5%
Glass 0.61 ± 0.05 0.78 ± 0.06 0.77 ± 0.07 2.45 ± 0.14 New-thyroid 0.06 ± 0.01 0.11 ± 0.03 0.06 ± 0.01 0.09 ± 0.02 Satellite 0.33 ± 0.01 0.31 ± 0.01 0.33 ± 0.01 0.61 ± 0.01 Svmguide2 0.63 ± 0.06 0.63 ± 0.06 0.67 ± 0.06 1.03 ± 0.08 Vehicle 0.32 ± 0.01 0.34 ± 0.02 0.34 ± 0.02 0.76 ± 0.05 Vowel 0.16 ± 0.01 0.25 ± 0.01 0.25 ± 0.01 0.41 ± 0.05 Waveform 0.42 ± 0.01 0.36 ± 0.01 0.39 ± 0.01 0.89 ± 0.02 Wine 0.08 ± 0.02 0.07 ± 0.01 0.08 ± 0.01 0.08 ± 0.02
1.92 ± 0.07 2.09 ± 0.07 2.46 ± 0.06 3.52 ± 0.08
131 ± 3.11 53.8 ± 0.19 48.5 ± 0.97 157 ± 0.59
M = 10%
Glass 0.58 ± 0.05 0.74 ± 0.06 0.79 ± 0.07 2.18 ± 0.14 New-thyroid 0.07 ± 0.01 0.06 ± 0.01 0.06 ± 0.01 0.05 ± 0.01 Satellite 0.34 ± 0.01 0.30 ± 0.01 0.34 ± 0.01 0.58 ± 0.01 Svmguide2 0.67 ± 0.05 0.67 ± 0.05 0.74 ± 0.07 0.90 ± 0.10 Vehicle 0.33 ± 0.01 0.33 ± 0.02 0.34 ± 0.02 0.72 ± 0.04 Vowel 0.14 ± 0.01 0.19 ± 0.01 0.19 ± 0.01 0.30 ± 0.04 Waveform 0.42 ± 0.01 0.36 ± 0.01 0.41 ± 0.01 0.85 ± 0.01 Wine 0.07 ± 0.01 0.06 ± 0.01 0.07 ± 0.01 0.07 ± 0.01
2.11 ± 0.08 2.01 ± 0.08 2.58 ± 0.07 3.31 ± 0.1
264 ± 6.91 102 ± 0.64 96.6 ± 1.99 179 ± 0.78
M = 20%
Glass 0.6 ± 0.07 0.75 ± 0.06 0.81 ± 0.07 2.30 ± 0.15 New-thyroid 0.07 ± 0.01 0.06 ± 0.01 0.05 ± 0.01 0.05 ± 0.01 Satellite 0.34 ± 0.01 0.30 ± 0.01 0.36 ± 0.01 0.53 ± 0.01 Svmguide2 0.67 ± 0.05 0.65 ± 0.06 0.74 ± 0.07 0.94 ± 0.08 Vehicle 0.33 ± 0.01 0.33 ± 0.02 0.34 ± 0.02 0.63 ± 0.04 Vowel 0.12 ± 0.01 0.16 ± 0.01 0.18 ± 0.01 0.15 ± 0.03 Waveform 0.43 ± 0.01 0.37 ± 0.01 0.45 ± 0.01 0.80 ± 0.01 Wine 0.07 ± 0.01 0.05 ± 0.01 0.06 ± 0.01 0.06 ± 0.02
2.17 ± 0.07 1.91 ± 0.07 2.68 ± 0.06 3.23 ± 0.1
683 ± 17.3 228 ± 0.78 216 ± 2.88 248 ± 0.66 15 / 22
Problem GFITC EP SEP VI
M = 5%
Glass 0.61 ± 0.05 0.78 ± 0.06 0.77 ± 0.07 2.45 ± 0.14 New-thyroid 0.06 ± 0.01 0.11 ± 0.03 0.06 ± 0.01 0.09 ± 0.02 Satellite 0.33 ± 0.01 0.31 ± 0.01 0.33 ± 0.01 0.61 ± 0.01 Svmguide2 0.63 ± 0.06 0.63 ± 0.06 0.67 ± 0.06 1.03 ± 0.08 Vehicle 0.32 ± 0.01 0.34 ± 0.02 0.34 ± 0.02 0.76 ± 0.05 Vowel 0.16 ± 0.01 0.25 ± 0.01 0.25 ± 0.01 0.41 ± 0.05 Waveform 0.42 ± 0.01 0.36 ± 0.01 0.39 ± 0.01 0.89 ± 0.02 Wine 0.08 ± 0.02 0.07 ± 0.01 0.08 ± 0.01 0.08 ± 0.02
1.92 ± 0.07 2.09 ± 0.07 2.46 ± 0.06 3.52 ± 0.08
131 ± 3.11 53.8 ± 0.19 48.5 ± 0.97 157 ± 0.59
M = 10%
Glass 0.58 ± 0.05 0.74 ± 0.06 0.79 ± 0.07 2.18 ± 0.14 New-thyroid 0.07 ± 0.01 0.06 ± 0.01 0.06 ± 0.01 0.05 ± 0.01 Satellite 0.34 ± 0.01 0.30 ± 0.01 0.34 ± 0.01 0.58 ± 0.01 Svmguide2 0.67 ± 0.05 0.67 ± 0.05 0.74 ± 0.07 0.90 ± 0.10 Vehicle 0.33 ± 0.01 0.33 ± 0.02 0.34 ± 0.02 0.72 ± 0.04 Vowel 0.14 ± 0.01 0.19 ± 0.01 0.19 ± 0.01 0.30 ± 0.04 Waveform 0.42 ± 0.01 0.36 ± 0.01 0.41 ± 0.01 0.85 ± 0.01 Wine 0.07 ± 0.01 0.06 ± 0.01 0.07 ± 0.01 0.07 ± 0.01
2.11 ± 0.08 2.01 ± 0.08 2.58 ± 0.07 3.31 ± 0.1
264 ± 6.91 102 ± 0.64 96.6 ± 1.99 179 ± 0.78
M = 20%
Glass 0.6 ± 0.07 0.75 ± 0.06 0.81 ± 0.07 2.30 ± 0.15 New-thyroid 0.07 ± 0.01 0.06 ± 0.01 0.05 ± 0.01 0.05 ± 0.01 Satellite 0.34 ± 0.01 0.30 ± 0.01 0.36 ± 0.01 0.53 ± 0.01 Svmguide2 0.67 ± 0.05 0.65 ± 0.06 0.74 ± 0.07 0.94 ± 0.08 Vehicle 0.33 ± 0.01 0.33 ± 0.02 0.34 ± 0.02 0.63 ± 0.04 Vowel 0.12 ± 0.01 0.16 ± 0.01 0.18 ± 0.01 0.15 ± 0.03 Waveform 0.43 ± 0.01 0.37 ± 0.01 0.45 ± 0.01 0.80 ± 0.01 Wine 0.07 ± 0.01 0.05 ± 0.01 0.06 ± 0.01 0.06 ± 0.02
2.17 ± 0.07 1.91 ± 0.07 2.68 ± 0.06 3.23 ± 0.1
683 ± 17.3 228 ± 0.78 216 ± 2.88 248 ± 0.66 15 / 22
M = 1 M = 2 M = 4 M = 8 M = 32 M = 128 M = 256 GFITC
M = 1 M = 2 M = 4 M = 8 M = 32 M = 128 M = 256 GFITC
16 / 22
1 1 2 3 4 0.1 0.3 0.5 0.7
GFITC M = 4 GFITC M = 20 GFITC M = 100 SEP M = 4 SEP M = 20 SEP M = 100 VI M = 4 VI M = 20 VI M = 100
1 1 2 3 4 0.5 1.0 1.5
GFITC M = 4 GFITC M = 20 GFITC M = 100 SEP M = 4 SEP M = 20 SEP M = 100 VI M = 4 VI M = 20 VI M = 100
17 / 22
0.00 0.12 0.24 0.36 0.48 0.60 100 10000 Training Time in Seconds in a Log10 Scale Test Error
Methods
EP SEP VI
0.10 0.54 0.98 1.42 1.86 2.30 100 10000 Training Time in Seconds in a Log10 Scale
Methods
EP SEP VI 18 / 22
18 / 22
0.50 0.52 0.54 0.56 0.58 0.60 1e+01 1e+03 1e+05 Training Time in Seconds in a Log10 Scale Test Error
Methods
EP Linear SEP VI
0.95 1.00 1.05 1.10 1.15 1e+01 1e+03 1e+05 Training Time in Seconds in a Log10 Scale
Methods
EP Linear SEP VI 19 / 22
1 2 3 4 5 6 5 4 3 2 1 Training Time in a Log10 Scale
20 / 22
21 / 22
21 / 22
21 / 22
21 / 22
21 / 22
21 / 22
21 / 22
Gaussian process approximations. NIPS 29, pp. 1533-1541. 2016.
2012.
Gaussian process priors. Neural Computation, 18:1790-1817, 2006.
variationally sparse Gaussian processes. NIPS 28, pp. 1648-1656. 2015.
andez-Lobato, D. and Hern´ andez-Lobato, J. M. Scal- able Gaussian process classification via expectation propagation. AISTATS, pp. 168-176, 2016.
EM-EP algorithm. IEEE PAMI, 28, 1948-1959, 2006.
1057-1064. 2008.
aki, J., Jyl¨ anki, P., and Vehtari, A. Nested expectation propagation for Gaussian process classification with a multinomial probit likelihood. JMLR, 14, 75-109, 2013.
18, pp. 1257-1264, 2006.
PAMI, 20,1342-1351, 1998.
22 / 22