Measuring Sample Quality with Kernels
Lester Mackey∗
Joint work with Jackson Gorham†
Microsoft Research∗, Opendoor Labs†
June 25, 2018
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 1 / 31
Measuring Sample Quality with Kernels Lester Mackey Joint work with - - PowerPoint PPT Presentation
Measuring Sample Quality with Kernels Lester Mackey Joint work with Jackson Gorham Microsoft Research , Opendoor Labs June 25, 2018 Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 1 / 31 Motivation: Large-scale Posterior
Microsoft Research∗, Opendoor Labs†
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 1 / 31
1
2
3
ind
1+e−β,vl
n
i=1 h(xi)
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 2 / 31
n
i=1 h(xi)
[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 3 / 31
[Welling and Teh, 2011, Ahn, Korattikara, and Welling, 2012, Korattikara, Chen, and Welling, 2014]
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 4 / 31
n
i=1 h(xi) used to approximate EP [h(Z)]
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 5 / 31
uller, 1997]
h∈H
|h(x)−h(y)| x−y
|h(x)−h(y)| x−y
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 6 / 31
uller, 1997]
h∈H
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 7 / 31
1
g∈G
2
3
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 8 / 31
t→0 (E[u(Zt) | Z0 = x] − u(x))/t
2∇ log p(Zt)dt + dWt
2∇u(x), ∇ log p(x) + 1 2∇, ∇u(x)
[Gorham and Mackey, 2015, Oates, Girolami, and Chopin, 2016]
dx log p(x) + g′(x) [Stein, Diaconis, Holmes, and Reinert, 2004]
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 9 / 31
i,l ciclk(zi, zl) ≥ 0, ∀zi ∈ X, ci ∈ R)
2 x−y2 2
2)−1/2
and Mackey, 2017]
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 10 / 31
i,i′=1 kj 0(xi, xi′).
0(x, y) 1 p(x)p(y)∇xj∇yj(p(x)k(x, y)p(y))
j=1 kj 0 of Oates, Girolami, and Chopin [2016]
Gretton [2016], Liu, Lee, and Jordan [2016]
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 11 / 31
x−y2
2
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 12 / 31
2 − 1 d)−1. If k(x, y) and its
2 ) rate as x − y2 → ∞, then
2 x−y2 2) and Mat´
2)β) with
2β 1+β
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 13 / 31
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 14 / 31
2)β for some β < 0, c ∈ R.
2)β for β ∈ (−1, 0). If
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 15 / 31
b
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 16 / 31
i.i.d. from mixture target P i.i.d. from single mixture component 101 102 103 104 101 102 103 104 10−2.5 10−2 10−1.5 10−1 10−0.5 100
Number of sample points, n Discrepancy value
asserstein
i.i.d. from mixture target P i.i.d. from single mixture component g h = TP g −3 3 −3 3 0.2 0.4 0.6 0.8 1.0 −1.0 −0.5 0.0 0.5
x
2 (x+1.5)2 + e− 1 2 (x−1.5)2, compare an i.i.d.
n from one component
1:n, TP, Gk) → 0
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 17 / 31
i.i.d. from mixture target P i.i.d. from single mixture component 101 102 103 104 101 102 103 104 10−2.5 10−2 10−1.5 10−1 10−0.5 100
Number of sample points, n Discrepancy value
asserstein
i.i.d. from mixture target P i.i.d. from single mixture component g h = TP g −3 3 −3 3 0.2 0.4 0.6 0.8 1.0 −1.0 −0.5 0.0 0.5
x
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 18 / 31
i.i.d. from target P Off−target sample
10−1 100 10−1 100 10−2 10−1 100 Gaussian Matérn Inverse Multiquadric 102 103 104 105102 103 104 105
Number of sample points, n Kernel Stein discrepancy Dimension
d = 8 d = 20
2, c = 1) does
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 19 / 31
l=1 π(yl | x)
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 20 / 31
iid
2N(X1, 2) + 1 2N(X1 + X2, 2)
2, c = 1) to select appropriate ǫ
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 21 / 31
KSD (lower is better)
2.0 2.5 3.0 3.5 −0.5 0.0 0.5 10−4 10−3 10−2 10−1
Tolerance parameter, ε ε = 0 (n = 230) ε = 10−2 (n = 416) ε = 10−1 (n = 1000) −3 −2 −1 1 2 3 −2 −1 1 2 −2 −1 1 2 −2 −1 1 2
x1 x2
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 22 / 31
l=1 π(yl | x)
[Ahn, Korattikara, and Welling, 2012]
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 23 / 31
2, c = 1) to compare SGFS-f to SGFS-d
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 24 / 31
17500 18000 18500 19000 102 102.5 103 103.5 104 104.5
Number of sample points, n IMQ kernel Stein discrepancy Sampler
SGFS−f
SGFS−d
−0.3 −0.2
BEST
−0.3 −0.2 −0.1 0.0
x7 x51
SGFS−d
0.9 1.0 1.1 1.2
WORST
−0.5 −0.4 −0.3 −0.2 −0.1 0.0
x8 x42
SGFS−f
0.0 0.1 0.2
BEST
−0.1 0.0 0.1 0.2
x32 x34
SGFS−f
−0.5 −0.4 −0.3
WORST
−1.5 −1.4 −1.3 −1.2 −1.1
x2 x25
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 25 / 31
Chwialkowski, Strathmann, and Gretton [2016] used the KSD S(Qn, TP, Gk)
2, c = 1)
i=1 with xi = zi + ui e1
iid
iid
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 26 / 31
i=1, can minimize KSD S( ˜
i=1 qn(xi)δxi for qn a
Liu and Lee [2016] do this with Gaussian kernel k(x, y) = e− 1
h x−y2 2
hx − y2 2)−1/2
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 27 / 31
10−4 10−3.5 10−3 10−2.5 10−2 2 10 50 75 100
~ X||2 2 d
Gaussian KSD IMQ KSD
n
i=1 δxi for xi iid
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 28 / 31
1
2
[??Oates, Girolami, and Chopin, 2016]
3
1 p(x)∇, p(x)m(x)g(x) of
Gorham, Duncan, Vollmer, and Mackey [2016] may be appropriate for heavy tails Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 29 / 31
ICML’12, 2012.
0021-9002. A celebration of applied probability.
0178-8051. doi: 10.1007/BF01197887.
Metrika, 35(1):339–348, 1988.
17th AISTATS, pages 185–193, 2014.
ICML, ICML’14, 2014.
ICML, pages 276–284, 2016.
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 30 / 31
Society: Series B (Statistical Methodology), pages n/a–n/a, 2016. ISSN 1467-9868. doi: 10.1111/rssb.12185.
Information Processing Systems, pages 4237–4246, 2017.
expository lectures and applications, volume 46 of IMS Lecture Notes Monogr. Ser., pages 1–26. Inst. Math. Statist., Beachwood, OH, 2004.
arXiv:1611.01722, Nov. 2016.
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 31 / 31
target P i.i.d. from single mixture component 101 102 103 104 101 102 103 104 10−2.5 10−2 10−1.5 10−1 10−0.5 100
Number of sample points, n Discrepancy value
Discrepancy
Graph Stein discrepancy Wasserstein
d = 4 101 102 103 104 101 102 103 104 10−3 10−2 10−1 100 101 102 103
Number of sample points, n Computation time (sec)
−1.0 −0.5
2 (x+1.5)2 + e− 1 2 (x−1.5)2 or a single
Mackey (MSR) Kernel Stein Discrepancy June 25, 2018 32 / 31