A Random Walk Around The Block
Johan Ugander Stanford University Joint work with: Isabel Kloumann (Facebook) & Jon Kleinberg (Cornell) Google Mountain View August 17, 2016
A Random Walk Around The Block Johan Ugander Stanford University - - PowerPoint PPT Presentation
A Random Walk Around The Block Johan Ugander Stanford University Joint work with: Isabel Kloumann (Facebook) & Jon Kleinberg (Cornell) Google Mountain View August 17, 2016 S e e d s e t e x p a n s i o n Given a
Johan Ugander Stanford University Joint work with: Isabel Kloumann (Facebook) & Jon Kleinberg (Cornell) Google Mountain View August 17, 2016
a target set T ⊂ V from a smaller seed set S ⊂ T.
a target set T ⊂ V from a smaller seed set S ⊂ T.
seed set S
a target set T ⊂ V from a smaller seed set S ⊂ T.
a target set T ⊂ V from a smaller seed set S ⊂ T. seed set S
Personalized PageRank
a target set T ⊂ V from a smaller seed set S ⊂ T.
(Jeh & Widom ’03, Kloster & Gleich ’14)
(Bagrow ’08, Kloumann & Kleinberg ’14)
a target set T ⊂ V from a smaller seed set S ⊂ T.
(Jeh & Widom ’03, Kloster & Gleich ’14)
(Bagrow ’08, Kloumann & Kleinberg ’14)
Kloumann & Kleinberg ‘14
Kloumann & Kleinberg ‘14
score(v) =
∞
X
k=1
wkrv
k
rv
k
PPR(v) ∝
∞
X
k=1
(αk)rv
k
HK(v) ∝
∞
X
k=1
✓tk k! ◆ rv
k
(rv
1, rv 2, ..., rv K)
= two parametric families of linear weights
What weights are “optimal” for diffusion-based classification?
20 40 60 80 100 10
−5
10 t=1 t=5 t=15 α=0.85 α=0.99 Weight Length
(Kloster & Gleich, ’14)
PPR HK wk = αk wk = tk/k!
score(v) =
K
X
k=1
wkrv
k
pin pin pout pout
If pin - pout = O(1)
If pin - pout ≥ Ω((pout(log n)/n)-1/2)
pin pout pout
If pin - pout = O(1)
If pin - pout ≥ Ω((pout(log n)/n)-1/2)
pin pout pout
If pin - pout = O(1)
If pin - pout ≥ Ω((pout(log n)/n)-1/2)
pin pout pout
be “method up”: how to tune diffusion weights to find seed sets?
core-periphery models? Latent space models (Hoff et al. 2002)? Etc.
score(v) =
K
X
k=1
wkrv
k
pin pin pout pout
rv
k
(rv
1, rv 2, ..., rv K)
ri rj
Target block nodes Other block nodes (w1, ..., wK)
0.006
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
0.006
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
0.006
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
a b
0.006
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
and quadratic functions. Forward pointer, will return.
0.002 0.004 0.006 0e+00 4e−04 8e−04 1−step Landing prob 2−step Landing prob
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
4e−04 8e−04 0e+00 4e−04 8e−04 3−step Landing prob 4−step Landing prob
r = (r1, . . . , rK) a = (a1, . . . , aK) b = (b1, . . . , bK) f(r) = (a − b)T r f(r) f(r)
0.006
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
a b
For 2-block SBM with equal sized blocks and edge densities , :
and the optimal geometric classifier is therefore: .
which is PPR(!) with . ak − bk = ✓pin − pout pin + pout ◆k pin pout α∗ = ✓pin − pout pin + pout ◆
K
X
k=1
(α∗)krk ,
0.006
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
a b
For 2-block SBM with equal sized blocks and edge densities , :
and the optimal geometric classifier is therefore: .
which is PPR(!) with . ak − bk = ✓pin − pout pin + pout ◆k pin pout α∗ = ✓pin − pout pin + pout ◆
K
X
k=1
(α∗)krk ,
0.006
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
a b
determined by the solution to a linear recurrence relation.
solved and yields PPR.
Lemma 1. For any ✏, > 0, there is an n sufficiently large such that the random landing probabilities (ˆ a1, ...., ˆ aK) and (ˆ b1, ...,ˆ bK) for a uniform random walk on Gn starting in the seed block satisfy the following conditions with probability at least 1 − for all k > 0: Nˆ ak ∈ (1 − ✏) Ak Ak + Bk , (1 + ✏) Ak Ak + Bk
(1) Nˆ bk ∈ (1 − ✏) Bk Ak + Bk , (1 + ✏) Bk Ak + Bk
(2) where Ak, Bk are the solutions to the matrix recurrence relation ( Ak = N(pinAk−1 + poutBk−1) Bk = N(poutAk−1 + pinBk−1), with A0 = 1, B0 = 0.
Ak, Bk
Lemma 1. For any ✏, > 0, there is an n sufficiently large such that the random landing probabilities (ˆ a1, ...., ˆ aK) and (ˆ b1, ...,ˆ bK) for a uniform random walk on Gn starting in the seed block satisfy the following conditions with probability at least 1 − for all k > 0: Nˆ ak ∈ (1 − ✏) Ak Ak + Bk , (1 + ✏) Ak Ak + Bk
(1) Nˆ bk ∈ (1 − ✏) Bk Ak + Bk , (1 + ✏) Bk Ak + Bk
(2) where Ak, Bk are the solutions to the matrix recurrence relation ( Ak = N(pinAk−1 + poutBk−1) Bk = N(poutAk−1 + pinBk−1), with A0 = 1, B0 = 0.
still obtainable from solutions to a matrix recurrence relation.
block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4
1e−3 2e−3 3e−3
rk
empirical centroids predicted centroids
1 2 3 4 5 6 7 8 9 k 1e−6 1e−5 | ˆ wk − Ψk|
Error
block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4
1e−3 2e−3 3e−3
rk
empirical centroids predicted centroids
1 2 3 4 5 6 7 8 9 k 1e−6 1e−5 | ˆ wk − Ψk|
Error
block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4
From matrix recurrence relation
α
= ✓pin − pout pin + pout ◆
α
= ✓pin − pout pin + pout ◆
α
0.006
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
a b
0.005
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
(a, Σa) (b, Σb)
0.005
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
2 exp
✓ −1 2(r − a)T Σ−1
a (r − a)
◆ Pr(r|z = 0) ∝ |Σb|− 1
2 exp
✓ −1 2(r − b)T Σ−1
b (r − b)
◆ g(r) = log Pr(r|z = 1) Pr(z = 1) Pr(r|z = 0) Pr(z = 0)
QuadSBMRank, LinSBMRank.
equal covariances; effective.
uniform variance, no covariance.
General : g2(r) ∝
a a − Σ−1 b b
T r + 1
2rT
Σ−1
b
− Σ−1
a
Assume Σa = Σb = Σ : g1(r) ∝ Σ−1(a − b)T r Assume Σa = Σb = I : g0(r) ∝ (a − b)T r
0.005
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
g1(r) g0(r)
QuadSBMRank, LinSBMRank.
equal covariances; effective.
uniform variance, no covariance.
show asymptotic normality and characterize covariance matrices?
General : g2(r) ∝
a a − Σ−1 b b
T r + 1
2rT
Σ−1
b
− Σ−1
a
Assume Σa = Σb = Σ : g1(r) ∝ Σ−1(a − b)T r Assume Σa = Σb = I : g0(r) ∝ (a − b)T r
0.005
4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob
g1(r) g0(r)
to resolution limit (dotted line), with slower decay rate. PPR, HK, LinSBMRank, QuadSBMRank, BP
discriminant function for balanced 2-block SBM.
follow from recurrence relation.
the space of landing probabilities greatly improves classification.
that can hopefully open new doors.
✓pin − pout pin + pout ◆
α
Isabel Kloumann, Johan Ugander, Jon Kleinberg “Block Models and Personalized PageRank” arXiv:1607.03483
possible to derive weights for bounded degree SBMs?
Hoff latent space model, etc, etc.
Isabel Kloumann, Johan Ugander, Jon Kleinberg “Block Models and Personalized PageRank” arXiv:1607.03483