[PPT] - A Random Walk Around The Block Johan Ugander Stanford University PowerPoint Presentation

SLIDE 1

A Random Walk Around The Block

Johan Ugander Stanford University Joint work with: Isabel Kloumann (Facebook)   & Jon Kleinberg (Cornell) Google Mountain View August 17, 2016

SLIDE 2

S e e d s e t e x p a n s i

n
Given a graph G=(V, E), goal is to accurately identify

a target set T ⊂ V from a smaller seed set S ⊂ T.

●
●
●
●
target set T

SLIDE 3

S e e d s e t e x p a n s i

n
Given a graph G=(V, E), goal is to accurately identify

a target set T ⊂ V from a smaller seed set S ⊂ T.

●
●
●
●
target set T

seed set S

SLIDE 4

S e e d s e t e x p a n s i

n
Given a graph G=(V, E), goal is to accurately identify

a target set T ⊂ V from a smaller seed set S ⊂ T.

●
●
●
●
seed set S

SLIDE 5

S e e d s e t e x p a n s i

n
Given a graph G=(V, E), goal is to accurately identify

a target set T ⊂ V from a smaller seed set S ⊂ T. seed set S

●
●
●
●
Scored by

Personalized PageRank

SLIDE 6

S e e d s e t e x p a n s i

n
Given a graph G=(V, E), goal is to accurately identify

a target set T ⊂ V from a smaller seed set S ⊂ T.

Applications:
Broadly: ranking on graphs, recommendation systems
Spam filtering (Wu & Chellapilla ’07)
Community detection (Weber et al. ’13)
Missing data inference (Mislove et al. ’14)
Common methods:
Semi-supervised learning (Zhu et al. ’03)
Diffusion-based classification

(Jeh & Widom ’03, Kloster & Gleich ’14)

Outwardness, modularity and more

(Bagrow ’08, Kloumann & Kleinberg ’14)

●
●
●
●

SLIDE 7

S e e d s e t e x p a n s i

n
Given a graph G=(V, E), goal is to accurately identify

a target set T ⊂ V from a smaller seed set S ⊂ T.

Applications:
Broadly: ranking on graphs, recommendation systems
Spam filtering (Wu & Chellapilla ’07)
Community detection (Weber et al. ’13)
Missing data inference (Mislove et al. ’14)
Common methods:
Semi-supervised learning (Zhu et al. ’03)
Diffusion-based classification

(Jeh & Widom ’03, Kloster & Gleich ’14)

Outwardness, modularity and more

(Bagrow ’08, Kloumann & Kleinberg ’14)

●
●
●
●

SLIDE 8

R e c a l l c u r v e s f

r

s e e d s e t e x p a n s i

n
Recall curve: true positive rate, as a function of the number  
f items returned based on small uniformly random seed set.
Kloumann & Kleinberg ’14 tested many different methods  
n data, broadly found Personalized PageRank to be best.

Kloumann & Kleinberg ‘14

SLIDE 9

R e c a l l c u r v e s f

r

s e e d s e t e x p a n s i

n
Recall curve: true positive rate, as a function of the number  
f items returned based on small uniformly random seed set.
Kloumann & Kleinberg ’14 tested many different methods  
n data, broadly found Personalized PageRank to be best.
Truncated PPR (first K steps) comparable to PPR from K=4.
Heat Kernel later found comparable to PPR.

Kloumann & Kleinberg ‘14

SLIDE 10

Classification based on random walk landing probabilities
, probability that a random walk starting in S is at v after k steps.
, truncated vector of landing probabilities.
Personalized PageRank and Heat Kernel ranking:
General diffusion score function:

D i f f u s i

n
b

a s e d n

d

e c l a s s i fi c a t i

n

score(v) =

∞

X

k=1

wkrv

k

rv

k

PPR(v) ∝

∞

X

k=1

(αk)rv

k

HK(v) ∝

∞

X

k=1

✓tk k! ◆ rv

k

(rv

1, rv 2, ..., rv K)

●
●
●
●

SLIDE 11

Personalized PageRank and Heat Kernel

= two parametric families of linear weights

Question in this work:

What weights are “optimal” for diffusion-based classification?

D i f f u s i

n
b

a s e d n

d

e c l a s s i fi c a t i

n

20 40 60 80 100 10

−5

10 t=1 t=5 t=15 α=0.85 α=0.99 Weight Length

(Kloster & Gleich, ’14)

PPR HK wk = αk wk = tk/k!

score(v) =

K

X

k=1

wkrv

k

SLIDE 12

T h e s t

c

h a s t i c b l

c

k m

d

e l

C blocks
Focus on C=2 blocks: 1=“Target”, 2=“Other”
n1, n2 nodes in blocks
Independent edge probabilities:
Edge probability within a block = pin
Edge probability across blocks = pout
(Results for C>2 as well, see paper)
Model with many names:
Stochastic Block Model (Holland et al. ’83)
Affiliation Model (Frank-Harary ’82)
Planted Partition Model (Dyer-Frieze ’89)

pin pin pout pout

SLIDE 13

T h e S B M r e s

l

u t i

n

l i m i t

Find true partition in poly(n) time w.h.p. as n→∞ :
Dyer-Frieze ’89:

If pin - pout = O(1)

Condon-Karp ’01: If pin - pout ≥ Ω(n-1/2)
McSherry ’01:

If pin - pout ≥ Ω((pout(log n)/n)-1/2)

pin

pin pout pout

SLIDE 14

T h e S B M r e s

l

u t i

n

l i m i t

Find true partition in poly(n) time w.h.p. as n→∞ :
Dyer-Frieze ’89:

If pin - pout = O(1)

Condon-Karp ’01: If pin - pout ≥ Ω(n-1/2)
McSherry ’01:

If pin - pout ≥ Ω((pout(log n)/n)-1/2)

Find partition positively correlated with true partition:
Coja-Oghlan ’06: If pin - pout ≥ Ω((pout/n)-1/2),
pin

pin pout pout

SLIDE 15

T h e S B M r e s

l

u t i

n

l i m i t

Find true partition in poly(n) time w.h.p. as n→∞ :
Dyer-Frieze ’89:

If pin - pout = O(1)

Condon-Karp ’01: If pin - pout ≥ Ω(n-1/2)
McSherry ’01:

If pin - pout ≥ Ω((pout(log n)/n)-1/2)

Find partition positively correlated with true partition:
Coja-Oghlan ’06: If pin - pout ≥ Ω((pout/n)-1/2),
If and only if (a-b)2 > 2(a+b) (pin = a/n, pout = b/n):
Decelle et al ’11: Conjecture and belief propagation numerics
Mossel et al ’12,’13, Massoulié ’13, Abbe et al. ’14: Proven
Recent extensions:
More than two blocks (e.g. Neeman-Netrapalli ’14)
Unequal block sizes (e.g. Zhang et al. ’16)
pin

pin pout pout

SLIDE 16

T h e S B M r e s

l

u t i

n

l i m i t

Is block recovery/classification over? No!
Unsupervised vs. semi-supervised
Empirical graphs != SBMs
Optimal algorithms not practical
Beyond asymptotic limits, what are decay rates?
Rather than being “problem down” (SBM classification), this talk will

be “method up”: how to tune diffusion weights to find seed sets?

Possible variations: Diffusion weights for seed set expansion in

core-periphery models? Latent space models (Hoff et al. 2002)? Etc.

score(v) =

K

X

k=1

wkrv

k

pin pin pout pout

SLIDE 17

D i f f u s i

n
b

a s e d c l a s s i fi c a t i

n

i n S B M s

SBMs present a natural binary classification problem.
Recall notation:
, probability that a random walk starting in S is at v after k steps.
, truncated vector of landing probabilities.
Choices of define sweep directions through space.
Optimistically:

rv

k

(rv

1, rv 2, ..., rv K)

ri rj

Target block nodes Other block nodes (w1, ..., wK)

SLIDE 18

T h e s p a c e

f

l a n d i n g p r

b

a b i l i t i e s

0.006

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05
SBM: 2000 nodes, Target & Other blocks, pin = 0.2, pout = 0.05
One seed node (uniformly at random from Target set)

SLIDE 19

T h e s p a c e

f

l a n d i n g p r

b

a b i l i t i e s

0.006

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05
SBM: 2000 nodes, Target & Other blocks, pin = 0.2, pout = 0.05
One seed node (uniformly at random from Target set)

SLIDE 20

0.006

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05

T h e s p a c e

f

l a n d i n g p r

b

a b i l i t i e s

Geometric discriminant function: sweeps through the space 
f landing probabilities following vector from b to a.

a b

SLIDE 21

0.006

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05

T h e s p a c e

f

l a n d i n g p r

b

a b i l i t i e s

Fisher discriminant functions: Clearly exist better linear

and quadratic functions. Forward pointer, will return.

SLIDE 22

T h e s p a c e

f

l a n d i n g p r

b

a b i l i t i e s

0.000

0.002 0.004 0.006 0e+00 4e−04 8e−04 1−step Landing prob 2−step Landing prob

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 3−step Landing prob 4−step Landing prob

Focus on deriving optimal Geometric discriminant function first.

SLIDE 23

G e

m

e t r i c d i s c r i m i n a n t f u n c t i

n

s

Let be the landing probabilities of a node
Let be the Target class centroid
Let be the Other class centroid
Then is the geometric discriminant function.
Notice: increases when r moves in direction of a - b.
Can classify nodes based on thresholds of .

r = (r1, . . . , rK) a = (a1, . . . , aK) b = (b1, . . . , bK) f(r) = (a − b)T r f(r) f(r)

0.006

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05

a b

SLIDE 24

P e r s

n

a l i z e d P a g e R a n k i s “

p

t i m a l ”

Main Theorem (informal version).

For 2-block SBM with equal sized blocks and edge densities , :    

  and the optimal geometric classifier is therefore: .  

which is PPR(!) with .  ak − bk = ✓pin − pout pin + pout ◆k pin pout α∗ = ✓pin − pout pin + pout ◆

K

X

k=1

(α∗)krk ,

0.006

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05

a b

SLIDE 25

P e r s

n

a l i z e d P a g e R a n k i s “

p

t i m a l ”

Main Theorem (informal version).

For 2-block SBM with equal sized blocks and edge densities , :    

  and the optimal geometric classifier is therefore: .  

which is PPR(!) with .  ak − bk = ✓pin − pout pin + pout ◆k pin pout α∗ = ✓pin − pout pin + pout ◆

K

X

k=1

(α∗)krk ,

0.006

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05

a b

Two main parts:
1. Centroids a, b concentrate on quantities

determined by the solution to a linear   recurrence relation.

2. That linear recurrence relation can be

solved and yields PPR.

SLIDE 26

P P R i s “

p

t i m a l ” : P r

f

i d e a

Part 1: Concentration of landing probabilities

Lemma 1. For any ✏, > 0, there is an n sufficiently large such that the random landing probabilities (ˆ a1, ...., ˆ aK) and (ˆ b1, ...,ˆ bK) for a uniform random walk on Gn starting in the seed block satisfy the following conditions with probability at least 1 − for all k > 0: Nˆ ak ∈  (1 − ✏) Ak Ak + Bk , (1 + ✏) Ak Ak + Bk

and

(1) Nˆ bk ∈  (1 − ✏) Bk Ak + Bk , (1 + ✏) Bk Ak + Bk

,

(2) where Ak, Bk are the solutions to the matrix recurrence relation ( Ak = N(pinAk−1 + poutBk−1) Bk = N(poutAk−1 + pinBk−1), with A0 = 1, B0 = 0.

SLIDE 27

P P R i s “

p

t i m a l ” : P r

f

i d e a

Part 1: Concentration of landing probabilities
interpretable as length-k walk count to nodes in block 1 vs. 2.
For large n, block walk counts increase by factors of ~E[degree].

Ak, Bk

Lemma 1. For any ✏, > 0, there is an n sufficiently large such that the random landing probabilities (ˆ a1, ...., ˆ aK) and (ˆ b1, ...,ˆ bK) for a uniform random walk on Gn starting in the seed block satisfy the following conditions with probability at least 1 − for all k > 0: Nˆ ak ∈  (1 − ✏) Ak Ak + Bk , (1 + ✏) Ak Ak + Bk

and

(1) Nˆ bk ∈  (1 − ✏) Bk Ak + Bk , (1 + ✏) Bk Ak + Bk

,

(2) where Ak, Bk are the solutions to the matrix recurrence relation ( Ak = N(pinAk−1 + poutBk−1) Bk = N(poutAk−1 + pinBk−1), with A0 = 1, B0 = 0.

SLIDE 28

M

r

e g e n e r a l S B M s

For SBMs with C>2 blocks and/or with arbitrary P:
Seed set expansion asks: identify nodes in a target block set.
With conditions on equal expected degrees, PPR(!).
Without conditions, still:
Asymptotically optimal weights for geometric classification

still obtainable from solutions to a matrix recurrence relation.

block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4

SLIDE 29

E m p i r i c a l v s . t h e

r

e t i c a l c e n t r

i

d s

2048-node, 4-block SBM, empirical class centroids vs. theory:

1e−3 2e−3 3e−3

rk

empirical centroids predicted centroids

1 2 3 4 5 6 7 8 9 k 1e−6 1e−5 | ˆ wk − Ψk|

Error

a, Target blocks
b, Other blocks

block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4

SLIDE 30

1e−3 2e−3 3e−3

rk

empirical centroids predicted centroids

1 2 3 4 5 6 7 8 9 k 1e−6 1e−5 | ˆ wk − Ψk|

Error

a, Target blocks
b, Other blocks

block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4

E m p i r i c a l v s . t h e

r

e t i c a l c e n t r

i

d s

2048-node, 4-block SBM, empirical class centroids vs. theory:

From matrix   recurrence relation

SLIDE 31

T h e

r

i e s

f

g r a p h d i f f u s i

n
Other motivations for PPR:
Random Surfer Model (Brin-Page ’98)
Cheeger inequalities for PPR, HK (Andersen et al ’06, Chung ’09)
Local spectral algorithm with regularization (Mahoney et al. ’12)
Our work shows PPR can be derived as “optimal” geometric classifier.
Also motivates how to choose PPR , as = .

α

= ✓pin − pout pin + pout ◆

α

●
●
●
●

SLIDE 32

T h e

r

i e s

f

g r a p h d i f f u s i

n
Other motivations for PPR:
Random Surfer Model (Brin-Page ’98)
Cheeger inequalities for PPR, HK (Andersen et al ’06, Chung ’09)
Local spectral algorithm with regularization (Mahoney et al. ’12)
Our work shows PPR can be derived as “optimal” geometric classifier.
Also motivates how to choose PPR , as = .
Most importantly: also opens door to methods beyond PPR.
●
●
●
●
α

= ✓pin − pout pin + pout ◆

α

SLIDE 33

P P R i s “

p

t i m a l ” i n a n a r r

w

s e n s e

0.006

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

p_in=0.2, p_out=0.05
Discriminant functions that model higher moments of point clouds?

a b

SLIDE 34

F i s h e r d i s c r i m i n a n t f u n c t i

n

s

0.005

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

Discriminant functions that model higher moments of point clouds.

(a, Σa) (b, Σb)

SLIDE 35

Let z be the latent class of each node.
Capture (mean, variance) of class point clouds:
Log-likelihood ratio as discriminant function:

F i s h e r d i s c r i m i n a n t f u n c t i

n

s

0.005

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

Pr(r|z = 1) ∝ |Σa|− 1

2 exp

✓ −1 2(r − a)T Σ−1

a (r − a)

◆ Pr(r|z = 0) ∝ |Σb|− 1

2 exp

✓ −1 2(r − b)T Σ−1

b (r − b)

◆ g(r) = log Pr(r|z = 1) Pr(z = 1) Pr(r|z = 0) Pr(z = 0)

SLIDE 36

Three approaches:
We call the first two methods

QuadSBMRank, LinSBMRank.

Perhaps reasonable to assume

equal covariances; effective.

PPR follows from an assumption of

uniform variance, no covariance.

F i s h e r d i s c r i m i n a n t f u n c t i

n

s

General : g2(r) ∝

Σ−1

a a − Σ−1 b b

T r + 1

2rT

Σ−1

b

− Σ−1

a

r

Assume Σa = Σb = Σ : g1(r) ∝ Σ−1(a − b)T r Assume Σa = Σb = I : g0(r) ∝ (a − b)T r

0.005

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

g2(r)

g1(r) g0(r)

SLIDE 37

Three approaches:
We call the first two methods

QuadSBMRank, LinSBMRank.

Perhaps reasonable to assume

equal covariances; effective.

PPR follows from an assumption of

uniform variance, no covariance.

Open challenge: Possible to

show asymptotic normality and  characterize covariance matrices?

F i s h e r d i s c r i m i n a n t f u n c t i

n

s

General : g2(r) ∝

Σ−1

a a − Σ−1 b b

T r + 1

2rT

Σ−1

b

− Σ−1

a

r

Assume Σa = Σb = Σ : g1(r) ∝ Σ−1(a − b)T r Assume Σa = Σb = I : g0(r) ∝ (a − b)T r

0.005

●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

g2(r)

g1(r) g0(r)

SLIDE 38

E v a l u a t i

n

: r e c a l l c u r v e s

SBM with 2 blocks, 64 nodes/block, 1 seed node.
Recall that Belief Propagation reaches resolution limit.
Easy instance (pin >> pout):
Everything does well.

SLIDE 39

E v a l u a t i

n

: r e c a l l c u r v e s

SBM with 2 blocks, 64 nodes/block, 1 seed node.
Recall that Belief Propagation reaches resolution limit.
Hard instance…
PPR/HK lost all recall, LinSBMRank and QuadSBMRank near BP.

SLIDE 40

E v a l u a t i

n

: r e c a l l c u r v e s

SBM with 2 blocks, 64 nodes/block, 1 seed node.
Recall that Belief Propagation reaches resolution limit.
Even harder instance…
LinSBMRank and QuadSBMRank outperforming BP by a hair…?

SLIDE 41

E v a l u a t i

n

: r e c a l l c u r v e s

SBM with 2 blocks, 64 nodes/block, 1 seed node.
Recall that Belief Propagation reaches resolution limit.
Impossible (pin = pout):
Nothing works.

SLIDE 42

E v a l u a t i

n

: r e s

l

u t i

n

l i m i t

Pearson correlation r between true partition and inferred partition.
Empirically, we see LinSBMRank and QuadSBMRank get very close

to resolution limit (dotted line), with slower decay rate. PPR, HK, LinSBMRank, QuadSBMRank, BP

SLIDE 43

C

n

c l u s i

n

s

Personalized PageRank with = is optimal geometric

discriminant function for balanced 2-block SBM.

Geometric discriminant functions for more general block models

follow from recurrence relation.

Landing probabilities are correlated; correcting for higher moments in

the space of landing probabilities greatly improves classification.

In practice: fit GMMs in space of landing probs.
A new perspective on diffusion-based ranking

that can hopefully open new doors.

●
●
●
●
=

✓pin − pout pin + pout ◆

α

Pre-print:

Isabel Kloumann, Johan Ugander, Jon Kleinberg  “Block Models and Personalized PageRank”  arXiv:1607.03483

SLIDE 44

O p e n d i r e c t i

n

s

●
●
●
●
Model covariance of landing probabilities?
Currently requires at least ~logarithmic degrees (we think);

possible to derive weights for bounded degree SBMs?

Better classifiers in the space of landing probabilities for  
ther random walks? (Non-backtracking, etc.)
Not just SBM? Optimal weights for dcSBM, core-periphery,

Hoff latent space model, etc, etc.

Slow decay beyond resolution limit?
Pre-print:

Isabel Kloumann, Johan Ugander, Jon Kleinberg  “Block Models and Personalized PageRank”  arXiv:1607.03483