A Random Walk Around The Block Johan Ugander Stanford University - - PowerPoint PPT Presentation

a random walk around the block
SMART_READER_LITE
LIVE PREVIEW

A Random Walk Around The Block Johan Ugander Stanford University - - PowerPoint PPT Presentation

A Random Walk Around The Block Johan Ugander Stanford University Joint work with: Isabel Kloumann (Facebook) & Jon Kleinberg (Cornell) Google Mountain View August 17, 2016 S e e d s e t e x p a n s i o n Given a


slide-1
SLIDE 1

A Random Walk Around The Block

Johan Ugander Stanford University Joint work with: Isabel Kloumann (Facebook) 
 & Jon Kleinberg (Cornell) Google Mountain View August 17, 2016

slide-2
SLIDE 2

S e e d s e t e x p a n s i

  • n
  • Given a graph G=(V, E), goal is to accurately identify


a target set T ⊂ V from a smaller seed set S ⊂ T.

  • target set T
slide-3
SLIDE 3

S e e d s e t e x p a n s i

  • n
  • Given a graph G=(V, E), goal is to accurately identify


a target set T ⊂ V from a smaller seed set S ⊂ T.

  • target set T

seed set S

slide-4
SLIDE 4

S e e d s e t e x p a n s i

  • n
  • Given a graph G=(V, E), goal is to accurately identify


a target set T ⊂ V from a smaller seed set S ⊂ T.

  • seed set S
slide-5
SLIDE 5

S e e d s e t e x p a n s i

  • n
  • Given a graph G=(V, E), goal is to accurately identify


a target set T ⊂ V from a smaller seed set S ⊂ T. seed set S

  • Scored by


Personalized PageRank

slide-6
SLIDE 6

S e e d s e t e x p a n s i

  • n
  • Given a graph G=(V, E), goal is to accurately identify


a target set T ⊂ V from a smaller seed set S ⊂ T.

  • Applications:
  • Broadly: ranking on graphs, recommendation systems
  • Spam filtering (Wu & Chellapilla ’07)
  • Community detection (Weber et al. ’13)
  • Missing data inference (Mislove et al. ’14)
  • Common methods:
  • Semi-supervised learning (Zhu et al. ’03)
  • Diffusion-based classification 


(Jeh & Widom ’03, Kloster & Gleich ’14)

  • Outwardness, modularity and more 


(Bagrow ’08, Kloumann & Kleinberg ’14)

slide-7
SLIDE 7

S e e d s e t e x p a n s i

  • n
  • Given a graph G=(V, E), goal is to accurately identify


a target set T ⊂ V from a smaller seed set S ⊂ T.

  • Applications:
  • Broadly: ranking on graphs, recommendation systems
  • Spam filtering (Wu & Chellapilla ’07)
  • Community detection (Weber et al. ’13)
  • Missing data inference (Mislove et al. ’14)
  • Common methods:
  • Semi-supervised learning (Zhu et al. ’03)
  • Diffusion-based classification 


(Jeh & Widom ’03, Kloster & Gleich ’14)

  • Outwardness, modularity and more 


(Bagrow ’08, Kloumann & Kleinberg ’14)

slide-8
SLIDE 8

R e c a l l c u r v e s f

  • r

s e e d s e t e x p a n s i

  • n
  • Recall curve: true positive rate, as a function of the number 

  • f items returned based on small uniformly random seed set.
  • Kloumann & Kleinberg ’14 tested many different methods 

  • n data, broadly found Personalized PageRank to be best.

Kloumann & Kleinberg ‘14

slide-9
SLIDE 9

R e c a l l c u r v e s f

  • r

s e e d s e t e x p a n s i

  • n
  • Recall curve: true positive rate, as a function of the number 

  • f items returned based on small uniformly random seed set.
  • Kloumann & Kleinberg ’14 tested many different methods 

  • n data, broadly found Personalized PageRank to be best.
  • Truncated PPR (first K steps) comparable to PPR from K=4.
  • Heat Kernel later found comparable to PPR.

Kloumann & Kleinberg ‘14

slide-10
SLIDE 10
  • Classification based on random walk landing probabilities
  • , probability that a random walk starting in S is at v after k steps.
  • , truncated vector of landing probabilities.
  • Personalized PageRank and Heat Kernel ranking:
  • General diffusion score function:

D i f f u s i

  • n
  • b

a s e d n

  • d

e c l a s s i fi c a t i

  • n

score(v) =

X

k=1

wkrv

k

rv

k

PPR(v) ∝

X

k=1

(αk)rv

k

HK(v) ∝

X

k=1

✓tk k! ◆ rv

k

(rv

1, rv 2, ..., rv K)

slide-11
SLIDE 11
  • Personalized PageRank and Heat Kernel


= two parametric families of linear weights

  • Question in this work: 


What weights are “optimal” for diffusion-based classification?

D i f f u s i

  • n
  • b

a s e d n

  • d

e c l a s s i fi c a t i

  • n

20 40 60 80 100 10

−5

10 t=1 t=5 t=15 α=0.85 α=0.99 Weight Length

(Kloster & Gleich, ’14)

PPR HK wk = αk wk = tk/k!

score(v) =

K

X

k=1

wkrv

k

slide-12
SLIDE 12

T h e s t

  • c

h a s t i c b l

  • c

k m

  • d

e l

  • C blocks
  • Focus on C=2 blocks: 1=“Target”, 2=“Other”
  • n1, n2 nodes in blocks
  • Independent edge probabilities:
  • Edge probability within a block = pin
  • Edge probability across blocks = pout
  • (Results for C>2 as well, see paper)
  • Model with many names:
  • Stochastic Block Model (Holland et al. ’83)
  • Affiliation Model (Frank-Harary ’82)
  • Planted Partition Model (Dyer-Frieze ’89)

pin pin pout pout

slide-13
SLIDE 13

T h e S B M r e s

  • l

u t i

  • n

l i m i t

  • Find true partition in poly(n) time w.h.p. as n→∞ :
  • Dyer-Frieze ’89:

If pin - pout = O(1)

  • Condon-Karp ’01: If pin - pout ≥ Ω(n-1/2)
  • McSherry ’01:

If pin - pout ≥ Ω((pout(log n)/n)-1/2)

  • pin

pin pout pout

slide-14
SLIDE 14

T h e S B M r e s

  • l

u t i

  • n

l i m i t

  • Find true partition in poly(n) time w.h.p. as n→∞ :
  • Dyer-Frieze ’89:

If pin - pout = O(1)

  • Condon-Karp ’01: If pin - pout ≥ Ω(n-1/2)
  • McSherry ’01:

If pin - pout ≥ Ω((pout(log n)/n)-1/2)

  • Find partition positively correlated with true partition:
  • Coja-Oghlan ’06: If pin - pout ≥ Ω((pout/n)-1/2),
  • pin

pin pout pout

slide-15
SLIDE 15

T h e S B M r e s

  • l

u t i

  • n

l i m i t

  • Find true partition in poly(n) time w.h.p. as n→∞ :
  • Dyer-Frieze ’89:

If pin - pout = O(1)

  • Condon-Karp ’01: If pin - pout ≥ Ω(n-1/2)
  • McSherry ’01:

If pin - pout ≥ Ω((pout(log n)/n)-1/2)

  • Find partition positively correlated with true partition:
  • Coja-Oghlan ’06: If pin - pout ≥ Ω((pout/n)-1/2),
  • If and only if (a-b)2 > 2(a+b) (pin = a/n, pout = b/n):
  • Decelle et al ’11: Conjecture and belief propagation numerics
  • Mossel et al ’12,’13, Massoulié ’13, Abbe et al. ’14: Proven
  • Recent extensions:
  • More than two blocks (e.g. Neeman-Netrapalli ’14)
  • Unequal block sizes (e.g. Zhang et al. ’16)
  • pin

pin pout pout

slide-16
SLIDE 16

T h e S B M r e s

  • l

u t i

  • n

l i m i t

  • Is block recovery/classification over? No!
  • Unsupervised vs. semi-supervised
  • Empirical graphs != SBMs
  • Optimal algorithms not practical
  • Beyond asymptotic limits, what are decay rates?
  • Rather than being “problem down” (SBM classification), this talk will

be “method up”: how to tune diffusion weights to find seed sets?

  • Possible variations: Diffusion weights for seed set expansion in 


core-periphery models? Latent space models (Hoff et al. 2002)? Etc.

score(v) =

K

X

k=1

wkrv

k

pin pin pout pout

slide-17
SLIDE 17

D i f f u s i

  • n
  • b

a s e d c l a s s i fi c a t i

  • n

i n S B M s

  • SBMs present a natural binary classification problem.
  • Recall notation:
  • , probability that a random walk starting in S is at v after k steps.
  • , truncated vector of landing probabilities.
  • Choices of define sweep directions through space.
  • Optimistically:

rv

k

(rv

1, rv 2, ..., rv K)

ri rj

Target block nodes Other block nodes (w1, ..., wK)

slide-18
SLIDE 18

T h e s p a c e

  • f

l a n d i n g p r

  • b

a b i l i t i e s

0.006

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05
  • SBM: 2000 nodes, Target & Other blocks, pin = 0.2, pout = 0.05
  • One seed node (uniformly at random from Target set)
slide-19
SLIDE 19

T h e s p a c e

  • f

l a n d i n g p r

  • b

a b i l i t i e s

0.006

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05
  • SBM: 2000 nodes, Target & Other blocks, pin = 0.2, pout = 0.05
  • One seed node (uniformly at random from Target set)
slide-20
SLIDE 20

0.006

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05

T h e s p a c e

  • f

l a n d i n g p r

  • b

a b i l i t i e s

  • Geometric discriminant function: sweeps through the space

  • f landing probabilities following vector from b to a.

a b

slide-21
SLIDE 21

0.006

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05

T h e s p a c e

  • f

l a n d i n g p r

  • b

a b i l i t i e s

  • Fisher discriminant functions: Clearly exist better linear 


and quadratic functions. Forward pointer, will return.

slide-22
SLIDE 22

T h e s p a c e

  • f

l a n d i n g p r

  • b

a b i l i t i e s

  • 0.000

0.002 0.004 0.006 0e+00 4e−04 8e−04 1−step Landing prob 2−step Landing prob

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05
  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 3−step Landing prob 4−step Landing prob

  • Focus on deriving optimal Geometric discriminant function first.

slide-23
SLIDE 23

G e

  • m

e t r i c d i s c r i m i n a n t f u n c t i

  • n

s

  • Let be the landing probabilities of a node
  • Let be the Target class centroid
  • Let be the Other class centroid
  • Then is the geometric discriminant function.
  • Notice: increases when r moves in direction of a - b.
  • Can classify nodes based on thresholds of .

r = (r1, . . . , rK) a = (a1, . . . , aK) b = (b1, . . . , bK) f(r) = (a − b)T r f(r) f(r)

0.006

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05

a b

slide-24
SLIDE 24

P e r s

  • n

a l i z e d P a g e R a n k i s “

  • p

t i m a l ”

  • Main Theorem (informal version). 


For 2-block SBM with equal sized blocks and edge densities , : 
 



 and the optimal geometric classifier is therefore: . 


which is PPR(!) with .
 ak − bk = ✓pin − pout pin + pout ◆k pin pout α∗ = ✓pin − pout pin + pout ◆

K

X

k=1

(α∗)krk ,

0.006

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05

a b

slide-25
SLIDE 25

P e r s

  • n

a l i z e d P a g e R a n k i s “

  • p

t i m a l ”

  • Main Theorem (informal version). 


For 2-block SBM with equal sized blocks and edge densities , : 
 



 and the optimal geometric classifier is therefore: . 


which is PPR(!) with .
 ak − bk = ✓pin − pout pin + pout ◆k pin pout α∗ = ✓pin − pout pin + pout ◆

K

X

k=1

(α∗)krk ,

0.006

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05

a b

  • Two main parts:
  • 1. Centroids a, b concentrate on quantities


determined by the solution to a linear 
 recurrence relation.

  • 2. That linear recurrence relation can be 


solved and yields PPR.

slide-26
SLIDE 26

P P R i s “

  • p

t i m a l ” : P r

  • f

i d e a

  • Part 1: Concentration of landing probabilities

Lemma 1. For any ✏, > 0, there is an n sufficiently large such that the random landing probabilities (ˆ a1, ...., ˆ aK) and (ˆ b1, ...,ˆ bK) for a uniform random walk on Gn starting in the seed block satisfy the following conditions with probability at least 1 − for all k > 0: Nˆ ak ∈  (1 − ✏) Ak Ak + Bk , (1 + ✏) Ak Ak + Bk

  • and

(1) Nˆ bk ∈  (1 − ✏) Bk Ak + Bk , (1 + ✏) Bk Ak + Bk

  • ,

(2) where Ak, Bk are the solutions to the matrix recurrence relation ( Ak = N(pinAk−1 + poutBk−1) Bk = N(poutAk−1 + pinBk−1), with A0 = 1, B0 = 0.

slide-27
SLIDE 27

P P R i s “

  • p

t i m a l ” : P r

  • f

i d e a

  • Part 1: Concentration of landing probabilities
  • interpretable as length-k walk count to nodes in block 1 vs. 2.
  • For large n, block walk counts increase by factors of ~E[degree].

Ak, Bk

Lemma 1. For any ✏, > 0, there is an n sufficiently large such that the random landing probabilities (ˆ a1, ...., ˆ aK) and (ˆ b1, ...,ˆ bK) for a uniform random walk on Gn starting in the seed block satisfy the following conditions with probability at least 1 − for all k > 0: Nˆ ak ∈  (1 − ✏) Ak Ak + Bk , (1 + ✏) Ak Ak + Bk

  • and

(1) Nˆ bk ∈  (1 − ✏) Bk Ak + Bk , (1 + ✏) Bk Ak + Bk

  • ,

(2) where Ak, Bk are the solutions to the matrix recurrence relation ( Ak = N(pinAk−1 + poutBk−1) Bk = N(poutAk−1 + pinBk−1), with A0 = 1, B0 = 0.

slide-28
SLIDE 28

M

  • r

e g e n e r a l S B M s

  • For SBMs with C>2 blocks and/or with arbitrary P:
  • Seed set expansion asks: identify nodes in a target block set.
  • With conditions on equal expected degrees, PPR(!).
  • Without conditions, still:
  • Asymptotically optimal weights for geometric classification


still obtainable from solutions to a matrix recurrence relation.

block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4

slide-29
SLIDE 29

E m p i r i c a l v s . t h e

  • r

e t i c a l c e n t r

  • i

d s

  • 2048-node, 4-block SBM, empirical class centroids vs. theory:

1e−3 2e−3 3e−3

rk

empirical centroids predicted centroids

1 2 3 4 5 6 7 8 9 k 1e−6 1e−5 | ˆ wk − Ψk|

Error

  • a, Target blocks
  • b, Other blocks

block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4

slide-30
SLIDE 30

1e−3 2e−3 3e−3

rk

empirical centroids predicted centroids

1 2 3 4 5 6 7 8 9 k 1e−6 1e−5 | ˆ wk − Ψk|

Error

  • a, Target blocks
  • b, Other blocks

block 1 block 2 block 3 block 4 block 1 block 2 block 3 block 4

E m p i r i c a l v s . t h e

  • r

e t i c a l c e n t r

  • i

d s

  • 2048-node, 4-block SBM, empirical class centroids vs. theory:

From matrix 
 recurrence relation

slide-31
SLIDE 31

T h e

  • r

i e s

  • f

g r a p h d i f f u s i

  • n
  • Other motivations for PPR:
  • Random Surfer Model (Brin-Page ’98)
  • Cheeger inequalities for PPR, HK (Andersen et al ’06, Chung ’09)
  • Local spectral algorithm with regularization (Mahoney et al. ’12)
  • Our work shows PPR can be derived as “optimal” geometric classifier.
  • Also motivates how to choose PPR , as = .

α

= ✓pin − pout pin + pout ◆

α

slide-32
SLIDE 32

T h e

  • r

i e s

  • f

g r a p h d i f f u s i

  • n
  • Other motivations for PPR:
  • Random Surfer Model (Brin-Page ’98)
  • Cheeger inequalities for PPR, HK (Andersen et al ’06, Chung ’09)
  • Local spectral algorithm with regularization (Mahoney et al. ’12)
  • Our work shows PPR can be derived as “optimal” geometric classifier.
  • Also motivates how to choose PPR , as = .
  • Most importantly: also opens door to methods beyond PPR.
  • α

= ✓pin − pout pin + pout ◆

α

slide-33
SLIDE 33

P P R i s “

  • p

t i m a l ” i n a n a r r

  • w

s e n s e

0.006

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • p_in=0.2, p_out=0.05
  • Discriminant functions that model higher moments of point clouds?

a b

slide-34
SLIDE 34

F i s h e r d i s c r i m i n a n t f u n c t i

  • n

s

0.005

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • Discriminant functions that model higher moments of point clouds.

(a, Σa) (b, Σb)

slide-35
SLIDE 35
  • Let z be the latent class of each node.
  • Capture (mean, variance) of class point clouds:
  • Log-likelihood ratio as discriminant function:

F i s h e r d i s c r i m i n a n t f u n c t i

  • n

s

0.005

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • Pr(r|z = 1) ∝ |Σa|− 1

2 exp

✓ −1 2(r − a)T Σ−1

a (r − a)

◆ Pr(r|z = 0) ∝ |Σb|− 1

2 exp

✓ −1 2(r − b)T Σ−1

b (r − b)

◆ g(r) = log Pr(r|z = 1) Pr(z = 1) Pr(r|z = 0) Pr(z = 0)

slide-36
SLIDE 36
  • Three approaches:
  • We call the first two methods


QuadSBMRank, LinSBMRank.

  • Perhaps reasonable to assume


equal covariances; effective.

  • PPR follows from an assumption of 


uniform variance, no covariance.

F i s h e r d i s c r i m i n a n t f u n c t i

  • n

s

General : g2(r) ∝

  • Σ−1

a a − Σ−1 b b

T r + 1

2rT

Σ−1

b

− Σ−1

a

  • r

Assume Σa = Σb = Σ : g1(r) ∝ Σ−1(a − b)T r Assume Σa = Σb = I : g0(r) ∝ (a − b)T r

0.005

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • g2(r)

g1(r) g0(r)

slide-37
SLIDE 37
  • Three approaches:
  • We call the first two methods


QuadSBMRank, LinSBMRank.

  • Perhaps reasonable to assume


equal covariances; effective.

  • PPR follows from an assumption of 


uniform variance, no covariance.

  • Open challenge: Possible to 


show asymptotic normality and
 characterize covariance matrices?

F i s h e r d i s c r i m i n a n t f u n c t i

  • n

s

General : g2(r) ∝

  • Σ−1

a a − Σ−1 b b

T r + 1

2rT

Σ−1

b

− Σ−1

a

  • r

Assume Σa = Σb = Σ : g1(r) ∝ Σ−1(a − b)T r Assume Σa = Σb = I : g0(r) ∝ (a − b)T r

0.005

  • ● ●
  • 0e+00

4e−04 8e−04 0e+00 4e−04 8e−04 2−step Landing prob 3−step Landing prob

  • g2(r)

g1(r) g0(r)

slide-38
SLIDE 38

E v a l u a t i

  • n

: r e c a l l c u r v e s

  • SBM with 2 blocks, 64 nodes/block, 1 seed node.
  • Recall that Belief Propagation reaches resolution limit.
  • Easy instance (pin >> pout):
  • Everything does well.
slide-39
SLIDE 39

E v a l u a t i

  • n

: r e c a l l c u r v e s

  • SBM with 2 blocks, 64 nodes/block, 1 seed node.
  • Recall that Belief Propagation reaches resolution limit.
  • Hard instance…
  • PPR/HK lost all recall, LinSBMRank and QuadSBMRank near BP.
slide-40
SLIDE 40

E v a l u a t i

  • n

: r e c a l l c u r v e s

  • SBM with 2 blocks, 64 nodes/block, 1 seed node.
  • Recall that Belief Propagation reaches resolution limit.
  • Even harder instance…
  • LinSBMRank and QuadSBMRank outperforming BP by a hair…?
slide-41
SLIDE 41

E v a l u a t i

  • n

: r e c a l l c u r v e s

  • SBM with 2 blocks, 64 nodes/block, 1 seed node.
  • Recall that Belief Propagation reaches resolution limit.
  • Impossible (pin = pout):
  • Nothing works.
slide-42
SLIDE 42

E v a l u a t i

  • n

: r e s

  • l

u t i

  • n

l i m i t

  • Pearson correlation r between true partition and inferred partition.
  • Empirically, we see LinSBMRank and QuadSBMRank get very close 


to resolution limit (dotted line), with slower decay rate. PPR, HK, LinSBMRank, QuadSBMRank, BP

slide-43
SLIDE 43

C

  • n

c l u s i

  • n

s

  • Personalized PageRank with = is optimal geometric 


discriminant function for balanced 2-block SBM.

  • Geometric discriminant functions for more general block models 


follow from recurrence relation.

  • Landing probabilities are correlated; correcting for higher moments in

the space of landing probabilities greatly improves classification.

  • In practice: fit GMMs in space of landing probs.
  • A new perspective on diffusion-based ranking


that can hopefully open new doors.

  • =

✓pin − pout pin + pout ◆

α

  • Pre-print:


Isabel Kloumann, Johan Ugander, Jon Kleinberg
 “Block Models and Personalized PageRank”
 arXiv:1607.03483

slide-44
SLIDE 44

O p e n d i r e c t i

  • n

s

  • Model covariance of landing probabilities?
  • Currently requires at least ~logarithmic degrees (we think);


possible to derive weights for bounded degree SBMs?

  • Better classifiers in the space of landing probabilities for 

  • ther random walks? (Non-backtracking, etc.)
  • Not just SBM? Optimal weights for dcSBM, core-periphery, 


Hoff latent space model, etc, etc.

  • Slow decay beyond resolution limit?
  • Pre-print:


Isabel Kloumann, Johan Ugander, Jon Kleinberg
 “Block Models and Personalized PageRank”
 arXiv:1607.03483