Local clustering with graph diffusions and spectral solution paths - - PowerPoint PPT Presentation

local clustering with graph diffusions and spectral
SMART_READER_LITE
LIVE PREVIEW

Local clustering with graph diffusions and spectral solution paths - - PowerPoint PPT Presentation

Local clustering with graph diffusions and spectral solution paths Joint with Kyle Kloster David F David F. . Gleich Gleich, (Purdue), supported by Purdue University NSF CAREER 1149756-CCF Local Clustering Given seed(s) S in G


slide-1
SLIDE 1

Local clustering with graph diffusions and spectral solution paths

Kyle Kloster Purdue University

Joint with

David F David F. . Gleich Gleich,

(Purdue), supported by NSF CAREER 1149756-CCF

slide-2
SLIDE 2

Local Clustering

Given seed(s) S in G, find a good cluster near S

seed

slide-3
SLIDE 3

Local Clustering

Given seed(s) S in G, find a good cluster near S

seed

“Near”? -> local, small containing S “Good”? -> low conductance

slide-4
SLIDE 4

Low-conductance sets are clusters

conductance( T ) =

# edges leaving T # edge endpoints in T

= “ chance a random edge that touches T exits T ” (for small sets T, i.e. vol(T) < vol(G)/2)

slide-5
SLIDE 5

Low-conductance sets are clusters

conductance( T ) =

# edges leaving T # edge endpoints in T

(for small sets T, i.e. vol(T) < vol(G)/2) For a global cluster, could use Fiedler… But we want a local cluster

slide-6
SLIDE 6

Fiedler

Lv = λ2Dv

“Sweep” over v:

  • 3. output best Sk
  • 2. for each set Sk = (1,…,k)

compute conductance

φ(Sk)

  • 1. sort:

v(1) ≥ v(2) ≥ · · ·

Compute Fiedler vector, v:

slide-7
SLIDE 7

Fiedler

Cheeger Inequality: Fiedler finds a cluster “not too much worse” than global optimal But we want local…

Lv = λ2Dv

“Sweep” over v:

  • 3. output best Sk
  • 2. for each set Sk = (1,…,k)

compute conductance

φ(Sk)

  • 1. sort:

v(1) ≥ v(2) ≥ · · ·

Compute Fiedler vector, v:

slide-8
SLIDE 8

Local Fiedler and diffusions

with local bias Fiedler

Lv = Dv[λ] Lv = Dv[λ] + “s”

[Mahoney Orecchia Vishnoi 12] “A local spectral method…” THM: MOV is a scaling of personalized PageRank*! (MOV) (normalized seed vector s)

slide-9
SLIDE 9

Local Fiedler and diffusions

PageRank vector, a diffusion Fiedler with local bias

Intuition: why MOV ~ PageRank

Lv = Dv[λ] AD−1 ˆ v = ˆ v[1 − λ] + “s” Lv = Dv[λ] + “s” (I − D−1/2AD−1/2)ˆ v = ˆ v[λ] + “s” (I − αP) ˆ v = “s”

slide-10
SLIDE 10

PageRank and other diffusions

“Personalized” PageRank (PPR) [Andersen, Chung, Lang 06]: local Cheeger inequality and fast algorithm, “Push” procedure

x = X

k=0

αkPk ˆ s (I − αP) x = ˆ s

Diffusion perspective Standard setting

slide-11
SLIDE 11

PageRank and other diffusions

“Personalized” PageRank (PPR) [Andersen, Chung, Lang 06]: local Cheeger inequality and fast algorithm, “Push” procedure Heat Kernel diffusion (HK) (many more!)

x = X

k=0

αkPk ˆ s f = X

k=0 tk k!Pk ˆ

s

20 40 60 80 100 10

−5

10 t=1 t=5 t=15 α=0.85 α=0.99 Weight Length

Various diffusions explore different aspects of graphs.

slide-12
SLIDE 12

PR

Diffusions, theory & practice

HK good conductance fast algorithm Gen Diff

Local Cheeger Inequality [Andersen Chung Lang 06] “PPR-push” is O(1/(ε(1-𝛽))) Local Cheeger Inequality [Chung 07] [K., Gleich 2014] “HK-push” is O(etC/ε ) Open question [Avron, Horesh 2015] Open question

This talk

TDPR

slide-13
SLIDE 13

PR

Diffusions, theory & practice

HK good conductance fast algorithm Gen Diff

Local Cheeger Inequality [Andersen Chung Lang 06] “PPR-push” is O(1/(ε(1-𝛽))) Local Cheeger Inequality [Chung 07] [K., Gleich 2014] “HK-push” is O(etC/ε ) Open question [Avron, Horesh 2015] Open question

This talk

TDPR

David Gleich and I are working with Olivia Simpson (a student of Fan Chung’s)

slide-14
SLIDE 14

General diffusions: intuition

seed

A diffusion propagates “rank” from a seed across a graph.

= high = low diffusion value = local cluster / low-conductance set

slide-15
SLIDE 15

General diffusions

A diffusion propagates “rank” from a seed across a graph.

f = X

k=0

ckPk ˆ s

General diffusion vector

p0 c0 p1 c1 p2 c2 p3 c3

+ + + + …

f =

Sweep over f!

slide-16
SLIDE 16

General algorithm

  • 1. Approximate f so
  • 2. Scale,
  • 3. Then sweep!

How to do this efficiently?

kD−1(f ˆ f)k∞  ✏ D−1ˆ f

slide-17
SLIDE 17

Algorithm Intuition

From parameters ck, ε, seed s …

Starting from here… How to end up here?

p0 p1 p2 p3

seed seed

p0 c0 p1 c1 p2 c2 p3 c3

+ + + + …

slide-18
SLIDE 18

Algorithm Intuition

Begin with mass at seed(s) in a “residual” staging area, r0 The residuals rk hold mass that is unprocessed – it’s like error

Idea: “push” any entry

rk(j)/ dj > (some threshold)

r0 r1 r2 r3

seed seed

p0 p1 p2 p3

+ + + + …

c0 c1 c2 c3

slide-19
SLIDE 19

Push Operation

push – (1) remove entry in rk, (2) put in f,

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + …

c0 c1 c2 c3

slide-20
SLIDE 20

push – (1) remove entry in rk, (2) put in f, (3) then scale and spread to neighbors in next r

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + …

c1

Push Operation

c0 c1 c2 c3

slide-21
SLIDE 21

Push Operation

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + …

push – (1) remove entry in rk, (2) put in f, (3) then scale and spread to neighbors in next r (repeat)

c0 c1 c2 c3

c2

slide-22
SLIDE 22

Push Operation

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + …

c2

push – (1) remove entry in rk, (2) put in f, (3) then scale and spread to neighbors in next r (repeat)

c0 c1 c2 c3

slide-23
SLIDE 23

Push Operation

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + …

push – (1) remove entry in rk, (2) put in f, (3) then scale and spread to neighbors in next r (repeat)

c0 c1 c2 c3

c2 c3

slide-24
SLIDE 24

Thresholds

ERROR equals weighted sum

  • f entries left in rk

à Set threshold so “leftovers” sum to < ε

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + … entries < threshold

c0 c1 c2 c3

slide-25
SLIDE 25

Thresholds

ERROR equals weighted sum

  • f entries left in rk

à Set threshold so “leftovers” sum to < ε

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + … entries < threshold

Threshold for stage rk is

c0 c1 c2 c3

Then

kD−1(f ˆ f)k∞  ✏

✏/ @

X

j=k+1

cj 1 A

slide-26
SLIDE 26

Another perspective

PageRank vector, a diffusion Fiedler with local bias

Lv = Dv[λ] AD−1 ˆ v = ˆ v[1 − λ] + “s” Lv = Dv[λ] + “s” (I − D−1/2AD−1/2)ˆ v = ˆ v[λ] + “s” (I − αP) ˆ v = “s”

slide-27
SLIDE 27

Another perspective

AD−1 ˆ Vk = ˆ Vk(I − Λk) + ˆ S LVk = DVkΛk LVk = DVkΛk + S

Fiedler with local bias

(I − D−1/2AD−1/2) ˆ Vk = ˆ VkΛk + ˆ S

slide-28
SLIDE 28

Another perspective

P ˆ VkΓ = ˆ Vk + ¯ S LVk = DVkΛk LVk = DVkΛk + S

Fiedler with local bias

AD−1 ˆ Vk = ˆ Vk(I − Λk) + ˆ S (I − D−1/2AD−1/2) ˆ Vk = ˆ VkΛk + ˆ S

Mix-product property For Kronecker product

slide-29
SLIDE 29

Another perspective

P ˆ VkΓ = ˆ Vk + ¯ S

Mix-product property For Kronecker product

LVk = DVkΛk LVk = DVkΛk + S

Fiedler with local bias

AD−1 ˆ Vk = ˆ Vk(I − Λk) + ˆ S (I − D−1/2AD−1/2) ˆ Vk = ˆ VkΛk + ˆ S (I − ΓT ⊗ P)vec( ˆ Vk) = vec( ˜ S)

slide-30
SLIDE 30

Another perspective

(I − αP) ˆ v = ˜ s (I − ΓT ⊗ P)vec( ˆ Vk) = vec( ˜ S)

  • generalizes PageRank to

“matrix teleportation parameter”

Γ = (I − Λk)−1

Standard spectral approach:

slide-31
SLIDE 31

Another perspective

(I − αP) ˆ v = ˜ s (I − ΓT ⊗ P)vec( ˆ Vk) = vec( ˜ S)

  • generalizes PageRank to

“matrix teleportation parameter”

Γ =       ˜ c0 ... ... ˜ cN      

Our framework is equivalent to: (Details in [K., Gleich KDD 14])

slide-32
SLIDE 32

General diffusions: conclusion

THM: For diffusion coefficients ck >= 0 satisfying “generalized push” approximates the diffusion f

  • n a symmetric graph so that

in work bounded by Constant for any inputs! (If diffusion decays fast)

X

k=0

ck = 1

and

kD−1(f ˆ f)k∞  ✏ O(2N2/✏)

N

X

k=0

ck ≤ ✏/2

“rate of decay”

slide-33
SLIDE 33

Proof sketch

  • 1. Stop pushing after N terms.

N

X

k=0

ck ≤ ✏/2

  • 2. Push residual entries in first N terms if
  • 3. Total work is # pushes:

rk(j) ≥ d(j)✏/(2N)

N−1

X

k=0 mk

X

t=1

d(jt)

slide-34
SLIDE 34

Push Recap

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + …

push – (1) remove entry in rk, (2) put in p, (3) then scale and spread to neighbors in next r

c0 c1 c2 c3

c2 c3

d(j) work

slide-35
SLIDE 35

Proof sketch

  • 1. Stop pushing after N terms.

N

X

k=0

ck ≤ ✏/2

  • 2. Push residual entries in first N terms if
  • 3. Total work is # pushes:

rk(j) ≥ d(j)✏/(2N)

N−1

X

k=0 mk

X

t=1

d(jt)

slide-36
SLIDE 36

Proof sketch

  • 1. Stop pushing after N terms.

N

X

k=0

ck ≤ ✏/2

  • 2. Push residual entries in first N terms if
  • 3. Total work is # pushes:

rk(j) ≥ d(j)✏/(2N)

N−1

X

k=0 mk

X

t=1

rk(jt)(2N)/✏

N−1

X

k=0 mk

X

t=1

d(jt)

slide-37
SLIDE 37

Proof sketch

  • 1. Stop pushing after N terms.

O(2N2/✏)

N

X

k=0

ck ≤ ✏/2

  • 2. Push residual entries in first N terms if
  • 4. Each rk sums to <= 1

(each push is added to f, which sums to 1)

  • 3. Total work is # pushes:

rk(j) ≥ d(j)✏/(2N)

N−1

X

k=0 mk

X

t=1

rk(jt)(2N)/✏

N−1

X

k=0 mk

X

t=1

d(jt)

mk

X

t=1

rk(jt) ≤ 1

slide-38
SLIDE 38

Solutions Paths

Benefit of these “push” diffusions? A direct decomposition is a black box: Feed in input, get output. In contrast, the iterative nature of “push” means running the algorithm is essentially “watching” the diffusion process occur.

slide-39
SLIDE 39

Solutions Paths

Benefit of these “push” diffusions? A direct decomposition is a black box: Feed in input, get output. In contrast, the iterative nature of “push” means running the algorithm is essentially “watching” the diffusion process occur.

✏ = 10−3 ✏ = 10−4 ✏ = 10−2

slide-40
SLIDE 40

Solutions Paths

10

1

10

2

10

3

10

4

10

5

10

−5

10

−4

10

−3

10

−2

10

−1

10 1/ε Degree normalized PageRank Netscience −− PageRank Solution Paths

✏ = 10−3 ✏ = 10−4 ✏ = 10−2

slide-41
SLIDE 41

Solutions Paths

10

1

10

2

10

3

10

4

10

5

10

−5

10

−4

10

−3

10

−2

10

−1

10 1/ε Degree normalized PageRank Netscience −− PageRank Solution Paths

Each curve is a node. Its value increases as ε goes to 0. Thick black line shows set of best conductance.

✏ = 10−3 ✏ = 10−4 ✏ = 10−2

slide-42
SLIDE 42

Solutions Paths

10

1

10

2

10

3

10

4

10

5

10

−5

10

−4

10

−3

10

−2

10

−1

10 1/ε Degree normalized PageRank Netscience −− PageRank Solution Paths

✏ = 10−3 ✏ = 10−4

Bundles of curves are good clusters Paths identify nested clusters

✏ = 10−2

Each curve is a node. Its value increases as ε goes to 0. Thick black line shows set of best conductance.

slide-43
SLIDE 43

Solutions Paths

Locate nested, good-conductance sets that a single diffusion + sweep could miss. Can be done efficiently because the constant- time approach to computing diffusions enables efficient storage and analysis of the push process Total Paths work (for PageRank): Still efficient!

O ✓ 1 ✏(1 − ↵) ◆2

slide-44
SLIDE 44

Thank you

Heat kernel code available at

http://www.cs.purdue.edu/homes/dgleich/codes/hkgrow

Solution paths: http://arxiv.org/abs/1503.00322

(Solution paths, generalized diffusion code soon)

Ongoing work

  • Generalized local Cheeger Inequality

for broader class of diffusions

Questions or suggestions? Email Kyle Kloster at kkloste-at-purdue-dot-edu