Graph diffusions and matrix functions: fast algorithms and - - PowerPoint PPT Presentation

graph diffusions and matrix functions fast algorithms and
SMART_READER_LITE
LIVE PREVIEW

Graph diffusions and matrix functions: fast algorithms and - - PowerPoint PPT Presentation

Graph diffusions and matrix functions: fast algorithms and localization results Thesis defense Advised by Kyle Kloster David F David F. . Gleich Gleich Supported by Purdue University NSF CAREER 1149756-CCF 1 Network


slide-1
SLIDE 1

Graph diffusions and matrix functions: fast algorithms and localization results

Kyle Kloster Purdue University

Advised by

David F David F. . Gleich Gleich

Supported by NSF CAREER 1149756-CCF

  • Thesis defense

1

slide-2
SLIDE 2

Network Analysis

Graphs can model everything!

Graph, G

V, nodes E, edges

Erdős Number Facebook friends Twitter followers Search engines Amazon/Netflix rec. Protein interactions Power grids Google Maps Air traffic control Sports rankings Cell tower placement Scheduling Parallel programming Everything Kevin Bacon 2

slide-3
SLIDE 3

Network Analysis

Recommending facebook friends Each node is a user, the graph has edges between facebook friends.

  • How should Facebook determine

which users to recommend as new friends to the node colored black?

3

slide-4
SLIDE 4

Network Analysis

PageRank One of the best methods for determining FB friends / Twitter followers is “seeded PageRank”

  • A diffusion process that leaks dye

from target node (seed) to the rest

  • f the graph.

More dye = higher probability that node is your friend!

4

slide-5
SLIDE 5
  • ~ O(109) #nodes

~ O(1010) #edges = |E|

And real-world networks are

Nonzero dye on every node (nonzero probability you are friends with each person) -> must look at whole graph to be accurate! Big networks pose a big problem for applications that need fast answers (like “which users should I befriend?”)

Mo’ data, mo’ problems

5

slide-6
SLIDE 6

State-of-the-art c. 2012: “Wild West”

There exist “fast” methods for seeded PageRank, but they were “compute first, ask questions later”* (or not at all!)

  • They lacked principled mathematical theory

guaranteeing these fast approximations would be accurate.

  • But fast approximate methods “seemed to work”
  • 6
slide-7
SLIDE 7

Localization in seeded PageRank

Seed Seed

Newman’s netscience graph 379 vertices 924 edges

x is “zero” on most of the nodes!

In connected graphs seeded PageRank is non-zero everywhere.

  • But in practice…

7

slide-8
SLIDE 8

Solution to Big Data: localization

Local algorithms look at just the graph region near the nodes of interest Localization occurs when a global object can be approximated accurately by being precise in only a small region

8

slide-9
SLIDE 9

Weak and strong localization

Weak localization: an approximation that is sparse, and accurate enough to use in applications that tolerate low accuracy (clustering!)

  • Strong localization: an approximation that

is sparse, and accurate enough for use in any application.

9

slide-10
SLIDE 10

PR

State of the art, 2016/4/22

HK Weak localization Strong localization Gen Diff

[Nassar, K., Gleich,

2015]

  • [K. & Gleich 2014]
  • [Gleich & K., 2014]

In preparation!

?

10 10

slide-11
SLIDE 11

Weak localization in diffusions

11 11

slide-12
SLIDE 12

General diffusions: intuition

A diffusion propagates “rank” from a seed across a graph.

= high = low diffusion value seed

12 12

slide-13
SLIDE 13

Graph Matrices

Adjacency matrix, A Random-walk transition matrix, P

Aij =

1, if node i links to node j 0 otherwise where is the outdegree of node j.

dj

Column stochastic! i.e. column-sums = 1

Pij = Aji/dj

13 13

P = ATD−1

where is the diag degree matrix.

D

slide-14
SLIDE 14

General diffusions: intuition

A diffusion propagates “rank” from a seed across a graph.

seed

General diffusion vector

p0 c0 p1 c1 p2 c2 p3 c3

+ + + + …

f = f = X

k=0

ckPk ˆ s = f(P)ˆ s

14 14

slide-15
SLIDE 15

Local Community Detection

Given seed(s) S in G, find a community that contains S.

seed

high internal, low external connectivity

“Community” ?

15 15

slide-16
SLIDE 16

Low-conductance sets are communities

conductance( T ) =

cut(T) min( vol(T), vol(Tc) )

conductance(comm) = 39/381 = .102 ~ “ chance a random edge touching T also exits T ”

16 16

slide-17
SLIDE 17

Graph diffusions find low-conductance sets

seed = high = low diffusion value = local community / low-conductance set

17 17

slide-18
SLIDE 18

Use a diffusion for good conductance sets

  • 1. Approximate f so
  • 2. Then “sweep” for best conductance set.

Sweep:

  • 1. Sort diffusion vector so
  • 2. Consider the sweep sets S(j) = {1,2,…,j}
  • 3. Return the set S(j) with the best conductance.

f1/d(1) ≥ f2/d(2) ≥ · · · kD−1(f ˆ f)k∞  ε , f ≥ ˆ f ≥ 0

18 18

slide-19
SLIDE 19

Weak localization in diffusions

  • 1. Approximate f so kD−1(f ˆ

f)k∞  ε , f ≥ ˆ f ≥ 0

Weak localization: When an approximation of f satisfies and is sparse, the diffusion is weakly localized.

  • Basically: “get just the biggest entries sort of correct”

kD−1(f ˆ f)k∞  ε 19 19

slide-20
SLIDE 20

Diffusions used for conductance

Personalized PageRank (PPR)

  • Heat Kernel (HK)
  • Time-dependent PageRank (TDPR)

f =

X

k=0

αkPk ˜ s f =

X

k=0 tk k!Pk ˜

s

Various diffusions explore different aspects of graphs.

20 20

slide-21
SLIDE 21

PR

Diffusions: conductance & algorithms

HK good conductance fast algorithm Gen Diff

Local Cheeger Inequality [Andersen,Chung,Lang 06] [Andersen Chung Lang 06] “PPR-push” is O(1/(ε(1-𝛽))) Local Cheeger Inequality [Chung ’07] [K., Gleich ’14] “HK-push” is O(etC/ε ) Open question [Avron, Horesh ’15] Constant-time heuristically [Ghosh et al. ’14] on L;

  • pen question for general f

In preparation with Gleich and Simpson TDPR 21 21

slide-22
SLIDE 22

Our algorithms for

  • constant time on any graph,
  • heat kernel:
  • general:
  • accuracy:
  • our experiments show heat kernel outperforms

PageRank on real-world communities

˜ O( e1

ε )

ˆ f ≈ f(P)ˆ s

kD−1(f ˆ f)k∞  ✏ O( N2

ε )

22 22

slide-23
SLIDE 23

General diffusion: Algorithm Intuition

From parameters ck, ε, seed s …

Starting from here… How to end up here?

p0 p1 p2 p3

seed seed

p0 c0 p1 c1 p2 c2 p3 c3

+ + + + …

f = X

k=0

ckPk ˆ s = f(P)ˆ s

“residual staging area”:

23 23

slide-24
SLIDE 24

General diffusion: Algorithm Intuition

p0 c0 p1 c1 p2 c2 p3 c3

+ + + + …

f = X

k=0

ckPk ˆ s = f(P)ˆ s

       I −P I −P I −P I ... ...                p0 p1 p2 . . . . . .         =         s . . . . . .        

24 24

slide-25
SLIDE 25

Algorithm Intuition

Begin with mass at seed(s) in a “residual” staging area, r0 The residuals rk hold mass that is unprocessed – it’s like error

Idea: “push” any entry

rk(j)/ dj > (some threshold)

r0 r1 r2 r3

seed seed

p0 p1 p2 p3

+ + + + …

c0 c1 c2 c3

25 25

slide-26
SLIDE 26

Thresholds

ERROR equals weighted sum

  • f entries left in the vectors rk

à Set threshold so “leftovers” sum to <

r0 r1 r2 r3 … p0 p1 p2 p3

+ + + + … entries < threshold

Threshold for stage rk is

c0 c1 c2 c3

Then kD−1(f ˆ

f)k∞  ε

ε/ @

X

j=k+1

cj 1 A

ε

26 26

slide-27
SLIDE 27

General diffusions: conclusion

THM: For diffusion coefficients ck >= 0 satisfying Our algorithm approximates the diffusion f

  • n an undirected graph so that

in work bounded by Constant for any inputs! (If diffusion decays fast)

X

k=0

ck = 1

and

kD−1(f ˆ f)k∞  ✏ O(2N2/✏)

N

X

k=0

ck ≤ ✏/2

“rate of decay”

27 27

slide-28
SLIDE 28

Proof sketch

  • 1. Stop pushing after N terms.

O(2N2/✏)

N

X

k=0

ck ≤ ✏/2

  • 2. Push residual entries in first N terms if
  • 4. Each rk sums to <= 1

(each push is added to f, which sums to 1)

  • 3. Total work is # pushes:

rk(j) ≥ d(j)✏/(2N)

N−1

X

k=0 mk

X

t=1

rk(jt)(2N)/✏

N−1

X

k=0 mk

X

t=1

d(jt)

mk

X

t=1

rk(jt) ≤ 1

28 28

slide-29
SLIDE 29

Strong localization in seeded PageRank

29 29

slide-30
SLIDE 30

Given a seed and a graph Seeded PageRank is defined as the solution to

  • where is the “teleportation

parameter” in (0,1). es P = ATD−1 α ( − αP)x = (1 − α)es Strong localization: if we can approximate x so that and the approximation is sparse, x is strongly localized.

Strong localization in seeded PageRank

kx ˆ xk1  ε

30 30

slide-31
SLIDE 31

An example on a bigger graph

Crawl of flickr from 2006: ~800K nodes, 6M edges, seeded PageRank with = 0.5

2 4 6 8 10 x 10

5

0.5 1 1.5

10 10

2

10

4

10

6

10

−15

10

−10

10

−5

10 10 10

2

10

4

10

6

10

−15

10

−10

10

−5

10

nonzeros error

plot(x)

||xtrue – xnnz||1

α

X-axis: node index Y-axis: value at that index in true PageRank vector

31 31

slide-32
SLIDE 32

Conditions for localization?

When is localization in diffusions possible? We’ve observed localization in real world graphs. Does it always occur?

  • Are there graphs in which no localization occurs?

If localization occurs *everywhere* then our result is less meaningful...

32 32

slide-33
SLIDE 33

Strong localization can be impossible

Consider a star graph

Values in the PageRank vector seeded on the center node. Essentially everything is needed to be non-zero to get a global error bound. 1 1 + α α (1 + α)(n − 1) 33 33

slide-34
SLIDE 34

Strong localization can be impossible

Consider a star graph How many entries in x can we round to zero before its error is too large? This:

  • requires

Values in the PageRank vector seeded on the center node. Essentially everything is needed to be non-zero to get a global error bound.

kx x∗k1  ε

1 1 + α α (1 + α)(n − 1)

1 + n ⇣ 1 − ε(1+α)

α

⌘ ≤ nnz(x∗)

34 34

slide-35
SLIDE 35

Strong localization can be impossible

THM: (Nassar, K., Gleich) Seeded PageRank is non-local

  • n any complete bipartite graphs (generalizing star graphs).
  • 35

35

slide-36
SLIDE 36

Strong localization can be impossible

THM: (Nassar, K., Gleich) Seeded PageRank is non-local

  • n any complete bipartite graphs (generalizing star graphs).
  • Why?

Fact: P is complete-bipartite iff eigenvalues = {-1,0,1}. PageRank is really a matrix function, Fact: a matrix function is equiv to interpolating polynomial

  • Only 3 eigenvalues p(x) is degree 2 (!)

f(x) = (1 − αx)−1. p(λi) = f(λi) → p(P) = f(P) ( − αP)−1ej = f(P)ej = (c0 + c1P + c2P2)ej

36 36

slide-37
SLIDE 37

When is localization possible?

Graphs exist where seeded PageRank has no localized behavior (complete bipartite)

  • & graphs exist with

localized behavior everywhere ( degree <= constant, or log log(n) )

  • So what properties can

determine localization in seeded PageRank?

37 37

slide-38
SLIDE 38

Skewed degree sequences

  • Log-log scale
  • 1.1M nodes

3M edges p ~ 0.71

[Yang and Leskovec, ICDM 2015]

Graphs where the k-th largest degree d(k) ≤ max(dk−p, δ) ( is min degree, d is max degree p is decay exponent )

δ

Youtube degree nodes

38 38

slide-39
SLIDE 39

Strong localization in personalized PageRank Vectors

Due to the maximum degree d, this does not say anything about traditional power-law graphs (e.g. the Pareto case)

Theorem (Nassar, K., Gleich): Let a graph have max-degree d, min-degree δ, n nodes, and let p be the decay exponent. Then Gauss Southwell computes xε with accuracy kx xεk1  ε, and the number

  • f non-zeros in xε is no greater than: min

⇢ n , 1 δ Cp(1/ε)

δ 1−α

  • where Cp =

( d(1 + log d) p = 1 d ⇣ 1 +

1 1−p(d(1/p)−1 1)

  • therwise

39 39

slide-40
SLIDE 40

We study the behavior of the Gauss-Southwell or push algorithm for computing PageRank

  • residual = remaining rank/dye to assign
  • solution = assigned rank/dye

Algorithm

1. pick node with most residual dye 2. assign dye to node 3. update residual dye on neighbors, 4. then repeat.

Strong localization in personalized PageRank Vectors (sketch)

40 40

slide-41
SLIDE 41

Coordinate relaxation for PageRank

Approximating Initial solution and residual: Iterative updates: first pick entry of residual, j

  • update solution:
  • update residual:
  • KEY: non-zeros in solution

bounded by number of iterations, k+1

x(0) = 0, r(0) = (1 − α)s

x(k+1) = x(k) + rj · ej

( − αP)x = ˜ s

r(k+1) = ˜ s − ( − αP)x(k+1) = r(k) − rjej + rjαPej

41 41

slide-42
SLIDE 42

PageRank Convergence: error & residual

Approximating a solution to residual and error satisfy

  • for any sub-multiplicative matrix norm || || .

( − αP)x = ˜ s

r(k) = ˜ s ( αP)x(k) = ( αP)x ( αP)x(k) ( αP)−1r(k) = (x x(k)) kx x(k)k  k( αP)−1kkr(k)k

42 42

slide-43
SLIDE 43

PageRank Convergence: residual bound

Approximating a solution to Error satisfies

  • ( − αP)x = ˜

s kx x(k)k  k( αP)−1kkr(k)k

Initial solution and residual: Update residual:

x(0) = 0, r(0) = (1 − α)s r(k+1) = r(k) − rjej + rjαPej

kr(k+1)k1  kr(k) rjejk1 + krjαPejk1  kr(k)k1 rj + |rjα|kPejk1  kr(k)k1 rj + |rjα|  kr(k)k1 rj(1 α)

Residual nonnegative Triangle inequality P is column-stochastic Residual nonnegative 43 43

slide-44
SLIDE 44

PageRank Convergence: residual bound

Approximating a solution to Error satisfies

  • ( − αP)x = ˜

s kx x(k)k  k( αP)−1kkr(k)k

Initial solution and residual: Residual norm: Assume we chose rj to be at least as big as the average magnitude of the residual entries. Then Bounding -- use the skewed degree seq!

x(0) = 0, r(0) = (1 − α)s

kr(k+1)k1  kr(k)k1 rj(1 α)

(definition of average)

rj kr(k)k1/nnz(r(k)) nnz(r(k))

44 44

slide-45
SLIDE 45

PageRank Convergence: the weeds, brief

Degree sequence assumption:

  • enables us to prove:
  • [ .... skipping the thickest weeds .... ]
  • which enables a bound on residual decay!

(recall that and is min degree)

d(t) ≤ d · t−p nnz(r(k)) ≤ Cp + δk kr(k+1)k1  (1 α)

  • (δ(k + 1) + Cp)/Cp

−(1−α)/δ

Cp ≈ d log d δ 45 45

slide-46
SLIDE 46

Strong localization in personalized PageRank Vectors (repeated)

Theorem (Nassar, K., Gleich): Let a graph have max-degree d, min-degree δ, n nodes, and let p be the decay exponent. Then Gauss Southwell computes xε with accuracy kx xεk1  ε, and the number

  • f non-zeros in xε is no greater than: min

⇢ n , 1 δ Cp(1/ε)

δ 1−α

  • where Cp =

( d(1 + log d) p = 1 d ⇣ 1 +

1 1−p(d(1/p)−1 1)

  • therwise

Only Cp depends on n, the rest are constants!

46 46

slide-47
SLIDE 47

PR

State of the art, 2016/4/22

HK Weak localization Strong localization Gen Diff

[Nassar, K., Gleich,

2015]

  • [Gleich & K. 2014]
  • [Gleich & K., 2014]

In preparation!

?

47 47

slide-48
SLIDE 48

48 48

10

1

10

2

10

3

10

4

10

5

10

−5

10

−4

10

−3

10

−2

10

−1

10 1/ε Degree normalized PageRank Netscience −− PageRank Solution Paths

f = X

k=0

ckPk ˆ s = f(P)ˆ s

Diffusion paths AptRank: Adaptive diffusions

(K. & Gleich) (Jiang, K., Gleich, Gribskov)

slide-49
SLIDE 49

Thank you

  • David F. Gleich (Purdue CS professor)

advisor and collaborator on all projects

  • Huda Nassar (Purdue CS grad student)

collaborator on PageRank localization and de-localization

  • Olivia Simpson (UCSD CS grad student)

collaborator on generalized diffusion work

  • The whole research group, for many proofreads in slides and

papers alike: Nichole Eikmeier, Yangyang Hou, Huda Nassar, Bryan Rainey, Yanfei Ren, Ayan Sinha, Varun Vasudevan, Nate Veldt, Tau Wu 49 49