Efficient Densest Subgraph Computation in Evolving Graphs Alessandro - - PowerPoint PPT Presentation

efficient densest subgraph computation in evolving graphs
SMART_READER_LITE
LIVE PREVIEW

Efficient Densest Subgraph Computation in Evolving Graphs Alessandro - - PowerPoint PPT Presentation

Efficient Densest Subgraph Computation in Evolving Graphs Alessandro Epasto Joint work with Silvio Lattanzi (Google Research, NY) and Mauro Sozio (Tlcom ParisTech) Social Networks are Constantly Evolving Brutus Julius Social Networks are


slide-1
SLIDE 1

Efficient Densest Subgraph Computation in Evolving Graphs

Joint work with Silvio Lattanzi (Google Research, NY) and Mauro Sozio (Télécom ParisTech)

Alessandro Epasto

slide-2
SLIDE 2

Social Networks are Constantly Evolving

Brutus Julius

slide-3
SLIDE 3

Brutus Julius Cleopatra

Social Networks are Constantly Evolving

slide-4
SLIDE 4

Brutus Julius Cleopatra

Social Networks are Constantly Evolving

slide-5
SLIDE 5

Brutus Julius Cleopatra

Social Networks are Constantly Evolving

slide-6
SLIDE 6

Brutus Cleopatra

Social Networks are Constantly Evolving

slide-7
SLIDE 7

Brutus Cleopatra Mark Anthony

Social Networks are Constantly Evolving

slide-8
SLIDE 8

Events in Social Media Streams

  • WWW2015 conference will be held in Florence.
  • Hofmann confirmed keynote at WWW2015 in Florence
  • WWW2015 opens May 20 in Florence

Dense subgraphs represent events!

slide-9
SLIDE 9

Event Detection

slide-10
SLIDE 10

Dynamic Community Detection Algorithms

Most algorithms assume a single static graph in input. Naive solution: run the algorithm once for each update. GOAL: efficiently keep track of the communities as the graph evolve.

slide-11
SLIDE 11

Densest Subgraph

H

Density H = 3/4

slide-12
SLIDE 12

Densest Subgraph

H

slide-13
SLIDE 13

Densest Subgraph in Static Graphs

  • Community used in Social Networks, Web and

Biology.

  • Polynomial exact algorithm (Goldberg, 1984)
  • (2+eps)-approximation MapReduce algorithm

(Bahmani et al., 2012).

slide-14
SLIDE 14

Densest Subgraph in Dynamic Graphs

No results known* in dynamic graphs with sublinear update time (before our publication). Naive Approach: O(m + n) time per update!

* Bhattacharya et al. - to appear in STOC 2015. Strong guarantees in streaming model.

slide-15
SLIDE 15

Our Problem

Goal: Preserve a 2+eps approximation with average time O(poly-log(n+m)) per update. Notice: Much better than O(n+m) per update and includes output time!

slide-16
SLIDE 16

Our Dynamic Graph Model

Start from an empty graph. Arbitrary long sequence of edge updates arrives… This models also node addition/removals implicitly.

(A, B) (B, C) (A, B)

… …

slide-17
SLIDE 17

Incremental and Fully-Dynamic

INCREMENTAL: arbitrary stream of edges additions only.

(A, B) (B, C)

slide-18
SLIDE 18

Incremental and Fully-Dynamic

FULLY-DYNAMIC: stream of edges arbitrary additions and random deletion.

(A, B) (B, C) (A, B)

slide-19
SLIDE 19

Our Goal

Design a Data Structure: 1) AddEdge(u,v) 2) RemoveEdge(u,v) Both operations can output a new densest subgraph S or nothing.

Invariant: the last subgraph in output is a 2+eps approx. for the current graph

slide-20
SLIDE 20

Result for edge additions (incremental)

Theorem: We maintain a 2+eps approx. in O(log^2(n) / eps^2) average time and linear space

Significant improvement over naive approach: O(m+n) average time

slide-21
SLIDE 21

Result for edge additions and deletion (fully dynamic)

Theorem: We maintain a 2+eps approx. in O(log^4(n) / eps^4) average time and linear space.

Very fast also in practice!

slide-22
SLIDE 22

Roadmap

  • Review Bahmani et al. for static graphs.
  • A new static graph algorithm.
  • Incremental algorithm.
  • Randomized fully-dynamic algorithm.
slide-23
SLIDE 23

Static Case - Bahmani et al. Algorithm

Let eps > 0: Iteration: 1 1) Compute Avg. Deg = K

Graph G0

slide-24
SLIDE 24

Static Case - Bahmani et al. Algorithm

Let eps > 0: Iteration: 1 1) Compute Avg. Deg = K 2) Let T = K (1+eps)

Graph G0

T = 2.3

slide-25
SLIDE 25

Static Case - Bahmani et al. Algorithm

Let eps > 0: Iteration: 1 1) Compute Avg. Deg = K 2) Let T = K (1+eps) 3) Remove nodes with degree < T

Graph G0

T = 2.3

slide-26
SLIDE 26

Static Case - Bahmani et al. Algorithm

Let eps > 0: Iteration: 2 1) Compute Avg. Deg = K

Graph G0 Graph G1

T = 2.3

slide-27
SLIDE 27

Static Case - Bahmani et al. Algorithm

Let eps > 0: Iteration: 2 1) Compute Avg. Deg = K 2) Let T = K (1+eps)

Graph G0 Graph G1

T = 2.3 T = 3.2

slide-28
SLIDE 28

Static Case - Bahmani et al. Algorithm

Let eps > 0: Iteration: 2 1) Compute Avg. Deg = K 2) Let T = K (1+eps) 3) Remove nodes with degree < T

Graph G0 Graph G1

T = 2.3 T = 3.2

slide-29
SLIDE 29

Static Case - Bahmani et al. Algorithm

Iterate until all nodes are removed. O u t p u t t h e d e n s e s t subgraph Gi.

T = 2.3

G2

T = 3.2

Graph G0 Graph G1

slide-30
SLIDE 30

Static Case - Bahmani et al. Algorithm

Iterate until all nodes are removed. O u t p u t t h e d e n s e s t subgraph Gi.

Graph G0 Graph G1

T = 2.3

G2

Theorem: (Bahmani et al.) 2+eps approx. in log(n) steps.

T = 3.2

slide-31
SLIDE 31

Towards a Dynamic Algorithm

  • Idea: Store graphs Gi’s.
  • When an edge is added update the Gi’s

u v T = 2.3 T = 3.2

This ensures a 2+eps approximation!

Graph G0 Graph G1

slide-32
SLIDE 32

Towards a Dynamic Algorithm

u v T = 2.3 T = 3.2

Deg > 2.3

Graph G0 Graph G1

  • Idea: Store graphs Gi’s.
  • When an edge is added update the Gi’s
slide-33
SLIDE 33

Towards a Dynamic Algorithm

Graph G0 Graph G1

T = 2.6 T = 4.0

Chain effect!

  • Idea: Store graphs Gi’s.
  • When an edge is added update the Gi’s
slide-34
SLIDE 34

Idea: fix Threshold T for all iterations

  • Use same threshold T at each iteration.
  • Easier to analyze and maintain.

For correct threshold T: same approximation of Bahamani et al.’s algorithm.

You’d better use T = 3.1

slide-35
SLIDE 35

Moving Threshold (Only Additions)

1) Set T = 1 to compute densest subgraph H and

  • utput it.

This provides a 2+eps approx. in O(poly-log(n)) average time

slide-36
SLIDE 36

Moving Threshold (Only Additions)

1) Set T = 1 to compute densest subgraph H and

  • utput it.

2) Maintain the Gi’ using threshold T as long as all nodes are removed in O(log(n)) steps.

This provides a 2+eps approx. in O(poly-log(n)) average time

slide-37
SLIDE 37

Moving Threshold (Only Additions)

1) Set T = 1 to compute densest subgraph H and

  • utput it.

2) Maintain the Gi’ using threshold T as long as all nodes are removed in O(log(n)) steps. 3) Repeat from 1) with higher threshold T = T * 2

This provides a 2+eps approx. in O(poly-log(n)) average time

slide-38
SLIDE 38

Fully-Dynamic Case

The analysis is significantly harder:

  • The density can increase/decrease in complex

patterns…

  • …densest subgraph is stable under random removals.
  • We tackle the stability to recompute the subgraph

few times.

slide-39
SLIDE 39

Experimental Evaluation - Datasets

  • DBLP& Patent: co-authorship graph.
  • LastFM: songs co-listened.
  • Yahoo! Answers: >1 Billions edges. Edge if two users

answer the same question.

slide-40
SLIDE 40

Evolution Densest Subgraph

1 2 3 4 5 6 7 1970 1975 1980 1985 1990 1995 2000 2005 2010 20 40 60 80 100

Density Size Time

Density Size

DBLP - Sliding Window 5 years

slide-41
SLIDE 41

Evolution Densest Subgraph

Patent Citations - Sliding Window 5 years

5 10 15 20 25 30 35 1975 1980 1985 1990 1995 50 100 150 200 250 300

Density Size Time

Density Size

slide-42
SLIDE 42

Evolution Densest Subgraph

Yahoo Answers - Sliding Window 100M edges

200 400 600 800 1000 1200 1400 1600 5e+08 1e+09 1.5e+09 2e+09 2.5e+09 500 1000 1500 2000 2500 3000 3500

Density Size Time

Density Size

Efficient in Highly Dynamic Datasets with Billions of Updates.

slide-43
SLIDE 43

Update Time vs Epsilon

10 20 30 40 50 60 70 80 90 d b l p p a t e n t

  • c
  • a

u t p a t e n t

  • c

i t l a s t f m y a h

  • Microseconds
  • Avg. Time per Update vs Epsilon

0.5 0.3 0.1 0.05

Scales much better with Epsilon than worst case.

slide-44
SLIDE 44

Comparison with Static Algorithm

1 10 100 1000 10000 100000 dblp patent-coaut patent-cit lastfm Microseconds

  • Avg. Time per Update vs K

Our Algorithm K=100000 K=10000 K=1000

slide-45
SLIDE 45

Comparison With Static Algorithm

1 10 100 d b l p p a t e n t

  • c

i t p a t e n t

  • c
  • a

u t l a s t f m Relative Error

Max Relative Error Static Algorithm vs K

100000 10000 1000

slide-46
SLIDE 46

Conclusions and Future Work

  • It is possible to maintain the densest subgraph

efficiently in dynamic graphs.

  • Future work: Recent Techniques (Bhattacharya et al.)

to define 2+eps with adversarial removes?

  • Top-k Densest Subgraph in Dynamic Graphs.
slide-47
SLIDE 47

Thank you for your attention

slide-48
SLIDE 48

Recent Results - STOC

Concurrently to our work Bhattacharya et al., STOC 2015 introduced a novel streaming algorithm for densest subgraph with strong guarantees.

  • Different model: Update vs Query time.
  • Strong space constraints (cannot store entire graph).
  • Adversarial additions and deletions.
  • 4+eps approx with O(n poly log) space, O(poly log)

update time, O(n) query time.

  • 2+eps approx with O(n poly log) space, higher time

complexity.

slide-49
SLIDE 49

Incremental Case: Only Additions

slide-50
SLIDE 50

Density vs Epsilon

20 40 60 80 100 120 140 d b l p p a t e n t

  • c
  • a

u t p a t e n t

  • c

i t l a s t f m y a h

  • 200

400 600 800 1000 1200 1400

Density (Ex. LastFm and Yahoo) Density (LastFm and Yahoo)

Maximum Density vs Epsilon

0.5 0.3 0.1 0.05

Max density is stable with different epsilons.

slide-51
SLIDE 51

Analysis of the Algorithm

We divide the edge additions in Rounds.

Round 1 Round 2 Add Add Add Add Add

H H Overflow T <- T(1+eps) Run of Static Algorithm Run of Static Algorithm Round i Add Add H

  • utput
  • utput
  • utput

Overflow T <- T(1+eps)

slide-52
SLIDE 52

Densest Subgraph - LP Primal

slide-53
SLIDE 53

Definitions

We say that an algorithm is a approximation of the densest subgraph problem for a > 1 if it outputs a graph with density at least: OPT / a We say that an operation has T amortized time if for any sequence of k update operations the total time is O(k T)

slide-54
SLIDE 54

Densest Subgraph - LP Primal Dual

  • The dual problem is the well-known graph orientation

problem.

  • Given undirected graph G find directed graph H
  • btained orienting the edges of G arbitrarily, that

minimizes the maximum in-degree.

  • If G has orientation of max in-degree < D then density
  • f densest subgraph is < D.
  • Hence, if it is possible to remove all nodes by

recursively removing nodes with degree < D then max density is < D.

slide-55
SLIDE 55

Fully Dynamic Algorithm

We divide the edge additions and deletions in Rounds.

Round i Add Rem Add H Invariant Fails Run Static Algorithm

… …

slide-56
SLIDE 56

Fully Dynamic Algorithm

We divide the edge additions and deletions in Rounds.

Round i Add Rem Add H Invariant Fails Run Static Algorithm

… …

Bad Round < O(m / log(n)) removals Good Round > O(m / log(n)) removals

slide-57
SLIDE 57

Fully Dynamic Algorithm

Round 1 Rem Rem Add

… …

Round 2 Add Rem Add Round 3 Add Add Add

Good Bad Bad

Idea: in good rounds removals “pay” for all the

  • perations

We can show that there are never more than poly-log consecutive bad rounds (w.h.p)