Efficient Diameter Approximation for Large Graphs in MapReduce - - PowerPoint PPT Presentation

efficient diameter approximation for large graphs in
SMART_READER_LITE
LIVE PREVIEW

Efficient Diameter Approximation for Large Graphs in MapReduce - - PowerPoint PPT Presentation

Efficient Diameter Approximation for Large Graphs in MapReduce Geppino Pucci - Universit` a di Padova, Italy Based on joint works ([SPAA15], [IPDPS16]) with: Matteo Ceccarello, Andrea Pietracaprina (U. Padova) Eli Upfal (Brown U.) Outline 1.


slide-1
SLIDE 1

Efficient Diameter Approximation for Large Graphs in MapReduce

Geppino Pucci - Universit` a di Padova, Italy Based on joint works ([SPAA15], [IPDPS16]) with: Matteo Ceccarello, Andrea Pietracaprina (U. Padova) Eli Upfal (Brown U.)

slide-2
SLIDE 2

Outline

  • 1. Context
  • 2. Computational model
  • 3. Previous work
  • 4. Diameter approximation algorithm
  • 5. Experiments
  • 6. Conclusions
slide-3
SLIDE 3

Context Scenario

◮ Large graph analytics: major discovery tool for diverse

application domains (e.g., social/road/biological network analysis, cybersecurity, NLP, cognitive computing)

◮ (Commodity) computer clusters: cheap, widespread platforms

with relatively high communication/synchronization costs

Focus

◮ Approximation of graph diameter

◮ very large, undirected, weighted (sparse) graphs ◮ linear space, few parallel rounds, practical efficiency

slide-4
SLIDE 4

Computational Model MR model [PPRSU12]

◮ Abstraction of popular programming frameworks

(MapReduce/Hadoop, Spark)

◮ Builds upon and simplifies [Karloff+’10][Goodrich’11] ◮ Underlying platform: unspecified number of interconnected

commodity machines

◮ Algorithm: sequence of rounds ◮ 2 parameters: max local space ML, max aggregate space MA

MR(ML, MA) round

Transforms a multiset X of key-value pairs into a new multiset Y

  • f key value pairs by applying a given reduce function to all input

pairs with the same key.

slide-5
SLIDE 5

Previous work Sequential setting

◮ APSP (Johnson’s alg.): O(n · m + n2 log n) time ◮ Roditty et al. (STOC’13, SODA’14): 3/2-approximation in

O(min{m3/2, mn2/3}) = o(n · m) time.

◮ Empirically: very few SSSPs guarantee accurate estimates

([MLH09, CGHLM13, C+12, C+13, C+15]).

Parallel setting

◮ Exact diameter through matrix-multiplication: O(log n)

rounds but Ω(n2) space.

◮ Cohen (JACM’00): (1 + ǫ)-approximation in O(poly(log n))

time and superlinear space. Not easy to implement.

slide-6
SLIDE 6

Previous work (cont’d)

2-Approximation achievable through SSSP PRAM algorithms

∆-stepping (Meyer and Sanders, JoA’03)

◮ Parallel time-work tradeoff by staggering edge relaxations

(dj ← min{dj, di + wij})

◮ At iteration i, compute distances ∈ [(i − 1)∆, i∆]. ◮ Small ∆’s: ≃ Dijkstra. Large ∆’s: ≃ Bellman-Ford ◮ Round complexity =Ω(ℓΦ(G)), where ℓΦ(G) edges are required

to connect any two nodes at distance Φ(G).

Our aim

Diameter approximation in linear space and o(ℓΦ(G)) rounds

slide-7
SLIDE 7

Diameter approximation: high-level strategy

Based on shallow-depth clustering:

  • 1. Compute a decomposition C of G into clusters of small radius
  • 2. Estimate diameter Φ(G) from diameter Φ(GC), with GC a

suitable quotient graph derived from C

Remarks

◮ Previous decompositions ([MPX13, Mey08]) do not guarantee

small (unweighted+weighted) radius

◮ Cluster ganularity chosen so that GC fits into local memory ◮ Small radius → low round complexity, better approximation

slide-8
SLIDE 8

Decomposition C: algorithm cluster(τ) Challenges

Cluster centers are sampled at random. In order to attain small (unweighted+weighted) cluster radius we must

  • 1. Ensure higher sampling density in remote regions of the graph
  • 2. Avoid heavy edges for cluster growth

Key ingredients

  • 1. Progressive clustering strategy
  • 2. ∆-stepping approach to cluster growing
slide-9
SLIDE 9

Decomposition C: a pathological example

slide-10
SLIDE 10

Decomposition C: algorithm cluster(τ) Progressive clustering [CPPU15]

  • 1. Select random batch of τ centers from uncovered nodes
  • 2. Grow both old and new clusters until covering half of the

uncovered nodes

  • 3. Repeat steps 1-2 until complete coverage

∆-stepping-like cluster growth [CPPU16]

◮ ∆ ← guess on cluster’s minimum weighted radius ◮ In each iteration of progressive clustering (Steps 1-2):

◮ Use only light edges (weight < ∆) and stop at radius ∆ ◮ If desired coverage cannot be obtained then ∆ ← 2∆

slide-11
SLIDE 11

Algorithm cluster(τ): example (τ = 1, ∆ = 4) Graph G

A B C D E F G H L I M P Q R S N O

1 4 1 1 1 1 4 2 2 1 2 2 2 3 3 3 1 1 5 3

slide-12
SLIDE 12

Algorithm cluster(τ): example (τ = 1, ∆ = 4) 1st batch of τ centers

A B C D E F G H L I M P Q R S N O

1 4 1 1 1 1 4 2 2 1 2 2 2 3 3 3 1 1 5 3

slide-13
SLIDE 13

Algorithm cluster(τ): example (τ = 1, ∆ = 4) 1st batch of τ centers

A B C D E F G H L I M P Q R S N O

1 4 1 1 1 1 4 2 2 1 2 2 2 3 3 3 1 1 5 3

slide-14
SLIDE 14

Algorithm cluster(τ): example (τ = 1, ∆ = 4) 2nd batch of τ centers

A B C D E F G H L I P Q R S N O

1 4 1 1 1 1 4 2 2 1 2 2 2 3 3 3 1 1 5 3

M

slide-15
SLIDE 15

Algorithm cluster(τ): example (τ = 1, ∆ = 4) 3rd batch of τ centers

A B C D E F G H L I P Q S N O

1 4 1 1 1 1 4 2 2 1 2 2 2 3 3 3 1 1 5 3

M R

slide-16
SLIDE 16

Decomposition C: algorithm cluster(τ) Theorem

W.h.p. cluster(τ) computes a decomposition C of G into O(τ log2 n) clusters

◮ Max cluster radius: O(R(G, τ) log n) ◮ Round complexity: O(min{n/τ, ℓR(G,τ)} log n)

  • n MR(nǫ, m), for any constant ǫ ∈ (0, 1).

where:

◮ R(G, τ): minimum max radius in any τ-clustering of G ◮ ℓX: max number of edges in a min-weight path of weight X

slide-17
SLIDE 17

Diameter approximation: example Graph G, weighted diameter Φ(G) = 16

A B C D E F G H L I M P Q R S N O

1 4 1 1 1 1 4 2 2 1 2 2 2 3 3 3 1 1 5 3

slide-18
SLIDE 18

Diameter approximation: example

A B F G H C D E L I P Q S N O

1 4 1 1 1 1 4 2 2 1 2 2 2 3 3 3 1 1 5 3

M R

Quotient graph GC Φ(GC) = 12 Φ(G) <= 12+4+2 = 18 (vs 16)

A

5

M R

7

Radius = 4 Radius = 2 Radius = 3

slide-19
SLIDE 19

Diameter approximation: main result Theorem

For a given weighted graph G, w.h.p. we can compute an upper bound to Φ(G)

◮ Approximation ratio: O(log3 n) ◮ Round complexity: O(min{n/τ, ℓR(G,τ) log n} log n)

  • n MR(nǫ, m), for any constant ǫ ∈ (0, 1).

Remarks

◮ Round complexity becomes o(ℓΦ(G)/nδ) on graphs of bounded

doubling dimension

◮ Practical implementation. On real-world graphs,

approximation ratio < 1.3

◮ Byproduct: linear-space, low-round k-center clustering in MR

slide-20
SLIDE 20

Proof Idea

◮ 2-phase decomposition strategy:

◮ Phase 1. Compute an estimate R of R(G, τ) through

progressive sampling.

◮ Phase 2. Perform log n iterations of cluster-growing steps of

fixed radius R from batches of centers selected with geometrically increasing probability

◮ O(log3 n) Approximation: w.h.p. the nodes of each

shortest-path segment of length R belong to O(log2 n) clusters of radius O(R log n).

slide-21
SLIDE 21

Diameter approximation: experiments Experimental setup

◮ In-house cluster with 16 machines ◮ 18GB RAM / Intel i7 nehalem 4-core processor ◮ Spark MapReduce platform

Datasets Graph n m Φ(G) roads-USA 23,947,347 29,166,673 55,859,820 roads-CAL 1,890,815 2,328,872 16,425,258 livejournal 3,997,962 32, 681, 189 9.41 twitter 41,652,230 1,468,365,182 9.07 mesh(S) S2 2S(S − 1) † R-MAT(S) 2S 16 · 2S † roads(S) ≈ S · 2.3 · 107 ≈ S · 5.3 · 107 † † the diameter depends on the size of the graph, controlled by S > 1. Scalability

21 22 23 24 machines 500 1000 1500 2000 2500 3000 3500 4000 4500 time (s) R­MAT(26) roads(3)

slide-22
SLIDE 22

Diameter approximation: experiments

◮ We compare our algorithm (CLUSTER) with ∆-stepping

Rounds

r

  • a

d s ­ U S A r

  • a

d s ­ C A L m e s h l i v e j

  • u

r n a l t w i t t e r R ­ M A T ( 2 4 ) 100 101 102 103 104 105 CLUSTER ∆­stepping

Time

r

  • a

d s ­ U S A r

  • a

d s ­ C A L m e s h l i v e j

  • u

r n a l t w i t t e r R ­ M A T ( 2 4 ) 101 102 103 104 105 time (s) CLUSTER ∆­stepping

slide-23
SLIDE 23

Diameter approximation: experiments

Work

r

  • a

d s ­ U S A r

  • a

d s ­ C A L m e s h l i v e j

  • u

r n a l t w i t t e r R ­ M A T ( 2 4 ) 107 108 109 1010 1011 1012 CLUSTER ∆­stepping

Approximation

r

  • a

d s ­ U S A r

  • a

d s ­ C A L m e s h l i v e j

  • u

r n a l t w i t t e r R ­ M A T ( 2 4 ) 1.0 1.1 1.2 1.3 1.4 1.5 CLUSTER ∆­stepping

The approximation quality does not depend on the granularity of the clustering.

slide-24
SLIDE 24

Conclusions Summary

MR-algorithm for O(log3 n) approximation of the diameter of a large, undirected, weighted graph G

◮ o(ℓΦ(G)) rounds, linear global space, sublinear local space ◮ Good performance/approximation on real-world graphs

Ongoing and future work

◮ Tighter analysis of approximation factor ◮ Clustering + constant d.d. yields a (1 + ǫ) (unweighted)

diameter approximation in O((m + n)/ǫ) sequential time.

◮ Clustering for approximate centrality computations

Software GRADIAS: crono.dei.unipd.it/gradias