Parallel Algorithms for Graphs on a Very Large Number of Nodes - - PowerPoint PPT Presentation

parallel algorithms for graphs on a very large number of
SMART_READER_LITE
LIVE PREVIEW

Parallel Algorithms for Graphs on a Very Large Number of Nodes - - PowerPoint PPT Presentation

Parallel Algorithms for Graphs on a Very Large Number of Nodes Krzysztof Onak IBM T.J. Watson Research Center Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 1 / 26 Outline Model of Computation 1 Sample


slide-1
SLIDE 1

Parallel Algorithms for Graphs

  • n a Very Large Number of Nodes

Krzysztof Onak

IBM T.J. Watson Research Center

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 1 / 26

slide-2
SLIDE 2

Outline

1

Model of Computation

2

Sample Algorithms and Their Limitations

3

Efficiently Estimating MST Weight

4

Computing MST in Geometric Setting

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 2 / 26

slide-3
SLIDE 3

Outline

1

Model of Computation

2

Sample Algorithms and Their Limitations

3

Efficiently Estimating MST Weight

4

Computing MST in Geometric Setting

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 3 / 26

slide-4
SLIDE 4

Model: Massive Parallel Computation

[Karloff, Suri, Vassilvitskii 2010; Beame, Koutris, Suciu 2013; . . . ] n items on input m machines

Machine Machine Machine Machine Machine Machine Machine Machine Machine

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 4 / 26

slide-5
SLIDE 5

Model: Massive Parallel Computation

[Karloff, Suri, Vassilvitskii 2010; Beame, Koutris, Suciu 2013; . . . ] n items on input m machines space per machine: s = n m · small-factor

Machine Machine Machine Machine Machine Machine Machine Machine Machine

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 4 / 26

slide-6
SLIDE 6

Model: Massive Parallel Computation

[Karloff, Suri, Vassilvitskii 2010; Beame, Koutris, Suciu 2013; . . . ] n items on input m machines space per machine: s = n m · small-factor

Machine Machine Machine Machine Machine Machine Machine Machine Machine

  • Initially: each machine receives n/m items

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 4 / 26

slide-7
SLIDE 7

Model: Massive Parallel Computation

[Karloff, Suri, Vassilvitskii 2010; Beame, Koutris, Suciu 2013; . . . ] n items on input m machines space per machine: s = n m · small-factor

Machine Machine Machine Machine Machine Machine Machine Machine Machine

  • Initially: each machine receives n/m items
  • Single round:
  • 1. Each machine performs computation
  • 2. Each machine sends and receives at most O(s) data

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 4 / 26

slide-8
SLIDE 8

Resources

n items on input m machines space per machine: s = n m · small-factor

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 5 / 26

slide-9
SLIDE 9

Resources

n items on input m machines space per machine: s = n m · small-factor

  • Popular assumption:

m = O(nα) for α ∈ (0, 1) = ⇒ s = nΩ(1)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 5 / 26

slide-10
SLIDE 10

Resources

n items on input m machines space per machine: s = n m · small-factor

  • Popular assumption:

m = O(nα) for α ∈ (0, 1) = ⇒ s = nΩ(1)

  • Likely to happen:

s ≫ m

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 5 / 26

slide-11
SLIDE 11

Resources

n items on input m machines space per machine: s = n m · small-factor

  • Popular assumption:

m = O(nα) for α ∈ (0, 1) = ⇒ s = nΩ(1)

  • Likely to happen:

s ≫ m Goals:

  • Minimize the number of rounds

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 5 / 26

slide-12
SLIDE 12

Resources

n items on input m machines space per machine: s = n m · small-factor

  • Popular assumption:

m = O(nα) for α ∈ (0, 1) = ⇒ s = nΩ(1)

  • Likely to happen:

s ≫ m Goals:

  • Minimize the number of rounds
  • Optimize running time

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 5 / 26

slide-13
SLIDE 13

Resources

n items on input m machines space per machine: s = n m · small-factor

  • Popular assumption:

m = O(nα) for α ∈ (0, 1) = ⇒ s = nΩ(1)

  • Likely to happen:

s ≫ m Goals:

  • Minimize the number of rounds
  • Optimize running time
  • Use amount of memory as close to linear as possible

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 5 / 26

slide-14
SLIDE 14

Comparison to PRAM

  • PRAM: classic parallel model
  • m processors
  • processors access common memory

Processor Processor Processor Processor

1 1 1 1 1 1 1 1 1 1 1

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 6 / 26

slide-15
SLIDE 15

Comparison to PRAM

  • PRAM: classic parallel model
  • m processors
  • processors access common memory
  • Many problems require ˜

Ω(log n) rounds in PRAM

Processor Processor Processor Processor

1 1 1 1 1 1 1 1 1 1 1

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 6 / 26

slide-16
SLIDE 16

Comparison to PRAM

  • PRAM: classic parallel model
  • m processors
  • processors access common memory
  • Many problems require ˜

Ω(log n) rounds in PRAM Example: computing XOR of n bits requires Ω(log n/ log log n) time in strongest PRAM model [Beame, Håstad 1989]

Processor Processor Processor Processor

1 1 1 1 1 1 1 1 1 1 1

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 6 / 26

slide-17
SLIDE 17

Comparison to PRAM

  • PRAM: classic parallel model
  • m processors
  • processors access common memory
  • Many problems require ˜

Ω(log n) rounds in PRAM Example: computing XOR of n bits requires Ω(log n/ log log n) time in strongest PRAM model [Beame, Håstad 1989]

  • Our model: O(logs n) rounds for XOR

XOR XOR XOR XOR

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 6 / 26

slide-18
SLIDE 18

Comparison to PRAM

  • PRAM: classic parallel model
  • m processors
  • processors access common memory
  • Many problems require ˜

Ω(log n) rounds in PRAM Example: computing XOR of n bits requires Ω(log n/ log log n) time in strongest PRAM model [Beame, Håstad 1989]

  • Our model: O(logs n) rounds for XOR

If s = nΩ(1), number of rounds is constant

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 6 / 26

slide-19
SLIDE 19

Comparison to PRAM

  • PRAM: classic parallel model
  • m processors
  • processors access common memory
  • Many problems require ˜

Ω(log n) rounds in PRAM Example: computing XOR of n bits requires Ω(log n/ log log n) time in strongest PRAM model [Beame, Håstad 1989]

  • Our model: O(logs n) rounds for XOR

If s = nΩ(1), number of rounds is constant

  • Our goal: constant number of communication rounds

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 6 / 26

slide-20
SLIDE 20

Outline

1

Model of Computation

2

Sample Algorithms and Their Limitations

3

Efficiently Estimating MST Weight

4

Computing MST in Geometric Setting

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 7 / 26

slide-21
SLIDE 21

Main Subject of Study: Minimum Spanning Tree

Select the subset of edges of minimum weight that connects all vertices

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 8 / 26

slide-22
SLIDE 22

Filtering Technique

[Karloff, Suri, Vassilvitskii 2010] [Lattanzi, Moseley, Suri, Vassilvitskii 2011]

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 9 / 26

slide-23
SLIDE 23

Filtering Technique

[Karloff, Suri, Vassilvitskii 2010] [Lattanzi, Moseley, Suri, Vassilvitskii 2011]

  • Input: weighted edges of a graph on N vertices

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 9 / 26

slide-24
SLIDE 24

Filtering Technique

[Karloff, Suri, Vassilvitskii 2010] [Lattanzi, Moseley, Suri, Vassilvitskii 2011]

  • Input: weighted edges of a graph on N vertices
  • Main idea:
  • 1. Find minimum spanning forest for subset of edges
  • 2. Remove edges not in the forest

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 9 / 26

slide-25
SLIDE 25

Filtering Technique

[Karloff, Suri, Vassilvitskii 2010] [Lattanzi, Moseley, Suri, Vassilvitskii 2011]

  • Input: weighted edges of a graph on N vertices
  • Main idea:
  • 1. Find minimum spanning forest for subset of edges
  • 2. Remove edges not in the forest
  • Algorithm: repeat the process until problem solved

MSF MSF MSF MSF Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 9 / 26

slide-26
SLIDE 26

Filtering Technique

[Karloff, Suri, Vassilvitskii 2010] [Lattanzi, Moseley, Suri, Vassilvitskii 2011]

  • Input: weighted edges of a graph on N vertices
  • Main idea:
  • 1. Find minimum spanning forest for subset of edges
  • 2. Remove edges not in the forest
  • Algorithm: repeat the process until problem solved

MSF MSF MSF MSF

  • Caveat: ≥N space per machine required

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 9 / 26

slide-27
SLIDE 27

Filtering Technique

[Karloff, Suri, Vassilvitskii 2010] [Lattanzi, Moseley, Suri, Vassilvitskii 2011]

  • Input: weighted edges of a graph on N vertices
  • Main idea:
  • 1. Find minimum spanning forest for subset of edges
  • 2. Remove edges not in the forest
  • Algorithm: repeat the process until problem solved

MSF MSF MSF MSF

  • Caveat: ≥N space per machine required
  • Complexity: s = N1+Ω(1)

⇒ O(1) rounds

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 9 / 26

slide-28
SLIDE 28

N1−Ω(1) Space in O(1) Rounds?

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 10 / 26

slide-29
SLIDE 29

N1−Ω(1) Space in O(1) Rounds?

  • Unlikely to be possible in general

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 10 / 26

slide-30
SLIDE 30

N1−Ω(1) Space in O(1) Rounds?

  • Unlikely to be possible in general
  • Can reduce from Sparse Connectivity:

Do edges span a connected graph?

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 10 / 26

slide-31
SLIDE 31

N1−Ω(1) Space in O(1) Rounds?

  • Unlikely to be possible in general
  • Can reduce from Sparse Connectivity:

Do edges span a connected graph?

  • Conjecture: superconstant number of rounds

with N1−Ω(1) memory

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 10 / 26

slide-32
SLIDE 32

N1−Ω(1) Space in O(1) Rounds?

  • Unlikely to be possible in general
  • Can reduce from Sparse Connectivity:

Do edges span a connected graph?

  • Conjecture: superconstant number of rounds

with N1−Ω(1) memory

  • Is this instance hard?

vs.

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 10 / 26

slide-33
SLIDE 33

N1−Ω(1) Space in O(1) Rounds?

  • Unlikely to be possible in general
  • Can reduce from Sparse Connectivity:

Do edges span a connected graph?

  • Conjecture: superconstant number of rounds

with N1−Ω(1) memory

  • Is this instance hard?

(solvable in O(log N) rounds)

vs.

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 10 / 26

slide-34
SLIDE 34

N1−Ω(1) Space in O(1) Rounds?

  • Unlikely to be possible in general
  • Can reduce from Sparse Connectivity:

Do edges span a connected graph?

  • Conjecture: superconstant number of rounds

with N1−Ω(1) memory

  • Is this instance hard?

(solvable in O(log N) rounds)

vs.

  • Reduction: connect select vertex to all vertices

with heavy edges

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 10 / 26

slide-35
SLIDE 35

N1−Ω(1) Space in O(1) Rounds?

  • Unlikely to be possible in general
  • Can reduce from Sparse Connectivity:

Do edges span a connected graph?

  • Conjecture: superconstant number of rounds

with N1−Ω(1) memory

  • Is this instance hard?

(solvable in O(log N) rounds)

vs.

  • Reduction: connect select vertex to all vertices

with heavy edges

  • This talk: algorithms with O(Nǫ) space per machine

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 10 / 26

slide-36
SLIDE 36

Outline

1

Model of Computation

2

Sample Algorithms and Their Limitations

3

Efficiently Estimating MST Weight

4

Computing MST in Geometric Setting

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 11 / 26

slide-37
SLIDE 37

Result

[Ł ˛ acki, M ˛ adry, Mitrovi´ c, O., Sankowski]

  • Input: M edges, weights in {1, 2, . . . , W}

(#nodes N ≤ #edges M)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 12 / 26

slide-38
SLIDE 38

Result

[Ł ˛ acki, M ˛ adry, Mitrovi´ c, O., Sankowski]

  • Input: M edges, weights in {1, 2, . . . , W}

(#nodes N ≤ #edges M)

  • Algorithm:
  • Computes (1 + ǫ)-approximation to MST weight

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 12 / 26

slide-39
SLIDE 39

Result

[Ł ˛ acki, M ˛ adry, Mitrovi´ c, O., Sankowski]

  • Input: M edges, weights in {1, 2, . . . , W}

(#nodes N ≤ #edges M)

  • Algorithm:
  • Computes (1 + ǫ)-approximation to MST weight
  • Space per machine:

O

  • M

m + N m · W ǫ 2 for M/m = MΩ(1)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 12 / 26

slide-40
SLIDE 40

Result

[Ł ˛ acki, M ˛ adry, Mitrovi´ c, O., Sankowski]

  • Input: M edges, weights in {1, 2, . . . , W}

(#nodes N ≤ #edges M)

  • Algorithm:
  • Computes (1 + ǫ)-approximation to MST weight
  • Space per machine:

O

  • M

m + N m · W ǫ 2 for M/m = MΩ(1)

  • Number of rounds: O(log(W/ǫ))

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 12 / 26

slide-41
SLIDE 41

Result

[Ł ˛ acki, M ˛ adry, Mitrovi´ c, O., Sankowski]

  • Input: M edges, weights in {1, 2, . . . , W}

(#nodes N ≤ #edges M)

  • Algorithm:
  • Computes (1 + ǫ)-approximation to MST weight
  • Space per machine:

O

  • M

m + N m · W ǫ 2 for M/m = MΩ(1)

  • Number of rounds: O(log(W/ǫ))
  • Note: No dependence on W would disprove

Sparse Connectivity Conjecture

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 12 / 26

slide-42
SLIDE 42

Approach

Use techniques of Chazelle, Rubinfeld, Trevisan (2005)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 13 / 26

slide-43
SLIDE 43

Approach

Use techniques of Chazelle, Rubinfeld, Trevisan (2005):

  • Gi = graph restricted to edges of weight < i

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 13 / 26

slide-44
SLIDE 44

Approach

Use techniques of Chazelle, Rubinfeld, Trevisan (2005):

  • Gi = graph restricted to edges of weight < i
  • Ti = #connected components in Gi

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 13 / 26

slide-45
SLIDE 45

Approach

Use techniques of Chazelle, Rubinfeld, Trevisan (2005):

  • Gi = graph restricted to edges of weight < i
  • Ti = #connected components in Gi
  • Number of edges of weight ≥i in MST = Ti − 1

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 13 / 26

slide-46
SLIDE 46

Approach

Use techniques of Chazelle, Rubinfeld, Trevisan (2005):

  • Gi = graph restricted to edges of weight < i
  • Ti = #connected components in Gi
  • Number of edges of weight ≥i in MST = Ti − 1

⇒ weight(MST) =

W

  • i=1

(Ti − 1)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 13 / 26

slide-47
SLIDE 47

Approach

Use techniques of Chazelle, Rubinfeld, Trevisan (2005):

  • Gi = graph restricted to edges of weight < i
  • Ti = #connected components in Gi
  • Number of edges of weight ≥i in MST = Ti − 1

⇒ weight(MST) =

W

  • i=1

(Ti − 1)

  • Ci(v) = size of the component of v in Gi

Ti =

  • v

1/Ci(v)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 13 / 26

slide-48
SLIDE 48

Approach

Use techniques of Chazelle, Rubinfeld, Trevisan (2005):

  • Gi = graph restricted to edges of weight < i
  • Ti = #connected components in Gi
  • Number of edges of weight ≥i in MST = Ti − 1

⇒ weight(MST) =

W

  • i=1

(Ti − 1)

  • Ci(v) = size of the component of v in Gi

Ti =

  • v

1/Ci(v)

  • Good approximation:
  • Compute sizes of small components
  • Replace 1/Ci(v) with 0 if Ci(v) ≥ W/ǫ

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 13 / 26

slide-49
SLIDE 49

Implementation

  • Reachability sets Rv for each node v:
  • Set of W/ǫ nodes accessible via cheapest edges

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 14 / 26

slide-50
SLIDE 50

Implementation

  • Reachability sets Rv for each node v:
  • Set of W/ǫ nodes accessible via cheapest edges
  • Initially: collect cheapest incident edges

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 14 / 26

slide-51
SLIDE 51

Implementation

  • Reachability sets Rv for each node v:
  • Set of W/ǫ nodes accessible via cheapest edges
  • Initially: collect cheapest incident edges
  • Repeat O(log(W/ǫ)) times:

Ask nodes u on Rv for their Ru and update

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 14 / 26

slide-52
SLIDE 52

Implementation

  • Reachability sets Rv for each node v:
  • Set of W/ǫ nodes accessible via cheapest edges
  • Initially: collect cheapest incident edges
  • Repeat O(log(W/ǫ)) times:

Ask nodes u on Rv for their Ru and update

  • O(log(W/ǫ)) updates suffice to explore useful nodes

up to distance W/ǫ

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 14 / 26

slide-53
SLIDE 53

Implementation

  • Reachability sets Rv for each node v:
  • Set of W/ǫ nodes accessible via cheapest edges
  • Initially: collect cheapest incident edges
  • Repeat O(log(W/ǫ)) times:

Ask nodes u on Rv for their Ru and update

  • O(log(W/ǫ)) updates suffice to explore useful nodes

up to distance W/ǫ

  • Use QuickSort-like sorting algorithm of Goodrich,

Sitchinava, Zhang (2011) to organize communication

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 14 / 26

slide-54
SLIDE 54

Outline

1

Model of Computation

2

Sample Algorithms and Their Limitations

3

Efficiently Estimating MST Weight

4

Computing MST in Geometric Setting

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 15 / 26

slide-55
SLIDE 55

Geometric Setting

Input: set of points in low dimensional metric space

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 16 / 26

slide-56
SLIDE 56

Geometric Setting

Input: set of points in low dimensional metric space

9 10 8 11 14 7

  • Points induce a weighted graph

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 16 / 26

slide-57
SLIDE 57

Geometric Setting

Input: set of points in low dimensional metric space

9 10 8 11 14 7

  • Points induce a weighted graph
  • Graph problems to consider:
  • Minimum Spanning Tree
  • Earth Mover Distance
  • Transportation Problem
  • Travelling Salesman Problem
  • . . .

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 16 / 26

slide-58
SLIDE 58

Result

[Andoni, Nikolov, O., Yaroslavtsev 2014]

  • Input: N points in low dimensional metric space
  • Example: R2
  • Generalizes to bounded doubling dimension

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 17 / 26

slide-59
SLIDE 59

Result

[Andoni, Nikolov, O., Yaroslavtsev 2014]

  • Input: N points in low dimensional metric space
  • Example: R2
  • Generalizes to bounded doubling dimension
  • Algorithm:
  • Computes (1 + ǫ)-approximate MST

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 17 / 26

slide-60
SLIDE 60

Result

[Andoni, Nikolov, O., Yaroslavtsev 2014]

  • Input: N points in low dimensional metric space
  • Example: R2
  • Generalizes to bounded doubling dimension
  • Algorithm:
  • Computes (1 + ǫ)-approximate MST
  • Space per machine: roughly O(N/m)

(as long as it fits subproblems)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 17 / 26

slide-61
SLIDE 61

Result

[Andoni, Nikolov, O., Yaroslavtsev 2014]

  • Input: N points in low dimensional metric space
  • Example: R2
  • Generalizes to bounded doubling dimension
  • Algorithm:
  • Computes (1 + ǫ)-approximate MST
  • Space per machine: roughly O(N/m)

(as long as it fits subproblems)

  • Number of rounds: O(1)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 17 / 26

slide-62
SLIDE 62

Result

[Andoni, Nikolov, O., Yaroslavtsev 2014]

  • Input: N points in low dimensional metric space
  • Example: R2
  • Generalizes to bounded doubling dimension
  • Algorithm:
  • Computes (1 + ǫ)-approximate MST
  • Space per machine: roughly O(N/m)

(as long as it fits subproblems)

  • Number of rounds: O(1)
  • Running time: near-linear

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 17 / 26

slide-63
SLIDE 63

Random Gridding

We reuse the Arora-Mitchell approach: Apply a randomly shifted grid

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 18 / 26

slide-64
SLIDE 64

Random Gridding

We reuse the Arora-Mitchell approach: Apply a randomly shifted grid

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 18 / 26

slide-65
SLIDE 65

Random Gridding

We reuse the Arora-Mitchell approach: Apply a randomly shifted grid

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 18 / 26

slide-66
SLIDE 66

Random Gridding

We reuse the Arora-Mitchell approach: Apply a randomly shifted grid

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 18 / 26

slide-67
SLIDE 67

Random Gridding

We reuse the Arora-Mitchell approach: Apply a randomly shifted grid Key property: cell of side ∆ separates points x and y w.p. O(1) · ρ(x,y)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 18 / 26

slide-68
SLIDE 68

Using Random Gridding

Typical usage: Recursive dynamic program for approximately solving problem

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 19 / 26

slide-69
SLIDE 69

Using Random Gridding

Typical usage: Recursive dynamic program for approximately solving problem

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 19 / 26

slide-70
SLIDE 70

Using Random Gridding

Typical usage: Recursive dynamic program for approximately solving problem

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 19 / 26

slide-71
SLIDE 71

Using Random Gridding

Typical usage: Recursive dynamic program for approximately solving problem

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 19 / 26

slide-72
SLIDE 72

Using Random Gridding

Typical usage: Recursive dynamic program for approximately solving problem

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 19 / 26

slide-73
SLIDE 73

Using Random Gridding

Typical usage: Recursive dynamic program for approximately solving problem

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 19 / 26

slide-74
SLIDE 74

Using Random Gridding

Typical usage: Recursive dynamic program for approximately solving problem Can partially isolate what happens inside a cell

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 19 / 26

slide-75
SLIDE 75

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-76
SLIDE 76

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-77
SLIDE 77

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-78
SLIDE 78

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

  • 1. Find two closest clusters

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-79
SLIDE 79

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

  • 1. Find two closest clusters
  • 2. If their distance less than ǫ∆, connect them

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-80
SLIDE 80

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

  • 1. Find two closest clusters
  • 2. If their distance less than ǫ∆, connect them

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-81
SLIDE 81

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

  • 1. Find two closest clusters
  • 2. If their distance less than ǫ∆, connect them

and repeat

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-82
SLIDE 82

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

  • 1. Find two closest clusters
  • 2. If their distance less than ǫ∆, connect them

and repeat

  • Pass up ǫ2∆-covering with information about

connected components

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-83
SLIDE 83

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

  • 1. Find two closest clusters
  • 2. If their distance less than ǫ∆, connect them

and repeat

  • Pass up ǫ2∆-covering with information about

connected components

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-84
SLIDE 84

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

  • 1. Find two closest clusters
  • 2. If their distance less than ǫ∆, connect them

and repeat

  • Pass up ǫ2∆-covering with information about

connected components

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-85
SLIDE 85

Our Algorithm

  • Connect points closer than ǫ·diam(S)

100·N

arbitrarily

  • Sub-solution for cell of side ∆:

ǫ2∆-covering with induced components

  • Combining sub-solutions:

Truncated version of Kruskal’s algorithm

  • 1. Find two closest clusters
  • 2. If their distance less than ǫ∆, connect them

and repeat

  • Pass up ǫ2∆-covering with information about

connected components

  • Expected cost of solution: optimum · (1 + ǫ · #levels)

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 20 / 26

slide-86
SLIDE 86

Select Implementation Details

  • Merge NΩ(1) × NΩ(1) cells at once

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 21 / 26

slide-87
SLIDE 87

Select Implementation Details

  • Merge NΩ(1) × NΩ(1) cells at once
  • Sub-solutions for all subcells should fit on

a single machine

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 21 / 26

slide-88
SLIDE 88

Select Implementation Details

  • Merge NΩ(1) × NΩ(1) cells at once
  • Sub-solutions for all subcells should fit on

a single machine

  • Use sorting [Goodrich, Sitchinava, Zhang 2011]

for grouping points and subcells that are close

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 21 / 26

slide-89
SLIDE 89

Select Implementation Details

  • Merge NΩ(1) × NΩ(1) cells at once
  • Sub-solutions for all subcells should fit on

a single machine

  • Use sorting [Goodrich, Sitchinava, Zhang 2011]

for grouping points and subcells that are close

  • Near-linear time:
  • Relax Kruskal’s algorithm
  • Efficient nearest neighbor data structure

[Krauthgamer, Lee 2004], [Cole, Gottlieb 2006]

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 21 / 26

slide-90
SLIDE 90

Lower Bounds for MST

  • Natural questions to ask:
  • Can generalize to unbounded dimension?
  • Can compute exact solution?

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 22 / 26

slide-91
SLIDE 91

Lower Bounds for MST

  • Natural questions to ask:
  • Can generalize to unbounded dimension?
  • Can compute exact solution?
  • Query complexity:
  • Model: distance queries
  • Our algorithm can be adapted to arbitrary bounded

doubling dimensional metric in this model

  • Lower bound: NΩ(1) rounds

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 22 / 26

slide-92
SLIDE 92

Lower Bounds for MST

  • Natural questions to ask:
  • Can generalize to unbounded dimension?
  • Can compute exact solution?
  • Query complexity:
  • Model: distance queries
  • Our algorithm can be adapted to arbitrary bounded

doubling dimensional metric in this model

  • Lower bound: NΩ(1) rounds
  • We give a conditional lower bound based
  • n Sparse Connectivity

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 22 / 26

slide-93
SLIDE 93

Reduction

In constant number of rounds: Computing exact MST in ℓd

∞ for d = 100 log N

⇒ deciding Sparse Connectivity

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 23 / 26

slide-94
SLIDE 94

Reduction

In constant number of rounds: Computing exact MST in ℓd

∞ for d = 100 log N

⇒ deciding Sparse Connectivity Construction:

  • For each vertex, pick a random vector vi in {−1, +1}d
  • For each edge e = (i, j), add point f(e) = vi + vj

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 23 / 26

slide-95
SLIDE 95

Reduction

In constant number of rounds: Computing exact MST in ℓd

∞ for d = 100 log N

⇒ deciding Sparse Connectivity Construction:

  • For each vertex, pick a random vector vi in {−1, +1}d
  • For each edge e = (i, j), add point f(e) = vi + vj

Distances (whp.):

  • Adjacent edges: f(e) − f(e′)∞ ≤ 2
  • Non-adjacent edges: f(e) − f(e′)∞ = 4

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 23 / 26

slide-96
SLIDE 96

Reduction

In constant number of rounds: Computing exact MST in ℓd

∞ for d = 100 log N

⇒ deciding Sparse Connectivity Construction:

  • For each vertex, pick a random vector vi in {−1, +1}d
  • For each edge e = (i, j), add point f(e) = vi + vj

Distances (whp.):

  • Adjacent edges: f(e) − f(e′)∞ ≤ 2
  • Non-adjacent edges: f(e) − f(e′)∞ = 4

MST weight:

  • Connected: ≤ 2(M − 1)
  • Not connected: ≥ 2M

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 23 / 26

slide-97
SLIDE 97

Other Results

[Andoni, Nikolov, O., Yaroslavtsev 2014]

  • Algorithm for approximating Earth-Mover Distance
  • A new way of partitioning the instance into

subproblems

  • Resolves an open question of Sharathkumar &

Agarwal (2012) about the transportation problem: First near-linear time algorithm

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 24 / 26

slide-98
SLIDE 98

Summary

  • Main goal:

Efficient algorithms for the Massive Parallel Computation Model

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 25 / 26

slide-99
SLIDE 99

Summary

  • Main goal:

Efficient algorithms for the Massive Parallel Computation Model

  • Important efficiency measure: number of rounds

When can it be made O(1) with low memory?

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 25 / 26

slide-100
SLIDE 100

Summary

  • Main goal:

Efficient algorithms for the Massive Parallel Computation Model

  • Important efficiency measure: number of rounds

When can it be made O(1) with low memory?

  • Well known obstacle: Sparse Connectivity

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 25 / 26

slide-101
SLIDE 101

Summary

  • Main goal:

Efficient algorithms for the Massive Parallel Computation Model

  • Important efficiency measure: number of rounds

When can it be made O(1) with low memory?

  • Well known obstacle: Sparse Connectivity
  • This talk: efficient algorithms for MST

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 25 / 26

slide-102
SLIDE 102

Summary

  • Main goal:

Efficient algorithms for the Massive Parallel Computation Model

  • Important efficiency measure: number of rounds

When can it be made O(1) with low memory?

  • Well known obstacle: Sparse Connectivity
  • This talk: efficient algorithms for MST
  • Future research:
  • More such algorithms
  • Better understanding of our limitations

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 25 / 26

slide-103
SLIDE 103

Questions?

Krzysztof Onak (IBM Research) Parallel Algorithms for Graphs on a Very Large. . . 26 / 26