Communication Complexity in the Field: New Questions from Practice - - PowerPoint PPT Presentation

communication complexity in the field new questions from
SMART_READER_LITE
LIVE PREVIEW

Communication Complexity in the Field: New Questions from Practice - - PowerPoint PPT Presentation

Communication Complexity in the Field: New Questions from Practice Qin Zhang Indiana University Bloomington BIRS Workshop March 20, 2017 1-1 This talk Not on a particular problem Try to present a few new questions that I have encountered


slide-1
SLIDE 1

1-1

Communication Complexity in the Field: New Questions from Practice

BIRS Workshop March 20, 2017

Qin Zhang Indiana University Bloomington

slide-2
SLIDE 2

2-1

Not on a particular problem Try to present a few new questions that I have encountered when trying to apply

  • comm. complexity in various settings

This talk

slide-3
SLIDE 3

3-1

I will talk about

  • 1. Number-in-hand CC with input sharing

– Distributed computation of graph problems

  • 2. Primitive problems overlap; direct-sum does not apply

– Distributed joins

  • 3. Higher LB in simultaneous comm. than one-way comm.?

– Sketching edit distance

Agenda

slide-4
SLIDE 4

4-1

Distributed graph computation

Real world systems: Pregel, Giraph, GPS, GraphLab, etc.

slide-5
SLIDE 5

5-1

The coordinator model

The coordinator model: We have k machines (sites) and one central server (coordinator).

– Each site has a 2-way comm. channel with the coordinator. – Each site has a piece of data xi. – Task: compute f (x1, . . . , xk) together via comm., for some f . Coordinator outputs the answer. – Goal: minimize total communication

· · ·

S1 S2 S3 Sk C

slide-6
SLIDE 6

6-1

Distributed graph computation

Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected.

slide-7
SLIDE 7

6-2

Distributed graph computation

Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected.

slide-8
SLIDE 8

6-3

Distributed graph computation

Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected.

· · ·

S1 S2 S3 Sk C

slide-9
SLIDE 9

6-4

Distributed graph computation

Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected.

A trivial solution: each Si sends a local spanning forest to C. Cost O(kn log n) bits.

· · ·

S1 S2 S3 Sk C

n: # nodes of the graph

slide-10
SLIDE 10

6-5

Distributed graph computation

Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected.

A trivial solution: each Si sends a local spanning forest to C. Cost O(kn log n) bits.

Can we do better, e.g., o(kn) bits of comm. in total?

· · ·

S1 S2 S3 Sk C

n: # nodes of the graph

slide-11
SLIDE 11

6-6

Distributed graph computation

Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected.

A trivial solution: each Si sends a local spanning forest to C. Cost O(kn log n) bits.

If graph is edge partitioned among k sites, Ω(kn) Can we do better, e.g., o(kn) bits of comm. in total?

· · ·

S1 S2 S3 Sk C

n: # nodes of the graph

[Woodruff, Z. ’13]

slide-12
SLIDE 12

7-1

LB graph for edge partition

For each i ∈ [k], (Xi, Y ) ∼ µ which is a hard input distribution for set-disjointness. Each site Si holding Xi = {Xi,1, . . . , Xi,n} creates an edge (ui, vj) for each Xi,j = 1. The coordinator holding Y = {Y1, . . . , Yn} creates a path containing {vj | Yj = 1} and a path containing {vj | Yj = 0}.

u1 u2 u3 uk vj|Y |+1 vj|Y |+2 vj|Y |+3 vjn vj1 vj2 vj|Y |

vj | Yj = 0 vj | Yj = 1 (X1) (X2) (X3) (Xk)

LB graph for edge partition:

slide-13
SLIDE 13

7-2

LB graph for edge partition

For each i ∈ [k], (Xi, Y ) ∼ µ which is a hard input distribution for set-disjointness. Each site Si holding Xi = {Xi,1, . . . , Xi,n} creates an edge (ui, vj) for each Xi,j = 1. The coordinator holding Y = {Y1, . . . , Yn} creates a path containing {vj | Yj = 1} and a path containing {vj | Yj = 0}.

u1 u2 u3 uk vj|Y |+1 vj|Y |+2 vj|Y |+3 vjn vj1 vj2 vj|Y |

vj | Yj = 0 vj | Yj = 1 (X1) (X2) (X3) (Xk)

LB graph for edge partition:

Graph connected ⇔ DISJ(X1, Y ) ∨ . . . ∨ DISJ(Xk, Y ) = 1 (LB: Ω(kn))

slide-14
SLIDE 14

8-1

What if the graph is node partitioned?

In most practical systems, graph is node partitioned. Can we prove a similar LB?

slide-15
SLIDE 15

8-2

What if the graph is node partitioned?

In most practical systems, graph is node partitioned. Can we prove a similar LB?

u1 u2 u3 uk vj|Y |+1 vj|Y |+2 vj|Y |+3 vjn vj1 vj2 vj|Y |

Basically, only bottom nodes (and their adjacent edges) are partitioned

Graph connected ⇔ DISJ(X1, Y ) ∨ . . . ∨ DISJ(Xk, Y ) = 1

slide-16
SLIDE 16

8-3

What if the graph is node partitioned?

In most practical systems, graph is node partitioned. Can we prove a similar LB? If we also partition the top nodes (and their adjacent edges), then the Ω(kn) LB does not hold.

u1 u2 u3 uk vj|Y |+1 vj|Y |+2 vj|Y |+3 vjn vj1 vj2 vj|Y |

Basically, only bottom nodes (and their adjacent edges) are partitioned

Graph connected ⇔ DISJ(X1, Y ) ∨ . . . ∨ DISJ(Xk, Y ) = 1

slide-17
SLIDE 17

8-4

What if the graph is node partitioned?

In most practical systems, graph is node partitioned. Can we prove a similar LB? If we also partition the top nodes (and their adjacent edges), then the Ω(kn) LB does not hold. Not a surprise. If a graph is node partitioned, ˜ O(n) suffices.

[Ahn, Guha, McGregor ’12]

u1 u2 u3 uk vj|Y |+1 vj|Y |+2 vj|Y |+3 vjn vj1 vj2 vj|Y |

Basically, only bottom nodes (and their adjacent edges) are partitioned

Graph connected ⇔ DISJ(X1, Y ) ∨ . . . ∨ DISJ(Xk, Y ) = 1

slide-18
SLIDE 18

9-1

Input sharing

To prove LB in the node partition model, one needs to deal with input sharing: each edge may be stored in two sites. Need new techniques? Input sharing

slide-19
SLIDE 19

9-2

Input sharing

To prove LB in the node partition model, one needs to deal with input sharing: each edge may be stored in two sites. Need new techniques? Input sharing Given a node u, the parties want to jointly compute a BSF tree rooted at u. The coordinator

  • utputs the final BFS tree.

What is the comm. complexity? A concrete problem: Breadth First Search Tree

slide-20
SLIDE 20

10-1

Distributed joins

slide-21
SLIDE 21

11-1

Set-intersection join

A = A1 Am B = B1 Bm Set-Intersection Join (cardinality version) SIJ(A, B) = |{(i, j) for which Ci,j > 0, where C = A · B}| An important operation in databases

e.g., skills of applicants e.g., skills required by a job positions

A1, . . . , Am ⊆ [n] = {1, 2, . . . , n}, and B1, . . . , Bm ⊆ [n]

slide-22
SLIDE 22

12-1

Set-intersection join (cont.)

The problem: estimate SIJ(A, B) up to a (1 + ǫ) factor. Useful e.g. in query planning.

slide-23
SLIDE 23

12-2

Set-intersection join (cont.)

The problem: estimate SIJ(A, B) up to a (1 + ǫ) factor. Useful e.g. in query planning. Current LB Ω(n/ǫ2/3):

(Van Gucht, Williams, Woodruff, Z. ’15)

slide-24
SLIDE 24

12-3

Set-intersection join (cont.)

The problem: estimate SIJ(A, B) up to a (1 + ǫ) factor. Useful e.g. in query planning. Current LB Ω(n/ǫ2/3): For each i ∈ [m], choose (Ai, Bi) ∼ µ where µ is a hard input distribution for set-disjointness. Define SUM(A, B) =

i∈[m] DISJ(Ai, Bi). W.h.p.

SIJ(A, B) = SUM(A, B) + m(m − 1). Using basically a direct-sum (Gap-hamming + DISJ), any rand. algo. that computes SUM(A, B) w.pr. 0.99 up to an additive error

  • m/2 needs Ω(mn) comm.

Set m = 1/ǫ2/3 to get Ω(n/ǫ2/3) LB

(Van Gucht, Williams, Woodruff, Z. ’15)

slide-25
SLIDE 25

13-1

Set-intersection join (cont.)

The current best UB: ˜ O(m/ǫ2) using F0-sketch, and is one-way Can we prove an Ω(n/ǫ2) LB? Not enough to apply a direct-sum type argument

  • n (A1, B1), . . . , (Am, Bm), since each Ai is going

to join each Bj. In other words, the primitive problems overlap. Need new techniques?

slide-26
SLIDE 26

14-1

Sketching threshold edit distance

slide-27
SLIDE 27

15-1

Edit Distance

Definition: Given two strings s, t ∈ Σn: ed(s, t) = minimum number of character operations (insertion/deletion/substitution) that transform s to t.

slide-28
SLIDE 28

15-2

Edit Distance

Definition: Given two strings s, t ∈ Σn: ed(s, t) = minimum number of character operations (insertion/deletion/substitution) that transform s to t. ed( banana , ananas ) = 2

slide-29
SLIDE 29

15-3

Edit Distance

Definition: Given two strings s, t ∈ Σn: ed(s, t) = minimum number of character operations (insertion/deletion/substitution) that transform s to t. ed( banana , ananas ) = 2 Applications: numerous. E.g.,

bioinformatics (measuring similarity between DNA seq.

slide-30
SLIDE 30

15-4

Edit Distance

Definition: Given two strings s, t ∈ Σn: ed(s, t) = minimum number of character operations (insertion/deletion/substitution) that transform s to t. ed( banana , ananas ) = 2 Applications: numerous. E.g.,

bioinformatics (measuring similarity between DNA seq. automatic spelling correction

slide-31
SLIDE 31

16-1

Problems

The threshold version of ED: Given two strings s, t ∈ {0, 1}n and a threhold K, output all the edits if ed(s, t) ≤ K, output “Error” otherwise.

slide-32
SLIDE 32

16-2

Problems

The threshold version of ED: Given two strings s, t ∈ {0, 1}n and a threhold K, output all the edits if ed(s, t) ≤ K, output “Error” otherwise. sk(s) s t document exchange

App: remote file sync; file transmission through a noisy channel

One-way comm.

slide-33
SLIDE 33

16-3

Problems

The threshold version of ED: Given two strings s, t ∈ {0, 1}n and a threhold K, output all the edits if ed(s, t) ≤ K, output “Error” otherwise. sk(s) s t document exchange

App: remote file sync; file transmission through a noisy channel

s sk(s) sketching

App: distributed similarity join

t sk(t) Simultaneous comm. One-way comm.

slide-34
SLIDE 34

17-1

What we have known

New: results from [Belazzougui, Z. ’16]. For simplicity, assuming K < n0.1

The one-way CC of K-threshold ED is Θ(K log n). The simultaneous CC of K-threshold ED is O(K 8 log5 n). Should be able to improve it to K 4 · poly log(n) or K 3 · poly log(n). But I am not sure if we can do it in o(K 2) · poly log(n). LB?

slide-35
SLIDE 35

18-1

A possible hard distribution

Conjecture: the following may be a hard distribution for K-threshold ED, i.e., any algo needs Ω(K 2) comm.

slide-36
SLIDE 36

18-2

A possible hard distribution

Conjecture: the following may be a hard distribution for K-threshold ED, i.e., any algo needs Ω(K 2) comm. W.pr. 1/2, the K edits are randomly located in s and t; W.pr. 1/2, the K edits are located in a random group of adjacent positions.

slide-37
SLIDE 37

19-1

The general question Can we prove higher LB in the simultaneous

  • comm. model than in the one-way comm.

model for natural problems?

If you know any example/result, please let me know. Thanks.

slide-38
SLIDE 38

20-1

Thank you! Questions?