communication complexity in the field new questions from
play

Communication Complexity in the Field: New Questions from Practice - PowerPoint PPT Presentation

Communication Complexity in the Field: New Questions from Practice Qin Zhang Indiana University Bloomington BIRS Workshop March 20, 2017 1-1 This talk Not on a particular problem Try to present a few new questions that I have encountered


  1. Communication Complexity in the Field: New Questions from Practice Qin Zhang Indiana University Bloomington BIRS Workshop March 20, 2017 1-1

  2. This talk Not on a particular problem Try to present a few new questions that I have encountered when trying to apply comm. complexity in various settings 2-1

  3. Agenda I will talk about 1. Number-in-hand CC with input sharing – Distributed computation of graph problems 2. Primitive problems overlap; direct-sum does not apply – Distributed joins 3. Higher LB in simultaneous comm. than one-way comm.? – Sketching edit distance 3-1

  4. Distributed graph computation Real world systems: Pregel, Giraph, GPS, GraphLab, etc. 4-1

  5. The coordinator model The coordinator model : We have k machines (sites) and one central server (coordinator). – Each site has a 2-way comm. channel with the coordinator. – Each site has a piece of data x i . – Task : compute f ( x 1 , . . . , x k ) together via comm., for some f . Coordinator outputs the answer. – Goal : minimize total communication C · · · S k S 1 S 3 S 2 5-1

  6. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. 6-1

  7. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. 6-2

  8. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C · · · S k S 1 S 3 S 2 6-3

  9. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph 6-4

  10. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph Can we do better, e.g., o ( kn ) bits of comm. in total? 6-5

  11. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph Can we do better, e.g., o ( kn ) bits of comm. in total? If graph is edge partitioned among k sites, Ω( kn ) [Woodruff, Z. ’13] 6-6

  12. LB graph for edge partition LB graph for edge partition: For each i ∈ [ k ], ( X i , Y ) ∼ µ which is a hard input distribution for set-disjointness. Each site S i holding X i = { X i , 1 , . . . , X i , n } creates an edge ( u i , v j ) for each X i , j = 1. The coordinator holding Y = { Y 1 , . . . , Y n } creates a path containing { v j | Y j = 1 } and a path containing { v j | Y j = 0 } . v j | Y j = 0 v j | Y j = 1 v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | u 1 u 2 u 3 u k ( X 1 ) ( X 2 ) ( X 3 ) ( X k ) 7-1

  13. LB graph for edge partition LB graph for edge partition: For each i ∈ [ k ], ( X i , Y ) ∼ µ which is a hard input distribution for set-disjointness. Each site S i holding X i = { X i , 1 , . . . , X i , n } creates an edge ( u i , v j ) for each X i , j = 1. The coordinator holding Y = { Y 1 , . . . , Y n } creates a path containing { v j | Y j = 1 } and a path containing { v j | Y j = 0 } . v j | Y j = 0 v j | Y j = 1 v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 (LB: Ω( kn )) u 1 u 2 u 3 u k ( X 1 ) ( X 2 ) ( X 3 ) ( X k ) 7-2

  14. What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? 8-1

  15. What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned 8-2

  16. What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned If we also partition the top nodes (and their adjacent edges), then the Ω( kn ) LB does not hold. 8-3

  17. What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned If we also partition the top nodes (and their adjacent edges), then the Ω( kn ) LB does not hold. Not a surprise. If a graph is node partitioned, ˜ O ( n ) suffices. [Ahn, Guha, McGregor ’12] 8-4

  18. Input sharing Input sharing To prove LB in the node partition model, one needs to deal with input sharing: each edge may be stored in two sites. Need new techniques? 9-1

  19. Input sharing Input sharing To prove LB in the node partition model, one needs to deal with input sharing: each edge may be stored in two sites. Need new techniques? A concrete problem: Breadth First Search Tree Given a node u , the parties want to jointly compute a BSF tree rooted at u . The coordinator outputs the final BFS tree. What is the comm. complexity? 9-2

  20. Distributed joins 10-1

  21. Set-intersection join A 1 , . . . , A m ⊆ [ n ] = { 1 , 2 , . . . , n } , and B 1 , . . . , B m ⊆ [ n ] A 1 = = B 1 B m B A A m e.g., skills e.g., skills of required by a applicants job positions Set-Intersection Join (cardinality version) SIJ ( A , B ) = |{ ( i , j ) for which C i , j > 0 , where C = A · B }| An important operation in databases 11-1

  22. Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. 12-1

  23. Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. Current LB Ω( n /ǫ 2 / 3 ) : (Van Gucht, Williams, Woodruff, Z. ’15) 12-2

  24. Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. Current LB Ω( n /ǫ 2 / 3 ) : (Van Gucht, Williams, Woodruff, Z. ’15) For each i ∈ [ m ], choose ( A i , B i ) ∼ µ where µ is a hard input distribution for set-disjointness. Define SUM ( A , B ) = � i ∈ [ m ] DISJ ( A i , B i ). W.h.p. SIJ ( A , B ) = SUM ( A , B ) + m ( m − 1) . Using basically a direct-sum (Gap-hamming + DISJ), any rand. algo. that computes SUM ( A , B ) w.pr. 0.99 � up to an additive error m / 2 needs Ω( mn ) comm. Set m = 1 /ǫ 2 / 3 to get Ω( n /ǫ 2 / 3 ) LB 12-3

  25. Set-intersection join (cont.) The current best UB : ˜ O ( m /ǫ 2 ) using F 0 -sketch, and is one-way Can we prove an Ω( n /ǫ 2 ) LB? Not enough to apply a direct-sum type argument on ( A 1 , B 1 ) , . . . , ( A m , B m ), since each A i is going to join each B j . In other words, the primitive problems overlap. Need new techniques? 13-1

  26. Sketching threshold edit distance 14-1

  27. Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . 15-1

  28. Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 15-2

  29. Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 Applications: numerous. E.g., bioinformatics (measuring similarity between DNA seq. 15-3

  30. Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 Applications: numerous. E.g., bioinformatics (measuring automatic spelling correction similarity between DNA seq. 15-4

  31. Problems The threshold version of ED: Given two strings s , t ∈ { 0 , 1 } n and a threhold K , output all the edits if ed ( s , t ) ≤ K , output “ Error ” otherwise. 16-1

  32. Problems The threshold version of ED: Given two strings s , t ∈ { 0 , 1 } n and a threhold K , output all the edits if ed ( s , t ) ≤ K , output “ Error ” otherwise. sk(s) t s document exchange App: remote file sync; file transmission through a noisy channel One-way comm. 16-2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend