Communication Complexity in the Field: New Questions from Practice - PowerPoint PPT Presentation

Communication Complexity in the Field: New Questions from Practice Qin Zhang Indiana University Bloomington BIRS Workshop March 20, 2017 1-1

This talk Not on a particular problem Try to present a few new questions that I have encountered when trying to apply comm. complexity in various settings 2-1

Agenda I will talk about 1. Number-in-hand CC with input sharing – Distributed computation of graph problems 2. Primitive problems overlap; direct-sum does not apply – Distributed joins 3. Higher LB in simultaneous comm. than one-way comm.? – Sketching edit distance 3-1

Distributed graph computation Real world systems: Pregel, Giraph, GPS, GraphLab, etc. 4-1

The coordinator model The coordinator model : We have k machines (sites) and one central server (coordinator). – Each site has a 2-way comm. channel with the coordinator. – Each site has a piece of data x i . – Task : compute f ( x 1 , . . . , x k ) together via comm., for some f . Coordinator outputs the answer. – Goal : minimize total communication C · · · S k S 1 S 3 S 2 5-1

Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. 6-1

Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. 6-2

Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C · · · S k S 1 S 3 S 2 6-3

Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph 6-4

Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph Can we do better, e.g., o ( kn ) bits of comm. in total? 6-5

Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph Can we do better, e.g., o ( kn ) bits of comm. in total? If graph is edge partitioned among k sites, Ω( kn ) [Woodruff, Z. ’13] 6-6

LB graph for edge partition LB graph for edge partition: For each i ∈ [ k ], ( X i , Y ) ∼ µ which is a hard input distribution for set-disjointness. Each site S i holding X i = { X i , 1 , . . . , X i , n } creates an edge ( u i , v j ) for each X i , j = 1. The coordinator holding Y = { Y 1 , . . . , Y n } creates a path containing { v j | Y j = 1 } and a path containing { v j | Y j = 0 } . v j | Y j = 0 v j | Y j = 1 v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | u 1 u 2 u 3 u k ( X 1 ) ( X 2 ) ( X 3 ) ( X k ) 7-1

LB graph for edge partition LB graph for edge partition: For each i ∈ [ k ], ( X i , Y ) ∼ µ which is a hard input distribution for set-disjointness. Each site S i holding X i = { X i , 1 , . . . , X i , n } creates an edge ( u i , v j ) for each X i , j = 1. The coordinator holding Y = { Y 1 , . . . , Y n } creates a path containing { v j | Y j = 1 } and a path containing { v j | Y j = 0 } . v j | Y j = 0 v j | Y j = 1 v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 (LB: Ω( kn )) u 1 u 2 u 3 u k ( X 1 ) ( X 2 ) ( X 3 ) ( X k ) 7-2

What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? 8-1

What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned 8-2

What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned If we also partition the top nodes (and their adjacent edges), then the Ω( kn ) LB does not hold. 8-3

What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned If we also partition the top nodes (and their adjacent edges), then the Ω( kn ) LB does not hold. Not a surprise. If a graph is node partitioned, ˜ O ( n ) suffices. [Ahn, Guha, McGregor ’12] 8-4

Input sharing Input sharing To prove LB in the node partition model, one needs to deal with input sharing: each edge may be stored in two sites. Need new techniques? 9-1

Input sharing Input sharing To prove LB in the node partition model, one needs to deal with input sharing: each edge may be stored in two sites. Need new techniques? A concrete problem: Breadth First Search Tree Given a node u , the parties want to jointly compute a BSF tree rooted at u . The coordinator outputs the final BFS tree. What is the comm. complexity? 9-2

Distributed joins 10-1

Set-intersection join A 1 , . . . , A m ⊆ [ n ] = { 1 , 2 , . . . , n } , and B 1 , . . . , B m ⊆ [ n ] A 1 = = B 1 B m B A A m e.g., skills e.g., skills of required by a applicants job positions Set-Intersection Join (cardinality version) SIJ ( A , B ) = |{ ( i , j ) for which C i , j > 0 , where C = A · B }| An important operation in databases 11-1

Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. 12-1

Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. Current LB Ω( n /ǫ 2 / 3 ) : (Van Gucht, Williams, Woodruff, Z. ’15) 12-2

Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. Current LB Ω( n /ǫ 2 / 3 ) : (Van Gucht, Williams, Woodruff, Z. ’15) For each i ∈ [ m ], choose ( A i , B i ) ∼ µ where µ is a hard input distribution for set-disjointness. Define SUM ( A , B ) = � i ∈ [ m ] DISJ ( A i , B i ). W.h.p. SIJ ( A , B ) = SUM ( A , B ) + m ( m − 1) . Using basically a direct-sum (Gap-hamming + DISJ), any rand. algo. that computes SUM ( A , B ) w.pr. 0.99 � up to an additive error m / 2 needs Ω( mn ) comm. Set m = 1 /ǫ 2 / 3 to get Ω( n /ǫ 2 / 3 ) LB 12-3

Set-intersection join (cont.) The current best UB : ˜ O ( m /ǫ 2 ) using F 0 -sketch, and is one-way Can we prove an Ω( n /ǫ 2 ) LB? Not enough to apply a direct-sum type argument on ( A 1 , B 1 ) , . . . , ( A m , B m ), since each A i is going to join each B j . In other words, the primitive problems overlap. Need new techniques? 13-1

Sketching threshold edit distance 14-1

Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . 15-1

Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 15-2

Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 Applications: numerous. E.g., bioinformatics (measuring similarity between DNA seq. 15-3

Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 Applications: numerous. E.g., bioinformatics (measuring automatic spelling correction similarity between DNA seq. 15-4

Problems The threshold version of ED: Given two strings s , t ∈ { 0 , 1 } n and a threhold K , output all the edits if ed ( s , t ) ≤ K , output “ Error ” otherwise. 16-1

Problems The threshold version of ED: Given two strings s , t ∈ { 0 , 1 } n and a threhold K , output all the edits if ed ( s , t ) ≤ K , output “ Error ” otherwise. sk(s) t s document exchange App: remote file sync; file transmission through a noisy channel One-way comm. 16-2

Communication Complexity in the Field: New Questions from Practice - PowerPoint PPT Presentation

Communication Complexity in the Field: New Questions from Practice Qin Zhang Indiana University Bloomington BIRS Workshop March 20, 2017 1-1 This talk Not on a particular problem Try to present a few new questions that I have encountered

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Data Streams & Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Communication Complexity BASICS Summer School 2015 Communication

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

A Stable Marriage Requires Communication Complexity Communication Complexity Proofs Yannai A.

Communication Complexity with Small Advantage Thomas Watson University of Memphis

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Session 12 Assessing and Developing Communication SECTION 4: 1 Communication Communication

Kicking the complexity habit Dan North @tastapod Kicking the complexity habit Dan North

Basics of Complexity Complexity = resources time space ink gates energy

End-2-End Search Mices 2018 Duncan Blythe About Me Duncan Blythe Research Scientist @ Zalando

02110 String indexing Computational geometry Introduction to NP-completeness Inge Li

Approximate Pattern Matching Using Suffix Tries Hendrik Nigul nigulh@math.ut.ee University of

Edit Distance: Sketching, Streaming and Document Exchange Djamal Belazzougui Qin Zhang CERIST,

Advanced Counting Techniques Generating Functions Abhijit Das Department of Computer Science and

Lexicon building Markus Forsberg GF summer school in Riga 2017 Todays talk Part I:

Architectures of Networks of Services "Networks and Telecommunications" 2003

FrAG A Hybrid CG Parser for French Eckhard Bick University of Southern Denmark

Communication Complexity in the Field: New Questions from Practice - PowerPoint PPT Presentation

Communication Complexity in the Field: New Questions from Practice Qin Zhang Indiana University Bloomington BIRS Workshop March 20, 2017 1-1 This talk Not on a particular problem Try to present a few new questions that I have encountered

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Data Streams &amp; Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Communication Complexity BASICS Summer School 2015 Communication

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

A Stable Marriage Requires Communication Complexity Communication Complexity Proofs Yannai A.

Communication Complexity with Small Advantage Thomas Watson University of Memphis

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Session 12 Assessing and Developing Communication SECTION 4: 1 Communication Communication

Kicking the complexity habit Dan North @tastapod Kicking the complexity habit Dan North

Basics of Complexity Complexity = resources time space ink gates energy

End-2-End Search Mices 2018 Duncan Blythe About Me Duncan Blythe Research Scientist @ Zalando

02110 String indexing Computational geometry Introduction to NP-completeness Inge Li

Approximate Pattern Matching Using Suffix Tries Hendrik Nigul nigulh@math.ut.ee University of

Edit Distance: Sketching, Streaming and Document Exchange Djamal Belazzougui Qin Zhang CERIST,

Advanced Counting Techniques Generating Functions Abhijit Das Department of Computer Science and

Lexicon building Markus Forsberg GF summer school in Riga 2017 Todays talk Part I:

Architectures of Networks of Services &quot;Networks and Telecommunications&quot; 2003

FrAG A Hybrid CG Parser for French Eckhard Bick University of Southern Denmark

Data Streams & Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

Architectures of Networks of Services "Networks and Telecommunications" 2003