sublinear algorithms for big data
play

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear - PowerPoint PPT Presentation

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear in Communication 2-1 Sublinear in communication x 2 = 111011 x 1 = 010011 The model x 3 = 111111 x k = 100011 They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal:


  1. Sublinear Algorithms for Big Data Qin Zhang 1-1

  2. Part 2: Sublinear in Communication 2-1

  3. Sublinear in communication x 2 = 111011 x 1 = 010011 The model x 3 = 111111 x k = 100011 They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal: minimize total bits of communication Applicaitons etc. 3-1

  4. A natrual approach The model x 2 = 111011 x 1 = 010011 Coordinator x 3 = 111111 = C x k = 100011 S 1 S 2 S 3 S k · · · They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal: minimize total bits of communication The natural approach Each S i computes a skech of its input sk ( S i ) and send it to C , and then C computes f ( x 1 , . . . , x k ) based on sk ( S 1 ) , . . . , sk ( S k ) The slides from next page are borrowed from Andrew McGregor 4-1

  5. I. Connectivity II. k -Connectivity III. Min-Cut

  6. II. k -Connectivity III. Min-Cut I. Connectivity Theorem: Testing Connectivity a) Dynamic Graph Stream: O(n polylog n) space. b) Simultaneous Messages: O(polylog n) length.

  7. Ingredient 1: Basic Algorithm

  8. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest):

  9. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

  10. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

  11. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

  12. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

  13. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge

  14. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge

  15. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge

  16. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge 3.Repeat until no edges between connected comp.

  17. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge 3.Repeat until no edges between connected comp. Lemma: After O(log n) rounds selected edges include spanning forest.

  18. Ingredient 2: Sketching Neighborhoods

  19. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 3 4

  20. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 3 4

  21. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4

  22. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4 Lemma: For any subset of nodes S ⊂ V , � support ( a i ) = E ( S , V \ S ) i ∈ S

  23. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4 Lemma: For any subset of nodes S ⊂ V , � support ( a i ) = E ( S , V \ S ) i ∈ S Lemma: ∃ random M: � N → � k with k=O(polylog N) such that for any a ∈ � N , with high probability → e ∈ support( a ) M a −

  24. Recipe: Sketch & Compute on Sketches

  25. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j

  26. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space:

  27. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j

  28. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use:

  29. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � M a j = M ( a j ) j ∈ S j ∈ S

  30. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � � → e ∈ support( a j ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S

  31. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � � → e ∈ support( a j ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S Detail: Actually each player sends log n indept sketches M 1 a j , M 2 a j , ... and central player uses M i a j when emulating i th iteration of the algorithm.

  32. I. Connectivity II. k -Connectivity III. Min-Cut

  33. I. Connectivity III. Min-Cut II. k -Connectivity Theorem: Checking every cut has size ≥ k a) Dynamic Graph Stream: O(n k polylog n) space. b) Simultaneous Messages: O(k polylog n) length.

  34. Ingredient 1: Basic Algorithm

  35. Ingredient 1: Basic Algorithm Algorithm (k-Connectivity):

  36. Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E)

  37. Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E) 2.For i=2 to k: 2.1. Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 )

  38. Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E) 2.For i=2 to k: 2.1. Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 ) Lemma: G(V ,F 1 +...+F k ) is k-connected iff G(V ,E) is.

  39. Ingredient 2: Connectivity Sketches

  40. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}.

  41. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G

  42. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2

  43. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2 Use M 3 G-M 3 F 1 -M 3 F 2 =M 3 (G-F 1 -F 2 ) to find F 3

  44. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2 Use M 3 G-M 3 F 1 -M 3 F 2 =M 3 (G-F 1 -F 2 ) to find F 3 etc.

  45. I. Connectivity II. k -Connectivity III. Min-Cut

  46. I. Connectivity II. k -Connectivity III. Min-Cut Theorem: (1+ % )-approximating minimum cut a) Dynamic Graph Stream: O( % -2 n polylog n) space. b) Simultaneous Messages: O( % -2 polylog n) length.

  47. Ingredient 1: Subsampling

  48. Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗

  49. Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗ p ∗ = 6 ǫ − 2 log n / Min-Cut(G) where

  50. Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗ p ∗ = 6 ǫ − 2 log n / Min-Cut(G) where G=G 0

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend