mapreduce with parallelizable reduce
play

Mapreduce With Parallelizable Reduce S. Muthu Muthukrishnan - PowerPoint PPT Presentation

Mapreduce With Parallelizable Reduce S. Muthu Muthukrishnan Some Premises At a deliberately high level, we know the MapReduce system. Some Premises At a deliberately high level, we know the


  1. Mapreduce With Parallelizable Reduce S. Muthu Muthukrishnan

  2. ■ ■ ■ ■ ■ Some Premises ■ At a deliberately high level, we know the MapReduce system.

  3. ■ ■ ■ ■ Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system.

  4. ■ ■ ■ Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system. ■ There is nice PRAM theory of parallel algorithms.

  5. ■ ■ Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system. ■ There is nice PRAM theory of parallel algorithms. ■ NC, prefix sums, list ranking, and more.

  6. ■ Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system. ■ There is nice PRAM theory of parallel algorithms. ■ NC, prefix sums, list ranking, and more. ■ Goal: Develop a useful theory of MapReduce algorithms.

  7. Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system. ■ There is nice PRAM theory of parallel algorithms. ■ NC, prefix sums, list ranking, and more. ■ Goal: Develop a useful theory of MapReduce algorithms. ■ An algorithmus role. Interesting problems, algorithms. Bridge from the other side.

  8. ❬ ❀ ✿ ✿ ✿ ❀ ❪ ✮ ❬ ❀ ✁ ✁ ✁ ❀ ❪ ■ ❬ ❪ ❂ P ❬ ❪ ✔ ■ ✰ ❀ ✁ ✁ ✁ ❀ ✭ ✰ ✮ ♣ ❪ ❬ ♣ ■ ❬ ❀ ♣ ❪ ■ ❬ ❪ ❂ P ✭ ✰ ✮ ♣ ❬ ❪ ♣ ✰ ❬ � ❪ ■ ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds.

  9. ■ ✰ ❀ ✁ ✁ ✁ ❀ ✭ ✰ ✮ ♣ ❪ ❬ ♣ ■ ❬ ❀ ♣ ❪ ■ ❬ ❪ ❂ P ✭ ✰ ✮ ♣ ❬ ❪ ♣ ✰ ❬ � ❪ ■ ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ .

  10. ❬ ❀ ♣ ❪ ■ ❬ ❪ ❂ P ✭ ✰ ✮ ♣ ❬ ❪ ♣ ✰ ❬ � ❪ ■ ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ . ■ Solution: ■ Assign A ❬ i ♣ n ✰ 1 ❀ ✁ ✁ ✁ ❀ ✭ i ✰ 1 ✮ ♣ n ❪ to key i .

  11. ❬ � ❪ ■ ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ . ■ Solution: ■ Assign A ❬ i ♣ n ✰ 1 ❀ ✁ ✁ ✁ ❀ ✭ i ✰ 1 ✮ ♣ n ❪ to key i . ■ Solve problem on B ❬ 1 ❀ ♣ n ❪ with one proc, B ❬ i ❪ ❂ P ✭ i ✰ 1 ✮ ♣ n A ❬ j ❪ . Doable? i ♣ n ✰ 1

  12. ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ . ■ Solution: ■ Assign A ❬ i ♣ n ✰ 1 ❀ ✁ ✁ ✁ ❀ ✭ i ✰ 1 ✮ ♣ n ❪ to key i . ■ Solve problem on B ❬ 1 ❀ ♣ n ❪ with one proc, B ❬ i ❪ ❂ P ✭ i ✰ 1 ✮ ♣ n A ❬ j ❪ . Doable? i ♣ n ✰ 1 ■ Solve problem for key i with PB ❬ i � 1 ❪ . Doable?

  13. Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ . ■ Solution: ■ Assign A ❬ i ♣ n ✰ 1 ❀ ✁ ✁ ✁ ❀ ✭ i ✰ 1 ✮ ♣ n ❪ to key i . ■ Solve problem on B ❬ 1 ❀ ♣ n ❪ with one proc, B ❬ i ❪ ❂ P ✭ i ✰ 1 ✮ ♣ n A ❬ j ❪ . Doable? i ♣ n ✰ 1 ■ Solve problem for key i with PB ❬ i � 1 ❪ . Doable? ■ List ranking in O ✭ 1 ✮ rounds? ■ Some graph algorithms in O ✭ 1 ✮ rounds recently.

  14. ■ ✭ ❀ ✮ ✭ ❀ ❀ ✮ ■ ❀ ■ ✭ ❀ ❀ ✮ ■ P ✕ ✕ ■ ■ ■ ❂ ✕ ✕ ✕ ■ ■ SIROCCO Challenge ■ Problem: Given graph G ❂ ✭ V ❀ E ✮ , count the number of triangles. 1 1 For ex, see. Fast Counting of Triangles in Large Real Networks without counting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.

  15. P ✕ ✕ ■ ■ ■ ❂ ✕ ✕ ✕ ■ ■ SIROCCO Challenge ■ Problem: Given graph G ❂ ✭ V ❀ E ✮ , count the number of triangles. 1 ■ Solution: ■ For each edge ✭ u ❀ v ✮ , generate a tuple ✭ u ❀ v ❀ 0 ✮ . ■ For each vertex v and for each pair of neighbors x ❀ z of v , generate a tuple ✭ x ❀ z ❀ 1 ✮ . ■ Presence of both 0 and 1 tuple for an edge is a triangle. 1 For ex, see. Fast Counting of Triangles in Large Real Networks without counting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.

  16. SIROCCO Challenge ■ Problem: Given graph G ❂ ✭ V ❀ E ✮ , count the number of triangles. 1 ■ Solution: ■ For each edge ✭ u ❀ v ✮ , generate a tuple ✭ u ❀ v ❀ 0 ✮ . ■ For each vertex v and for each pair of neighbors x ❀ z of v , generate a tuple ✭ x ❀ z ❀ 1 ✮ . ■ Presence of both 0 and 1 tuple for an edge is a triangle. P i ✕ 3 ■ Solution: The number of triangles is i where ✕ i are 6 eigenvalues of adjacency matrix A of G in sorted order. ■ A 3 ii is the number of triangles involving i . ■ The trace is 6 times the number of triangles. ■ If ✕ is eigenvalue of A , ie., Ax ❂ ✕ x , then ✕ 3 is eigenvalue of A 3 . ■ In practice, computing top few eigenvalues suffices. 1 For ex, see. Fast Counting of Triangles in Large Real Networks without counting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.

  17. ✂ ❁❁ ■ ✭ ✮ Eigenvalue Estimation A is a n ✂ n real valued matrix. ■ Lanczos method.

  18. Eigenvalue Estimation A is a n ✂ n real valued matrix. ■ Lanczos method. ■ Sketches. Ar for pseudo random n ✂ d vector r , d ❁❁ n . Will O ✭ nd ✮ sketch fit into one machine?

  19. Special Case Motivation: Logs processing. x = inputrecord; x-squared = x * x; aggregator: table sum; emit aggregator <- x-squared; MUD Algorithm m ❂ ✭✟ ❀ ✟ ❀ ✑ ✮ . ■ Local function ✟ ✿ ✝ ✦ Q maps input item to a message. ■ Aggregator ✟ ✿ Q ✂ Q ✦ Q maps two messages to a single message. ■ Post-processing operator ✑ ✿ Q ✦ ✝ produces the final output, applying m ❚ ✭ x ✮ . ■ Computes a function f if ✑ ✭ m ❚ ✭ ✁ ✮✮ ❂ f for all trees ❚ .

  20. MUD Examples ✟✭ x ✮ ❂ ❤ x ❀ x ✐ ✟ ✭ ❤ a 1 ❀ b 1 ✐ ❀ ❤ a 2 ❀ b 2 ✐ ✮ ❂ ❤ min ✭ a 1 ❀ a 2 ✮ ❀ max ✭ b 1 ❀ b 2 ✮ ✐ ✑ ✭ ❤ a ❀ b ✐ ✮ ❂ b � a Figure: mud algorithm for computing the total span (left)

  21. MUD Examples ✟✭ x ✮ ❂ ❤ x ❀ h ✭ x ✮ ❀ 1 ✐ ✟ ✭ ❤ a 1 ❀ h ✭ a 1 ✮ ❀ c 1 ✐ ❀ ❤ a 2 ❀ h ✭ a 2 ✮ ❀ c 2 ✐ ✮ ✭ ❤ a i ❀ h ✭ a i ✮ ❀ c i ✐ if h ✭ a i ✮ ❁ h ✭ a j ✮ = ❤ a 1 ❀ h ✭ a 1 ✮ ❀ c 1 ✰ c 2 ✐ otherwise ✑ ✭ ❤ a ❀ b ❀ c ✐ ✮ ❂ a if c ❂ 1 Figure: Mud algorithms for computing a uniform random sample of the unique items in a set (right). Here h is an approximate minwise hash function.

  22. Streaming ■ streaming algorithm s ❂ ✭ ✛❀ ✑ ✮ . ■ operator ✛ ✿ Q ✂ ✝ ✦ Q ■ ✑ ✿ Q ✦ ✝ converts the final state to the output. ■ On input x ✷ ✝ n , the streaming algorithm computes f ❂ ✑ ✭ s 0 ✭ x ✮✮ , where 0 is the starting state, and s q ✭ x ✮ ❂ ✛ ✭ ✛ ✭ ✿ ✿ ✿ ✛ ✭ ✛ ✭ q ❀ x 1 ✮ ❀ x 2 ✮ ❀ ✿ ✿ ✿ ❀ x k � 1 ✮ ❀ x k ✮ . ■ Communication complexity is log ❥ Q ❥

  23. ■ ■ ■ ❂ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ✿ ❂ ✭ ✮ ❂ MUD vs Streaming ■ For a mud algorithm m ❂ ✭✟ ❀ ✟ ❀ ✑ ✮ , there is a streaming algorithm s ❂ ✭ ✛❀ ✑ ✮ of the same complexity with same output, by setting ✛ ✭ q ❀ x ✮ ❂ ✟ ✭ q ❀ ✟✭ x ✮✮ .

  24. ■ ■ ❂ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ✿ ❂ ✭ ✮ ❂ MUD vs Streaming ■ For a mud algorithm m ❂ ✭✟ ❀ ✟ ❀ ✑ ✮ , there is a streaming algorithm s ❂ ✭ ✛❀ ✑ ✮ of the same complexity with same output, by setting ✛ ✭ q ❀ x ✮ ❂ ✟ ✭ q ❀ ✟✭ x ✮✮ . ■ Central question: Can MUD simulate streaming?

  25. ■ ❂ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ✿ ❂ ✭ ✮ ❂ MUD vs Streaming ■ For a mud algorithm m ❂ ✭✟ ❀ ✟ ❀ ✑ ✮ , there is a streaming algorithm s ❂ ✭ ✛❀ ✑ ✮ of the same complexity with same output, by setting ✛ ✭ q ❀ x ✮ ❂ ✟ ✭ q ❀ ✟✭ x ✮✮ . ■ Central question: Can MUD simulate streaming? ■ Count the occurrences of the first odd number on the stream.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend