frogwild
play

FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis - PowerPoint PPT Presentation

FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Fast PageRank Approximations Constantine Caramanis on Graph Engines Web Ranking Given web graph Find important pages E B A D D C 2 Web Ranking Given web graph


  1. FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Fast PageRank Approximations Constantine Caramanis on Graph Engines

  2. Web Ranking Given web graph Find “important” pages E B A D D C 2

  3. Web Ranking Given web graph Find “important” pages E Rank Based on In-degree Classic Approach B A D C 2

  4. Web Ranking Given web graph Find “important” pages E Rank Based on In-degree Classic Approach B A A D D S S Susceptible C S S to manipulation by spammer networks 2

  5. PageRank [Page et al., 1999] Page Importance π Described by distribution E B A D C 3

  6. PageRank [Page et al., 1999] Page Importance π Described by distribution E Recursive Definition Important pages are pointed to by D B A ❖ important pages are pointed to by ❖ important pages are pointed to by… π C 3

  7. PageRank [Page et al., 1999] Page Importance π Described by distribution E Recursive Definition Important pages are pointed to by D B A ❖ important pages are pointed to by ❖ important pages are pointed to by… π Robust C to manipulation by spammer networks 3

  8. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly E B A D C 4

  9. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors B A D C 4

  10. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors B A D C 4

  11. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors Redistribute evenly B A D a fraction, p T = 0.15, of all water C 4

  12. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors Redistribute evenly B A D a fraction, p T = 0.15, of all water Repeat until convergence π C Power Iteration employed usually 4

  13. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E D B A C 5

  14. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E 1 D B A C 5

  15. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E 1/3 D B A 1/3 1/3 C 5

  16. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B A 1 C 5

  17. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B A C 5

  18. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B Sampling after t steps A Frog location gives sample from π C 5

  19. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B Sampling after t steps A Frog location gives sample from π π PageRank Vector C Many frogs, estimate vector π 5

  20. PageRank Approximation Looking for k “heavy nodes” Do not need full PageRank vector E E Random Walk Sampling Favors heavy nodes D D B B A A Captured Mass Metric C C For node set S: (S) π 6

  21. PageRank Approximation Looking for k “heavy nodes” Do not need full PageRank vector E Random Walk Sampling Favors heavy nodes D B A Captured Mass Metric k=2 C For node set S: (S) π Return set {E,D} Captured mass = ({E,D}) π 6

  22. Platform

  23. Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction B A D C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8

  24. Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8

  25. Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D 2. Apply C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8

  26. Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D 2. Apply 3. Scatter C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8

  27. Edge Cuts ❖ Assign vertices to machines E E ❖ Cross-machine edges require network communication ❖ Pregel, GraphLab 1.0 B B A A D D ❖ High-degree nodes generate large volume of traffic C C ❖ Computational load imbalance 9

  28. Edge Cuts Machine 2 Machine 1 E ❖ Assign vertices to machines B A ❖ Cross-machine edges require network communication D ❖ Pregel, GraphLab 1.0 ❖ High-degree nodes generate large volume of traffic C ❖ Computational load imbalance Machine 3 9

  29. Vertex Cuts ❖ Assign edges to machines ❖ High-degree nodes replicated E ❖ One replica designated master ❖ Need for synchronization 1. Gather B B B A D D 2. Apply [on master] 3. Synchronize mirrors 4. Scatter C ❖ GraphLab 2.0 - PowerGraph ❖ Balanced - Network still bottleneck 10

  30. Vertex Cuts Machine 1 Machine 2 ❖ Assign edges to machines E ❖ High-degree nodes replicated B A ❖ One replica designated master ❖ Need for synchronization B D 1. Gather 2. Apply [on master] B 3. Synchronize mirrors D 4. Scatter ❖ GraphLab 2.0 - PowerGraph C ❖ Balanced - Network still bottleneck Machine 3 10

  31. Vertex Cuts Machine 1 Machine 2 ❖ Assign edges to machines E ❖ High-degree nodes replicated B A ❖ One replica designated master ❖ Need for synchronization B D 1. Gather 2. Apply [on master] B 3. Synchronize mirrors D 4. Scatter ❖ GraphLab 2.0 - PowerGraph C ❖ Balanced - Network still bottleneck Machine 3 10

  32. Random Walks on GraphLab Machine 2 Master node decides step B C Decision synced to all mirrors Machine 1 Machine 3 B A B Only machine M needs it D Unnecessary network traffic Machine M Average replication factor ~8 B Z 11

  33. Random Walks on GraphLab Machine 2 Master node decides step B C Decision synced to all mirrors Machine 1 Z Machine 3 B A B Only machine M needs it D Unnecessary network traffic Machine M Average replication factor ~8 B Z 11

  34. Random Walks on GraphLab Machine 2 Master node decides step Z B C Decision synced to all mirrors Machine 1 Machine 3 Z B A B Only machine M needs it D Unnecessary network traffic Machine M Z Average replication factor ~8 B Z 11

  35. Objective Faster PageRank approximation on GraphLab Idea Only synchronize the mirror that will receive the frog Doable, but requires 1. Serious engine hacking 2. Exposing an ugly/complicated API to programmer Simpler Pick mirrors to synchronize at random! Synchronize independently with probability p S 12

  36. FrogWild! N Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program Ber( p S ) Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 Ber( p S ) synchronized mirrors. Machine M B Z 13

  37. FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program Ber( p S ) K Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 Ber( p S ) synchronized mirrors. Machine M B Z 13

  38. FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 Ber( p S ) synchronized mirrors. Machine M B Z 13

  39. FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 Ber( p S ) synchronized mirrors. Machine M B Z 13

  40. FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 synchronized mirrors. Machine M B Z 13

  41. FrogWild! Machine 2 Release N frogs in parallel K/2 B C Machine 1 Vertex Program Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 synchronized mirrors. Machine M K/2 B Z 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend