Fast PageRank Approximations
- n Graph Engines
FrogWild!
Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Constantine Caramanis
FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis - - PowerPoint PPT Presentation
FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Fast PageRank Approximations Constantine Caramanis on Graph Engines Web Ranking Given web graph Find important pages E B A D D C 2 Web Ranking Given web graph
Fast PageRank Approximations
Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Constantine Caramanis
2
2
2
S S S S
3
3
B
A
Important pages are pointed to by
❖ important pages are pointed to by
❖ important pages are pointed to by…
3
B
A
Important pages are pointed to by
❖ important pages are pointed to by
❖ important pages are pointed to by…
4
4
4
4
4
5
B
A
5
B
A
1
5
B
A
1/3 1/3 1/3
5
B
A
1
5
B
A
5
B
A
5
B
A
6
Do not need full PageRank vector
Favors heavy nodes
B
A
B
A
For node set S: (S)
6
Do not need full PageRank vector
Favors heavy nodes
B
A
Return set {E,D} Captured mass = ({E,D}) k=2
For node set S: (S)
❖ Engine splits graph across cluster ❖ Vertex program describes logic
8
Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013]
❖ Engine splits graph across cluster ❖ Vertex program describes logic
8
Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013]
❖ Engine splits graph across cluster ❖ Vertex program describes logic
8
Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013]
❖ Engine splits graph across cluster ❖ Vertex program describes logic
8
Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013]
❖ Assign vertices to machines ❖ Cross-machine edges require
network communication
❖ Pregel, GraphLab 1.0 ❖ High-degree nodes generate
large volume of traffic
❖ Computational load imbalance
9
❖ Assign vertices to machines ❖ Cross-machine edges require
network communication
❖ Pregel, GraphLab 1.0 ❖ High-degree nodes generate
large volume of traffic
❖ Computational load imbalance
9
Machine 2 Machine 1 Machine 3
10
❖ Assign edges to machines ❖ High-degree nodes replicated ❖ One replica designated master ❖ Need for synchronization
❖ GraphLab 2.0 - PowerGraph ❖ Balanced - Network still bottleneck
10
Machine 2 Machine 1 Machine 3
❖ Assign edges to machines ❖ High-degree nodes replicated ❖ One replica designated master ❖ Need for synchronization
❖ GraphLab 2.0 - PowerGraph ❖ Balanced - Network still bottleneck
10
Machine 2 Machine 1 Machine 3
❖ Assign edges to machines ❖ High-degree nodes replicated ❖ One replica designated master ❖ Need for synchronization
❖ GraphLab 2.0 - PowerGraph ❖ Balanced - Network still bottleneck
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
11
Decision synced to all mirrors Only machine M needs it
Average replication factor ~8
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
Z
11
Decision synced to all mirrors Only machine M needs it
Average replication factor ~8
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
Z Z Z
11
Decision synced to all mirrors Only machine M needs it
Average replication factor ~8
Only synchronize the mirror that will receive the frog Doable, but requires
Pick mirrors to synchronize at random!
12
Synchronize independently with probability pS
1.Each frog dies w.p. (gives sample) Assume K frogs survive 2.For every mirror, draw bridge w.p. 3.Spread frogs evenly among synchronized mirrors.
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
Ber(pS) Ber(pS) Ber(pS)
13
1.Each frog dies w.p. (gives sample) Assume K frogs survive 2.For every mirror, draw bridge w.p. 3.Spread frogs evenly among synchronized mirrors.
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
Ber(pS) Ber(pS) Ber(pS)
13
1.Each frog dies w.p. (gives sample) Assume K frogs survive 2.For every mirror, draw bridge w.p. 3.Spread frogs evenly among synchronized mirrors.
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
Ber(pS) Ber(pS)
13
1.Each frog dies w.p. (gives sample) Assume K frogs survive 2.For every mirror, draw bridge w.p. 3.Spread frogs evenly among synchronized mirrors.
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
Ber(pS)
13
1.Each frog dies w.p. (gives sample) Assume K frogs survive 2.For every mirror, draw bridge w.p. 3.Spread frogs evenly among synchronized mirrors.
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
13
1.Each frog dies w.p. (gives sample) Assume K frogs survive 2.For every mirror, draw bridge w.p. 3.Spread frogs evenly among synchronized mirrors.
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
13
1.Each frog dies w.p. (gives sample) Assume K frogs survive 2.For every mirror, draw bridge w.p. 3.Spread frogs evenly among synchronized mirrors.
Machine 1
A B
Machine 2
B C
Machine 3
B
D
Machine M
B Z
13
14
15
2 +
S)p∩(t)
17
18
19
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., & Hellerstein, J. M. (2010). Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1006.4990. Gonzalez, J. E., Low, Y., Gu, H., Bickson, D., & Guestrin, C. (2012, October). PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs. In OSDI (Vol. 12, No. 1, p. 2). Nguyen, D., Lenharth, A., & Pingali, K. (2013, November). A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (pp. 456-471). ACM. Avery, C. (2011). Giraph: Large-scale graph processing infrastruction on Hadoop. Proceedings of Hadoop
Xin, R. S., Gonzalez, J. E., Franklin, M. J., & Stoica, I. (2013, June). Graphx: A resilient distributed graph system
23
B
A
23
B
A
1 1 1/3 1/3 1/3
23
B
A
pT ∈ [0, 1]
23
B
A
pT ∈ [0, 1]
23
B
A
pT ∈ [0, 1]
B C
A
B
A