Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and Effic iciently ly
Jakub Łącki Slobodan Mitrović Krzysztof Onak Piotr Sankowski
Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and - - PowerPoint PPT Presentation
Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and Effic iciently ly Jakub cki Slobodan Mitrovi Krzysztof Onak Piotr Sankowski Why Random Walks? Web ratings [Page, Brin, Motwani, Winograd 99] [Berkhin 05]
Jakub Łącki Slobodan Mitrović Krzysztof Onak Piotr Sankowski
[Chierichetti, Haddadan ‘17]
[Czumaj, Sohler ’10] [Nachmias, Shapira ‘10] [Kale, Seshadhri ‘11] [Czumaj, Peng, Sohler ’15] [Chiplunkar, Kapralov, Khanna, Mousavifar, Peres ‘18] [Kumar, Seshadhri, Stolman ‘18] [Czumaj, Monemizadeh, Onak, Sohler ’19]
Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).
Applications
Approximate bipartiteness testing Approximate expansion testing
Approximate connectivity and MST
PageRank for directed graph
Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).
Approximate connectivity and MST
Applications
Approximate bipartiteness testing Approximate expansion testing PageRank for directed graph
Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).
Applications
Approximate bipartiteness testing Conditional lower- bound of Ω(log L) Approximate expansion testing
Approximate connectivity and MST
PageRank for directed graph
Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).
Track spare random
double wanted ones.
v
Output: deg(v) L-length random walk per v; walks mutually independent
v w
2i
Track spare random
double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent
v w x
2i 2i
Track spare random
double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent
v w x
2i+1
Track spare random
double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent
v w x
But how will w know a priori how many walks will pass through it?
2i+1
y
Track spare random
double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent
But how will w know a priori how many walks will pass through it?
Each vertex v maintains proportionally to deg(v) random walks. But how will w know a priori how many walks will pass through it?
v w y
Each vertex v maintains proportionally to deg(v) random walks.
v w
But how will w know a priori how many walks will pass through it?
y
In expectation, after t steps there are proportionally to deg(v) walks ending at v.
>=1/(2m)
Input: Directed graph GD Output: (1+α)-approximate PageRank; ε is the jumping probability Rounds: ෨ 𝑃(ε-1 log log n) Space per machine: sublinear in n Total space: ෨ 𝑃((m + n1+o(1)) ε-4 α-2).
Undirected graphs Directed graphs
Undirected graphs Directed graphs
Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.
Undirected graphs Directed graphs
Stationary distribution can be difficult to compute. Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.
Undirected graphs Directed graphs
Stationary distribution can be difficult to compute. Stationary distribution of v can be O(1/2n). Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.
Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈
Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈
Walk matrix of G. Jumping to a random vertex Following P with
Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈
PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]
Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈
Undirected graphs Directed graphs
𝑈 and 𝑄 are “similar”.
PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]
Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈
Undirected graphs Directed graphs
We do not know stationary distribution of 𝑈. 𝑈 and 𝑄 are “similar”.
PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]
Undirected graphs Directed graphs
We do not know stationary distribution of 𝑈. Stationary distribution of v w.r.t. 𝑈 at least ε/n.
Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈
PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]
𝑈 and 𝑄 are “similar”. Stationary distribution of v w.r.t. to P can be O(1/2n).
PageRank for undirected G. PageRank for directed GD.
PageRank for undirected G.
“Small” changes in 𝑈 require a “small” increase in the number of spare walks.
PageRank for directed GD.
PageRank for undirected G.
“Small” changes in 𝑈 require a “small” increase in the number of spare walks.
PageRank for directed GD. Random walks for (1-δ)G+δ GD.
PageRank for undirected G.
“Small” changes in 𝑈 require a “small” increase in the number of spare walks.
PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]
PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD.
PageRank for undirected G.
“Small” changes in 𝑈 require a “small” increase in the number of spare walks.
PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]
PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD.
PageRank for undirected G.
“Small” changes in 𝑈 require a “small” increase in the number of spare walks.
PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]
PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD. PageRank for δ G+(1-δ)GD.
. . .
PageRank for undirected G.
“Small” changes in 𝑈 require a “small” increase in the number of spare walks.
PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]
PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD. PageRank for δ G+(1-δ)GD.
. . .
N machines: Data:
N machines: Data:
N machines: Data:
N machines:
Next-round data: Data:
N machines:
Next-round data:
Data:
N machines:
Next-round data: Data:
For graphs, N * S = ϴ(# of edges)
For graphs, N * S = ϴ(# of edges) Goal: make the small # of rounds
Interesting case: S much smaller than the input size