[PPT] - Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and PowerPoint Presentation

SLIDE 1

Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and Effic iciently ly

Jakub Łącki Slobodan Mitrović Krzysztof Onak Piotr Sankowski

SLIDE 2

Why Random Walks?

Web ratings [Page, Brin, Motwani, Winograd ’99] [Berkhin ‘05]

[Chierichetti, Haddadan ‘17]

Graph partitioning [Andersen, Chung, Lang ‘06]
Random spanning trees [Kelner, Mądry ‘09]
Property testing [Goldreich, Ron ’99] [Kaufman, Krivelevich, Ron ‘04]

[Czumaj, Sohler ’10] [Nachmias, Shapira ‘10] [Kale, Seshadhri ‘11] [Czumaj, Peng, Sohler ’15] [Chiplunkar, Kapralov, Khanna, Mousavifar, Peres ‘18] [Kumar, Seshadhri, Stolman ‘18] [Czumaj, Monemizadeh, Onak, Sohler ’19]

Connectivity [Reif ’85] [Halperin, Zwick ’94]
Matching [Goel, Kapralov, Khanna ‘13]
Laplacian solvers [Andoni, Krauthgamer, Pogrow ‘18]

SLIDE 3

How to Compute Random Walks?

Centralized [direct implementation]
Streaming [Sarma, Gollapudi, Panigrahy ’11, Jin ‘19]
Distributed (CONGEST) [Sarma, Nanongkai, Pandurangan, Tetali’13]
MPC, undirected graphs (non-independent walks) [Bahmani, Chakrabarti, Xin ’11]

SLIDE 4

How to Compute Random Walks?

Centralized [direct implementation]
Streaming [Sarma, Gollapudi, Panigrahy ’11, Jin ‘19]
Distributed (CONGEST) [Sarma, Nanongkai, Pandurangan, Tetali’13]

Our result (undirected graphs):

Independent random walks in MPC with sublinear memory per machine.

MPC, undirected graphs (non-independent walks) [Bahmani, Chakrabarti, Xin ’11]

SLIDE 5

Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).

Our Results

SLIDE 6

Our Results

Applications

Approximate bipartiteness testing Approximate expansion testing

Approximate connectivity and MST

PageRank for directed graph

Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).

SLIDE 7

Approximate connectivity and MST

Our Results

Applications

Approximate bipartiteness testing Approximate expansion testing PageRank for directed graph

Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).

SLIDE 8

Our Results

Applications

Approximate bipartiteness testing Conditional lower- bound of Ω(log L) Approximate expansion testing

Approximate connectivity and MST

PageRank for directed graph

Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).

SLIDE 9

Random Walks in Undirected Graphs

SLIDE 10

Random Walks: Doubling by Stitching

G

Track spare random

walks. Use spare to

double wanted ones.

v

Output: deg(v) L-length random walk per v; walks mutually independent

SLIDE 11

Random Walks: Doubling by Stitching

G

v w

2i

Track spare random

walks. Use spare to

double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent

SLIDE 12

Random Walks: Doubling by Stitching

G

v w x

2i 2i

Track spare random

walks. Use spare to

double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent

SLIDE 13

Random Walks: Doubling by Stitching

G

v w x

2i+1

Track spare random

walks. Use spare to

double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent

SLIDE 14

Random Walks: Doubling by Stitching

G

v w x

But how will w know a priori how many walks will pass through it?

2i+1

y

Track spare random

walks. Use spare to

double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent

SLIDE 15

Random Walks: Follow Stationary Distribution

But how will w know a priori how many walks will pass through it?

SLIDE 16

Random Walks: Follow Stationary Distribution

Each vertex v maintains proportionally to deg(v) random walks. But how will w know a priori how many walks will pass through it?

G

v w y

SLIDE 17

Random Walks: Follow Stationary Distribution

G

Each vertex v maintains proportionally to deg(v) random walks.

v w

But how will w know a priori how many walks will pass through it?

y

In expectation, after t steps there are proportionally to deg(v) walks ending at v.

SLIDE 18

Random Walks: Takeaway

1. Following stationary distribution

allows us to “predict” the future.

SLIDE 19

Random Walks: Takeaway

1. Following stationary distribution

allows us to “predict” the future.

2. The memory requirement is

inversely proportional to the min entry

f the stationary distribution.

>=1/(2m)

SLIDE 20

PageRank for Directed Graphs

Input: Directed graph GD Output: (1+α)-approximate PageRank; ε is the jumping probability Rounds: ෨ 𝑃(ε-1 log log n) Space per machine: sublinear in n Total space: ෨ 𝑃((m + n1+o(1)) ε-4 α-2).

SLIDE 21

(Prelude) Random Walks: Undirected vs Directed

vs

Undirected graphs Directed graphs

SLIDE 22

(Prelude) Random Walks: Undirected vs Directed

vs

Undirected graphs Directed graphs

Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.

SLIDE 23

(Prelude) Random Walks: Undirected vs Directed

vs

Undirected graphs Directed graphs

Stationary distribution can be difficult to compute. Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.

SLIDE 24

(Prelude) Random Walks: Undirected vs Directed

vs

Undirected graphs Directed graphs

Stationary distribution can be difficult to compute. Stationary distribution of v can be O(1/2n). Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.

SLIDE 25

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

SLIDE 26

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

Walk matrix of G. Jumping to a random vertex Following P with

prob. 1 − 𝜗.

SLIDE 27

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

SLIDE 28

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

Undirected graphs Directed graphs

𝑈 and 𝑄 are “similar”.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

vs

SLIDE 29

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

Undirected graphs Directed graphs

We do not know stationary distribution of 𝑈. 𝑈 and 𝑄 are “similar”.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

vs

SLIDE 30

PageRank: Undirected vs Directed Graphs

vs

Undirected graphs Directed graphs

We do not know stationary distribution of 𝑈. Stationary distribution of v w.r.t. 𝑈 at least ε/n.

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

𝑈 and 𝑄 are “similar”. Stationary distribution of v w.r.t. to P can be O(1/2n).

SLIDE 31

SLIDE 32

PageRank: Molding Undirected to Directed

PageRank for undirected G. PageRank for directed GD.

SLIDE 33

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank for directed GD.

SLIDE 34

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank for directed GD. Random walks for (1-δ)G+δ GD.

SLIDE 35

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD.

SLIDE 36

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD.

SLIDE 37

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD. PageRank for δ G+(1-δ)GD.

. . .

SLIDE 38

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD. PageRank for δ G+(1-δ)GD.

. . .

SLIDE 39

PageRank: Takeaway

1. The stationary distribution is lower-

bounded by ε/n.

SLIDE 40

PageRank: Takeaway

1. The stationary distribution is lower-

bounded by ε/n.

2. “Small” changes in a walk matrix

affect the stationary distribution by little.

SLIDE 41

SLIDE 42

Massively Parallel Computation (MPC) round

. . . . . .

N machines: Data:

S S S S

SLIDE 43

Massively Parallel Computation (MPC) round

. . . . . .

N machines: Data:

S S S S

SLIDE 44

Massively Parallel Computation (MPC) round

. . . . . .

N machines: Data:

S S S S

process data locally

SLIDE 45

Massively Parallel Computation (MPC) round

. . . . . .

N machines:

. . .

Next-round data: Data:

S S S S

SLIDE 46

Massively Parallel Computation (MPC) round

. . . . . .

N machines:

. . .

Next-round data:

One round

Data:

S S S S

SLIDE 47

Massively Parallel Computation (MPC) round

. . . . . .

N machines:

. . .

Next-round data: Data:

S S S S

One round

SLIDE 48

Massively Parallel Computation (MPC) parameters

N = # of machines

S = space per machine

For graphs, N * S = ϴ(# of edges)

space S

SLIDE 49

Massively Parallel Computation (MPC) parameters

N = # of machines

S = space per machine

For graphs, N * S = ϴ(# of edges) Goal: make the small # of rounds

space S ≤ S ≤ S

Interesting case: S much smaller than the input size