Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and - - PowerPoint PPT Presentation

walk alkin ing ran andomly ly mas massiv ively ly an and
SMART_READER_LITE
LIVE PREVIEW

Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and - - PowerPoint PPT Presentation

Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and Effic iciently ly Jakub cki Slobodan Mitrovi Krzysztof Onak Piotr Sankowski Why Random Walks? Web ratings [Page, Brin, Motwani, Winograd 99] [Berkhin 05]


slide-1
SLIDE 1

Walk alkin ing Ran andomly ly, Mas Massiv ively ly, an and Effic iciently ly

Jakub Łącki Slobodan Mitrović Krzysztof Onak Piotr Sankowski

slide-2
SLIDE 2

Why Random Walks?

  • Web ratings [Page, Brin, Motwani, Winograd ’99] [Berkhin ‘05]

[Chierichetti, Haddadan ‘17]

  • Graph partitioning [Andersen, Chung, Lang ‘06]
  • Random spanning trees [Kelner, Mądry ‘09]
  • Property testing [Goldreich, Ron ’99] [Kaufman, Krivelevich, Ron ‘04]

[Czumaj, Sohler ’10] [Nachmias, Shapira ‘10] [Kale, Seshadhri ‘11] [Czumaj, Peng, Sohler ’15] [Chiplunkar, Kapralov, Khanna, Mousavifar, Peres ‘18] [Kumar, Seshadhri, Stolman ‘18] [Czumaj, Monemizadeh, Onak, Sohler ’19]

  • Connectivity [Reif ’85] [Halperin, Zwick ’94]
  • Matching [Goel, Kapralov, Khanna ‘13]
  • Laplacian solvers [Andoni, Krauthgamer, Pogrow ‘18]
slide-3
SLIDE 3

How to Compute Random Walks?

  • Centralized [direct implementation]
  • Streaming [Sarma, Gollapudi, Panigrahy ’11, Jin ‘19]
  • Distributed (CONGEST) [Sarma, Nanongkai, Pandurangan, Tetali’13]
  • MPC, undirected graphs (non-independent walks) [Bahmani, Chakrabarti, Xin ’11]
slide-4
SLIDE 4

How to Compute Random Walks?

  • Centralized [direct implementation]
  • Streaming [Sarma, Gollapudi, Panigrahy ’11, Jin ‘19]
  • Distributed (CONGEST) [Sarma, Nanongkai, Pandurangan, Tetali’13]

Our result (undirected graphs):

Independent random walks in MPC with sublinear memory per machine.

  • MPC, undirected graphs (non-independent walks) [Bahmani, Chakrabarti, Xin ’11]
slide-5
SLIDE 5

Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).

Our Results

slide-6
SLIDE 6

Our Results

Applications

Approximate bipartiteness testing Approximate expansion testing

Approximate connectivity and MST

PageRank for directed graph

Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).

slide-7
SLIDE 7

Approximate connectivity and MST

Our Results

Applications

Approximate bipartiteness testing Approximate expansion testing PageRank for directed graph

Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).

slide-8
SLIDE 8

Our Results

Applications

Approximate bipartiteness testing Conditional lower- bound of Ω(log L) Approximate expansion testing

Approximate connectivity and MST

PageRank for directed graph

Input: Undirected graph G; length L Output: An L-length random walk per vertex; walks mutually independent Rounds: O(log L) Space per machine: sublinear in n Total space: O(m L log n).

slide-9
SLIDE 9

Random Walks in Undirected Graphs

slide-10
SLIDE 10

Random Walks: Doubling by Stitching

G

Track spare random

  • walks. Use spare to

double wanted ones.

v

Output: deg(v) L-length random walk per v; walks mutually independent

slide-11
SLIDE 11

Random Walks: Doubling by Stitching

G

v w

2i

Track spare random

  • walks. Use spare to

double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent

slide-12
SLIDE 12

Random Walks: Doubling by Stitching

G

v w x

2i 2i

Track spare random

  • walks. Use spare to

double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent

slide-13
SLIDE 13

Random Walks: Doubling by Stitching

G

v w x

2i+1

Track spare random

  • walks. Use spare to

double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent

slide-14
SLIDE 14

Random Walks: Doubling by Stitching

G

v w x

But how will w know a priori how many walks will pass through it?

2i+1

y

Track spare random

  • walks. Use spare to

double wanted ones. Output: deg(v) L-length random walk per v; walks mutually independent

slide-15
SLIDE 15

Random Walks: Follow Stationary Distribution

But how will w know a priori how many walks will pass through it?

slide-16
SLIDE 16

Random Walks: Follow Stationary Distribution

Each vertex v maintains proportionally to deg(v) random walks. But how will w know a priori how many walks will pass through it?

G

v w y

slide-17
SLIDE 17

Random Walks: Follow Stationary Distribution

G

Each vertex v maintains proportionally to deg(v) random walks.

v w

But how will w know a priori how many walks will pass through it?

y

In expectation, after t steps there are proportionally to deg(v) walks ending at v.

slide-18
SLIDE 18

Random Walks: Takeaway

  • 1. Following stationary distribution

allows us to “predict” the future.

slide-19
SLIDE 19

Random Walks: Takeaway

  • 1. Following stationary distribution

allows us to “predict” the future.

  • 2. The memory requirement is

inversely proportional to the min entry

  • f the stationary distribution.

>=1/(2m)

slide-20
SLIDE 20

PageRank for Directed Graphs

Input: Directed graph GD Output: (1+α)-approximate PageRank; ε is the jumping probability Rounds: ෨ 𝑃(ε-1 log log n) Space per machine: sublinear in n Total space: ෨ 𝑃((m + n1+o(1)) ε-4 α-2).

slide-21
SLIDE 21

(Prelude) Random Walks: Undirected vs Directed

vs

Undirected graphs Directed graphs

slide-22
SLIDE 22

(Prelude) Random Walks: Undirected vs Directed

vs

Undirected graphs Directed graphs

Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.

slide-23
SLIDE 23

(Prelude) Random Walks: Undirected vs Directed

vs

Undirected graphs Directed graphs

Stationary distribution can be difficult to compute. Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.

slide-24
SLIDE 24

(Prelude) Random Walks: Undirected vs Directed

vs

Undirected graphs Directed graphs

Stationary distribution can be difficult to compute. Stationary distribution of v can be O(1/2n). Stationary distribution is easy to compute: deg(v) / (2m). Stationary distribution of v is “nicely” lower-bounded.

slide-25
SLIDE 25

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

slide-26
SLIDE 26

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

Walk matrix of G. Jumping to a random vertex Following P with

  • prob. 1 − 𝜗.
slide-27
SLIDE 27

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

slide-28
SLIDE 28

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

Undirected graphs Directed graphs

𝑈 and 𝑄 are “similar”.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

vs

slide-29
SLIDE 29

PageRank: Undirected vs Directed Graphs

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

Undirected graphs Directed graphs

We do not know stationary distribution of 𝑈. 𝑈 and 𝑄 are “similar”.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

vs

slide-30
SLIDE 30

PageRank: Undirected vs Directed Graphs

vs

Undirected graphs Directed graphs

We do not know stationary distribution of 𝑈. Stationary distribution of v w.r.t. 𝑈 at least ε/n.

Input: 𝑄 = 𝐻𝐸−1 𝑈 = 1 − 𝜗 𝑄 + 𝜗 𝑜 11𝑈 Output: Stationary distribution of 𝑈

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

𝑈 and 𝑄 are “similar”. Stationary distribution of v w.r.t. to P can be O(1/2n).

slide-31
SLIDE 31
slide-32
SLIDE 32

PageRank: Molding Undirected to Directed

PageRank for undirected G. PageRank for directed GD.

slide-33
SLIDE 33

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank for directed GD.

slide-34
SLIDE 34

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank for directed GD. Random walks for (1-δ)G+δ GD.

slide-35
SLIDE 35

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD.

slide-36
SLIDE 36

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD.

slide-37
SLIDE 37

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD. PageRank for δ G+(1-δ)GD.

. . .

slide-38
SLIDE 38

PageRank: Molding Undirected to Directed

PageRank for undirected G.

“Small” changes in 𝑈 require a “small” increase in the number of spare walks.

PageRank can be approximated from random walks of 𝑈. [Breyer ‘02]

PageRank for directed GD. Random walks for (1-δ)G+δ GD. PageRank for (1-δ)G+δ GD. PageRank for (1-2δ)G+2δ GD. PageRank for δ G+(1-δ)GD.

. . .

slide-39
SLIDE 39

PageRank: Takeaway

  • 1. The stationary distribution is lower-

bounded by ε/n.

slide-40
SLIDE 40

PageRank: Takeaway

  • 1. The stationary distribution is lower-

bounded by ε/n.

  • 2. “Small” changes in a walk matrix

affect the stationary distribution by little.

slide-41
SLIDE 41
slide-42
SLIDE 42

Massively Parallel Computation (MPC) round

. . . . . .

N machines: Data:

S S S S

slide-43
SLIDE 43

Massively Parallel Computation (MPC) round

. . . . . .

N machines: Data:

S S S S

slide-44
SLIDE 44

Massively Parallel Computation (MPC) round

. . . . . .

N machines: Data:

S S S S

process data locally

slide-45
SLIDE 45

Massively Parallel Computation (MPC) round

. . . . . .

N machines:

. . .

Next-round data: Data:

S S S S

slide-46
SLIDE 46

Massively Parallel Computation (MPC) round

. . . . . .

N machines:

. . .

Next-round data:

One round

Data:

S S S S

slide-47
SLIDE 47

Massively Parallel Computation (MPC) round

. . . . . .

N machines:

. . .

Next-round data: Data:

S S S S

One round

slide-48
SLIDE 48

Massively Parallel Computation (MPC) parameters

N = # of machines

S = space per machine

For graphs, N * S = ϴ(# of edges)

space S

slide-49
SLIDE 49

Massively Parallel Computation (MPC) parameters

N = # of machines

S = space per machine

For graphs, N * S = ϴ(# of edges) Goal: make the small # of rounds

space S ≤ S ≤ S

Interesting case: S much smaller than the input size