Graph Streaming and Sketching Lecture 19 Nov 5, 2020 Chandra - - PowerPoint PPT Presentation

graph streaming and sketching
SMART_READER_LITE
LIVE PREVIEW

Graph Streaming and Sketching Lecture 19 Nov 5, 2020 Chandra - - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Graph Streaming and Sketching Lecture 19 Nov 5, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 1 Graphs G = ( V , E ) is an undirected graph n = | V | and m = | E | Edges e 1 , e 2 , . . . , e m seen as a


slide-1
SLIDE 1

CS 498ABD: Algorithms for Big Data

Graph Streaming and Sketching

Lecture 19

Nov 5, 2020

Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 1
slide-2
SLIDE 2

Graphs

G = (V , E) is an undirected graph n = |V | and m = |E| Edges e1, e2, . . . , em seen as a stream, n known

Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 1

cite,

13107,

  • .
slide-3
SLIDE 3

Graphs

G = (V , E) is an undirected graph n = |V | and m = |E| Edges e1, e2, . . . , em seen as a stream, n known Questions: What graph problems can be solve with small space? Can we handle edge deletions?

Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 1

=

slide-4
SLIDE 4

Semi-streaming Model

Lower bounds show that we require Ω(n) memory Assume we have Θ(npolylog(n) memory. About polylog per vertex

  • f the graph

Can solve several interesting problems. Essentially reduce dense graphs to sparse graphs.

Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 1
  • ÷
slide-5
SLIDE 5

Connectivity

Is G connected? Output a spanning tree if it is. Output an MST of G in the weighted case. Is G k-edge connected?

Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 1
slide-6
SLIDE 6

Basic Connectivity

Maintain spanning forest: need only O(n) edges When edge ei = (u, v) arrives. If u and v are in different components add ei to spanning forest. Otherwise discard ei.

Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 1 Ci , er , .
  • , Eun

want

to

know

at end

  • f

stream if

A

is

connected

slide-7
SLIDE 7

:÷iEE

'

am

÷

=

slide-8
SLIDE 8

MST

Maintain spanning forest: need only O(n) edges When edge ei = (u, v) arrives. If u and v are in different components add ei to spanning forest. What if u and v are in same connected component?

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 1
slide-9
SLIDE 9
  • ¥

.

¥

.

slide-10
SLIDE 10

MST

Maintain spanning forest: need only O(n) edges When edge ei = (u, v) arrives. If u and v are in different components add ei to spanning forest. What if u and v are in same connected component? Check cycle formed by adding ei and discard heaviest edge in cycle.

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 1
slide-11
SLIDE 11

MST

Maintain spanning forest: need only O(n) edges When edge ei = (u, v) arrives. If u and v are in different components add ei to spanning forest. What if u and v are in same connected component? Check cycle formed by adding ei and discard heaviest edge in cycle. Exercise: Prove that algorithm outputs an MST if G is connected.

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 1
slide-12
SLIDE 12

MST

Maintain spanning forest: need only O(n) edges When edge ei = (u, v) arrives. If u and v are in different components add ei to spanning forest. What if u and v are in same connected component? Check cycle formed by adding ei and discard heaviest edge in cycle. Exercise: Prove that algorithm outputs an MST if G is connected. Note: we did not focus on time to process each edge in stream. Can use data structures to implement in O(log n) time per operation.

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 1
slide-13
SLIDE 13

k-edge-connectivity

Definition A graph G = (V , E) is k-edge-connected if deleting any k 1 edges still leaves a connected graph.

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 1

¥I

:

wt

z -edge connected

slide-14
SLIDE 14

t¥¥÷÷

slide-15
SLIDE 15

k-edge-connectivity

Definition A graph G = (V , E) is k-edge-connected if deleting any k 1 edges still leaves a connected graph. Definition Given a graph G = (V , E) and S ⇢ V , (S) is the set of edges with exactly one end point in S.

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 1

OF

slide-16
SLIDE 16

k-edge-connectivity

Definition A graph G = (V , E) is k-edge-connected if deleting any k 1 edges still leaves a connected graph. Definition Given a graph G = (V , E) and S ⇢ V , (S) is the set of edges with exactly one end point in S. Lemma A graph G is k-edge connected iff |(S)| k for all S ⇢ V .

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 1

=

slide-17
SLIDE 17

÷

.

QO

slide-18
SLIDE 18

Sparse certificates for k-edge connectivity

Observation: If G is k-edge-connected than m kn/2. Why?

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 1

NIO

*

dy lol >

. K .

Edgar

K

  • 2

m %

,

nlzc

.
  • m. ⑨

EE

slide-19
SLIDE 19

Sparse certificates for k-edge connectivity

Observation: If G is k-edge-connected than m kn/2. Why? Question: Suppose G is edge-minimal k-edge-connected graph on n

  • nodes. What is an upper bound on the number of edges?
Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 1

=

slide-20
SLIDE 20

Sparse certificates for k-edge connectivity

Observation: If G is k-edge-connected than m kn/2. Why? Question: Suppose G is edge-minimal k-edge-connected graph on n

  • nodes. What is an upper bound on the number of edges?

Theorem An edge-minimal k-edge-connected graph on n nodes has at most k(n 1) edges.

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 1
  • cnn.in:8?-z--(n-llthz
slide-21
SLIDE 21

Sparse certificates for k-edge connectivity

Observation: If G is k-edge-connected than m kn/2. Why? Question: Suppose G is edge-minimal k-edge-connected graph on n

  • nodes. What is an upper bound on the number of edges?

Theorem An edge-minimal k-edge-connected graph on n nodes has at most k(n 1) edges. Theorem Given a graph G finding the smallest 2-edge-connected subgraph is NP-Hard.

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 1
slide-22
SLIDE 22

Sparse certificates for k-edge connectivity

Theorem An edge-minimal k-edge-connected graph on n nodes has at most k(n 1) edges. Constructive proof via algorithm.

For i = 1 to k do Let Fi be a spanning forest in (V , E \ [i−1 j=1Fj) Output H = (V , F1 [ F2 . . . [ Fk) Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 1
slide-23
SLIDE 23
  • #
  • Red
=

Fa

¥¥:÷÷:

  • Q
  • A
  • Claim :

Iof h

is

1

edge

connected

iff

Fi

is

1

edge connected

claim :

A

is

2

edge

connected

iff

Fi UE

is 2

edge

Canuck

claim :

h

is

3

edge

connected

iff

F

, v Fr UF

,

is 7 eef -

.
slide-24
SLIDE 24

Sparse certificates for k-edge connectivity

Theorem An edge-minimal k-edge-connected graph on n nodes has at most k(n 1) edges. Constructive proof via algorithm.

For i = 1 to k do Let Fi be a spanning forest in (V , E \ [i−1 j=1Fj) Output H = (V , F1 [ F2 . . . [ Fk)

Easy to see that H as at most k(n 1) edges. Lemma H is k-edge-connected if G is.

Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 1
slide-25
SLIDE 25

Streaming setting

For i = 1 to k do Let Fi be a spanning forest in (V , E \ [i−1 j=1Fj) Output H = (V , F1 [ F2 . . . [ Fk)

Algorithm can be implemented in streaming setting. How?

Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 1

Maintain

Fi

,

IT ,

  • i

Fk

Q

Q

9

slide-26
SLIDE 26

k-node-connectivity

Definition A graph G = (V , E) is k-node-connected (or k-vertex-connected) if deleting any k 1 nodes leaves a connected graph.

Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 1
  • Ifcut valet

£-4

slide-27
SLIDE 27

k-node-connectivity

Definition A graph G = (V , E) is k-node-connected (or k-vertex-connected) if deleting any k 1 nodes leaves a connected graph. Theorem An edge-minimal k-edge-connected graph on n nodes has at most kn edges. Above theorem is much more tricky than for the edge case. See [Zelke] for references and streaming algorithm.

Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 1

O

Ide

=

Flynt kn)

slide-28
SLIDE 28

Part I Graph sketching for connectivity

Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 1
slide-29
SLIDE 29
  • add
ee ,

add

add

( un)

,

add (u , w)

,

delete

  • law)
  • r
r r l

l

suit

.

"

slide-30
SLIDE 30

Graph sketching

We saw previously that linear sketching on vectors x allows for several powerful applications including ability to handle deletions Graph streaming with deletions: each token in stream is of the form (e, ∆) where e is an edge and ∆ 2 {1, 1}. Want to maintain a sketch/data structure of size O(npolylog(n)) such that one can answer basic questions. Example: connectivity queries.

Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 1
  • O

IE Rd

poly

#

slide-31
SLIDE 31

Linear sketching recap

Vector x 2 Rn that is updated one coordinate at a time. Pick a sketch matrix Mr 2 Rk×n and maintain sketch Mrx of dimension k The sketch matrix Mr depends on a random string r and is implicitly defined and not explicitly stored. Assumption is that Mr1i for vector 1i (which has 1 in i’th coordinate and 0 in all

  • ther entries) can be computed efficiently from r.

When x is updated to x + ↵1i we update sketch by ↵Mr1i. Do postprocessing of Mrx

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 1

Mc

. REM

M=¥

  • y

ye Rk

=

slide-32
SLIDE 32

`0 sampling in turnstile model

kxk0 is number of non-zero coordinates (distinct elements) `0-sampling: output a non-zero coordinate of x near uniformly. Can be done with O(log2 n)-sized sketch Note: allow positive and negative entries in x

Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 1

=

(to

0, O, I, O, I , -1,0, 0,01

O

slide-33
SLIDE 33

Sketching for graphs

Consider vector f 2 R(n

2) where fi 2 {0, 1} indicating whether edge

i in the complete graph on n nodes is in the graph or not. Example: Sketching f is not adequate for most graph applications. We need information about edges incident to each vertex. For node v let fv 2 R(n

2) be a vector that only considers edges

incident to v in the complete graph. Essentially the row of v in the adjacency matrix.

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 1

]

Graph

  • 3

( K ,

  • , →
  • i
i
  • I

(7)

slide-34
SLIDE 34
  • il

¥

'

(121437114 ) (237/24) (34 )

i.÷ic÷÷

4

→g

"

%

"!

":

"

:

"

O

O O O O l O O
  • I
O
  • I
  • I

fu

vedic fu

vertex

v

. 427 1137 11411231 (24 )

(34)

[ fu

=

( o ,

  • p

VES

  • = Limit
slide-35
SLIDE 35

Sketching for graphs

Consider vector f 2 R(n

2) where fi 2 {0, 1} indicating whether edge

i in the complete graph on n nodes is in the graph or not. Example: Sketching f is not adequate for most graph applications. We need information about edges incident to each vertex. For node v let fv 2 R(n

2) be a vector that only considers edges

incident to v in the complete graph. Essentially the row of v in the adjacency matrix. Why use n

2
  • dimensions?
Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 1

=

slide-36
SLIDE 36

Sketching for graphs

Consider vector f 2 R(n

2) where fi 2 {0, 1} indicating whether edge

i in the complete graph on n nodes is in the graph or not. Example: Sketching f is not adequate for most graph applications. We need information about edges incident to each vertex. For node v let fv 2 R(n

2) be a vector that only considers edges

incident to v in the complete graph. Essentially the row of v in the adjacency matrix. Why use n

2
  • dimensions? To be able to use linear
  • perations over different nodes.

We sketch each fv using same sketch matrix M and this takes O(npolylog(n)) space.

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 1
  • _
slide-37
SLIDE 37

Sketching for graphs: connectivity

For connectivity the following specific representation is useful. Assume wlog that V = [n] Define vector a(i) for node i of dimension n

2
  • as follows:

a(i)({k, j}) = 0 if i 6= k and i 6= j (edge is not incident to i) a(i)({k, j}) = 1 if i = k and i < j (edge is incident to i and neighbor has higher index) a(i)({k, j}) = 1 if i = j and k < i (edge is incident to i and neighbor has higher index)

Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 1
slide-38
SLIDE 38

Sketching for graphs: connectivity

For connectivity the following specific representation is useful. Assume wlog that V = [n] Define vector a(i) for node i of dimension n

2
  • as follows:

a(i)({k, j}) = 0 if i 6= k and i 6= j (edge is not incident to i) a(i)({k, j}) = 1 if i = k and i < j (edge is incident to i and neighbor has higher index) a(i)({k, j}) = 1 if i = j and k < i (edge is incident to i and neighbor has higher index) Lemma Suppose S ⇢ [n] then P

i∈S a(i) is the representation for the node
  • btained by contracting S into a single node.
Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 1
slide-39
SLIDE 39

Example

Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 1
slide-40
SLIDE 40

Connectivity using sketching

Setting: stream of edge updates (ei, ∆i) where ei specifies the end points and ∆i 2 {1, 1} (insert or delete). Strict turnstile. Want to know if G is connected at end of stream and find a spanning tree Want to use O(n logc n) space for some small c

Chandra (UIUC) CS498ABD 19 Fall 2020 19 / 1

=

=

=

I

slide-41
SLIDE 41

Offline algorithm

Consider following “parallel” algorithm for spanning tree computation similar to Bourouvka’s algorithm for MST Start with each vertex in separate connected component In each round each connected component picks a single edge leaving it. All chosen edges added and connected components updated (equivalently shrink the connected components into a single node) Repeat until graph has a single connected component (or equivalently we have only one node)

Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 1

=

slide-42
SLIDE 42

Offline algorithm

Consider following “parallel” algorithm for spanning tree computation similar to Bourouvka’s algorithm for MST Start with each vertex in separate connected component In each round each connected component picks a single edge leaving it. All chosen edges added and connected components updated (equivalently shrink the connected components into a single node) Repeat until graph has a single connected component (or equivalently we have only one node) Algorithm terminates in O(log n) iterations.

Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 1
slide-43
SLIDE 43

Emulation via sketching

Focus on implementing the first iteration of the offline algorithm. Pick a sketching matrix M and keep sketches of Ma(i) for each i 2 [n] while edges are seen in the stream. Note: each edge e = (i, j) updates a(i) and a(j). After seeing all edges use `0 sampling from the sketch to pick a non-zero coordinate from a(i) which corresponds to an edge incident to node i. Sketch size is O(n logc n) to enable correctness of `0 sampling with high probability.

Chandra (UIUC) CS498ABD 21 Fall 2020 21 / 1
slide-44
SLIDE 44

Emulation via sketching

Focus on implementing the first iteration of the offline algorithm. Pick a sketching matrix M and keep sketches of Ma(i) for each i 2 [n] while edges are seen in the stream. Note: each edge e = (i, j) updates a(i) and a(j). After seeing all edges use `0 sampling from the sketch to pick a non-zero coordinate from a(i) which corresponds to an edge incident to node i. Sketch size is O(n logc n) to enable correctness of `0 sampling with high probability. We need to recurse after picking edges in first iteration and contract to create new contracted graph.

Chandra (UIUC) CS498ABD 21 Fall 2020 21 / 1
slide-45
SLIDE 45

Emulation via sketching

Focus on implementing the first iteration of the offline algorithm. Pick a sketching matrix M and keep sketches of Ma(i) for each i 2 [n] while edges are seen in the stream. Note: each edge e = (i, j) updates a(i) and a(j). After seeing all edges use `0 sampling from the sketch to pick a non-zero coordinate from a(i) which corresponds to an edge incident to node i. Sketch size is O(n logc n) to enable correctness of `0 sampling with high probability. We need to recurse after picking edges in first iteration and contract to create new contracted graph. But contracted graph depends on sketch and we cannot make another pass!

Chandra (UIUC) CS498ABD 21 Fall 2020 21 / 1
slide-46
SLIDE 46

Emulation via sketching

Focus on implementing the first iteration of the offline algorithm. Pick a sketching matrix M and keep sketches of Ma(i) for each i 2 [n] while edges are seen in the stream. Note: each edge e = (i, j) updates a(i) and a(j). After seeing all edges use `0 sampling from the sketch to pick a non-zero coordinate from a(i) which corresponds to an edge incident to node i. Sketch size is O(n logc n) to enable correctness of `0 sampling with high probability. We need to recurse after picking edges in first iteration and contract to create new contracted graph. But contracted graph depends on sketch and we cannot make another pass! Linearity to the rescue!

Chandra (UIUC) CS498ABD 21 Fall 2020 21 / 1
slide-47
SLIDE 47

Emulation via sketching

Implementing two iterations of the offline algorithm Pick independent sketching matrices M1 and M2 and keep sketches for M1a(i) and M2a(i) for each i as before Let H be contracted graph obtained by using M1 for first iteration Suppose S is a connected component that gets contracted to a node v. By lemma we have sketch for nodes in graph H! M2a(v) = P

i∈S M2a(i). Chandra (UIUC) CS498ABD 22 Fall 2020 22 / 1
slide-48
SLIDE 48

Emulation via sketching

Implementing two iterations of the offline algorithm Pick independent sketching matrices M1 and M2 and keep sketches for M1a(i) and M2a(i) for each i as before Let H be contracted graph obtained by using M1 for first iteration Suppose S is a connected component that gets contracted to a node v. By lemma we have sketch for nodes in graph H! M2a(v) = P

i∈S M2a(i).

Question: Why do we need M2? Can we not use M1 itself?

Chandra (UIUC) CS498ABD 22 Fall 2020 22 / 1
slide-49
SLIDE 49

Emulation via sketching

Implementing the offline algorithm Pick independent sketching matrices M1, M2, . . . , Mt where t = O(log n) and keep sketches for Mja(i) for each node i and for each 1  j  t. Total space is O(n logc n) since t = O(log n) Use Mj, via linearity, for the contracted graph in iteration j to create graph for next iteration.

Chandra (UIUC) CS498ABD 23 Fall 2020 23 / 1
slide-50
SLIDE 50

Emulation via sketching

Implementing the offline algorithm Pick independent sketching matrices M1, M2, . . . , Mt where t = O(log n) and keep sketches for Mja(i) for each node i and for each 1  j  t. Total space is O(n logc n) since t = O(log n) Use Mj, via linearity, for the contracted graph in iteration j to create graph for next iteration. Correctness requires that each iteration has high probability. Use union bound over iterations (since sketches are independent) and in each iteration use union bound over all vertices (using high probability of `0 sampling).

Chandra (UIUC) CS498ABD 23 Fall 2020 23 / 1