Data Streams & Communication Complexity
Lecture 2: Graph Spanners, Sparsifiers, & Sketches Andrew McGregor, UMass Amherst
1/25
Data Streams & Communication Complexity Lecture 2: Graph - - PowerPoint PPT Presentation
Data Streams & Communication Complexity Lecture 2: Graph Spanners, Sparsifiers, & Sketches Andrew McGregor, UMass Amherst 1/25 Graph Streams Consider a stream of m edges e 1 , e 2 , . . . . . . , e m defining a graph G with
1/25
◮ Consider a stream of m edges
2/25
◮ Consider a stream of m edges
◮ Semi-streaming: What can we compute with O(n · polylog n) space?
2/25
3/25
4/25
◮ Goal: Approximate length of the shortest path dG(u, v) between a
5/25
◮ Goal: Approximate length of the shortest path dG(u, v) between a
5/25
◮ Goal: Compute the number of connected components.
6/25
◮ Goal: Compute the number of connected components. ◮ Algorithm: Maintain a spanning forest F
6/25
◮ Goal: Compute the number of connected components. ◮ Algorithm: Maintain a spanning forest F
◮ F ← ∅ 6/25
◮ Goal: Compute the number of connected components. ◮ Algorithm: Maintain a spanning forest F
◮ F ← ∅ ◮ For each edge (u, v), if u and v aren’t connected in F,
6/25
◮ Goal: Compute the number of connected components. ◮ Algorithm: Maintain a spanning forest F
◮ F ← ∅ ◮ For each edge (u, v), if u and v aren’t connected in F,
◮ Analysis:
6/25
◮ Goal: Compute the number of connected components. ◮ Algorithm: Maintain a spanning forest F
◮ F ← ∅ ◮ For each edge (u, v), if u and v aren’t connected in F,
◮ Analysis:
◮ F has the same number of connected components as G 6/25
◮ Goal: Compute the number of connected components. ◮ Algorithm: Maintain a spanning forest F
◮ F ← ∅ ◮ For each edge (u, v), if u and v aren’t connected in F,
◮ Analysis:
◮ F has the same number of connected components as G ◮ F has at most n − 1 edges. 6/25
◮ Goal: Compute the number of connected components. ◮ Algorithm: Maintain a spanning forest F
◮ F ← ∅ ◮ For each edge (u, v), if u and v aren’t connected in F,
◮ Analysis:
◮ F has the same number of connected components as G ◮ F has at most n − 1 edges.
◮ Thm: Can count connected components in O(n log n) space.
6/25
◮ Algorithm:
7/25
◮ Algorithm:
◮ H ← ∅. 7/25
◮ Algorithm:
◮ H ← ∅. ◮ For each edge (u, v), if dH(u, v) ≥ 2t, H ← H ∪ {(u, v)} 7/25
◮ Algorithm:
◮ H ← ∅. ◮ For each edge (u, v), if dH(u, v) ≥ 2t, H ← H ∪ {(u, v)}
◮ Analysis:
7/25
◮ Algorithm:
◮ H ← ∅. ◮ For each edge (u, v), if dH(u, v) ≥ 2t, H ← H ∪ {(u, v)}
◮ Analysis:
◮ Distances increase by at most a factor 2t − 1 since an edge (u, v) is
7/25
◮ Algorithm:
◮ H ← ∅. ◮ For each edge (u, v), if dH(u, v) ≥ 2t, H ← H ∪ {(u, v)}
◮ Analysis:
◮ Distances increase by at most a factor 2t − 1 since an edge (u, v) is
◮ Lemma: H has O(n1+1/t) edges since all cycles have length ≥ 2t + 1. 7/25
◮ Algorithm:
◮ H ← ∅. ◮ For each edge (u, v), if dH(u, v) ≥ 2t, H ← H ∪ {(u, v)}
◮ Analysis:
◮ Distances increase by at most a factor 2t − 1 since an edge (u, v) is
◮ Lemma: H has O(n1+1/t) edges since all cycles have length ≥ 2t + 1.
7/25
8/25
◮ Let d = 2m/n be average degree of H.
8/25
◮ Let d = 2m/n be average degree of H. ◮ Let J be the graph formed by removing all nodes with degree less
8/25
◮ Let d = 2m/n be average degree of H. ◮ Let J be the graph formed by removing all nodes with degree less
8/25
◮ Let d = 2m/n be average degree of H. ◮ Let J be the graph formed by removing all nodes with degree less
◮ Grow a BFS of depth t from an arbitrary node in J.
8/25
◮ Let d = 2m/n be average degree of H. ◮ Let J be the graph formed by removing all nodes with degree less
◮ Grow a BFS of depth t from an arbitrary node in J. ◮ Because a) no cycles of length less than 2t + 1 and b) all degrees in
8/25
◮ Let d = 2m/n be average degree of H. ◮ Let J be the graph formed by removing all nodes with degree less
◮ Grow a BFS of depth t from an arbitrary node in J. ◮ Because a) no cycles of length less than 2t + 1 and b) all degrees in
◮ But (m/n − 1)t ≤ |J| ≤ n and therefore m ≤ n + n1+1/t.
8/25
9/25
◮ Goal: Approximate capacity CG(S) of any cut (S, V \ S) in G.
10/25
◮ Goal: Approximate capacity CG(S) of any cut (S, V \ S) in G.
10/25
◮ Goal: Approximate capacity CG(S) of any cut (S, V \ S) in G.
10/25
◮ Goal: Approximate capacity CG(S) of any cut (S, V \ S) in G.
◮ Idea: Use A as a black box to recursively sparsify graph stream.
10/25
11/25
11/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges.
12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 ◮ Read in G2: compute A(G2) and forget G2 12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 ◮ Read in G2: compute A(G2) and forget G2 ◮ Compute A(A(G1) ∪ A(G2)) and forget A(G1) and A(G2) 12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 ◮ Read in G2: compute A(G2) and forget G2 ◮ Compute A(A(G1) ∪ A(G2)) and forget A(G1) and A(G2) ◮ Read in G3: compute A(G3) and forget G3 12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 ◮ Read in G2: compute A(G2) and forget G2 ◮ Compute A(A(G1) ∪ A(G2)) and forget A(G1) and A(G2) ◮ Read in G3: compute A(G3) and forget G3 ◮ Read in G4: compute A(G4) and forget G4 12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 ◮ Read in G2: compute A(G2) and forget G2 ◮ Compute A(A(G1) ∪ A(G2)) and forget A(G1) and A(G2) ◮ Read in G3: compute A(G3) and forget G3 ◮ Read in G4: compute A(G4) and forget G4 ◮ Compute A(A(G3) ∪ A(G4)) and forget A(G3) and A(G4) 12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 ◮ Read in G2: compute A(G2) and forget G2 ◮ Compute A(A(G1) ∪ A(G2)) and forget A(G1) and A(G2) ◮ Read in G3: compute A(G3) and forget G3 ◮ Read in G4: compute A(G4) and forget G4 ◮ Compute A(A(G3) ∪ A(G4)) and forget A(G3) and A(G4) ◮ Compute A(A(A(G1) ∪ A(G2)) ∪ A(A(G3) ∪ A(G4))) and forget . . . 12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 ◮ Read in G2: compute A(G2) and forget G2 ◮ Compute A(A(G1) ∪ A(G2)) and forget A(G1) and A(G2) ◮ Read in G3: compute A(G3) and forget G3 ◮ Read in G4: compute A(G4) and forget G4 ◮ Compute A(A(G3) ∪ A(G4)) and forget A(G3) and A(G4) ◮ Compute A(A(A(G1) ∪ A(G2)) ∪ A(A(G3) ∪ A(G4))) and forget . . .
◮ Results in a (1 + γ)log m-sparsifier for G in O(nγ−2 log m) space.
12/25
◮ Divide stream into segments G1, G2, . . . each of t = O(nǫ−2) edges. ◮ Consider binary tree over segments
G1 G2 G3 G4 G5 G6 G7 G8 G1∪G2 G3∪G4 G5∪G6 G7∪G8 G1∪G2∪G3∪G4 G5∪G6∪G7∪G8 G=G1∪G2∪G3∪G4∪G5∪G6∪G7∪G8
◮ Recursively use A with parameter 1 + γ:
◮ Read in G1: compute A(G1) and forget G1 ◮ Read in G2: compute A(G2) and forget G2 ◮ Compute A(A(G1) ∪ A(G2)) and forget A(G1) and A(G2) ◮ Read in G3: compute A(G3) and forget G3 ◮ Read in G4: compute A(G4) and forget G4 ◮ Compute A(A(G3) ∪ A(G4)) and forget A(G3) and A(G4) ◮ Compute A(A(A(G1) ∪ A(G2)) ∪ A(A(G3) ∪ A(G4))) and forget . . .
◮ Results in a (1 + γ)log m-sparsifier for G in O(nγ−2 log m) space. ◮ If γ = O(ǫ/ log m), we get (1 + ǫ)-sparsifier in O(nǫ−2 log3 m) space.
12/25
13/25
◮ Consider a stream of edges inserts and deletions, e.g.,
1 2 3 5 4 14/25
◮ Consider a stream of edges inserts and deletions, e.g.,
1 2 3 5 4
◮ Dynamic semi-streaming: What can we compute about a dynamic
14/25
15/25
◮ Goal: Test whether G is connected.
16/25
◮ Goal: Test whether G is connected. ◮ Our algorithm will actually return a spanning forest of G.
16/25
◮ Goal: Test whether G is connected. ◮ Our algorithm will actually return a spanning forest of G.
16/25
◮ Goal: Test whether G is connected. ◮ Our algorithm will actually return a spanning forest of G.
◮ Idea: Emulate above algorithm in a single pass using ℓ0-sampling of
16/25
◮ Represent graph on [n] with edges E ⊂ [n] × [n], as matrix
n 2)
17/25
◮ Represent graph on [n] with edges E ⊂ [n] × [n], as matrix
n 2)
1 2 3 5 4
(1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (4,5)
17/25
◮ Represent graph on [n] with edges E ⊂ [n] × [n], as matrix
n 2)
1 2 3 5 4
(1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (4,5)
◮ Lemma: For S ⊂ [n], support( i∈S ai) = E(S) where ai is ith row
17/25
◮ Let A(a1), A(a2), . . . , A(an) be sketches for ℓ0 sampling. Can
18/25
◮ Let A(a1), A(a2), . . . , A(an) be sketches for ℓ0 sampling. Can
◮ Suppose we found edges that connected, e.g., S = {a1, a2, a3}. How
18/25
◮ Let A(a1), A(a2), . . . , A(an) be sketches for ℓ0 sampling. Can
◮ Suppose we found edges that connected, e.g., S = {a1, a2, a3}. How
◮ Linearity: Because of linearity we can just add sketches,
18/25
◮ Let A(a1), A(a2), . . . , A(an) be sketches for ℓ0 sampling. Can
◮ Suppose we found edges that connected, e.g., S = {a1, a2, a3}. How
◮ Linearity: Because of linearity we can just add sketches,
◮ Under-the-rug: Actually we need to use log n independent sketch
18/25
19/25
◮ Goal: Test whether all cuts of G have size at least k.
20/25
◮ Goal: Test whether all cuts of G have size at least k. ◮ Our algorithm actually returns a certificate of k-connectivity.
20/25
◮ Goal: Test whether all cuts of G have size at least k. ◮ Our algorithm actually returns a certificate of k-connectivity.
20/25
◮ Goal: Test whether all cuts of G have size at least k. ◮ Our algorithm actually returns a certificate of k-connectivity.
20/25
◮ Goal: Test whether all cuts of G have size at least k. ◮ Our algorithm actually returns a certificate of k-connectivity.
◮ Idea: Emulate above algorithm in a single pass by exploiting linearity
20/25
◮ Can find F1 using the connectivity algorithm.
21/25
◮ Can find F1 using the connectivity algorithm. ◮ But how can we find F2 without taking another pass over the data?
21/25
◮ Can find F1 using the connectivity algorithm. ◮ But how can we find F2 without taking another pass over the data? ◮ Linearity: Suppose we have independent connectivity sketches
21/25
◮ Can find F1 using the connectivity algorithm. ◮ But how can we find F2 without taking another pass over the data? ◮ Linearity: Suppose we have independent connectivity sketches
21/25
◮ Can find F1 using the connectivity algorithm. ◮ But how can we find F2 without taking another pass over the data? ◮ Linearity: Suppose we have independent connectivity sketches
21/25
◮ Can find F1 using the connectivity algorithm. ◮ But how can we find F2 without taking another pass over the data? ◮ Linearity: Suppose we have independent connectivity sketches
21/25
◮ Can find F1 using the connectivity algorithm. ◮ But how can we find F2 without taking another pass over the data? ◮ Linearity: Suppose we have independent connectivity sketches
◮ Given A(G), B(G), C(G) we would find F1 and F2 as above. We
21/25
◮ Can find F1 using the connectivity algorithm. ◮ But how can we find F2 without taking another pass over the data? ◮ Linearity: Suppose we have independent connectivity sketches
◮ Given A(G), B(G), C(G) we would find F1 and F2 as above. We
◮ And so on. . . resulting algorithm, connectivityk, requires one
21/25
22/25
◮ Goal: Estimate the size of the min-cut up to a (1 + ǫ) factor.
23/25
◮ Goal: Estimate the size of the min-cut up to a (1 + ǫ) factor. ◮ If min-cut size is O(ǫ−2 · polylog n) then connectivityk algorithm
23/25
◮ Goal: Estimate the size of the min-cut up to a (1 + ǫ) factor. ◮ If min-cut size is O(ǫ−2 · polylog n) then connectivityk algorithm
◮ What can be done if min-cut is large?
23/25
◮ Goal: Estimate the size of the min-cut up to a (1 + ǫ) factor. ◮ If min-cut size is O(ǫ−2 · polylog n) then connectivityk algorithm
◮ What can be done if min-cut is large?
23/25
◮ Goal: Estimate the size of the min-cut up to a (1 + ǫ) factor. ◮ If min-cut size is O(ǫ−2 · polylog n) then connectivityk algorithm
◮ What can be done if min-cut is large?
◮ Idea: Subsample the input graph at different rates and use
23/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1}
24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1} ◮ Let Hi = connectivityk(Gi) where k := 24ǫ−2 log n
24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1} ◮ Let Hi = connectivityk(Gi) where k := 24ǫ−2 log n ◮ Post-Processing: Let µi be min-cut size of Hi. Return
24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1} ◮ Let Hi = connectivityk(Gi) where k := 24ǫ−2 log n ◮ Post-Processing: Let µi be min-cut size of Hi. Return
◮ Analysis:
◮ Let λi be the size of min-cut of Gi 24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1} ◮ Let Hi = connectivityk(Gi) where k := 24ǫ−2 log n ◮ Post-Processing: Let µi be min-cut size of Hi. Return
◮ Analysis:
◮ Let λi be the size of min-cut of Gi ◮ Karger’s result implies 2iλi = (1 ± ǫ)λ for all i = 0, 1, . . . , ⌊lg 1/p∗⌋. 24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1} ◮ Let Hi = connectivityk(Gi) where k := 24ǫ−2 log n ◮ Post-Processing: Let µi be min-cut size of Hi. Return
◮ Analysis:
◮ Let λi be the size of min-cut of Gi ◮ Karger’s result implies 2iλi = (1 ± ǫ)λ for all i = 0, 1, . . . , ⌊lg 1/p∗⌋. ◮ If λi < k, connectivityk algorithm guarantees λi = µi. 24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1} ◮ Let Hi = connectivityk(Gi) where k := 24ǫ−2 log n ◮ Post-Processing: Let µi be min-cut size of Hi. Return
◮ Analysis:
◮ Let λi be the size of min-cut of Gi ◮ Karger’s result implies 2iλi = (1 ± ǫ)λ for all i = 0, 1, . . . , ⌊lg 1/p∗⌋. ◮ If λi < k, connectivityk algorithm guarantees λi = µi. ◮ Lemma: j ≤ ⌊lg 1/p∗⌋ 24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1} ◮ Let Hi = connectivityk(Gi) where k := 24ǫ−2 log n ◮ Post-Processing: Let µi be min-cut size of Hi. Return
◮ Analysis:
◮ Let λi be the size of min-cut of Gi ◮ Karger’s result implies 2iλi = (1 ± ǫ)λ for all i = 0, 1, . . . , ⌊lg 1/p∗⌋. ◮ If λi < k, connectivityk algorithm guarantees λi = µi. ◮ Lemma: j ≤ ⌊lg 1/p∗⌋
◮ Total space is O(k · n · polylog n) = O(ǫ−2 · n · polylog n).
24/25
◮ Let hi be a hash function such that for each e ∈ [n] × [n]
◮ Let Gi = (V , Ei) where Ei = {e ∈ E : hi(e) = 1} ◮ Let Hi = connectivityk(Gi) where k := 24ǫ−2 log n ◮ Post-Processing: Let µi be min-cut size of Hi. Return
◮ Analysis:
◮ Let λi be the size of min-cut of Gi ◮ Karger’s result implies 2iλi = (1 ± ǫ)λ for all i = 0, 1, . . . , ⌊lg 1/p∗⌋. ◮ If λi < k, connectivityk algorithm guarantees λi = µi. ◮ Lemma: j ≤ ⌊lg 1/p∗⌋
◮ Total space is O(k · n · polylog n) = O(ǫ−2 · n · polylog n). ◮ Can extend these ideas to get (1 + ǫ)-sparsification of a dynamic
24/25
◮ Consider i = ⌊lg 1/p∗⌋ and so sampling probability for Gi is
25/25
◮ Consider i = ⌊lg 1/p∗⌋ and so sampling probability for Gi is
◮ Consider a cut in G of size λ. Expected number of edges across
ǫ2
25/25