Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving
Andrew McGregor
University of Massachusetts
Latest on Linear Sketches for Large Graphs: Lots of Problems, Little - - PowerPoint PPT Presentation
Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of
Andrew McGregor
University of Massachusetts
Andrew McGregor
University of Massachusetts
Vertex Connectivity and Sparsification
Guha, McGregor, Tench [PODS 2015]
Densest Subgraphs
McGregor, Tench, Vorotnikova, Vu [MFCS 2015]
Matching, Vertex Cover, Hitting Set
Chitnis, Cormode, Esfandiari, Hajiaghayi, McGregor, Monemizadeh, Vorotnikova [TBA 2016]
massive graph defined by a long sequence of edge insertions and deletions. Don’t want to have to store the entire graph.
massive graph defined by a long sequence of edge insertions and deletions. Don’t want to have to store the entire graph.
Technique: Linear Sketches. Maintain a random linear projections of vectors and matrices representing the graph.
massive graph defined by a long sequence of edge insertions and deletions. Don’t want to have to store the entire graph.
Technique: Linear Sketches. Maintain a random linear projections of vectors and matrices representing the graph.
spectral sparsification, matching, vertex cover, hitting set, correlation clustering, triangles, spanners, densest subgraph…
massive graph defined by a long sequence of edge insertions and deletions. Don’t want to have to store the entire graph.
Technique: Linear Sketches. Maintain a random linear projections of vectors and matrices representing the graph.
spectral sparsification, matching, vertex cover, hitting set, correlation clustering, triangles, spanners, densest subgraph…
Graph Streaming Survey
McGregor [SIGMOD Record 2014]
M∈ℜpolylog(N) x N such that for any x∈ℜN, a random non-zero element of x can be reconstructed from Mx whp.
Tardos [PODS 2011]
M∈ℜpolylog(N) x N such that for any x∈ℜN, a random non-zero element of x can be reconstructed from Mx whp.
Tardos [PODS 2011]
dynamic graph stream model using O(polylog n) bits of space.
density among sampled edge scaled by m/t. Return maxS ĎS
density among sampled edge scaled by m/t. Return maxS ĎS
density among sampled edge scaled by m/t. Return maxS ĎS
density among sampled edge scaled by m/t. Return maxS ĎS
density among sampled edge scaled by m/t. Return maxS ĎS
What other types of sampling are there that a) are useful for solving graph problems and b) can be supported on dynamic graph streams?
via SNAPE Sampling
via DEALS Sampling
matching in dynamic stream model using Õ(k2) space.
cover, hitting set… but gets a lot more complicated.
matching in dynamic stream model using Õ(k2) space.
cover, hitting set… but gets a lot more complicated.
size Ω(k/t) in the dynamic stream model using Õ(k2/t3) space.
using Õ(n2/t3) space. This is also optimal; see Sanjeev’s talk.
edges remain, return NULL.
edges remain, return NULL.
edges remain, return NULL.
SNAPE samples will include a max matching from G.
is ≥10k and edge is shallow if both endpoints aren’t heavy.
SHALLOW EDGE HEAVY NODE
is ≥10k and edge is shallow if both endpoints aren’t heavy.
SHALLOW EDGE HEAVY NODE
is ≥10k and edge is shallow if both endpoints aren’t heavy.
G’ includes all shallow edges in G. Every heavy node in G has degree at least 5k in G’.
SHALLOW EDGE HEAVY NODE
is ≥10k and edge is shallow if both endpoints aren’t heavy.
G’ includes all shallow edges in G. Every heavy node in G has degree at least 5k in G’.
SHALLOW EDGE HEAVY NODE
is ≥10k and edge is shallow if both endpoints aren’t heavy.
G’ includes all shallow edges in G. Every heavy node in G has degree at least 5k in G’.
still have plenty of other edges on that node.
SHALLOW EDGE HEAVY NODE
is ≥10k and edge is shallow if both endpoints aren’t heavy.
G’ includes all shallow edges in G. Every heavy node in G has degree at least 5k in G’.
still have plenty of other edges on that node.
SHALLOW EDGE HEAVY NODE
u v
SHALLOW EDGE
neighbors of u, v leaves exactly the edge uv if u and v sampled.
u v
SHALLOW EDGE
neighbors of u, v leaves exactly the edge uv if u and v sampled.
u v
SHALLOW EDGE
neighbors of u, v leaves exactly the edge uv if u and v sampled.
u v
SHALLOW EDGE
neighbors of u, v leaves exactly the edge uv if u and v sampled.
u v
SHALLOW EDGE
neighbors of u, v leaves exactly the edge uv if u and v sampled.
Pr[uv is only remaining edge] ≥ p2(1 − p)|Γ(u)|+|Γ(v)|+|W | = Ω(k−2)
u v
SHALLOW EDGE
neighbors of u, v leaves exactly the edge uv if u and v sampled.
Pr[uv is only remaining edge] ≥ p2(1 − p)|Γ(u)|+|Γ(v)|+|W | = Ω(k−2)
u v
SHALLOW EDGE
HEAVY NODE
HEAVY NODE
HEAVY NODE
Pr[edge incident to u is sampled] ≥ p(1 − p)|W | = Ω(k−1)
HEAVY NODE
Pr[edge incident to u is sampled] ≥ p(1 − p)|W | = Ω(k−1)
HEAVY NODE
with p=ϴ(t/k) has matching of size Ω(k/t) with high probability.
with p=ϴ(t/k) has matching of size Ω(k/t) with high probability.
consider constructing greedy matching M.
with p=ϴ(t/k) has matching of size Ω(k/t) with high probability.
consider constructing greedy matching M.
Pr[ei added to M] ≈ Pr[ei isn’t a NULL] · Pr[all endpoints in M are deleted] = Ω(kp2) · (1 − p)o(k/t) = Ω(t2/k)
with p=ϴ(t/k) has matching of size Ω(k/t) with high probability.
consider constructing greedy matching M.
Pr[ei added to M] ≈ Pr[ei isn’t a NULL] · Pr[all endpoints in M are deleted] = Ω(kp2) · (1 − p)o(k/t) = Ω(t2/k)
via SNAPE Sampling
via DEALS Sampling
using Õ(ε-1kn) space.
using Õ(ε-1kn) space.
at end of the stream. May use Õ(n) space.
at end of the stream. May use Õ(n) space.
sketch and ai encodes neighborhood of node i.
at end of the stream. May use Õ(n) space.
sketch and ai encodes neighborhood of node i.
1 2 3 5 4
at end of the stream. May use Õ(n) space.
sketch and ai encodes neighborhood of node i.
1 2 3 5 4
{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}
a1 = 1 1 a1 = ( 1 1 0)
at end of the stream. May use Õ(n) space.
sketch and ai encodes neighborhood of node i.
1 2 3 5 4
{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}
a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0)
at end of the stream. May use Õ(n) space.
sketch and ai encodes neighborhood of node i.
1 2 3 5 4
{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}
a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0)
at end of the stream. May use Õ(n) space.
sketch and ai encodes neighborhood of node i.
1 2 3 5 4
{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}
a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0) a1 + a2 = ( 1 1 0)
at end of the stream. May use Õ(n) space.
sketch and ai encodes neighborhood of node i.
hence, ∑i∈S Mai = M(∑i∈S ai) yields random edge across (S,V\S).
1 2 3 5 4
{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}
a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0) a1 + a2 = ( 1 1 0)
at end of the stream. May use Õ(n) space.
sketch and ai encodes neighborhood of node i.
hence, ∑i∈S Mai = M(∑i∈S ai) yields random edge across (S,V\S).
1 2 3 5 4
{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}
a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0) a1 + a2 = ( 1 1 0)
connected after removal of set of k nodes S” using Õ(kn) space.
connected after removal of set of k nodes S” using Õ(kn) space.
path between u and v amongst sampled edges.
connected after removal of set of k nodes S” using Õ(kn) space.
path between u and v amongst sampled edges.
spanning forest on these nodes. Repeat Õ(k2) times.
connected after removal of set of k nodes S” using Õ(kn) space.
path between u and v amongst sampled edges.
spanning forest on these nodes. Repeat Õ(k2) times.
connected after removal of set of k nodes S” using Õ(kn) space.
path between u and v amongst sampled edges.
spanning forest on these nodes. Repeat Õ(k2) times.
between xi and xi+1 with prob. p2(1-p)k≈k-2. After Õ(k2) repeats we have S-avoiding path in E’ with high probability.
cut sizes where λe is the edge connectivity. Fung et al. [STOC 2011]
cut sizes where λe is the edge connectivity. Fung et al. [STOC 2011]
sample each remaining edge with probably 1/2.
cut sizes where λe is the edge connectivity. Fung et al. [STOC 2011]
sample each remaining edge with probably 1/2.
Graph Streaming Survey
McGregor [SIGMOD Record 2014]
Vertex Connectivity and Sparsification.
Guha, McGregor, Tench [PODS 2015]
Densest Subgraphs.
McGregor, Tench, Vorotnikova, Vu [MFCS 2015]
Matching, Vertex Cover, Hitting Set.
Chitnis, Cormode, Esfandiari, Hajiaghayi, McGregor, Monemizadeh, Vorotnikova [TBA 2016]
Graph Streaming Survey
McGregor [SIGMOD Record 2014]
Vertex Connectivity and Sparsification.
Guha, McGregor, Tench [PODS 2015]
Densest Subgraphs.
McGregor, Tench, Vorotnikova, Vu [MFCS 2015]
Matching, Vertex Cover, Hitting Set.
Chitnis, Cormode, Esfandiari, Hajiaghayi, McGregor, Monemizadeh, Vorotnikova [TBA 2016]