Latest on Linear Sketches for Large Graphs: Lots of Problems, Little - - PowerPoint PPT Presentation

latest on linear sketches for large graphs lots of
SMART_READER_LITE
LIVE PREVIEW

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little - - PowerPoint PPT Presentation

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving


slide-1
SLIDE 1

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving

Andrew McGregor

University of Massachusetts

slide-2
SLIDE 2

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving

Andrew McGregor

University of Massachusetts

Vertex Connectivity and Sparsification

Guha, McGregor, Tench [PODS 15]

Densest Subgraphs

McGregor, Tench, Vorotnikova, Vu [MFCS 15]

Matching, Vertex Cover, Hitting Set

Chitnis, Cormode, Esfandiari, Hajiaghayi, McGregor, Monemizadeh, Vorotnikova [SODA 16]

slide-3
SLIDE 3
  • Motivation Dynamic Graph Streams. Want to analyze a

massive graph defined by a long sequence of edge insertions and deletions. Don’t want to have to store the entire graph.

Background

slide-4
SLIDE 4
  • Motivation Dynamic Graph Streams. Want to analyze a

massive graph defined by a long sequence of edge insertions and deletions. Don’t want to have to store the entire graph.

  • Main

Technique Linear Sketches. Maintain random linear projections of vectors and matrices representing the graph.

Background

slide-5
SLIDE 5
  • Motivation Dynamic Graph Streams. Want to analyze a

massive graph defined by a long sequence of edge insertions and deletions. Don’t want to have to store the entire graph.

  • Main

Technique Linear Sketches. Maintain random linear projections of vectors and matrices representing the graph.

  • What’s Known Lots and lots! Edge and vertex connectivity,

spectral sparsification, matching, vertex cover, hitting set, correlation clustering, triangles, spanners, densest subgraph…

Background

Graph Stream Algorithms: A Survey A n d r e w M c G r e g
  • r∗
U n i v e r s i t y
  • f
M a s s a c h u s e t t s m c g r e g
  • r
@ c s . u m a s s . e d u A B S T R A C T O v e r t h e l a s t d e c a d e , t h e r e h a s b e e n c
  • n
s i d e r a b l e i n
  • t
e r e s t i n d e s i g n i n g a l g
  • r
i t h m s f
  • r
p r
  • c
e s s i n g m a s s i v e g r a p h s i n t h e d a t a s t r e a m m
  • d
e l . T h e
  • r
i g i n a l m
  • t
i
  • v
a t i
  • n
w a s t w
  • f
  • l
d : a ) i n m a n y a p p l i c a t i
  • n
s , t h e d y
  • n
a m i c g r a p h s t h a t a r i s e a r e t
  • l
a r g e t
  • b
e s t
  • r
e d i n t h e m a i n m e m
  • r
y
  • f
a s i n g l e m a c h i n e a n d b ) c
  • n
s i d e r i n g g r a p h p r
  • b
l e m s y i e l d s n e w i n s i g h t s i n t
  • t
h e c
  • m
p l e x i t y
  • f
s t r e a m c
  • m
p u t a t i
  • n
. H
  • w
e v e r , t h e t e c h n i q u e s d e v e l
  • p
e d i n t h i s a r e a a r e n
  • w
fi n d i n g a p p l i c a t i
  • n
s i n
  • t
h e r a r e a s i n c l u d i n g d a t a s t r u c t u r e s f
  • r
d y n a m i c g r a p h s , a p
  • p
r
  • x
i m a t i
  • n
a l g
  • r
i t h m s , a n d d i s t r i b u t e d a n d p a r a l l e l c
  • m
  • p
u t a t i
  • n
. W e s u r v e y t h e s t a t e
  • f
  • t
h e
  • a
r t r e s u l t s ; i d e n
  • t
i f y g e n e r a l t e c h n i q u e s ; a n d h i g h l i g h t s
  • m
e s i m p l e a l
  • g
  • r
i t h m s t h a t i l l u s t r a t e b a s i c i d e a s . 1 . I N T R O D U C T I O N M a s s i v e g r a p h s a r i s e i n a n y a p p l i c a t i
  • n
w h e r e t h e r e i s d a t a a b
  • u
t b
  • t
h b a s i c e n t i t i e s a n d t h e r e l a t i
  • n
s h i p s b e t w e e n t h e s e e n t i t i e s , e . g . , w e b
  • p
a g e s a n d h y p e r l i n k s ; n e u r
  • n
s a n d s y n a p s e s ; p a p e r s a n d c i t a t i
  • n
s ; I P a d d r e s s e s a n d n e t w
  • r
k fl
  • w
s ; p e
  • p
l e a n d t h e i r f r i e n d s h i p s . G r a p h s h a v e a l s
  • b
e c
  • m
e t h e d e f a c t
  • s
t a n d a r d f
  • r
r e p r e s e n t i n g m a n y t y p e s
  • f
h i g h l y
  • s
t r u c t u r e d d a t a . H
  • w
e v e r , a n a l y z
  • i
n g t h e s e g r a p h s v i a c l a s s i c a l a l g
  • r
i t h m s c a n b e c h a l
  • l
e n g i n g g i v e n t h e s h e e r s i z e
  • f
t h e g r a p h s . F
  • r
e x a m
  • p
l e , b
  • t
h t h e w e b g r a p h a n d m
  • d
e l s
  • f
t h e h u m a n b r a i n w
  • u
l d u s e a r
  • u
n d 1 010 n
  • d
e s a n d I P v 6 s u p p
  • r
t s 2128 p
  • s
s i b l e a d d r e s s e s . O n e a p p r
  • a
c h t
  • h
a n d l i n g s u c h g r a p h s i s t
  • p
r
  • c
e s s t h e m i n t h e d a t a s t r e a m m
  • d
e l w h e r e t h e i n p u t i s d e
  • fi
n e d b y a s t r e a m
  • f
d a t a . F
  • r
e x a m p l e , t h e s t r e a m c
  • u
l d c
  • n
s i s t
  • f
t h e e d g e s
  • f
t h e g r a p h . A l g
  • r
i t h m s i n t h i s m
  • d
e l m u s t p r
  • c
e s s t h e i n p u t s t r e a m i n t h e
  • r
d e r i t a r
  • r
i v e s w h i l e u s i n g
  • n
l y a l i m i t e d a m
  • u
n t m e m
  • r
y . T h e s e c
  • n
s t r a i n t s c a p t u r e v a r i
  • u
s c h a l l e n g e s t h a t a r i s e w h e n p r
  • c
e s s i n g m a s s i v e d a t a s e t s , e . g . , m
  • n
i t
  • r
i n g n e t w
  • r
k t r a f fi c i n r e a l t i m e
  • r
e n s u r i n g I / O e f fi c i e n c y w h e n p r
  • c
e s s i n g d a t a t h a t d
  • e
s n
  • t
fi t i n m a i n m e m
  • r
y . R e l a t e d Supported in part by NSF awards CCF-0953754 and CCF- 1320719 and a Google Research Award. q u e s t i
  • n
s t h a t a r i s e i n c l u d e h
  • w
t
  • t
r a d e
  • f
f s i z e a n d a c
  • c
u r a c y w h e n c
  • n
s t r u c t i n g d a t a s u m m a r i e s a n d h
  • w
t
  • q
u i c k l y u p d a t e t h e s e s u m m a r i e s . T e c h n i q u e s t h a t h a v e b e e n d e v e l
  • p
e d t
  • t
h e r e d u c e t h e s p a c e u s e h a v e a l s
  • b
e e n u s e f u l i n r e d u c i n g c
  • m
m u n i c a t i
  • n
i n d i s t r i b u t e d s y s t e m s . T h e m
  • d
e l a l s
  • h
a s d e e p c
  • n
n e c t i
  • n
s w i t h a v a r i e t y
  • f
a r e a s i n t h e
  • r
e t i c a l c
  • m
p u t e r s c i e n c e i n c l u d
  • i
n g c
  • m
m u n i c a t i
  • n
c
  • m
p l e x i t y , m e t r i c e m b e d d i n g s , c
  • m
  • p
r e s s e d s e n s i n g , a n d a p p r
  • x
i m a t i
  • n
a l g
  • r
i t h m s . T h e d a t a s t r e a m m
  • d
e l h a s b e c
  • m
e i n c r e a s i n g l y p
  • p
  • u
l a r
  • v
e r t h e l a s t t w e n t y y e a r s a l t h
  • u
g h t h e f
  • c
u s
  • f
m u c h
  • f
t h e e a r l y w
  • r
k w a s
  • n
p r
  • c
e s s i n g n u m e r i c a l d a t a s u c h a s e s t i m a t i n g q u a n t i l e s , h e a v y h i t t e r s ,
  • r
t h e n u m b e r
  • f
d i s t i n c t e l e m e n t s i n t h e s t r e a m . T h e e a r l i
  • e
s t w
  • r
k t
  • e
x p l i c i t l y c
  • n
s i d e r g r a p h p r
  • b
l e m s w a s t h e i n fl u e n t i a l b y p a p e r b y H e n z i n g e r e t a l . [ 3 6 ] w h i c h c
  • n
  • s
i d e r e d p r
  • b
l e m s r e l a t e d t
  • f
  • l
l
  • w
i n g p a t h s i n d i r e c t e d g r a p h s a n d c
  • n
n e c t i v i t y . M
  • s
t
  • f
t h e w
  • r
k
  • n
g r a p h s t r e a m s h a s
  • c
c u r r e d i n t h e l a s t d e c a d e a n d f
  • c
u s e s
  • n
t h e s e m i
  • s
t r e a m i n g m
  • d
e l [ 2 7 , 5 2 ] . I n t h i s m
  • d
e l t h e d a t a s t r e a m a l g
  • r
i t h m i s p e r m i t t e d O ( n p
  • l
y l
  • g
n ) s p a c e w h e r e n i s t h e n u m b e r
  • f
n
  • d
e s i n t h e g r a p h . T h i s i s b e c a u s e m
  • s
t p r
  • b
l e m s a r e p r
  • v
a b l y i n t r a c t a b l e i f t h e a v a i l a b l e s p a c e i s s u b
  • l
i n e a r i n n , w h e r e a s m a n y p r
  • b
  • l
e m s b e c
  • m
e f e a s i b l e
  • n
c e t h e r e i s m e m
  • r
y r
  • u
g h l y p r
  • p
  • r
t i
  • n
a l t
  • t
h e n u m b e r
  • f
n
  • d
e s i n t h e g r a p h . I n t h i s d
  • c
u m e n t w e w i l l s u r v e y t h e r e s u l t s k n
  • w
n f
  • r
p r
  • c
e s s i n g g r a p h s t r e a m s . I n d
  • i
n g s
  • t
h e r e a r e n u
  • m
e r
  • u
s g
  • a
l s i n c l u d i n g i d e n t i f y i n g t h e s t a t e
  • f
  • t
h e
  • a
r t r e s u l t s f
  • r
a v a r i e t y
  • f
p
  • p
u l a r p r
  • b
l e m s a n d i d e n t i f y
  • i
n g g e n e r a l a l g
  • r
i t h m i c t e c h n i q u e s . I t w i l l a l s
  • b
e n a t
  • u
r a l t
  • d
i s c u s s s
  • m
e i m p
  • r
t a n t s u m m a r y d a t a s t r u c t u r e s f
  • r
g r a p h s , s u c h a s s p a n n e r s a n d s p a r s i fi e r s . T h r
  • u
g h
  • u
t , w e w i l l p r e s e n t v a r i
  • u
s s i m p l e a l g
  • r
i t h m s , s
  • m
e
  • f
w h i c h m a y n
  • t
b e
  • p
t i m a l , t h a t i l l u s t r a t e b a s i c i d e a s a n d w
  • u
l d b e s u i t a b l e f
  • r
t e a c h i n g i n a n u n d e r g r a d u a t e
  • r
g r a d u a t e c l a s s r
  • m
s e t t i n g . N
  • t
a t i
  • n
. T h r
  • u
g h
  • u
t t h i s d
  • c
u m e n t w e w i l l u s e n a n d m t
  • d
e n
  • t
e t h e n u m b e r
  • f
n
  • d
e s a n d e d g e s i n t h e g r a p h u n d e r c
  • n
s i d e r a t i
  • n
. F
  • r
a n y n a t u r a l n u m b e r k , w e u s e [ k ] t
  • d
e n
  • t
e t h e s e t { 1 , 2 , . . . , k } . W e w r i t e a = b ± c

Graph Streaming Survey

McGregor [SIGMOD Record 14]

slide-6
SLIDE 6

Preliminary

  • L0 Sampling Primitive There’s a distribution over matrices

M∈ℝpolylog(N) x N such that for any x∈ℝN, a random non-zero element of x can be reconstructed from Mx whp.

  • Jowhari, Saglam,

Tardos [PODS 11]

slide-7
SLIDE 7

Preliminary

  • L0 Sampling Primitive There’s a distribution over matrices

M∈ℝpolylog(N) x N such that for any x∈ℝN, a random non-zero element of x can be reconstructed from Mx whp.

  • Jowhari, Saglam,

Tardos [PODS 11]

  • Corollary Can uniformly sample an edge in the dynamic graph

stream model using O(polylog n) bits of space.

slide-8
SLIDE 8
  • Density of node set S is DS =|ES|/|S|. Estimate D*=maxS DS.

Densest Subgraph

slide-9
SLIDE 9
  • Density of node set S is DS =|ES|/|S|. Estimate D*=maxS DS.
  • Previous Result 2+ε approximation using Õ(ε-2 n) space.
  • Bhattycharya et al. [STOC 15], Bahmani et al. [PVLDB 12]

Densest Subgraph

slide-10
SLIDE 10
  • Density of node set S is DS =|ES|/|S|. Estimate D*=maxS DS.
  • Previous Result 2+ε approximation using Õ(ε-2 n) space.
  • Bhattycharya et al. [STOC 15], Bahmani et al. [PVLDB 12]
  • Our Result One pass 1+ε approximation using Õ(ε-2 n) space:

Densest Subgraph

slide-11
SLIDE 11
  • Density of node set S is DS =|ES|/|S|. Estimate D*=maxS DS.
  • Previous Result 2+ε approximation using Õ(ε-2 n) space.
  • Bhattycharya et al. [STOC 15], Bahmani et al. [PVLDB 12]
  • Our Result One pass 1+ε approximation using Õ(ε-2 n) space:
  • Uniformly sample of Õ(ε-2 n) edges. Let ĎS be estimate of DS

based on sampled edges. Return maxS ĎS.

Densest Subgraph

slide-12
SLIDE 12
  • Density of node set S is DS =|ES|/|S|. Estimate D*=maxS DS.
  • Previous Result 2+ε approximation using Õ(ε-2 n) space.
  • Bhattycharya et al. [STOC 15], Bahmani et al. [PVLDB 12]
  • Our Result One pass 1+ε approximation using Õ(ε-2 n) space:
  • Uniformly sample of Õ(ε-2 n) edges. Let ĎS be estimate of DS

based on sampled edges. Return maxS ĎS.

  • Analysis For any subset S of size k, with probability 1-n-2k,

Densest Subgraph

slide-13
SLIDE 13
  • Density of node set S is DS =|ES|/|S|. Estimate D*=maxS DS.
  • Previous Result 2+ε approximation using Õ(ε-2 n) space.
  • Bhattycharya et al. [STOC 15], Bahmani et al. [PVLDB 12]
  • Our Result One pass 1+ε approximation using Õ(ε-2 n) space:
  • Uniformly sample of Õ(ε-2 n) edges. Let ĎS be estimate of DS

based on sampled edges. Return maxS ĎS.

  • Analysis For any subset S of size k, with probability 1-n-2k,
  • ĎS ≈ε DS if DS ≈ D* and ĎS ≪ D* if DS ≪ D*

Densest Subgraph

slide-14
SLIDE 14
  • Density of node set S is DS =|ES|/|S|. Estimate D*=maxS DS.
  • Previous Result 2+ε approximation using Õ(ε-2 n) space.
  • Bhattycharya et al. [STOC 15], Bahmani et al. [PVLDB 12]
  • Our Result One pass 1+ε approximation using Õ(ε-2 n) space:
  • Uniformly sample of Õ(ε-2 n) edges. Let ĎS be estimate of DS

based on sampled edges. Return maxS ĎS.

  • Analysis For any subset S of size k, with probability 1-n-2k,
  • ĎS ≈ε DS if DS ≈ D* and ĎS ≪ D* if DS ≪ D*
  • Use union bound over O(nk) subsets of size k for each k.

Densest Subgraph

slide-15
SLIDE 15
  • Density of node set S is DS =|ES|/|S|. Estimate D*=maxS DS.
  • Previous Result 2+ε approximation using Õ(ε-2 n) space.
  • Bhattycharya et al. [STOC 15], Bahmani et al. [PVLDB 12]
  • Our Result One pass 1+ε approximation using Õ(ε-2 n) space:
  • Uniformly sample of Õ(ε-2 n) edges. Let ĎS be estimate of DS

based on sampled edges. Return maxS ĎS.

  • Analysis For any subset S of size k, with probability 1-n-2k,
  • ĎS ≈ε DS if DS ≈ D* and ĎS ≪ D* if DS ≪ D*
  • Use union bound over O(nk) subsets of size k for each k.
  • see also Mitzenmacher et al. [KDD 15], Esfandiari et al. [ArXiv 15]

Densest Subgraph

slide-16
SLIDE 16

What other types of sampling are there that a) are useful for solving graph problems and b) can be supported on dynamic graph streams?

slide-17
SLIDE 17
  • 1. Graph Matching

via SNAPE Sampling

  • II. Graph Connectivity

via DEALS Sampling

slide-18
SLIDE 18
  • 1st Result If max matching has size ≤k, can solve exactly exact

max matching in dynamic stream model using Õ(k2) space.

  • Optimal & Simple. Extends to hypergraph matching, vertex

cover, hitting set… but gets a lot more complicated.

  • Basic Idea: “SNAPE” sampling primitive.

Graph Matchings

slide-19
SLIDE 19
  • 1st Result If max matching has size ≤k, can solve exactly exact

max matching in dynamic stream model using Õ(k2) space.

  • Optimal & Simple. Extends to hypergraph matching, vertex

cover, hitting set… but gets a lot more complicated.

  • Basic Idea: “SNAPE” sampling primitive.
  • 2nd Result If max matching has size ≥k, can find matching of

size Ω(k/t) in the dynamic stream model using Õ(k2/t3) space.

  • Application: Guessing k gives O(t)-approx for max matching

using Õ(n2/t3) space. This is also optimal; ask Grigory!

Graph Matchings

slide-20
SLIDE 20

SNAPE Sampling

Sample-Nodes-And-Pick-Edge

slide-21
SLIDE 21
  • Sample each node with probability ϴ(1/k) and delete the rest

SNAPE Sampling

Sample-Nodes-And-Pick-Edge

slide-22
SLIDE 22
  • Sample each node with probability ϴ(1/k) and delete the rest

SNAPE Sampling

Sample-Nodes-And-Pick-Edge

slide-23
SLIDE 23
  • Sample each node with probability ϴ(1/k) and delete the rest

SNAPE Sampling

Sample-Nodes-And-Pick-Edge

slide-24
SLIDE 24
  • Sample each node with probability ϴ(1/k) and delete the rest
  • Return a random edge amongst those that remain. If no edges

remain, return “null”

SNAPE Sampling

Sample-Nodes-And-Pick-Edge

slide-25
SLIDE 25
  • Sample each node with probability ϴ(1/k) and delete the rest
  • Return a random edge amongst those that remain. If no edges

remain, return “null”

SNAPE Sampling

Sample-Nodes-And-Pick-Edge

slide-26
SLIDE 26
  • Sample each node with probability ϴ(1/k) and delete the rest
  • Return a random edge amongst those that remain. If no edges

remain, return “null”

  • Theorem If G has max matching size ≤k then O(k2 log k)

SNAPE samples will include a max matching from G.

SNAPE Sampling

Sample-Nodes-And-Pick-Edge

slide-27
SLIDE 27
  • Let G have max matching of size ≤k. Say node is heavy if degree

is ≥10k and edge is shallow if both endpoints aren’t heavy.

Small Matching Analysis: Basic Idea

SHALLOW EDGE HEAVY NODE

slide-28
SLIDE 28
  • Let G have max matching of size ≤k. Say node is heavy if degree

is ≥10k and edge is shallow if both endpoints aren’t heavy.

Small Matching Analysis: Basic Idea

SHALLOW EDGE HEAVY NODE

slide-29
SLIDE 29
  • Let G have max matching of size ≤k. Say node is heavy if degree

is ≥10k and edge is shallow if both endpoints aren’t heavy.

  • Lemma G’ includes a max matching of G if:

G’ includes all shallow edges in G. Every heavy node in G has degree at least 5k in G’.

Small Matching Analysis: Basic Idea

SHALLOW EDGE HEAVY NODE

slide-30
SLIDE 30
  • Let G have max matching of size ≤k. Say node is heavy if degree

is ≥10k and edge is shallow if both endpoints aren’t heavy.

  • Lemma G’ includes a max matching of G if:

G’ includes all shallow edges in G. Every heavy node in G has degree at least 5k in G’.

Small Matching Analysis: Basic Idea

SHALLOW EDGE HEAVY NODE

slide-31
SLIDE 31
  • Let G have max matching of size ≤k. Say node is heavy if degree

is ≥10k and edge is shallow if both endpoints aren’t heavy.

  • Lemma G’ includes a max matching of G if:

G’ includes all shallow edges in G. Every heavy node in G has degree at least 5k in G’.

  • Proof Idea Each missing edge is incident to some heavy node but

you still have plenty of other edges on that node.

Small Matching Analysis: Basic Idea

SHALLOW EDGE HEAVY NODE

slide-32
SLIDE 32
  • Let G have max matching of size ≤k. Say node is heavy if degree

is ≥10k and edge is shallow if both endpoints aren’t heavy.

  • Lemma G’ includes a max matching of G if:

G’ includes all shallow edges in G. Every heavy node in G has degree at least 5k in G’.

  • Proof Idea Each missing edge is incident to some heavy node but

you still have plenty of other edges on that node.

  • Useful Fact G has a vertex cover W of size at most 2k.

Small Matching Analysis: Basic Idea

SHALLOW EDGE HEAVY NODE

slide-33
SLIDE 33

Small Matching Analysis: Shallow Edges

u v

SHALLOW EDGE

slide-34
SLIDE 34
  • If u and v are sampled but no other nodes in vertex cover W or

Γ(u) or Γ(v) are sampled then uv is only edge remaining!

Small Matching Analysis: Shallow Edges

u v

SHALLOW EDGE

slide-35
SLIDE 35
  • If u and v are sampled but no other nodes in vertex cover W or

Γ(u) or Γ(v) are sampled then uv is only edge remaining!

Small Matching Analysis: Shallow Edges

u v

SHALLOW EDGE

slide-36
SLIDE 36
  • If u and v are sampled but no other nodes in vertex cover W or

Γ(u) or Γ(v) are sampled then uv is only edge remaining!

Small Matching Analysis: Shallow Edges

u v

SHALLOW EDGE

slide-37
SLIDE 37
  • If u and v are sampled but no other nodes in vertex cover W or

Γ(u) or Γ(v) are sampled then uv is only edge remaining!

Small Matching Analysis: Shallow Edges

u v

SHALLOW EDGE

slide-38
SLIDE 38
  • If u and v are sampled but no other nodes in vertex cover W or

Γ(u) or Γ(v) are sampled then uv is only edge remaining!

  • Hence, if uv is shallow:

Small Matching Analysis: Shallow Edges

Pr[uv is only remaining edge] ≥ p2(1 − p)|Γ(u)|+|Γ(v)|+|W | = Ω(k−2)

u v

SHALLOW EDGE

slide-39
SLIDE 39
  • If u and v are sampled but no other nodes in vertex cover W or

Γ(u) or Γ(v) are sampled then uv is only edge remaining!

  • Hence, if uv is shallow:
  • After O(k2 log k) repetitions, have sampled edge uv whp.

Small Matching Analysis: Shallow Edges

Pr[uv is only remaining edge] ≥ p2(1 − p)|Γ(u)|+|Γ(v)|+|W | = Ω(k−2)

u v

SHALLOW EDGE

slide-40
SLIDE 40

Small Matching Analysis: Edges on Heavy Nodes

HEAVY NODE

slide-41
SLIDE 41
  • For heavy u, deleting W\{u} leaves star on u with ≥ 8k leaves.

Small Matching Analysis: Edges on Heavy Nodes

HEAVY NODE

slide-42
SLIDE 42
  • For heavy u, deleting W\{u} leaves star on u with ≥ 8k leaves.

Small Matching Analysis: Edges on Heavy Nodes

HEAVY NODE

slide-43
SLIDE 43
  • For heavy u, deleting W\{u} leaves star on u with ≥ 8k leaves.
  • Hence,

Small Matching Analysis: Edges on Heavy Nodes

HEAVY NODE

Pr[edge incident to u is sampled] ≈ Ω(kp2) · (1 − p)|W | = Ω(k−1)

slide-44
SLIDE 44
  • For heavy u, deleting W\{u} leaves star on u with ≥ 8k leaves.
  • Hence,
  • After O(k2 log k) repetitions, have sampled 5k edges on u.

Small Matching Analysis: Edges on Heavy Nodes

HEAVY NODE

Pr[edge incident to u is sampled] ≈ Ω(kp2) · (1 − p)|W | = Ω(k−1)

slide-45
SLIDE 45
  • Theorem If G has matching ≥k then O(k2/t3) SNAPE samples

with p=ϴ(t/k) has matching of size Ω(k/t) with high probability.

Approximate Matching: Basic Idea

slide-46
SLIDE 46
  • Theorem If G has matching ≥k then O(k2/t3) SNAPE samples

with p=ϴ(t/k) has matching of size Ω(k/t) with high probability.

  • Proof
  • Let e1, e2, e3,… be sequence of SNAPE samples and consider

constructing greedy matching M.

Approximate Matching: Basic Idea

slide-47
SLIDE 47
  • Theorem If G has matching ≥k then O(k2/t3) SNAPE samples

with p=ϴ(t/k) has matching of size Ω(k/t) with high probability.

  • Proof
  • Let e1, e2, e3,… be sequence of SNAPE samples and consider

constructing greedy matching M.

  • Assuming |M|=o(k/t) then

Approximate Matching: Basic Idea

Pr[ei added to M] ≈ Pr[ei isn’t a NULL] · Pr[all endpoints in M are deleted] = Ω(kp2) · (1 − p)o(k/t) = Ω(t2/k)

slide-48
SLIDE 48
  • Theorem If G has matching ≥k then O(k2/t3) SNAPE samples

with p=ϴ(t/k) has matching of size Ω(k/t) with high probability.

  • Proof
  • Let e1, e2, e3,… be sequence of SNAPE samples and consider

constructing greedy matching M.

  • Assuming |M|=o(k/t) then
  • After O(k2/t3) SNAPE samples we have |M|= Ω(k/t)

Approximate Matching: Basic Idea

Pr[ei added to M] ≈ Pr[ei isn’t a NULL] · Pr[all endpoints in M are deleted] = Ω(kp2) · (1 − p)o(k/t) = Ω(t2/k)

slide-49
SLIDE 49
  • 1. Graph Matching

via SNAPE Sampling

  • II. Graph Connectivity

via DEALS Sampling

slide-50
SLIDE 50

Graph Connectivity

slide-51
SLIDE 51
  • 1st Result Test if k-edge-connected using Õ(kn) space.
  • Basic Idea: “DEALS” sampling primitive.

Graph Connectivity

slide-52
SLIDE 52
  • 1st Result Test if k-edge-connected using Õ(kn) space.
  • Basic Idea: “DEALS” sampling primitive.
  • 2nd Result Distinguish node connectivity ≤k from ≥(1+ε)k

using Õ(ε-1kn) space.

  • Basic Idea: Combine node sampling and DEALS sampling.
  • Open: Testing exact node connectivity?

Graph Connectivity

slide-53
SLIDE 53
  • 1st Result Test if k-edge-connected using Õ(kn) space.
  • Basic Idea: “DEALS” sampling primitive.
  • 2nd Result Distinguish node connectivity ≤k from ≥(1+ε)k

using Õ(ε-1kn) space.

  • Basic Idea: Combine node sampling and DEALS sampling.
  • Open: Testing exact node connectivity?
  • 3rd Result (1+ε)-approx every cut using Õ(ε-2n) space.
  • Basic Idea: Combine edge sampling and DEALS sampling.
  • Hypergraph Sparsifiers: Extends Kogan, Krauthgamer [ITCS 15]

Graph Connectivity

slide-54
SLIDE 54

DEALS Sampling

Direct-Edges-Add-L0-Sketches

slide-55
SLIDE 55

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

slide-56
SLIDE 56

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

  • Algorithm Construct Ma1, Ma2, … , Man where M is L0-sampling

sketch and ai is vector encoding neighborhood of node i.

slide-57
SLIDE 57

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

  • Algorithm Construct Ma1, Ma2, … , Man where M is L0-sampling

sketch and ai is vector encoding neighborhood of node i.

1 2 3 5 4

slide-58
SLIDE 58

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

  • Algorithm Construct Ma1, Ma2, … , Man where M is L0-sampling

sketch and ai is vector encoding neighborhood of node i.

1 2 3 5 4

{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}

a1 = 1 1 a1 = ( 1 1 0)

slide-59
SLIDE 59

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

  • Algorithm Construct Ma1, Ma2, … , Man where M is L0-sampling

sketch and ai is vector encoding neighborhood of node i.

1 2 3 5 4

{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}

a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0)

slide-60
SLIDE 60

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

  • Algorithm Construct Ma1, Ma2, … , Man where M is L0-sampling

sketch and ai is vector encoding neighborhood of node i.

1 2 3 5 4

{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}

a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0)

slide-61
SLIDE 61

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

  • Algorithm Construct Ma1, Ma2, … , Man where M is L0-sampling

sketch and ai is vector encoding neighborhood of node i.

1 2 3 5 4

{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}

a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0) a1 + a2 = ( 1 1 0)

slide-62
SLIDE 62

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

  • Algorithm Construct Ma1, Ma2, … , Man where M is L0-sampling

sketch and ai is vector encoding neighborhood of node i.

  • Lemma Non-zero entries of ∑i∈S ai = edges across (S,V\S) and

hence ∑i∈S Mai = M(∑i∈S ai) yields random edge across (S,V\S).

1 2 3 5 4

{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}

a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0) a1 + a2 = ( 1 1 0)

slide-63
SLIDE 63

DEALS Sampling

Direct-Edges-Add-L0-Sketches

  • Problem Sample edge across cut (S,V\S) where cut is specified

at end of the stream. May use Õ(n) space.

  • Algorithm Construct Ma1, Ma2, … , Man where M is L0-sampling

sketch and ai is vector encoding neighborhood of node i.

  • Lemma Non-zero entries of ∑i∈S ai = edges across (S,V\S) and

hence ∑i∈S Mai = M(∑i∈S ai) yields random edge across (S,V\S).

  • Application Find spanning trees and edges in light cuts.

1 2 3 5 4

{1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}

a1 = 1 1 a1 = ( 1 1 0) a2 = ( − 1 1 0) a1 + a2 = ( 1 1 0)

slide-64
SLIDE 64

Application to Node Connectivity…

slide-65
SLIDE 65
  • Simplified Result Can answer queries of form “are u and v

connected after removal of set of k nodes S” using Õ(kn) space.

Application to Node Connectivity…

slide-66
SLIDE 66
  • Simplified Result Can answer queries of form “are u and v

connected after removal of set of k nodes S” using Õ(kn) space.

  • Algorithm
  • Sample some edges and answer “no” iff there’s no S-avoiding

path between u and v amongst sampled edges.

Application to Node Connectivity…

slide-67
SLIDE 67
  • Simplified Result Can answer queries of form “are u and v

connected after removal of set of k nodes S” using Õ(kn) space.

  • Algorithm
  • Sample some edges and answer “no” iff there’s no S-avoiding

path between u and v amongst sampled edges.

  • How to sample: Pick each node with probability1/k and find

spanning forest on these nodes. Repeat Õ(k2) times.

Application to Node Connectivity…

slide-68
SLIDE 68
  • Simplified Result Can answer queries of form “are u and v

connected after removal of set of k nodes S” using Õ(kn) space.

  • Algorithm
  • Sample some edges and answer “no” iff there’s no S-avoiding

path between u and v amongst sampled edges.

  • How to sample: Pick each node with probability1/k and find

spanning forest on these nodes. Repeat Õ(k2) times.

  • Analysis Let u-x1-x2-….-xt-v be S-avoiding path in input graph.

Application to Node Connectivity…

slide-69
SLIDE 69
  • Simplified Result Can answer queries of form “are u and v

connected after removal of set of k nodes S” using Õ(kn) space.

  • Algorithm
  • Sample some edges and answer “no” iff there’s no S-avoiding

path between u and v amongst sampled edges.

  • How to sample: Pick each node with probability1/k and find

spanning forest on these nodes. Repeat Õ(k2) times.

  • Analysis Let u-x1-x2-….-xt-v be S-avoiding path in input graph.
  • Spanning forest on sampled nodes contains an S-avoiding path

between xi and xi+1 with prob. p2(1-p)k≈k-2. After Õ(k2) repeats we have S-avoiding path in E’ with high probability.

Application to Node Connectivity…

slide-70
SLIDE 70
  • Result Can 1+ε approximate all cuts using Õ(ε-2n) space.

Application to Cut Sparsification…

slide-71
SLIDE 71
  • Result Can 1+ε approximate all cuts using Õ(ε-2n) space.
  • Basic Idea
  • Sampling edges with probability ≥ (cε-2 log n)/λe preserves all

cut sizes where λe is the edge connectivity. Fung et al. [STOC 11]

Application to Cut Sparsification…

slide-72
SLIDE 72
  • Result Can 1+ε approximate all cuts using Õ(ε-2n) space.
  • Basic Idea
  • Sampling edges with probability ≥ (cε-2 log n)/λe preserves all

cut sizes where λe is the edge connectivity. Fung et al. [STOC 11]

  • Use DEALS sampling to pick all edges with λe≤2cε-2 log n and

sample each remaining edge with probably 1/2.

Application to Cut Sparsification…

slide-73
SLIDE 73
  • Result Can 1+ε approximate all cuts using Õ(ε-2n) space.
  • Basic Idea
  • Sampling edges with probability ≥ (cε-2 log n)/λe preserves all

cut sizes where λe is the edge connectivity. Fung et al. [STOC 11]

  • Use DEALS sampling to pick all edges with λe≤2cε-2 log n and

sample each remaining edge with probably 1/2.

  • Recurse O(log n) times in parallel until we have sparse graph.

Application to Cut Sparsification…

slide-74
SLIDE 74

Thanks!

Graph Streaming Survey

McGregor [SIGMOD Record 14]

Vertex Connectivity and Sparsification

Guha, McGregor, Tench [PODS 15]

Densest Subgraphs

McGregor, Tench, Vorotnikova, Vu [MFCS 15]

Matching, Vertex Cover, Hitting Set

Chitnis, Cormode, Esfandiari, Hajiaghayi, McGregor, Monemizadeh, Vorotnikova [SODA 16]

slide-75
SLIDE 75

Thanks!

Graph Streaming Survey

McGregor [SIGMOD Record 14]

Vertex Connectivity and Sparsification

Guha, McGregor, Tench [PODS 15]

Densest Subgraphs

McGregor, Tench, Vorotnikova, Vu [MFCS 15]

Matching, Vertex Cover, Hitting Set

Chitnis, Cormode, Esfandiari, Hajiaghayi, McGregor, Monemizadeh, Vorotnikova [SODA 16]