Streaming Algorithms for Matching Size in Sparse Graphs Graham - PowerPoint PPT Presentation

Streaming Algorithms for Matching Size in Sparse Graphs Graham Cormode g.cormode@warwick.ac.uk Joint work with S. Muthukrishnan (Rutgers), Morteza Monemizadeh (Rutgers  Amazon) Hossein Jowhari (Warwick  K. N. Toosi U. of Technology ) G G’

Big Graphs  Increasingly many “big” graphs: – Internet/web graph (2 64 possible edges) – Online social networks (10 11 edges)  Many natural problems on big graphs: – Connectivity/reachability/distance between nodes – Summarization/sparsification – Traditional optimization goals: vertex cover, maximal matching  Various models for handling big graphs: – Parallel (BSP/MapReduce): store and process the whole graph – Sampling: try to capture a subset of nodes/edges – Streaming (this work): seek a compact summary of the graph 2

Streaming graph model  The “you get one chance” model: – Vertex set [n] known, see each edge only once – Space used must be sublinear in the size of the input – Analyze costs (time to process each edge, accuracy of answer)  Variations within the model: – See each edge exactly once or at least once?  Assume exactly once, this assumption can be removed – Insertions only, or edges added and deleted? – How sublinear is the space?  Semi-streaming: linear in n (nodes) but sublinear in m (edges)  “Strictly streaming”: sublinear in n, polynomial or logarithmic  Many problems “hard” (space lower bounds) for graph streaming 3

Streaming Matching  Aim to find a matching for the input graph – Subgraph with maximum degree 1  Easy linear space 2-approximation in insert-only – Just greedily construct a matching, O(n) space  We seek to approximate the size of the matching in o(n) space – Kapralov , Khanna, Sudan, SODA’14: O(poly log n) approx in O(poly log n) space, assuming random order of arrivals – Esfandiari et al., SODA’15 : O(c) approximation in O(c n 2/3 ) space, assuming graph has c-bounded arboricity – Bury and C. Schwiegelshohn , ESA’15: Weighted graphs – McGregor and Vorotnikova , APPROX’16: Improved constant factors 4

Matching under sparsity  Many graphs (phone, web, social) are ‘sparse’ – Asymptotically fewer than O(n 2 ) edges  Characterize sparsity by bounded arboricity c – Edges can be partitioned into at most c forests – Equivalent to the largest local density, |E(U)|/(|U|-1) for U  V  E(U) is the number of edges in the subgraph induced by U – E.g. planarity corresponds to 3-bounded arboricity  Use structural properties of graph streams to give results – Improved poly. space algorithm for matching with deletions – First polylog space algorithm for matching with inserts only 5

α -Goodness  Define an edge in a stream to be α -good if neither of its endpoints appears more than α times in the suffix of the input – Intuition: This definition sparsifies the graph but approximately preserves the matching  The number of α -good edges approximates the matching size – Edges on low degree nodes are already α -good – Every high degree node has at most α +1 α -good edges – Estimating the number of α -good edges is easier than finding the matching itself Edge is 1-good if at most 1 edge on each endpoint arrives later 6

Easy case: trees (c= 1 )  Consider a tree T with maximum matching size M*  |E 1 | ≤ 2M* : The subgraph E 1 has degree at most 2, no cycles – So can make a matching for T from E 1 using at least half the edges  |E 1 | ≥ M* : Proof by induction on number of nodes n – Base case: n=2 is trivial – Inductive case: add an edge (somewhere in the stream) that connects a new leaf to an existing node  Either M* and |E 1 | stay the same, or |E 1 | increases by 1 and M* increases by at most 1  At most 1 edge is ejected from E 1 , but the new edge replaces it 7

General case  Upper bound: |E 6c | ≤ (22.5c + 6)/3 M* – E α has degree at most α +1, and invoke a bound on M* [Han 08]  Lower bound: M* ≤ 3|E 6c | – Break nodes into low L and high degree H classes (as before) – Relate the size of a maximum matching to number of high degree nodes plus edges with both ends low degree – Define HH: the nodes in H that only link to others in H  There must still be plenty of these by a counting argument – Use bounded arboricity to argue that half the nodes in HH have degree less than 6c (averaging argument) – These must all have a 6c-good edge (not too many neighbors)  Combine these to conclude M* ≤ 3|E 6c | ≤ (22.5c + 6)M* 8

Testing edges for α -Goodness  To estimate matching size, count number of α -good edges  Follow a sampling strategy similar to L 0 sampling – Uniformly sample an edge (u, v) from the stream (easy to do) – Count number of subsequent edges incident on u and v – Terminate procedure if more than α incident edges  Need to sample many times in parallel to get result – Sample rate too low: no edges found are α -good – Sample rate too high: space too high  But we can drop the instances that fail  Goldilocks effect: We can find a sample rate that is just right – And bound the space of the over-sampling instances 9

p=1/U Parallel guessing p=1  Make parallel guesses of sampling rates p i – Run 1/ ε log n guesses with sampling rates p i = (1+ ε ) -i – Terminate level i if more than O( α log (n)/ ε 2 ) guesses are active  Estimate: Use lowest non-terminated level to make estimate  Correctness : there is a ‘good’ level that will not be terminated – E α not monotone! Might go up and down as we see more edges – But the matching size only increases as the stream goes on – Use the previous analysis relating E α to matching size to bound – Also argue that using other levels to estimate is OK  Result: use O(c/ ε 2 log n) space to O(c) approximate M* 10

Matching with deletions  We assume not too many deletions: bounded by O( α n)  Our algorithm samples nodes into a set T with probability p  In parallel as insertions/deletions of edges arrive, maintain: The induced subgraph on T 1. The cut edges between T and degrees of neighbors of T 2. A matching of size at most 1/p 3.  Via arboricity assumption, nodes have expected degree O( α )  Matching (3) maintained via randomized algorithm in space O(p -2 )  Result: Balancing the space costs sets p = n -1/3 , total space O(n 2/3 ) – Estimate matching size by #high degree nodes + #low degree edges – Maintained statistics are sufficient to O( α 2 ) approximate matching size based on number of surviving high degree nodes 11

Open Problems  Work in progress: improve constants and simplify analysis [McGregor and Vorotnikova: connection to fractional matchings]  Extensions to the parallel/distributed case – Obstacle: α -good definition seems inherently centralized  Other notions of structure/sparsity beyond arboricity?  Extend to the weighted matching case: some recent results here  Connections between the streaming and online models?  Cardinality estimation for other graph problems, e.g.: – Maximum Independent Set – Dominating Set Thank you! 12

Streaming Algorithms for Matching Size in Sparse Graphs Graham - PowerPoint PPT Presentation

Streaming Algorithms for Matching Size in Sparse Graphs Graham Cormode g.cormode@warwick.ac.uk Joint work with S. Muthukrishnan (Rutgers), Morteza Monemizadeh (Rutgers Amazon) Hossein Jowhari (Warwick K. N. Toosi U. of Technology ) G G

Streaming algorithms Jeremy Gibbons University of Oxford APPSEM II, April 2004 Streaming

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Parameterized Streaming Algorithms Graham Cormode Rajesh Chitnis Parameterized Streaming

Introduction (1) Packet Loss Recovery for Streaming is growing Commercial streaming

Massive-scale analysis of streaming social networks David A. Bader Exascale Streaming Data

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Streaming Systems Instructor: Matei Zaharia cs245.stanford.edu Outline Motivation Streaming

Landell - live streaming for the masses Luciana Fujii Pontello Landell - live streaming for the

Playing Video Content Alan Smith ACTIVE SOLUTION, STOCKHOLM, SWEDEN youtube.com/user/CloudCasts

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Semi-Streaming Algorithms for Annotated Graph Streams Justin Thaler, Yahoo Labs Data Streaming

Evaluation and Development of Algorithms and Techniques for Streaming Detector Readout

Streaming Algorithms for Bin Packing and Vector Scheduling Graham Cormode and Pavel Vesel y

Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323:

iTrails: Pay-as-you-go Information Integration in Introduction Data & Query Dataspaces

CS 401: Computer Algorithms I Stable Matching Xiaorui Sun 1 Administrativia Stuffs

Data from the Si Target-Recoil Tracker - (using r3broot) Time stamp reconstruction Run

The Nevis TPC Readout Hardware for MicroBooNE and SBND

Pr trt s t

Coinduction in Agda via Copatterns and Sized Types Andreas Abel Department of Computer Science

N e w p e r s p e c t i v e s o fg e r e d b y o v e r l a p s t o

Shape design problem of waveguide by controlling resonance KAKO, Takashi Professor Emeritus

Streaming Algorithms for Matching Size in Sparse Graphs Graham - PowerPoint PPT Presentation

Streaming Algorithms for Matching Size in Sparse Graphs Graham Cormode g.cormode@warwick.ac.uk Joint work with S. Muthukrishnan (Rutgers), Morteza Monemizadeh (Rutgers Amazon) Hossein Jowhari (Warwick K. N. Toosi U. of Technology ) G G

Streaming algorithms Jeremy Gibbons University of Oxford APPSEM II, April 2004 Streaming

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Parameterized Streaming Algorithms Graham Cormode Rajesh Chitnis Parameterized Streaming

Introduction (1) Packet Loss Recovery for Streaming is growing Commercial streaming

Massive-scale analysis of streaming social networks David A. Bader Exascale Streaming Data

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Streaming Systems Instructor: Matei Zaharia cs245.stanford.edu Outline Motivation Streaming

Landell - live streaming for the masses Luciana Fujii Pontello Landell - live streaming for the

Playing Video Content Alan Smith ACTIVE SOLUTION, STOCKHOLM, SWEDEN youtube.com/user/CloudCasts

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Semi-Streaming Algorithms for Annotated Graph Streams Justin Thaler, Yahoo Labs Data Streaming

Evaluation and Development of Algorithms and Techniques for Streaming Detector Readout

Streaming Algorithms for Bin Packing and Vector Scheduling Graham Cormode and Pavel Vesel y

Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323:

iTrails: Pay-as-you-go Information Integration in Introduction Data &amp; Query Dataspaces

CS 401: Computer Algorithms I Stable Matching Xiaorui Sun 1 Administrativia Stuffs

Data from the Si Target-Recoil Tracker - (using r3broot) Time stamp reconstruction Run

The Nevis TPC Readout Hardware for MicroBooNE and SBND

Pr trt s t

Coinduction in Agda via Copatterns and Sized Types Andreas Abel Department of Computer Science

N e w p e r s p e c t i v e s o fg e r e d b y o v e r l a p s t o

Shape design problem of waveguide by controlling resonance KAKO, Takashi Professor Emeritus

iTrails: Pay-as-you-go Information Integration in Introduction Data & Query Dataspaces