Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work - PowerPoint PPT Presentation

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work with A. Wirth Sublinear Algorithms Workshop JHU, Jan 2016

Combinatorial Optimisation Problems I 1950s, 60s: Operations research I 1970s, 80s: NP-hardness I 1990s, 2000s: Approximation algorithms, hardness of approximation I 2010s: Space-constrained settings, e.g., streaming

Set Cover

Set Cover with Sets Streamed I Input: stream of m sets, each ⊆ [ n ] I Goal: cover universe [ n ] using as few sets as possible

Set Cover with Sets Streamed I Input: stream of m sets, each ⊆ [ n ] I Goal: cover universe [ n ] using as few sets as possible • Use sublinear (in m ) space • Ideally O ( n polylog n ) ... “semi-streaming” • Need Ω ( n log n ) space to certify : for each item, who covered it? Think m ≥ n

Background and Related Work O ffl ine results: I Best possible poly-time approx (1 ± o (1)) ln n [Johnson’74] [Slav´ ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14] I Simple greedy strategy gets ln n -approx: • Repeatedly add set with highest contribution • Contribution := number of new elements covered

Background and Related Work O ffl ine results: I Best possible poly-time approx (1 ± o (1)) ln n [Johnson’74] [Slav´ ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14] I Simple greedy strategy gets ln n -approx: • Repeatedly add set with highest contribution • Contribution := number of new elements covered Streaming results: I One pass semi-streaming O ( √ n ) approx I This is best possible in one semi-streaming pass [Emek-Ros´ en’14] I O (log n ) semi-streaming passes allow O (log n ) approx [Saha-Getoor’09] [Cormode-Karlo ff -Wirth’10]

Background and Related Work O ffl ine results: I Best possible poly-time approx (1 ± o (1)) ln n [Johnson’74] [Slav´ ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14] I Simple greedy strategy gets ln n -approx: • Repeatedly add set with highest contribution • Contribution := number of new elements covered Streaming results: I One pass semi-streaming O ( √ n ) approx I This is best possible in one semi-streaming pass [Emek-Ros´ en’14] I O (log n ) semi-streaming passes allow O (log n ) approx [Saha-Getoor’09] [Cormode-Karlo ff -Wirth’10] I There’s more: wait till the end! [Nisan’02] [Demaine-Indyk-Mahabadi-Vakilian’14] [Indyk-M-V’16]

Related Work: In Greater Detail Algorithms using p passes, S space, giving α -approximation Upper bounds: O ( n ) , α = O ( √ n ) I p = 1 , S = e [Emek-Ros´ en’14] I p = O (log n ) , S = e O ( n ) , α = O (log n ) [Cormode-Karlo ff -Wirth’10] I S = e O ( mn 1 / Ω (log p ) ) , α = O ( p ) [Demaine-Indyk-Mahabadi-Vakilian’14] I S = e O ( mn 1 / Ω ( p ) ) , α = O ( p ) [Indyk-Mahabadi-Vakilian’16] Lower bounds: I p = 1 , S = e O ( n ) ⇒ α = Ω ( n 1 / 2 − δ ) [Emek-Ros´ en’14] I α < 1 2 log 2 n ⇒ S = Ω ( m ) [Nisan’02] I α = O (1), deterministic ⇒ S = Ω ( mn ) [Demaine-I-M-V’14] I α = 1 ⇒ S = e Ω ( n 1+1 / (2( p +1)) ) [Indyk-Mahabadi-Vakilian’16] I p = 1 , α = 3 2 ⇒ S = Ω ( mn ) [Indyk-Mahabadi-Vakilian’16]

Our Results Upper bound I With p passes, semi-streaming space, get O ( n 1 / ( p +1) )-approx I Algorithm giving this approx based on very simple heuristic I Deterministic Lower bound I Randomised I In p passes, semi-streaming space, need Ω ( n 1 / ( p +1) / p 2 ) approx I Upper bound tight for all constant p I Semi-streaming O (log n ) approx requires Ω (log n / log log n ) passes

Progressive Greedy Algorithm Recall simple greedy: I Repeatedly add set with highest contribution I Contribution := number of new elements covered Progressive greedy: I In first pass, add all sets with contribution ≥ n 1 − 1 / p I In second pass, add all sets with contribution ≥ n 1 − 2 / p I ... I ... I In p th pass, add all sets with contribution ≥ 1

Progressive Greedy Algorithm 1: procedure GreedyPass (stream � , threshold ⌧ , set Sol , array Coverer ) for each set S i in � do 2: C { x : Coverer [ x ] 6 = 0 } . the already covered elements 3: if | S i \ C | � ⌧ then . set’s contribution � threshold 4: Sol Sol [ { i } 5: for each x 2 S i \ C do Coverer [ x ] i 6: 7: procedure ProgGreedyNaive (stream � , integer n , integer p � 1) Coverer [1 . . . n ] 0 n ; Sol ∅ 8: for j = 1 to p do GreedyPass ( � , n 1 − j / p , Sol , Coverer ) 9: output Sol , Coverer 10:

Progressive Greedy: Analysis Idea Consider p = 2 passes I First pass: admit sets i ff contribution ≥ √ n I Thus, first pass adds at most √ n sets to Sol

Progressive Greedy: Analysis Idea Consider p = 2 passes I First pass: admit sets i ff contribution ≥ √ n I Thus, first pass adds at most √ n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √ n I Thus, Sol will cover the same using ≤ √ n | Opt | sets

Progressive Greedy: Analysis Idea Consider p = 2 passes I First pass: admit sets i ff contribution ≥ √ n I Thus, first pass adds at most √ n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √ n I Thus, Sol will cover the same using ≤ √ n | Opt | sets But wait, this uses two passes for O ( √ n ) approx!

Progressive Greedy: Analysis Idea Consider p = 2 passes I First pass: admit sets i ff contribution ≥ √ n I Thus, first pass adds at most √ n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √ n I Thus, Sol will cover the same using ≤ √ n | Opt | sets But wait, this uses two passes for O ( √ n ) approx! I Logic of last pass especially simple: add set if positive contrib I Can fold this into previous one Final result: p passes, O ( n 1 / ( p +1) )-approx

Lower Bound Idea: One Pass Reduce from index : Alice gets x ∈ { 0 , 1 } n , Bob gets j ∈ [ n ], Alice talks to Bob, who must determine x j . Requires Ω ( n )-bit message. [Ablayev’96] Universe F 2 q F q n = q 2 Alice’s sets Bob’s set

Lower Bound Idea: One Pass Reduce from index : Alice gets x ∈ { 0 , 1 } n , Bob gets j ∈ [ n ], Alice talks to Bob, who must determine x j . Requires Ω ( n )-bit message. [Ablayev’96] Universe F 2 q F q n = q 2 Alice’s sets Bob’s set If Alice has Bob’s missing line , then | Opt | = 2, else | Opt | ≥ q

Lower Bound Idea: One Pass Reduce from index : Alice gets x ∈ { 0 , 1 } n , Bob gets j ∈ [ n ], Alice talks to Bob, who must determine x j . Requires Ω ( n )-bit message. [Ablayev’96] Universe F 2 q F q n = q 2 Alice’s sets Bob’s set If Alice has Bob’s missing line , then | Opt | = 2, else | Opt | ≥ q So Θ ( √ n ) approx requires Ω (#lines) = Ω ( q 2 ) = Ω ( n ) space

Next Steps Goal: p semi-streaming passes require Ω ( n 1 / ( p +1) ) approx I Handle more passes I Increase space bound

Next Steps Goal: p semi-streaming passes require Ω ( n 1 / ( p +1) ) approx I Handle more passes • Can’t start from index , need harder communication problem I Increase space bound • Need ! ( n ) to rule out semi-streaming

Tree Pointer Jumping Multiplayer game tpj p +1 , t defined on complete ( p + 1)-level t -ary tree I Pointer to child at each internal level- i node (known to Player i ) I Bit at each leaf node (known to Player 1) I Goal: output (whp) bit reached by following pointers from root Level 3 Model: p rounds of communication Level 2 Each round: player 1 , player 2 , . . . , player p +1 Level 1 1 0 0 1 1 1 0 0 1 Theorem: Longest message is Ω ( t / p 2 ) bits [C.-Cormode-McGregor’08]

Multi-Pass Set Cover: First Attempt Two passes, reducing from tpj 3 , t , using universe F 3 q (so n = q 3 ) I Three players: Alice, Bob, Carol • Alice encodes leaf bits: lines in F 3 q • Bob encodes lower pointers: planes in F 3 q with a line deleted • Carol encodes root pointer: F 3 q with a plane deleted

Multi-Pass Set Cover: First Attempt Two passes, reducing from tpj 3 , t , using universe F 3 q (so n = q 3 ) I Three players: Alice, Bob, Carol • Alice encodes leaf bits: lines in F 3 q • Bob encodes lower pointers: planes in F 3 q with a line deleted • Carol encodes root pointer: F 3 q with a plane deleted I (Carol set) ∪ (corresp. Bob set) = F 3 q \ (a line) I If Alice has the missing line, then | Opt | = 3, else ⇒ | Opt | ≥ q (*)

Multi-Pass Set Cover: First Attempt Two passes, reducing from tpj 3 , t , using universe F 3 q (so n = q 3 ) I Three players: Alice, Bob, Carol • Alice encodes leaf bits: lines in F 3 q • Bob encodes lower pointers: planes in F 3 q with a line deleted • Carol encodes root pointer: F 3 q with a plane deleted I (Carol set) ∪ (corresp. Bob set) = F 3 q \ (a line) I If Alice has the missing line, then | Opt | = 3, else ⇒ | Opt | ≥ q (*) How good is this?

Multi-Pass Set Cover: First Attempt Two passes, reducing from tpj 3 , t , using universe F 3 q (so n = q 3 ) I Three players: Alice, Bob, Carol • Alice encodes leaf bits: lines in F 3 q • Bob encodes lower pointers: planes in F 3 q with a line deleted • Carol encodes root pointer: F 3 q with a plane deleted I (Carol set) ∪ (corresp. Bob set) = F 3 q \ (a line) I If Alice has the missing line, then | Opt | = 3, else ⇒ | Opt | ≥ q (*) How good is this? I Each pointer encoded by Bob can choose from only as many leaves as ⇒ t = Θ ( q 2 ) = Θ ( n 2 / 3 ) there are lines in a specific plane =

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work - PowerPoint PPT Presentation

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work with A. Wirth Sublinear Algorithms Workshop JHU, Jan 2016 Combinatorial Optimisation Problems I 1950s, 60s: Operations research I 1970s, 80s: NP-hardness I 1990s, 2000s:

Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh Mahabadi, Ali Vakilian Set Cover

Graphs Vertex Cover Vertex Cover A vertex cover of a graph G=(V ,E) is a set C of vertices such

COAL COVER COAL COAL COAL COVER COVER COVER Searfoss

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

LP techniques for set cover Chs. 13, 14, 15 Risto Hakala risto.m.hakala@tkk.fi March 10, 2008

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem Sepehr Assadi

Tight Bounds for Single-Pass Streaming Complexity of the Set Cover Problem Sepehr Assadi

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Introduction (1) Packet Loss Recovery for Streaming is growing Commercial streaming

Massive-scale analysis of streaming social networks David A. Bader Exascale Streaming Data

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Streaming Systems Instructor: Matei Zaharia cs245.stanford.edu Outline Motivation Streaming

Landell - live streaming for the masses Luciana Fujii Pontello Landell - live streaming for the

Geometric Representation in the Theories of Pseudo-finite Fields Ozlem Beyarslan ci Bo

Quantile Estimation Definition and Examples Point Estimates Peter J. Haas Confidence Intervals

Candecomp/Parafac based Array Processing David Brie CRAN UMR 7039 - Universit e de Lorraine -

Dissipation and the Foundations of Classical Statistical Denis J. Evans, Stephen R. Williams

Solving p -adic differential equations in point counting algorithms Hendrik Hubrechts Katholieke

CBIG Computational Breast Imaging Group Quantitative imaging phenotyping of breast cancer risk

=> Could only be submi0ed once during the first year of the COST ac<on Swiss COST Project

Role of image quality in dose management via/through DRL Ehsan Samei, PhD, FAAPM, FSPIE, FAIMBE,

Sambuz

Useful Links

Newsletter

Mail Us

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work - PowerPoint PPT Presentation

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work with A. Wirth Sublinear Algorithms Workshop JHU, Jan 2016 Combinatorial Optimisation Problems I 1950s, 60s: Operations research I 1970s, 80s: NP-hardness I 1990s, 2000s:

Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh Mahabadi, Ali Vakilian Set Cover

Graphs Vertex Cover Vertex Cover A vertex cover of a graph G=(V ,E) is a set C of vertices such

COAL COVER COAL COAL COAL COVER COVER COVER Searfoss

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

LP techniques for set cover Chs. 13, 14, 15 Risto Hakala risto.m.hakala@tkk.fi March 10, 2008

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem Sepehr Assadi

Tight Bounds for Single-Pass Streaming Complexity of the Set Cover Problem Sepehr Assadi

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Introduction (1) Packet Loss Recovery for Streaming is growing Commercial streaming

Massive-scale analysis of streaming social networks David A. Bader Exascale Streaming Data

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Streaming Systems Instructor: Matei Zaharia cs245.stanford.edu Outline Motivation Streaming

Landell - live streaming for the masses Luciana Fujii Pontello Landell - live streaming for the

Geometric Representation in the Theories of Pseudo-finite Fields Ozlem Beyarslan ci Bo

Quantile Estimation Definition and Examples Point Estimates Peter J. Haas Confidence Intervals

Candecomp/Parafac based Array Processing David Brie CRAN UMR 7039 - Universit e de Lorraine -

Dissipation and the Foundations of Classical Statistical Denis J. Evans, Stephen R. Williams

Solving p -adic differential equations in point counting algorithms Hendrik Hubrechts Katholieke

CBIG Computational Breast Imaging Group Quantitative imaging phenotyping of breast cancer risk

=&gt; Could only be submi0ed once during the first year of the COST ac&lt;on Swiss COST Project

Role of image quality in dose management via/through DRL Ehsan Samei, PhD, FAAPM, FSPIE, FAIMBE,

Sambuz

Useful Links

Newsletter

Mail Us

=> Could only be submi0ed once during the first year of the COST ac<on Swiss COST Project