Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set - PowerPoint PPT Presentation

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem Sepehr Assadi University of Pennsylvania Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem Input: A collection of m sets S 1 , . . . , S m from a universe [ n ] . Goal: Choose a smallest subset C of the sets from S 1 , . . . , S m such that C covers [ n ] , i.e., � i ∈ C S i = [ n ] . Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem Input: A collection of m sets S 1 , . . . , S m from a universe [ n ] . Goal: Choose a smallest subset C of the sets from S 1 , . . . , S m such that C covers [ n ] , i.e., � i ∈ C S i = [ n ] . We use OPT to denote the optimal solution size. Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem A classic optimization problem with many applications: Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem A classic optimization problem with many applications: Information retrieval, ◮ e.g., finding a smallest number of documents covering all the topics in a given query. Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem A classic optimization problem with many applications: Information retrieval, ◮ e.g., finding a smallest number of documents covering all the topics in a given query. Data mining, ◮ e.g., finding a smallest number of features explaining all positive examples, i.e., a “minimal explanation” of a pattern. Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem A classic optimization problem with many applications: Information retrieval, ◮ e.g., finding a smallest number of documents covering all the topics in a given query. Data mining, ◮ e.g., finding a smallest number of features explaining all positive examples, i.e., a “minimal explanation” of a pattern. Web search and advertising, ◮ e.g., finding a smallest number of impressions to reach a certain set of users. Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem A classic optimization problem with many applications: Information retrieval, ◮ e.g., finding a smallest number of documents covering all the topics in a given query. Data mining, ◮ e.g., finding a smallest number of features explaining all positive examples, i.e., a “minimal explanation” of a pattern. Web search and advertising, ◮ e.g., finding a smallest number of impressions to reach a certain set of users. Operation research, machine learning, web host analysis, . . . Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem: Classical Setting Theoretical aspects: One of Karp’s original 21 NP-hard problems [Karp, 1972]. The greedy algorithm that picks the “best” set in each iteration achieves ln ( n ) approximation [Johnson, 1974, Slav´ ık, 1997]. No better approximation factor is possible in polynomial time unless P = NP [Lund and Yannakakis, 1994, Feige, 1998, Dinur and Steurer, 2014, Moshkovitz, 2015]. Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem: Classical Setting Theoretical aspects: One of Karp’s original 21 NP-hard problems [Karp, 1972]. The greedy algorithm that picks the “best” set in each iteration achieves ln ( n ) approximation [Johnson, 1974, Slav´ ık, 1997]. No better approximation factor is possible in polynomial time unless P = NP [Lund and Yannakakis, 1994, Feige, 1998, Dinur and Steurer, 2014, Moshkovitz, 2015]. In practice, The greedy algorithm is highly efficient and surprisingly accurate. Returned solution has < 10% · OPT sets more than the optimal solution on a typical data set [Grossman and Wool, 1997, Gomes et al., 2006, Cormode et al., 2010]. Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem: Classical Setting Theoretical aspects: One of Karp’s original 21 NP-hard problems [Karp, 1972]. The greedy algorithm that picks the “best” set in each iteration achieves ln ( n ) approximation [Johnson, 1974, Slav´ ık, 1997]. No better approximation factor is possible in polynomial time unless P = NP [Lund and Yannakakis, 1994, Feige, 1998, Dinur and Steurer, 2014, Moshkovitz, 2015]. In practice, The greedy algorithm is highly efficient and surprisingly accurate. Returned solution has < 10% · OPT sets more than the optimal solution on a typical data set [Grossman and Wool, 1997, Gomes et al., 2006, Cormode et al., 2010]. as long as the dataset is relatively small! Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem: Big Data Scenario [Cormode et al., 2010]: A direct implementation of the greedy algorithm scales surprisingly poorly when the data size grows. Efficient on main memory Inefficient on disk Sepehr Assadi (Penn) PODS 2017

The Set Cover Problem: Big Data Scenario [Cormode et al., 2010]: A direct implementation of the greedy algorithm scales surprisingly poorly when the data size grows. Efficient on main memory Inefficient on disk One approach: the streaming model for the set cover problem introduced by [Saha and Getoor, 2009]. Sepehr Assadi (Penn) PODS 2017

The Streaming Set Cover Problem Model: Sequential access to the sets: ◮ The input sets S 1 , . . . , S m are presented one by one in a stream. Sepehr Assadi (Penn) PODS 2017

The Streaming Set Cover Problem Model: Sequential access to the sets: ◮ The input sets S 1 , . . . , S m are presented one by one in a stream. Small working memory: ◮ The streaming algorithm has a small space to maintain a summary of the input sets. Sepehr Assadi (Penn) PODS 2017

The Streaming Set Cover Problem Model: Sequential access to the sets: ◮ The input sets S 1 , . . . , S m are presented one by one in a stream. Small working memory: ◮ The streaming algorithm has a small space to maintain a summary of the input sets. Efficiency: ◮ The algorithm can make one or few passes over the stream and should output the answer using only the stored summary. Sepehr Assadi (Penn) PODS 2017

The Streaming Set Cover Problem Model: Sequential access to the sets: ◮ The input sets S 1 , . . . , S m are presented one by one in a stream. Small working memory: ◮ The streaming algorithm has a small space to maintain a summary of the input sets. Efficiency: ◮ The algorithm can make one or few passes over the stream and should output the answer using only the stored summary. Small space: Semi-streaming space, i.e., � O ( n ) . 1 Sub-linear space, i.e., o ( mn ) . 2 Sepehr Assadi (Penn) PODS 2017

The Streaming Set Cover Problem Note. We do not restrict the computation time of the algorithms in this model, e.g., allow exponential time computation. Sepehr Assadi (Penn) PODS 2017

The Streaming Set Cover Problem Note. We do not restrict the computation time of the algorithms in this model, e.g., allow exponential time computation. For theoretical purposes: understanding the space complexity of streaming algorithms in absence of time complexity restrictions. Sepehr Assadi (Penn) PODS 2017

The Streaming Set Cover Problem Note. We do not restrict the computation time of the algorithms in this model, e.g., allow exponential time computation. For theoretical purposes: understanding the space complexity of streaming algorithms in absence of time complexity restrictions. For practical purposes: we rarely need the full power of such exponential time computation anyway. Sepehr Assadi (Penn) PODS 2017

State of the Art Many interesting results: [Saha and Getoor, 2009, Cormode et al., 2010, Emek and Ros´ en, 2014, Demaine et al., 2014, Badanidiyuru et al., 2014, Indyk et al., 2015, Har-Peled et al., 2016, Chakrabarti and Wirth, 2016, Assadi et al., 2016, McGregor and Vu, 2016, Bateni et al., 2016]. Sepehr Assadi (Penn) PODS 2017

State of the Art Many interesting results: [Saha and Getoor, 2009, Cormode et al., 2010, Emek and Ros´ en, 2014, Demaine et al., 2014, Badanidiyuru et al., 2014, Indyk et al., 2015, Har-Peled et al., 2016, Chakrabarti and Wirth, 2016, Assadi et al., 2016, McGregor and Vu, 2016, Bateni et al., 2016]. In particular, Complete resolution of the complexity of multi-pass semi-streaming algorithms [Chakrabarti and Wirth, 2016]. Sepehr Assadi (Penn) PODS 2017

State of the Art Many interesting results: [Saha and Getoor, 2009, Cormode et al., 2010, Emek and Ros´ en, 2014, Demaine et al., 2014, Badanidiyuru et al., 2014, Indyk et al., 2015, Har-Peled et al., 2016, Chakrabarti and Wirth, 2016, Assadi et al., 2016, McGregor and Vu, 2016, Bateni et al., 2016]. In particular, Complete resolution of the complexity of multi-pass semi-streaming algorithms [Chakrabarti and Wirth, 2016]. Complete resolution of the complexity of single-pass sub-linear space streaming algorithms [Assadi et al., 2016]. Sepehr Assadi (Penn) PODS 2017

State of the Art Many interesting results: [Saha and Getoor, 2009, Cormode et al., 2010, Emek and Ros´ en, 2014, Demaine et al., 2014, Badanidiyuru et al., 2014, Indyk et al., 2015, Har-Peled et al., 2016, Chakrabarti and Wirth, 2016, Assadi et al., 2016, McGregor and Vu, 2016, Bateni et al., 2016]. In particular, Complete resolution of the complexity of multi-pass semi-streaming algorithms [Chakrabarti and Wirth, 2016]. Complete resolution of the complexity of single-pass sub-linear space streaming algorithms [Assadi et al., 2016]. Short summary: to ensure efficiency, we need more than � O ( n ) space and more than one pass! Sepehr Assadi (Penn) PODS 2017

State of the Art The best known sub-linear space algorithm [Har-Peled et al., 2016]: Sepehr Assadi (Penn) PODS 2017

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set - PowerPoint PPT Presentation

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem Sepehr Assadi University of Pennsylvania Sepehr Assadi (Penn) PODS 2017 The Set Cover Problem Input: A collection of m sets S 1 , . . . , S m from a universe [ n

Chapter 2 Tight-frames An Introduction 1 Outline 1. Tight-frame 1. Tight-frame 2. Matrix

50% pass developmental credit course course pass take pass developmental credit credit

U-Pass Program Executive Management Committee May 17, 2018 1 U-PASS The U-Pass Pilot

6. Approximation and fitting norm approximation least-norm problems regularized

The Proposed Closure of Rollover Pass Texas General Land Office Jerry Patterson, Land

U-PASS IMPLEMENTATION 2015/2016 Why are we implementing the U-Pass? In 2014/2015, the

Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 Amit Chakrabarti 1 Multi-Pass

Tight Gas in the Netherlands A Study Proposal EBN Exploration Day 23 May 2016 1 1 Why a

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

SVMpAUC-tight: A new algorithm for optimizing partial AUC based on a tight convex upper bound

Analysis of the Parallel Distinguished Point Tradeoff Jin Hong, *Ga Won Lee, Daegun Ma Seoul

HIBE with Tight Multi-challenge Security Roman Langrehr ETH Zurich (Switzerland), Part of the

On i -tight sets of the Hermitian polar space with small parameter i Jan De Beule Vrije

Supplemental Instruction (SI-PASS) A Rose with many Names LANCASTER / LEIF BRYNGFORS EUROPEAN

Building a Pass Rusher from Scratch Since 2002 Big Skill Pass Rush System V.G.H.H Vision

5. Structured Descriptions & Tradeoff Between Expressiveness and Tractability Outline

Requirements Engineering Software Engineering Software Engineering Andreas Zeller Saarland

Construction of covering arrays from Outline m-sequences Covering arrays Definition Research

Decidable Problems for Counter Systems Day 3 Vector Addition Systems St ephane Demri

Once for All: Train One Network and Specialize it for Efficient Deployment Han Cai, Chuang Gan,

2/20/19 Disclosures I have no disclosures. Fever in the ICU Infectious Diseases in Clinical

PACE 2019: The 4th Iteration Johannes K. Fichte, TU Dresden Markus Hecher, TU Wien & Univ.

Overtaking VEST Antoine Joux 1 , 2 Jean-Ren Reinhard 3 1 DGA 2 Universit de

Creating statistics e-Assessments using DEWIS with embedded R code Iain Weir, Rhys Gwynllyw &

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set - PowerPoint PPT Presentation

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem Sepehr Assadi University of Pennsylvania Sepehr Assadi (Penn) PODS 2017 The Set Cover Problem Input: A collection of m sets S 1 , . . . , S m from a universe [ n

Chapter 2 Tight-frames An Introduction 1 Outline 1. Tight-frame 1. Tight-frame 2. Matrix

50% pass developmental credit course course pass take pass developmental credit credit

U-Pass Program Executive Management Committee May 17, 2018 1 U-PASS The U-Pass Pilot

6. Approximation and fitting norm approximation least-norm problems regularized

The Proposed Closure of Rollover Pass Texas General Land Office Jerry Patterson, Land

U-PASS IMPLEMENTATION 2015/2016 Why are we implementing the U-Pass? In 2014/2015, the

Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 Amit Chakrabarti 1 Multi-Pass

Tight Gas in the Netherlands A Study Proposal EBN Exploration Day 23 May 2016 1 1 Why a

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

SVMpAUC-tight: A new algorithm for optimizing partial AUC based on a tight convex upper bound

Analysis of the Parallel Distinguished Point Tradeoff Jin Hong, *Ga Won Lee, Daegun Ma Seoul

HIBE with Tight Multi-challenge Security Roman Langrehr ETH Zurich (Switzerland), Part of the

On i -tight sets of the Hermitian polar space with small parameter i Jan De Beule Vrije

Supplemental Instruction (SI-PASS) A Rose with many Names LANCASTER / LEIF BRYNGFORS EUROPEAN

Building a Pass Rusher from Scratch Since 2002 Big Skill Pass Rush System V.G.H.H Vision

5. Structured Descriptions &amp; Tradeoff Between Expressiveness and Tractability Outline

Requirements Engineering Software Engineering Software Engineering Andreas Zeller Saarland

Construction of covering arrays from Outline m-sequences Covering arrays Definition Research

Decidable Problems for Counter Systems Day 3 Vector Addition Systems St ephane Demri

Once for All: Train One Network and Specialize it for Efficient Deployment Han Cai, Chuang Gan,

2/20/19 Disclosures I have no disclosures. Fever in the ICU Infectious Diseases in Clinical

PACE 2019: The 4th Iteration Johannes K. Fichte, TU Dresden Markus Hecher, TU Wien &amp; Univ.

Overtaking VEST Antoine Joux 1 , 2 Jean-Ren Reinhard 3 1 DGA 2 Universit de

Creating statistics e-Assessments using DEWIS with embedded R code Iain Weir, Rhys Gwynllyw &amp;

5. Structured Descriptions & Tradeoff Between Expressiveness and Tractability Outline

PACE 2019: The 4th Iteration Johannes K. Fichte, TU Dresden Markus Hecher, TU Wien & Univ.

Creating statistics e-Assessments using DEWIS with embedded R code Iain Weir, Rhys Gwynllyw &