Interval Selection in the Streaming Model Sergio Cabello, Pablo P - PowerPoint PPT Presentation

Interval Selection in the Streaming Model Sergio Cabello, Pablo P´ erez-Lantero University of Ljubljana (Slovenia), Universidad de Santiago, USACH (Chile) ADGO 2016 Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 1 / 31

Introduction Interval Selection in the Streaming Model Given a stream I of intervals, compute within one pass over I a maximum subset of I of independent intervals (of cardinality α ( I )). Data stream model ◮ widely used (Data Streams: Alg. & App., Muthukrishnan, 2005) ◮ data arrives sequentially (not necessarily sorted) ◮ bound in the amount of memory (e.g. polylog) ◮ only access data of the past stored in the limited memory ◮ ⇒ approximate solutions in many cases Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 2 / 31

Introduction Interval Selection in the Streaming Model Given a stream I of intervals, compute within one pass over I a maximum subset of I of independent intervals (of cardinality α ( I )). Interval Selection ≡ Maximum Independent Set in Interval Graphs ◮ Fundamental optimization problem ◮ Greedy algorithm in linear time (once intervals are sorted) Interval Selection in Data Stream: ◮ 2-approximation in the Data Stream Model with O ( α ( I )) space: Emek et al (ICALP 2012); Cabello & P´ erez-Lantero (2015) ◮ No ( < 2) -approximation can be obtained in sublinear space : Emek et al (ICALP 2012) ◮ Generalizes the distinct elements problem : Given a data stream of numbers, identify how many distinct numbers are in the stream (Kane et al, PODS 2010) Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 3 / 31

Introduction Interval Selection in the Streaming Model Given a stream I of intervals, compute within one pass over I a maximum subset of I of independent intervals (of cardinality α ( I )). We consider the estimation of α ( I ) (assuming that endpoints of intervals are in [ n ] = { 1 , 2 , . . . , n } ) Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 4 / 31

Our results ((2 + ε )-approximation w.h.p.) An algorithm to compute ˆ α ( I ) such that: 1 � 1 � 2 − ε α ( I ) ≤ ˆ α ( I ) ≤ α ( I ) with probability at least 2 / 3, in O ( ε − 5 log 6 n ) space. ((3 / 2 + ε )-approximation w.h.p.) For same-length intervals, a computation of ˆ α ( I ): 2 � 2 � 3 − ε α ( I ) ≤ ˆ α ( I ) ≤ α ( I ) with probability at least 2 / 3, in O ( ε − 2 log(1 /ε ) + log n ) space. (Lower bounds) The approximation ratios for estimating α ( I ) are essentially 3 optimal, if we use o ( n ) bits of space. Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 5 / 31

A 2-approximation in O ( α ( I )) space (Cabello & P´ erez-Lantero) Window partition of R (other intervals of I ) Maintain a partition of R into windows For each window, all intervals from I contained in it are pairwise-intertersecting Fact : Since in the optimal solution no 2 intervals can fit within the same window, taking one interval from each window gives a 2-approximation Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 6 / 31

A 2-approximation in O ( α ( I )) space (Cabello & P´ erez-Lantero) store ≤ 2 intervals per window of R interval with Leftmost right endpoint interval with Rightmost left endpoint Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 7 / 31

A 2-approximation in O ( α ( I )) space (Cabello & P´ erez-Lantero) Initialization: one window, i.e. R 1st interval of I Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 7 / 31

A 2-approximation in O ( α ( I )) space (Cabello & P´ erez-Lantero) Window partition of R discard this new interval from I Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 7 / 31

A 2-approximation in O ( α ( I )) space (Cabello & P´ erez-Lantero) a window of R discard the new interval Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 7 / 31

A 2-approximation in O ( α ( I )) space (Cabello & P´ erez-Lantero) a window of R update the info of the window remove this interval Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 7 / 31

A 2-approximation in O ( α ( I )) space (Cabello & P´ erez-Lantero) a window of R split the window! Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 7 / 31

A 2-approximation in O ( α ( I )) space (Cabello & P´ erez-Lantero) ≤ α ( I ) windows the space is within O ( α ( I )) each new interval is processed in O (log α ( I )) time Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 8 / 31

Our assumptions for the estimation of α ( I ) Endpoints of intervals are in [ n ] = { 1 , 2 , . . . , n } 1 A unit of memory can store a value from [ n ] = { 1 , 2 , . . . , n } 2 Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 9 / 31

Sampling techniques Suppose we have a stream I of numbers in [ n ] = { 1 , 2 , . . . , n } 1 Maintaining the minimum over the stream is easy 2 To maintain a (uniform) random element s over the stream, we would like to have 3 a (uniform & computable) random permutation h : [ n ] → [ n ]: ◮ s = first element of I . ◮ for each new a ∈ I : if h ( a ) < h ( s ) then s = a . The sampled element is chosen the first time it is seen 4 Problem: there is no compact way to encode a uniform-random permutation 5 Solution: construct h using hash functions and sacrifice uniformity 6 Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 10 / 31

Sampling techniques A family of permutations H = { h : [ n ] → [ n ] } is ε -min-wise independent if 1 − ε h ∈H [ h ( y ) = min h ( X )] ≤ 1 + ε ∀ X ⊆ [ n ] , y ∈ X : ≤ Pr | X | | X | For X ⊆ [ n ], choosing h ∈ H uniform at random: arg min { h ( x ) | x ∈ X } is a near-uniform random element of X Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 11 / 31

Sampling techniques Computable family of ε -min-wise independent permutations For every ε ∈ (0 , 1 / 2) and n > 0, there exists a family H ( n , ε ) = { h : [ n ] → [ n ] } of ε -min-wise independent permutations such that: a random-uniform element of H ( n , ε ) can be chosen in O (log(1 /ε )) time ( constructive ); for h ∈ H ( n , ε ) and x , y ∈ [ n ], we can decide with O (log(1 /ε )) arithmetic operations whether h ( x ) < h ( y ) ( computable ) Proof: Construct K -wise independent hash functions [ c · n /ε ] → [ c · n /ε ] for K = Θ(log(1 /ε )) and some constant c . (Indyk, 2001). Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 12 / 31

Sampling techniques How to generate a near-uniform random element of X ⊆ [ n ] = { 1 , 2 , . . . , n } ? Let H = H ( n , ε ) 1 Choose h ∈ H uniformly at random 2 return s = arg min { h ( x ) | x ∈ X } 3 [Datar and Muthukrishnan (ESA 2002)] ∀ y ∈ Y ⊆ X ⊆ [ n ] : ( near-uniform behavior) (1 − ε ) | Y | ≤ Pr[ s ∈ Y ] ≤ (1 + ε ) | Y | . | X | | X | 1 − 4 ε ≤ Pr[ y = s | s ∈ Y ] ≤ 1 + 4 ε | Y | . | Y | Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 13 / 31

Sampling techniques How to maintain a near-uniform random interval of the stream I = I 1 , I 2 , I 3 , . . . ? Fix an easy-to-compute mapping b : I → [ n 2 ], e.g. 1 b ([ x , y ]) = n ( x − 1) + y Let H = H ( n 2 , ε ) 2 Choose h ∈ H uniformly at random 3 ◮ s = first interval of I . ◮ for each new interval a ∈ I : if h ◦ b ( a ) < h ◦ b ( s ) then s = a . Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 14 / 31

Streaming algorithm (general idea) 2-approx 2-approx 2-approx 2-approx n + 1 1 Find independent canonical segments in the window [1 , n ] = [1 , n + 1) Compute a 2-approximation within each canonical segment S : in O � α ( I ∈ I | I ⊂ S ) � space Guarantee that each canonical segment S contains enough disjoint intervals from I , but not too many to save space Estimate the number of independent canonical segments the average of the 2-approximations of the segments Cabello and P´ erez-Lantero (Uni-Lj and USACH) Interval Selection in the Streaming Model ADGO 2016 15 / 31

Interval Selection in the Streaming Model Sergio Cabello, Pablo P - PowerPoint PPT Presentation

Interval Selection in the Streaming Model Sergio Cabello, Pablo P erez-Lantero University of Ljubljana (Slovenia), Universidad de Santiago, USACH (Chile) ADGO 2016 Cabello and P erez-Lantero (Uni-Lj and USACH) Interval Selection in the

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

Computing the Cube of an Computing the . . . Why Power of a Matrix Interval Matrix Is NP-Hard

On restrictions of balanced 2-interval graphs Philippe Gambette and Stphane Vialette Outline

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Interval Arithmatic and Automatic Differentiation in Optimization and Model Calibration Grzegorz

Greedy Algorithms Solve problems with the simplest possible algorithm CSE 421 The hard

Learning Outcomes I understand the active-low signal convention and how to interface circuits

Power Converters and Power Quality II CERN Accelerator School on Power Converters Baden, Friday 9

GEEM : An algorithm for Active Learning on Attributed Graphs Florence Regol* Soumyasundar Pal*,

STANCE : un outil d'analyse de contrexemples inspir du test Kalou Cabrera Castillos, Hlne

Constrained rigid body Collision detection Contact point Colliding contact

1 Open loop flow control Hard problems Traffic descriptors Two phases to flow Two phases to

Some Observations of Internet Stream Lifetimes CAIDA/WIDE, Los Angeles, 12 Mar 05 Nevil Brownlee

Interval Selection in the Streaming Model Sergio Cabello, Pablo P - PowerPoint PPT Presentation

Interval Selection in the Streaming Model Sergio Cabello, Pablo P erez-Lantero University of Ljubljana (Slovenia), Universidad de Santiago, USACH (Chile) ADGO 2016 Cabello and P erez-Lantero (Uni-Lj and USACH) Interval Selection in the

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

Computing the Cube of an Computing the . . . Why Power of a Matrix Interval Matrix Is NP-Hard

On restrictions of balanced 2-interval graphs Philippe Gambette and Stphane Vialette Outline

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Interval Arithmatic and Automatic Differentiation in Optimization and Model Calibration Grzegorz

Greedy Algorithms Solve problems with the simplest possible algorithm CSE 421 The hard

Learning Outcomes I understand the active-low signal convention and how to interface circuits

Power Converters and Power Quality II CERN Accelerator School on Power Converters Baden, Friday 9

GEEM : An algorithm for Active Learning on Attributed Graphs Florence Regol* Soumyasundar Pal*,

STANCE : un outil d'analyse de contrexemples inspir du test Kalou Cabrera Castillos, Hlne

Constrained rigid body Collision detection Contact point Colliding contact

1 Open loop flow control Hard problems Traffic descriptors Two phases to flow Two phases to

Some Observations of Internet Stream Lifetimes CAIDA/WIDE, Los Angeles, 12 Mar 05 Nevil Brownlee

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?