Overlap Interval Partition Join os 1 ohlen 1 Johann Gamper 2 Anton - PowerPoint PPT Presentation

Overlap Interval Partition Join os 1 ohlen 1 Johann Gamper 2 Anton Dign¨ Michael H. B¨ 1 University of Z¨ urich, Switzerland 2 Free University of Bozen-Bolzano, Italy SIGMOD 2014 June 22-27, 2014 - Snowbird, Utah, USA

Introduction ◮ Temporal relations : tuples have a time interval. ◮ Overlap join : join tuples with overlapping time intervals. r 1 r 2 r 3 r s 1 s 3 s s 2 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t r 1 ◦ s 2 r 2 ◦ s 3 r 3 ◦ s 3 r 2 ◦ s 2 r ⊲ ⊳ s SIGMOD 2014 2/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Introduction ◮ Temporal relations : tuples have a time interval. ◮ Overlap join : join tuples with overlapping time intervals. r 1 r 2 r 3 r s 1 s 3 s s 2 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t r 1 ◦ s 2 r 2 ◦ s 3 r 3 ◦ s 3 r 2 ◦ s 2 r ⊲ ⊳ s ◮ Goal : Efficient and robust overlap join ◮ Alternative for query optimizer when other predicates are absent, have poor selectivity (long histories), or need to be evaluated after the join (on overlapping interval) SIGMOD 2014 2/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Outline ◮ OIP : an efficient partitioning for interval data ◮ OIPJoin : a partition join based on OIP ◮ Determine the optimal OIP parameter k for OIPJoin ◮ Empirical evaluation SIGMOD 2014 3/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Idea of Overlap Interval Partitioning OIP ◮ Given input data with intervals t SIGMOD 2014 4/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Idea of Overlap Interval Partitioning OIP ◮ Given input data with intervals t ◮ Partition intervals according to position and duration t ◮ Constant clustering guarantee: Difference in duration of tuple and partition is upper-bounded by a constant. SIGMOD 2014 4/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Overlap Interval Partitioning ( OIP ) ◮ Divide time range into k granules of equal duration ◮ Partitions are sequences of contiguous granules ◮ Partitions can overlap SIGMOD 2014 5/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Overlap Interval Partitioning ( OIP ) ◮ Divide time range into k granules of equal duration ◮ Partitions are sequences of contiguous granules ◮ Partitions can overlap k = 3: Q 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t granule 0 granule 1 granule 2 SIGMOD 2014 5/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Overlap Interval Partitioning ( OIP ) ◮ Divide time range into k granules of equal duration ◮ Partitions are sequences of contiguous granules ◮ Partitions can overlap k = 3: k = 4: Q Q 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t t granule 0 granule 1 granule 2 granule 0 granule 1 granule 2 granule 3 SIGMOD 2014 5/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Overlap Interval Partitioning ( OIP ) ◮ Divide time range into k granules of equal duration ◮ Partitions are sequences of contiguous granules ◮ Partitions can overlap k = 3: k = 4: Q Q 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t t granule 0 granule 1 granule 2 granule 0 granule 1 granule 2 granule 3 Low k ⇒ fewer partition accesses High k ⇒ more precise partitions (less overlapping boxes) (better fitting boxes) SIGMOD 2014 5/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

The OIPJoin 1. Determine number of granules k r 1 r r 2 r 3 s 3 s s 4 s 6 s 1 s 2 s 5 s 7 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t SIGMOD 2014 6/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

The OIPJoin 1. Determine number of granules k 2. Partition both input relations using OIP r 1 r r 2 r 3 s 3 s s 4 s 6 s 1 s 2 s 5 s 7 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t SIGMOD 2014 6/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

The OIPJoin 1. Determine number of granules k 2. Partition both input relations using OIP 3. Join tuples within overlapping partitions r 1 r r 2 r 3 s 3 s s 4 s 6 s 1 s 2 s 5 s 7 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t SIGMOD 2014 6/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

The OIPJoin 1. Determine number of granules k 2. Partition both input relations using OIP 3. Join tuples within overlapping partitions r 1 r r 2 r 3 s 3 s s 4 s 6 s 1 s 2 s 5 s 7 2012-1 2012-2 2012-3 2012-4 2012-5 2012-6 2012-7 2012-8 2012-9 2012-10 2012-11 2012-12 t Properties: ◮ Only 11 tuple comparisons ◮ 9 result tuples ◮ 2 false hits ( r 1 ◦ s 6 and r 2 ◦ s 5 ) ◮ Only 5 inner partitions scanned (5 partition accesses) SIGMOD 2014 6/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Properties of OIP ◮ Constant clustering guarantee : The difference in duration between a tuple and its partition is less than two granules. ◮ All tuples in a partition behave similarly ◮ Very few false hits ◮ Scans of partitions instead of random tuple access : ◮ High cache locality ◮ Much faster than index look-ups SIGMOD 2014 7/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

How to Determine k ? Intuition: Find optimal k s.t. the number of false hits of OIP justifies the number of partition accesses and vice versa. SIGMOD 2014 8/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Cost Dimensions We consider CPU and IO costs Cost CPU IO False Hits Increase the number of CPU Increase the number of operations (identifying and block transfers (more discarding false hits). data is fetched). Partition Increase the number of CPU Increase the number of Accesses operations (search in the ac- block transfers (more cess structure). partially filled blocks) SIGMOD 2014 9/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Cost Dimensions We consider CPU and IO costs Cost CPU IO False Hits Increase the number of CPU Increase the number of operations (identifying and block transfers (more discarding false hits). data is fetched). Partition Increase the number of CPU Increase the number of Accesses operations (search in the ac- block transfers (more cess structure). partially filled blocks) What does that mean for k ? ◮ High k ⇒ few false hits, many partition accesses ◮ Low k ⇒ many false hits, few partition accesses SIGMOD 2014 9/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Determining k for the OIPJoin 1. Quantify false hits on average: AFR ≤ 1 k (Probability that a tuple is a false hit) 2. Quantify partition accesses on average: APA = k 2 + k +1 3 (Number of partitions accessed by a query interval) 3. Define the cost function for the overhead due to AFR and APA using CPU and IO cost 4. Minimize the cost function w.r.t. k SIGMOD 2014 10/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Overhead Cost for Partition Accesses ◮ For each of the | p r | outer partitions ◮ APA inner partition accesses (scans) partially filled blocks search in access structure (1 trailing block per partition) (2 comparisons in access list) | p r | · APA · ( c io + 2 · c cpu ) part. accesses ◮ Average number of Partition Accesses APA = k 2 + k +1 3 SIGMOD 2014 11/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Overhead Cost for False Hits ◮ For each of the | p r | outer partitions ◮ AFR · n s false hits (inner) fetched ◮ Each outer tuple ◮ Is compared with AFR · n s false hits (inner) ◮ Is AFR · n s times a false hits | p r | · n s · AFR · c io + 2 · n s · n r · AFR · 2 · c cpu ) false hits b more data is fetched identifying and discarding (1 false hit within a block) (2 comparisons per false hit) ◮ Average False hit Ratio AFR ≤ 1 k SIGMOD 2014 12/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

The Overhead Cost Function partially filled blocks search in access structure (1 trailing block per partition) (2 comparisons in access list) cost ( k ) = | p r | · APA · ( c io + 2 · c cpu ) + part. accesses | p r | · n s · AFR · ( c io + 2 · n r | p r | · 2 · c cpu ) false hits b more data is fetched identifying and discarding (1 false hit within a block) (2 comparisons per false hit) SIGMOD 2014 13/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Determining k for the OIPJoin ◮ By minimizing cost ( k ) we get: k = f ( n r , n s , c cpu , c io , b ) Example: ◮ n r = 10M tuples ◮ n r = 100M tuples ◮ c cpu = 0 . 5 ◮ c io = 10 ◮ b = 15 tuples on average in storage block k = f (10M , 100M , 0 . 5 , 10 , 15) = 16 , 521 SIGMOD 2014 14/23 A. Dign¨ os, M. H. B¨ ohlen, J. Gamper

Overlap Interval Partition Join os 1 ohlen 1 Johann Gamper 2 Anton - PowerPoint PPT Presentation

Overlap Interval Partition Join os 1 ohlen 1 Johann Gamper 2 Anton Dign Michael H. B 1 University of Z urich, Switzerland 2 Free University of Bozen-Bolzano, Italy SIGMOD 2014 June 22-27, 2014 - Snowbird, Utah, USA Introduction

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

The Power of Two-Choices in Regulating Interval Partitions Ohad N. Feldheim (Stanford) Joint

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

Overlap between VaD VaD and AD: and AD: Overlap between an epidemiological perspective an

Corrected network measures Introduction Overlap weight Corrected Vladimir Batagelj overlap

Cloud Layer Overlap and the Influence of Vertical and Temporal Resolution of Radar Data Oliver

Interval Partitions and Polynomial Factorization Daniel Panario School of Mathematics and

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

Computing the Cube of an Computing the . . . Why Power of a Matrix Interval Matrix Is NP-Hard

On restrictions of balanced 2-interval graphs Philippe Gambette and Stphane Vialette Outline

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

I. FAQ S Q. What is partition? Partition is a proceeding in equity to determine the way in

Types of joins PAN DAS JOIN S F OR S P READS H EET US ERS John Miller Principal Data Scientist

P2P Traffic Localization by Alias Tracker for Tracker-based P2P applications (ATTP)

Game Graphics & Real-time Rendering CMPM 163, W2018 Prof. Angus Forbes (instructor)

Housekeeping Aims To enable participants to understand what coaching is To establish the

The hereditarily ordinal definable sets in models of determinacy John R. Steel University of

The hereditarily ordinal definable sets in models of determinacy John R. Steel University of

CS 220: Discrete Structures and their Applications Measuring algorithm running time using big O

Institutionalizing segregation conditional cash transfers and employment choices Public Economics

Overlap Interval Partition Join os 1 ohlen 1 Johann Gamper 2 Anton - PowerPoint PPT Presentation

Overlap Interval Partition Join os 1 ohlen 1 Johann Gamper 2 Anton Dign Michael H. B 1 University of Z urich, Switzerland 2 Free University of Bozen-Bolzano, Italy SIGMOD 2014 June 22-27, 2014 - Snowbird, Utah, USA Introduction

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

The Power of Two-Choices in Regulating Interval Partitions Ohad N. Feldheim (Stanford) Joint

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

Overlap between VaD VaD and AD: and AD: Overlap between an epidemiological perspective an

Corrected network measures Introduction Overlap weight Corrected Vladimir Batagelj overlap

Cloud Layer Overlap and the Influence of Vertical and Temporal Resolution of Radar Data Oliver

Interval Partitions and Polynomial Factorization Daniel Panario School of Mathematics and

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

Computing the Cube of an Computing the . . . Why Power of a Matrix Interval Matrix Is NP-Hard

On restrictions of balanced 2-interval graphs Philippe Gambette and Stphane Vialette Outline

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

I. FAQ S Q. What is partition? Partition is a proceeding in equity to determine the way in

Types of joins PAN DAS JOIN S F OR S P READS H EET US ERS John Miller Principal Data Scientist

P2P Traffic Localization by Alias Tracker for Tracker-based P2P applications (ATTP)

Game Graphics &amp; Real-time Rendering CMPM 163, W2018 Prof. Angus Forbes (instructor)

Housekeeping Aims To enable participants to understand what coaching is To establish the

The hereditarily ordinal definable sets in models of determinacy John R. Steel University of

The hereditarily ordinal definable sets in models of determinacy John R. Steel University of

CS 220: Discrete Structures and their Applications Measuring algorithm running time using big O

Institutionalizing segregation conditional cash transfers and employment choices Public Economics

Game Graphics & Real-time Rendering CMPM 163, W2018 Prof. Angus Forbes (instructor)