RUBIK: Efficient Threshold Queries on Massive Time Series Eleni - - PowerPoint PPT Presentation

rubik efficient threshold queries on massive time series
SMART_READER_LITE
LIVE PREVIEW

RUBIK: Efficient Threshold Queries on Massive Time Series Eleni - - PowerPoint PPT Presentation

RUBIK: Efficient Threshold Queries on Massive Time Series Eleni Tzirita Zacharatou Thomas Heinis* Farhan Tauheed Anastasia Ailamaki cole Polytechnique *Imperial College London Oracle Labs, Zurich Fdrale de Lausanne


slide-1
SLIDE 1

Thomas Heinis* Eleni Tzirita Zacharatou‡ Farhan Tauheed§ Anastasia Ailamaki‡

RUBIK: Efficient Threshold Queries on Massive Time Series

§Oracle Labs, Zurich

*Imperial College London

‡École Polytechnique

Fédérale de Lausanne

slide-2
SLIDE 2

2

voltage voltage time time

Scaling up Brain Simulations

time

Temporal Resolution Model Resolution

3D Neuron Model

Time Series Analysis: key to neuroscientific discovery

slide-3
SLIDE 3
  • Exploration
  • Hypothesis Testing

3

Neuron firing: which and when

  • Identify subsets of interest:

time series where voltage > -40 and time step ∈ [300,400] Threshold Query

time

Threshold queries fuel efficient data analysis

voltage

slide-4
SLIDE 4

4

Time Series Correlation…

time series id voltage time step

…enables efficient time series-specific compression

Trends Correlation Opportunity to scale with Increased simulation duration Across time increase in temporal resolution Increasingly detailed models Across time series increase in spatial resolution

slide-5
SLIDE 5

5

Time Series Data Discretization

1 1 1 1 1

Timestep Bin

Binning: Partition the values into bins Range encoding: Set bin to ‘1’ if condition satisfied, ‘0’ otherwise

≥ 5 ≥ 10 ≥ 15 ≥ 20 17 9 5 2

Timestep Value

3: [15-20) 2: [10-15) 1: [5-10) 0: [0-5)

Precomputed answers stored as a bitmap Increased similarity across time series

slide-6
SLIDE 6

6

1 1 1 1 1

Timestep Bin

Bitmap Compression Today

  • Run-Length-Encoding compresses each bitvector

§ Word-Aligned Hybrid Code (WAH) [SSDBM ’02] 4×’0’ 2×’0’, 1×’1’, 1ב0’ 2×’0’, 1×’1’, 1ב0’ 3×’1’, 1ב0’

  • Compression prevents direct access

§ Timesteps don’t correspond to bit positions

slide-7
SLIDE 7

7

1 1 1 1 1

Timestep Bin

Bitmap Compression Today

  • Run-Length-Encoding compresses each bitvector

§ Word-Aligned Hybrid Code (WAH) [SSDBM ’02] 4×’0’ 2×’0’, 1×’1’, 1ב0’ 2×’0’, 1×’1’, 1ב0’ 3×’1’, 1ב0’

  • Compression prevents direct access

§ Timesteps don’t correspond to bit positions

Values filtered independently of timesteps Similarities across time series are not exploited

slide-8
SLIDE 8

8

Our Approach: RUBIK

Bitmap index creation

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Bitmap stacking Quadtree-based bitmap decomposition

Access specific timesteps Exploit similarities

slide-9
SLIDE 9

9

Start

Mix

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Timestep Time series Bins

First Split

All 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

All 1 All 1 Mix

Second Split

1 1 1 1 1 1 1 1 1 1 1 1 1

All 0 All 1 Mix All 0

Quadtree-based 3D Bitmap Decomposition

slide-10
SLIDE 10

10

Start

Mix

First Split

All 0 All 1 All 1 Mix

Second Split

1 1

All 0 All 1 Mix All 0

Quadtree-based 3D Bitmap Decomposition

Apply WAH

slide-11
SLIDE 11

11

Query Execution

Mix All 0 All 1 All 1 Mix All 0 All 1 Mix All 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Query: voltage > 11 in time steps 1 and 2

Timestep Bin

Transformation into a 2D bitmap problem One tree traversal to retrieve multiple bitmaps

slide-12
SLIDE 12

12

Stacking Time Series Bitmaps

Goal: Maximize size and number of common squares

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Mix Mix All 1 All 1

cluster 1 cluster 2

Mix All 0 All 1 All 1

bitmap 1 bitmap 2 bitmap 3

⇒ Maximize compression across time series

slide-13
SLIDE 13

13

In-memory indexes: FastBit (WAH-compressed bitmap index) and RUBIK Configuration: 128 bins, Hardware: AMD Opteron CPU @ 2.7GHz, 32GB RAM Time series data: 1000 time steps, 1.2GB – 4.8GB

5 10 15 20 25 312K 624K 1.25M Total execution time (s) # time series FastBit RUBIK

300 600 900 1200 1500 312K 624K 1.25M

Index size (MB) # time series FastBit RUBIK #queries: 60

Scaling with Data Volume

RUBIK index size scales sublinearly 9X to 23X speedup

slide-14
SLIDE 14

Datasets: 500K – 2M time series, 1024 time steps, 2.1GB – 8.4GB

2 4 6 8 10 small medium (2x) large (4x)

size (GB)

dataset

Index Size Dataset Size

14

~80% of the time is spent on filtering

RUBIK Sensitivity Analysis

6.7X 5.8X 7.5X Hardware: AMD Opteron, 2.7GHz, 32GB RAM

Increased similarity ⇒ Increased compression

Benchmark: 60 threshold queries, random thresholds, up to 15% selectivity Configuration: 128 bins

2 4 6 8 small medium (2X) large (4X)

query execution time (s)

dataset

2D range query Filtering

slide-15
SLIDE 15

15

Threshold Queries on Time Series

Thank you!

  • Subsets of interest in neuroscience simulations
  • RUBIK outperforms state-of-the-art by using:

– Quadtree decomposition ⇒ Transformation into a 2D bitmap problem – Time series clustering ⇒ Similarities across time series are exploited

  • RUBIK scales particularly well with time series from

increasingly detailed simulation models