optimally l leveraging d densi sity a and l locality for
play

Optimally L Leveraging D Densi sity a and L Locality for E - PowerPoint PPT Presentation

Optimally L Leveraging D Densi sity a and L Locality for E Exploratory B Browsi sing a and S Sampling Albert Kim 1* , Liqi Xu 2* , Tarique Siddiqui 1 , Silu Huang 2 , Samuel Madden 1 , Aditya Parameswaran 2 1 MIT 2 University of Illinois


  1. Optimally L Leveraging D Densi sity a and L Locality for E Exploratory B Browsi sing a and S Sampling Albert Kim 1* , Liqi Xu 2* , Tarique Siddiqui 1 , Silu Huang 2 , Samuel Madden 1 , Aditya Parameswaran 2 1 MIT 2 University of Illinois (UIUC) 1

  2. Mo Motivation Subset of voters who reside in Paris and voted for a specific candidate Some of genes that get positively induced after a clinical trial Example sessions on a given website on an IPhone X Summarization Browsing

  3. Mo Motivation “Although big data demands aggregations, analysts wanted to see individual records to spotcheck their results, and to get a sense of what sat in a bucket.” [1] Any-k Problem: How to quickly return a small subset of records that satisfy arbitrary user- specified predicates? [1] Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data. Moritz et al.

  4. Existing Exi ng Appr Approach: h: Bi Bitma map Index q Effective for traditional OLAP-style workloads q One bitmap per each attribute value q Index at the record level Bitmaps for ANYK probelm … Origin Mon Mon = 1 Mon = 2 Mon = 3 1 0 0 … ORD 1 q Inefficient for any-k problem 0 1 0 … ORD 2 0 1 0 … CMI 2 0 0 1 … CMI 3 1 0 0 … ORD 1 0 1 0 … ORD 2 q High storage cost 1 0 0 … CMI 1 1 0 0 … ORD 1 Bitmap Indices Airline Dataset

  5. Our Ap Ou Approach: De Densit ity M Map ap I Index q Index at the block level q Read/Write in the unit of sector (e.g,. 4KB) q Consume less memory q Store the frequency of set bits per block … Origin Mon Mon = 1 Mon = 2 Mon = 3 … ORD 1 1 0 0 … ORD 2 0 1 0 Mon = 1 Mon = 2 Mon = 3 … CMI 2 0 1 0 0.5 0.5 0.0 … CMI 3 0 0 1 0.0 0.5 0.5 … ORD 1 1 0 0 0.5 0.5 0.0 # of tuples … ORD 2 0 1 0 1.0 0.0 0.0 per block: 2 … CMI 1 1 0 0 … ORD 1 1 0 0 Density Maps Bitmap Indices Airline Dataset

  6. Ou Our Ap Approach: De Densit ity-Op Optimal Observation #1 [Density: Denser is better] SELECT ANY-K(*) FROM T WHERE Month = 1 AND Origin = “ORD” Orig = ”ORD”: Month = 1: (Sorted) (Sorted) Month = 1 AND Origin = “ORD”:

  7. Ou Our Ap Approach: Lo Locality-Op Optimal Observation #2 [Locality: Closer is better] SELECT ANY-K(*) FROM T WHERE Month = 1 AND Origin = “ORD” Orig = ”ORD”: Month = 1: Month = 1 AND Origin = “ORD”: Density-Optimal vs Locality Optimal ?

  8. Our Ap Ou Approach: I/ I/O Optim timal al # of samples q Leverages both density and locality Blocks q Uses dynamic programming q High Computation Cost Hybrid q Run both Density-Optimal and Locality- Optimal q Choose the set of blocks with the smaller estimated I/O Cost I/O Cost Model on HDDs

  9. Expe Experimental Setting ng q Airline Dataset § 123 million rows and 11 attributes with a total size of 11 GB q Baselines: § Bitmap-Scan § Lossy-Bitmap § EWAH q Queries

  10. Expe Experimental Resul sults q Hybrid: 4x faster q I/O: 90% of the runtime CPU I/O Query runtimes for airline workload on a HDD.

  11. Expe Experimental Resul sults q Uncompressed bitmaps: 47x more memory q EWAH: 3x more memory q Lossy: slower query performance due to high false positives Memory consumption of index structures

  12. Mo More in the paper! r! ü Density Maps ü ANY-K algorithms q Aggregation Estimation q Grouping + Join q More experimental results Needletail Architecture Technical Report: http://data-people.cs.illinois.edu/needletail.pdf

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend