an in depth analysis of data aggregation cost factors in
play

An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar - PowerPoint PPT Presentation

An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database Stephan Mller, Hasso Plattner Enterprise Platform and Integration Concepts Hasso Plattner Institute, Potsdam (Germany) DOLAP 2012 November 2nd Agenda


  1. An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database Stephan Müller, Hasso Plattner Enterprise Platform and Integration Concepts Hasso Plattner Institute, Potsdam (Germany) DOLAP 2012 – November 2nd

  2. Agenda ■ Data aggregation ■ Aggregations in Hyrise ■ Cost factors and benchmarks ■ Summary and future work DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  3. Motivation ■ Aggregation □ Abstractions for conceptualizing the real world □ Resource-intensive operation in OLAP workloads ■ Columnar in-memory databases □ Optimized for set processing □ Enables reunification of OLTP and OLAP [1] [1] H. Plattner. A common database approach for OLTP and OLAP using an in-memory column database. In SIGMOD, 2009. DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  4. Data Aggregation ■ SELECT product, SUM (amount) AS revenue FROM sales WHERE year=2011 GROUP BY product ■ Two phases □ Grouping (hash-based, sorting, nested loops) □ Calculation DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  5. Data Aggregation in Hyrise ■ Columnar Storage ■ Dictionary compression id fruit amount id fruit amount 1 apples 10 1 1 1 "fruit" dictionary 2 bananas 10 2 2 1 1 apples 3 cherries 20 3 3 2 2 bananas 4 apples 10 4 1 1 3 cherries 5 cherries 30 5 3 1 4 grapes 6 melons 50 6 5 3 5 melons 7 bananas 20 7 2 4 8 grapes 40 8 4 1 DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  6. Data Aggregation – Grouping table id fruit amount 1 1 1 position on "fruit" dictionary 2 2 1 1 0, 3 1 apples 3 3 2 2 1, 6 2 bananas 4 1 1 3 2, 4 3 cherries 5 3 1 4 7 4 grapes 6 5 3 5 5 5 melons 7 2 4 8 4 1 DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  7. Data Aggregation - Calculation table "amount" dictionary id fruit amount 1 1 1 1 10 position on "fruit" 2 2 1 2 20 1 0, 3 3 3 2 3 30 2 1, 6 4 1 1 4 40 3 2, 4 5 3 1 5 50 4 7 6 5 3 5 5 7 2 4 8 4 1 fruit sum(amount) apples 20 bananas 50 cherries 30 grapes 10 melons 40 DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  8. Cost Modeling in IMDBs ■ Cost model based on cache misses [2] □ Basic patterns for each cache level □ Combine patterns to model complex operations [2] Manegold, S. et al. 2002. Generic database cost models for hierarchical memory systems. VLDB . (2002). DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  9. Cost Modeling of the Aggregation DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  10. Cost Factors ■ Data-dependent □ Dataset size □ Distinct grouping values □ Distinct grouping values distribution □ Sorting of grouping values ■ Data-independent □ Aggregation function □ Hash implementation DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  11. Hash Implementations ■ Ordering □ Sorted tree ( std::map) □ Unsorted hash table ( std::unordered_map) ■ Pre-allocation strategies □ Naïve implementation (automatic growth) □ Distinct value setup (scaffold of hash map) □ Histogram-based setup (scaffold and position list) DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  12. Data Set Size ■ Number of bytes processed during query □ Input table data □ Intermediate result data □ Final result data ■ Influencing factors □ Relation size □ Relational expression □ Grouping attributes □ Aggregation attributes □ Aggregation functions DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  13. Distinct Values – L1 Cache Misses DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  14. Distinct Values – CPU Cycles DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  15. Distinct Values Distribution ● Uniform − L1 cache misses ● 8e+06 Exponential − L1 cache misses Uniform − L2 cache misses Exponential − L2 cache misses ● 6e+06 Number of cache misses ● 4e+06 2e+06 ● ● ● 0e+00 1 10 100 1000 10000 100000 Number of distinct values DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  16. Impact of Sorted Attributes – L1 Misses DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  17. Hash Implementations – L1 Misses DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  18. Hash Implementations – CPU Cycles DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  19. Summary ■ High impact of aggregation operation in a mixed OLTP and OLAP workload ■ Data characteristics influence aggregation performance ■ Hash implementation choice matters ■ Pre-allocation can reduce cache misses ■ Future work □ Multithreaded aggregations □ Extension of existing cost models DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

  20. 20 Thank you! stephan.mueller@hpi.uni-potsdam.de http://epic.hpi.uni-potsdam.de DOLAP 2012 | An In-Depth Analysis of Data Aggregation Cost Factors in a Columnar In-Memory Database

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend