large scale data engineering
play

Large-Scale Data Engineering Modern SQL-on-Hadoop Systems - PowerPoint PPT Presentation

Large-Scale Data Engineering Modern SQL-on-Hadoop Systems event.cwi.nl/lsde2015 Analytical Database Systems Parallel (MPP): Teradata Paraccel Pivotal Vertica Redshift Oracle (IMM) Netteza DB2-BLU InfoBright SQLserver


  1. Large-Scale Data Engineering Modern SQL-on-Hadoop Systems event.cwi.nl/lsde2015

  2. Analytical Database Systems Parallel (MPP): Teradata Paraccel Pivotal Vertica Redshift Oracle (IMM) Netteza DB2-BLU InfoBright SQLserver Vectorwise (columnstore) open source: ? MySQL LucidDB MonetDB event.cwi.nl/lsde2015

  3. SQL-on-Hadoop Systems Open Source: Commercial: • Hive (HortonWorks) • HAWQ (Pivotal) • Impala (Cloudera) • Vortex (Actian) • Drill (MapR) • Vertica Hadoop (HP) • Presto (Facebook) • BigQuery (IBM) • DataBricks • Splice Machine • CitusData • InfiniDB Hadoop event.cwi.nl/lsde2015

  4. “SQL on Hadoop” Systems High “outside Hadoop” “wrapped legacy” SQL Maturity “from (performance+features) scratch” Hadoop Integration Native Low event.cwi.nl/lsde2015

  5. Analytical DB engines for Hadoop storage – columnar storage + compression –table partitioning / distribution –exploiting correlated data query-processor  CPU-efficient query engine (vectorized or JIT codegen)  many-core ready system  rich SQL (+authorization+..)  batch update infrastructure  scaling with multiple nodes  MetaStore & file formats  YARN & elasticity event.cwi.nl/lsde2015

  6. Columnar Storage Query on data and store row-store column-store Date Store Product Customer Price Date Store Product Customer Price Inserting a new record + easy to add/modify a record + only need to read in relevant data - might read in unnecessary data - tuple writes require multiple accesses => suitable for read-mostly, read-intensive, large data repositories event.cwi.nl/lsde2015

  7. Analytical DB engines for Hadoop storage –columnar storage + compression –table partitioning / distribution –exploiting correlated data query-processor  CPU-efficient query engine (vectorized or JIT codegen)  many-core ready system  rich SQL (+authorization+..)  batch update infrastructure  scaling with multiple nodes  MetaStore & file formats  YARN & elasticity event.cwi.nl/lsde2015

  8. Columnar Compression • Trades I/O for CPU – A winning proposition currently – Even trading RAM bandwidth for CPU wins • 64 core machines starved for RAM bandwidth • Additional column-store synergy: – Column store: data of the same distribution close together Better compression rates • Generic compression (gzip) vs Domain-aware compression • – Synergy with vectorized processing (see later) compress/decompress/execution, SIMD – Can use extra space to store multiple copies of data in different sort orders (see later) event.cwi.nl/lsde2015

  9. Run-length Encoding Quarter Price Product ID Price Quarter Product ID (value, start_pos, run_length) (value, start_pos, run_length) Q1 1 5 5 (1, 1, 5) (Q1, 1, 300) Q1 1 7 7 (2, 6, 2) Q1 1 2 (Q2, 301, 350) 2 Q1 1 9 … 9 (Q3, 651, 500) Q1 1 6 (1, 301, 3) 6 Q1 2 8 (Q4, 1151, 600) (2, 304, 1) 8 Q1 2 5 5 … … … … … Q2 1 3 3 Q2 1 8 8 Q2 1 1 1 Q2 2 4 4 … … … … event.cwi.nl/lsde2015

  10. “Integrating Compression and Execution in Column-Oriented Database Systems” Abadi et. al, SIGMOD ’06 Bitmap Encoding • For each unique ID: 2 Product ID ID: 1 ID: 3 … value, v, in column c, create bit-vector b 1 1 0 0 0 1 1 0 0 0 – b[i] = 1 if c[i] = v 1 1 0 0 0 • Good for columns 1 1 0 0 0 with few unique 1 1 0 0 0 values 2 0 1 0 0 • Each bit-vector can 2 0 1 0 0 be further … … … … … compressed if sparse 1 1 0 0 0 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 … … … … … event.cwi.nl/lsde2015

  11. “Integrating Compression and Execution in Column-Oriented Database Systems” Abadi et. al, SIGMOD ’06 Dictionary Encoding Quarter • For each unique Quarter value create Quarter 0 1 dictionary entry Q1 3 24 0 Q2 • Dictionary can 2 128 0 be per-block or Q4 0 122 0 per-column Q1 1 3 Q3 + 2 • Column-stores OR 2 have the Q1 + advantage that Q1 Dictionary dictionary Dictionary Q1 24: Q1, Q2, Q4, Q1 entries may Q2 0: Q1 … encode multiple Q4 1: Q2 122: Q2, Q4, Q3, Q3 values at once Q3 … 2: Q3 Q3 3: Q4 128: Q3, Q1, Q1, Q1 … event.cwi.nl/lsde2015

  12. Frame Of Reference Encoding Price Price • Encodes values as b bit Frame: 50 45 offset from chosen frame -5 54 of reference 4 48 -2 • Special escape code (e.g. 55 4 bits per 5 value all bits set to 1) indicates a 51 1 53 difference larger than can 3 40 be stored in b bits ∞ 50 40 – After escape code, 49 Exceptions (there 0 original (uncompressed) are better ways to 62 deal with -1 value is written exceptions) 52 ∞ 50 “Compressing Relations and Indexes ” 62 … Goldstein, Ramakrishnan, Shaft, ICDE’98 2 0 … event.cwi.nl/lsde2015

  13. Differential Encoding Time Time • Encodes values as b bit offset from previous value 5:00 5:00 • Special escape code (just like 5:02 2 frame of reference encoding) indicates a difference larger than 5:03 1 can be stored in b bits 2 bits per 5:03 0 value – After escape code, original 5:04 1 (uncompressed) value is written 5:06 2 • Performs well on columns containing increasing/decreasing 5:07 1 sequences 5:08 1 – inverted lists 5:10 2 – timestamps ∞ Exceptions (there 5:15 – object IDs are better ways to deal with 5:16 5:15 – sorted / clustered columns exceptions) 5:16 1 … “Improved Word-Aligned Binary 0 Compression for Text Indexing” Ahn, Moffat, TKDE’06 event.cwi.nl/lsde2015

  14. “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Heavy-Weight Compression Schemes • Modern disks (SSDs) can achieve > 1GB/s • 1/3 CPU for decompression  3GB/s needed  Lightweight compression schemes are better  Even better: operate directly on compressed data event.cwi.nl/lsde2015

  15. “Integrating Compression and Execution in Column-Oriented Database Systems” Abadi et. al, SIGMOD ’06 Operating Directly on Compressed Data Examples SUM i (rle-compressed column[ i ])  SUM g (count[ g ] * value[ g ]) • (country == “Asia”)  countryCode == 6 • strcmp SIMD Benefits: • I/O - CPU tradeoff is no longer a tradeoff (CPU also gets improved) • Reduces memory–CPU bandwidth requirements • Opens up possibility of operating on multiple records at once event.cwi.nl/lsde2015

  16. Analytical DB engines for Hadoop storage –columnar storage + compression – table partitioning / distribution –exploiting correlated data query-processor  CPU-efficient query engine (vectorized or JIT codegen)  many-core ready system  rich SQL (+authorization+..)  batch update infrastructure  scaling with multiple nodes  MetaStore & file formats  YARN & elasticity event.cwi.nl/lsde2015

  17. Table Partitioning and Distribution • data is spread based on a Key partition by range Q1 – Functions: Hash, Range, List Q2 • “distribution” Q3 – Goal: parallelism Q4 • give each compute node a piece of the data distribute by hash • each query has work on every piece (keep everyone busy) • “partitioning” – Goal: data lifecycle management • Data warehouse e.g. keeps last six months • Every night: load one new day, drop the oldest partition – Goal: improve access patterm • when querying for May, drop Q1,Q3,Q4 (“partition pruning”) Which kind of function would you use for which method? event.cwi.nl/lsde2015

  18. Vertica Multiple Orders (Projections) • Precomputed Projections reduce join effort • Projections are ordered (e.g. on “date”, or on “cust”) • Ordered data allows “selection pushdown” – Scan less data • Ordered Data enhances compression – Run-length encoding – Frame of Reference event.cwi.nl/lsde2015

  19. Data Placement in Hadoop • Each node writes the partitions it owns partition by range Q1 – Where does the data end up, really? Q2 • HDFS default block placement strategy: Q3 – Node that initiates writes gets first copy Q4 – 2nd copy on the same rack distribute by hash – 3rd copy on a different rack • Rows from the same record should on the same node – Not entirely trivial in column stores • Column partitions should be co-located – Simple solution: • Put all columns together in one file (RCFILE, ORCFILE, Parquet) – Complex solution: • Replace the default HDFS block placement strategy by a custom one event.cwi.nl/lsde2015

  20. Example: Parquet Format event.cwi.nl/lsde2015

  21. Popular File Formats in Hadoop • Good old CSV partition by range Q1 – Textual, easy to parse (but slow), better compress it! Q2 • Sequence Files Q3 – Binary data, faster to process Q4 • RCfile distribute by hash – Hive first attempt at column-store • ORCfile – Columnar compression, MinMax • Parquet – Proposed by Twitter and Cloudera Impala – Like ORCfile, no MinMax event.cwi.nl/lsde2015

  22. Example: Parquet Format event.cwi.nl/lsde2015

  23. HCatalog (“Hive MetaStore”) De-facto Metadata Standard on Hadoop • Where are the tables? Wat do they contain? How are they Partitioned? • Can I read from them? Can I write to them? SQL-on-Hadoop challenges: • Reading-writing many file formats • Opening up the own datastore to foreign tools that read from it HCatalog makes UDFs less important! event.cwi.nl/lsde2015

Recommend


More recommend