the datapath system a data centric analytic processing
play

The DataPath System: A Data-Centric Analytic Processing Engine for - PowerPoint PPT Presentation

The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses Subi Arumugam 1 ,Alin Dobra 1 ,Christopher M. Jermaine 2 , Niketan Pansare 2 ,Luis Perez 2 1 University of Florida, 2 Rice University June 9, 2010


  1. The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses Subi Arumugam 1 ,Alin Dobra 1 ,Christopher M. Jermaine 2 , Niketan Pansare 2 ,Luis Perez 2 1 University of Florida, 2 Rice University June 9, 2010

  2. Motivation • Storage is cheap: 1TB disk is 80-100$ • Disks have high throughput • 100$ 1TB disk can do 150MB/s reads/writes • 4,000$ 1TB SSD (OCZ p88) reads at 1.4GB/s • Processors are fast: 6GFLOPs/Core, 24GFrops for 100$ • TPC-H Q1 ( at 1TB scale factor ) • 8 Aggregates over 95-97% of lineitem • need to read about 160-700GB: 2 P88 scan in 60-250s • need to perform 30FLOPs*6 · 10 9 =180GFLOPS; 8s • Q1 should be I/O bound; should do 8 in parallel

  3. Motivation • Storage is cheap: 1TB disk is 80-100$ • Disks have high throughput • 100$ 1TB disk can do 150MB/s reads/writes • 4,000$ 1TB SSD (OCZ p88) reads at 1.4GB/s • Processors are fast: 6GFLOPs/Core, 24GFrops for 100$ • TPC-H Q1 ( at 1TB scale factor ) • 8 Aggregates over 95-97% of lineitem • need to read about 160-700GB: 2 P88 scan in 60-250s • need to perform 30FLOPs*6 · 10 9 =180GFLOPS; 8s • Q1 should be I/O bound; should do 8 in parallel • Best non-clustered performer: 142s for 1.7M$ • 64 cores, 512GB memory, 576 disks

  4. Large Scale Analytics Goals • Deal with analytical queries on large data (1-10TB) • Get closer to theoretical CPU performance • gap stands at 100-1000 for most databases • Sub 100,000$ system with minute response time (1TB) • stay I/O bound even with fast disks and multiple queries • No or little tuning: no indexing, no tunable partitioning

  5. Large Scale Analytics Goals • Deal with analytical queries on large data (1-10TB) • Get closer to theoretical CPU performance • gap stands at 100-1000 for most databases • Sub 100,000$ system with minute response time (1TB) • stay I/O bound even with fast disks and multiple queries • No or little tuning: no indexing, no tunable partitioning DataPath • System designed from ground up to meet these goals.

  6. Benchmark System Old System (2008) – 60,000 $ • 8 processors, 32 cores • 128GB DDR2 RAM (16 bays) • 2 Averatec RAID controlless, 4 12-disk enclosures • 47 Velociraptor Disks, 8 Baracuda disks • Maximum aggregate throughput 2.2GB/s New System (2010) – 20,000 $ • 4 processors, 48 cores • 128GB DDR3 memory • 2 OCZ Z-drive 1TB PCI SSD disks • Maximum aggregate throughput 2.8GB/s

  7. Data-centric Computation

  8. DataPath Execution Model • Tuple-oriented execution model • Tuples shared by queries in the system • Chunks of tuples pushed into waypoints for processing • Waypoints implement operations for multiple queries • Tuple processing loops at full CPU speed for (int i = 0; true; i++) { if (tuple[i].BelongsTo (Q1)) Q1.Process (tuple[i]); if (tuple[i].BelongsTo (Q2)) Q2.Process (tuple[i]); if (tuple[i].BelongsTo (Q3)) Q3.Process (tuple[i]); }

  9. Query Execution – Example Q 1 : SELECT SUM (l quantity) FROM lineitem WHERE l shipdate > ’1-1-06’; out � Q 1 : SUM(l_quantity) � Q 1 : l_shipdate > ‘1-1-06’ Q 1 lineitem orders

  10. Query Execution – Example Q 1 : SELECT SUM (l quantity) FROM lineitem WHERE l shipdate > ’1-1-06’; out out Q 2 : SELECT SUM (l extendedprice) � Q 2 : SUM(l_extendedprice) FROM lineitem, order WHERE � Q 1 : SUM(l_quantity) l shipmode <> ’rail’ Q 2 : l_orderkey = o_orderkey Q 1 AND o orderdate < ’1-1-08’ Q 2 � Q 2 : o_orderdate < ‘1-1-08’ AND l orderkey = o orderkey; � Q 1 : l_shipdate > ‘1-1-06’ Q 2 : l_shipmode <> ‘rail’ Q 2 Q 1 , Q 2 orders lineitem

  11. Query Execution – Example Q 1 : SELECT SUM (l quantity) FROM lineitem WHERE l shipdate > ’1-1-06’; out out Q 2 : SELECT SUM (l extendedprice) � Q 2 : SUM(l_extendedprice) FROM lineitem, order WHERE � Q 1 : SUM(l_quantity) l shipmode <> ’rail’ Q 2 : l_orderkey = o_orderkey Q 1 AND o orderdate < ’1-1-08’ Q 2 � Q 2 : o_orderdate < ‘1-1-08’ AND l orderkey = o orderkey; � Q 1 : l_shipdate > ‘1-1-06’ Q 2 : l_shipmode <> ‘rail’ Q 2 Q 1 , Q 2 Q 3 : SELECT AVG (l discount) FROM lineitem, orders WHERE lineitem orders o custkey = 1234 AND l orderkey = o orderkey;

  12. Tuple Processing Loop Usual problems: • branch mis-prediction • instruction cache misses • per-tuple overhead DataPath solution – Use a C++ meta-compiler • generate new tuple processing loops for each waypoint when new queries added • code is human-readable (has even comments) • compiled as a library with -O3 -msse4.1 • everything is hardcoded • compiler finds sharing, branch-misprediction, SSE

  13. File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 3 2 1 5 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

  14. File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 2 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 3 1 5 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

  15. File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 2 3 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 1 5 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

  16. File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 2 5 3 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 1 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

  17. File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 1 2 5 3 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 4 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

  18. File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 1 2 5 3 4 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 6 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

  19. File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 1 2 5 6 3 4 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 10 9 7 8 12 15 11 14 13 20 18 16 17 19 Finished:

  20. File Scanner Staging Area Chunk 1 Chunk 2 Chunk 3 Chunk 4 1 2 5 6 3 4 8 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 10 9 7 12 15 11 14 13 20 18 16 17 19 Finished: Chunk 1

  21. File Scanner Staging Area Chunk 5 Chunk 2 Chunk 3 Chunk 4 5 6 7 8 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 10 9 12 15 11 14 13 20 18 16 17 19 Finished: Chunk 1

  22. File Scanner Staging Area Chunk 5 Chunk 2 Chunk 3 Chunk 4 5 6 10 7 8 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 9 12 15 11 14 13 20 18 16 17 19 Finished: Chunk 1

  23. File Scanner Staging Area Chunk 5 Chunk 2 Chunk 3 Chunk 4 5 6 9 10 7 8 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 12 15 11 14 13 20 18 16 17 19 Finished: Chunk 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend