the glass half full
play

The Glass Half Full Using Programmable Hardware Accelerators in - PowerPoint PPT Presentation

The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt Istvn IMDEA Software Institute 1 IM IMDEA Soft ftware In Institute 16 Faculty in the areas of: Program Analysis and Verification


  1. The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt István IMDEA Software Institute 1

  2. IM IMDEA Soft ftware In Institute • 16 Faculty in the areas of: • Program Analysis and Verification • Languages and Compilers • Security and Privacy • Theoretical Computer Science • Distributed Systems and Databases • ~10 Post-docs, ~25 PhD Students, ~10 Interns • Located in UPM Montegancedo Campus, Madrid • We are hiring! https://software.imdea.org/

  3. Context: Analytical Databases ▪ OLAP – Online Analytical Processing ▪ Large datasets – up to TBs ▪ Ad-hoc querying to extract insight, recurring reporting – Possibly complex operations ▪ Read-mostly workloads, updates in batches ▪ OLTP – Online Transaction Processing ▪ Smaller datasets ▪ Queries known, relate to business actions ▪ Makes heavy use of indexes ▪ Reads and updates intermixed 3

  4. Databases were a 25 Billion $ market in 2018… Could we specialize machines to them? 4 https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/

  5. Database Computer – ’70s “The first goal is to design it with the capability of handling a very large on-line database of 10^10 bytes or beyond since special-purpose machines are not likely to be cost- effective for small databases.” ▪ Fully custom machine for databases ▪ Processors – special ISA microprocessors ▪ Memory – magnetic bubbles and CCDs ▪ Semiconductor technology and general purpose CPUs took over Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases . 5 IEEE Trans. Computers 28(6): 414-429 (1979)

  6. Gamma Machine – ’80s ▪ Based on VAX multi- processor system ▪ By the time the software and hardware were developed, CPUs have become much faster ▪ Couldn’t keep up with Moore’s law David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High 6 Performance Dataflow Database Machine. VLDB 1986: 228-237

  7. Data/Compute Gap Specialized CPU Scaling Commodity in Cloud Hardware Revival 7

  8. Renewed interest in Specialized Hardware CPUs FPGAs ASICs 8

  9. Re-programmable Specialized Hardware F ield P rogrammable G ate A rray (FPGA) ▪ Free choice of architecture Op 1 ▪ Fine-grained pipelining, communication, distributed memory Op 2 ▪ Tradeoff: all “code” occupies chip space Op 3 ▪ Evolving platform: larger chips, more heterogeneity 9

  10. Integration Options Data Data Accel. Accel. Data Accel. 1) On the side 3) Co-processor 2) In data-path 10

  11. In the Cloud Today ▪ Accelerator ▪ Amazon F1 ▪ In data path ▪ Microsoft Catapult ▪ Co-processor FPGA ▪ Intel Xeon+FPGA FPGA FPGA CPU CPU CPU Socket1 Socket2 Socket1 Intel Xeon+FPGA Gen.2 Intel Xeon+FPGA Gen.1 11

  12. The Glass Half Empty… 12

  13. The Glass Half Empty… ▪ 1) On the side acceleration introduces overhead Query execution time 120 100 80 Accel. 60 2x 40 Data 20 0 Software With Acceleration Compute Data Movement ▪ Many related work offers no real speedup if we factor in data movement, transformation, software overhead… 13

  14. The Glass Half Empty… ▪ 2) “All or nothing” behavior makes query planning difficult ▪ Example: fixed capacity hash table on FPGA ▪ Constant time access for reads and writes ▪ What happens if data doesn’t fit? ▪ Can’t always know the number of keys aprioi # 14

  15. The Glass Half Empty… ▪ 3) Analytical databases becoming more optimized / not much compute in core SQL ▪ X100 [CIDR05] showed that <10% of compute time spent on SQL operators +,-,*,SUM,AVG in analytical queries ▪ Columnar stores often memory bound (10s of GB/s) 15

  16. The Glass Half Empty… ▪ On the side acceleration introduces overhead ▪ “All or nothing” behavior makes query planning difficult ▪ Analytical databases becoming more optimized / not much compute in core SQL 16

  17. The Glass Half Full… ▪ On the side acceleration introduces overhead ✓ Reduce data movement bottlenecks 17

  18. Processing in data path: Smart Flash ▪ IBEX: Database storage engine with processing offload ▪ Filter and pre-aggregate for analytic workloads → Larger bandwidth, more IOPS (Samsung YourSQL, MIT BlueDBM) ▪ Opportunity to extend SSDs/Flash with complex offload SSD IBEX Database Server Samsung “smart” SSD IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan and G. Alonso, VLDB’14 18

  19. Processing in data path: Distributed Processing Caribou: Distributed Workers (Compute) storage with processing + Provisioning • Specialized HW nodes • + Scalability 10Gbps access • 25W power cons. Storage Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage . PVLDB 10(11), 2017. 19

  20. Smart Storage in Databases: Filter push-down SELECT … FROM customer WHERE age<35 AND purchases>2 AND address LIKE “%PO. Box 123%” ▪ Challenge: guarantee that filtering never slows down retrieval Intel Hyperscan library (Xeon E5-2680 v2) ▪ Algorithms can be re-imagined to become bandwidth-bound 2.8x instead of compute-bound ▪ Extend the state of the art: parameterization without re-programming [FCCM16] ▪ Many options: Regular expressions, comparisons, decompression, … 20 [FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16

  21. The Glass Half Full… ✓ Reduce data movement bottlenecks ▪ “All or nothing” behavior makes query planning difficult ✓ Hybrid processing 21

  22. IBEX’s Hybrid Group -by ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department Ibex with SW-only Group-By CPU Final Filtered Projection Selection Group-by Group Input table data s 22

  23. IBEX’s Hybrid Group -by ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department Ibex with HW-only Group-By CPU Final Filtered Projection Selection Group-by Group Input table data s 23

  24. IBEX’s Hybrid Group -by ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department ▪ If number of groups does not fit on FPGA? ▪ Send partial aggregates – finalize in SW ▪ Worst case: same as no acceleration ▪ Best- case: All in HW! Ibex with Hybrid Group-by CPU Ibex with HW-only Group-By CPU Final Final Partial Filtered Filtered Projection Projection Selection Selection Group-by Group-by Group-by Group Group Group Input table Input table data data s s s Challenge: How to split across accelerator and software? 24

  25. The Glass Half Full ✓ Reduce data movement bottlenecks ✓ Hybrid Processing ▪ Analytical databases becoming more optimized / not much compute in core SQL ✓ Emerging compute-intensive workloads 25

  26. The Rise of Machine Learning ▪ Databases adopting new ways of analyzing the data ▪ SAP Hana, Oracle, SQL Server, etc. ▪ Specialized hardware can help both with model building [Kara18] , inference [Owaida18] ▪ Benefits for “classical” algorithms as well [Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018) 26 [Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-300

  27. doppioDB: a hybrid database engine No data copy, transformation, DRAM (DB Tables) partitioning, etc. FPGA Co-processor CPU Hardware Hardware Software Software Operator Operator operator operator Database Hardware Hardware Engine Operator Operator (MonetDB) ▪ Goal: extend the capabilities of analytical databases ▪ FPGA works on the same data as software (cache-coherent access) ▪ Can combine SW and HW operators inside the same query ▪ Challenge: ensure high utilization of FPGA, use in many queries 27

  28. K-means – Algorithm ◼ Goal: partition unlabeled data into several clusters, where the number of clusters is the “k” in the k -means. ◼ Two steps in each iteration: ◼ Assignment : assign data points to closet centroid according to distance metric ◼ Centroid update : the centroids are re- calculated by averaging all the data points within each cluster ◼ Long process if the data set and number of iterations are large 28

  29. Design – Execution Walk-Through 1 4 Receives K-Means parameters Accumulates data points per cluster and counts how many data points are assigned to 2 Fetch the initial centroids and each cluster the data 5 Collect partial results from each pipeline 3 Calculates the distance between 6 Division for updating new centroid a data point and all the centroids and assign it to closest centroid 7 Writes back the final results 3 4 1 DRAM 2 (DB Tables) 7 6 5 29 Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases . FPL 2018

  30. Uses of Parallelism ▪ K-Means algorithm ▪ FPGA outperforms several cores of the CPU Need to determine K ▪ Can use parallelism in two ways – cover more queries (Elbow method) K is known / Centroids known 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend