dat abases in
play

DAT ABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3 rd , - PowerPoint PPT Presentation

DAT ABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3 rd , 2014 OLTP vs. OLAP databases . Source : https://www.flickr.com/photos/adesigna/3237575990 On-line Transaction Processing Fast operations that ingest new data and then update


  1. DAT ABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3 rd , 2014

  2. OLTP vs. OLAP databases . Source : https://www.flickr.com/photos/adesigna/3237575990

  3. On-line Transaction Processing • Fast operations that ingest new data and then update state using ACID transactions. • Only access a small amount of data. • Volume : 1k to 1m txn/sec • Latency : >1-50 ms • Database Size : 100s GB to 10s TB 3

  4. Example • -line game in the OLTP database. Pre-computed model Click OL TP DBMS Game Stream decides the next level Application the player is shown. Framework Game Updates 4

  5. Example • -line game in the OLTP database. Click OL TP DBMS Game Stream Application Framework Real-time Game Monitoring Updates 5

  6. Database Warehouses • Complete history of OLTP databases. • Complex queries that analyze large segments of fact tables and combine them with dimension tables. • Volume : A couple queries per second • Latency : 1-60 seconds • Database Size : 100s TB to 10s PB 6

  7. Example • Compute model used to guide OLTP DBMS decisions from historical data. Click OL TP DBMS OLAP DBMS Game ETL Stream Application Framework Game New Updates Model 7

  8. OLTP vs. OLAP • Storage Format: – OL TP → R ow-oriented – OLAP → Column-oriented • Primary Database Location: – OL TP → In-Memory – OLAP → Disks • Workloads: – OL TP → Write-Heavy – OLAP → R ead-Only 8

  9. Things to consider with databases in the cloud . Source : https://www.flickr.com/photos/arvidnn/15285491335

  10. Good Things • Better Resource Utilization • Elastic Scaling • Database-as-a-Service Offerings 10

  11. Better Resource Utilization • Combine multiple silos onto overprovisioned resources. • Public platform providers achieve better economies of scale. • Database machines are (mostly) dead. • Optimal multi-tenant placement is a difficult problem. 11

  12. Elastic Scaling • Automatically provision new resources on the fly as needed. • Scaling up vs. Scaling out . • Difficult for OLTP DBMS to continue processing transactions while data migrates. 12

  13. OLTP Scale-out Example Elapsed Time TPC-C Benchmark on H-Store (Fall 2014) E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing R.Taft, E.Mansour, M.Serafini, J.Duggan, A.J. Elmore, A.Aboulnaga, A.Pavlo, M.Stonebraker 13 Scaling from 3 to 4 nodes Proceedings of the VLDB Endowment, vol. 8, iss. 3, pages. 245 256, November 2014.

  14. Database-as-a-Service • Cloud provider manages physical configuration of a DBMS. • Ideal for applications that are co-located in • Combine private data with curated databases (i.e., data marts) 14

  15. Bad Things • I/O Virtualization • File system Replication • Security + Privacy Concerns • Performance Variance 15

  16. I/O Virtualization • Distributed file system stores data transparently across multiple nodes. • • This causes a DBMS pull data to query push query to data 16

  17. OLAP I/O Virtualization SELECT YEAR( o_date ) AS o_year, AVG( o_amount ) FROM orders GROUP BY o_year ORDER BY o_year ASC Terabytes! Distributed Filesystem OLAP DBMS 17

  18. OLAP I/O Virtualization SELECT YEAR( o_date ) AS o_year, AVG( o_amount ) FROM orders GROUP BY o_year ORDER BY o_year ASC Distributed Filesystem OLAP DBMS Bytes! 18

  19. File System Replication • The DBMS should not rely on file system replication for durability. • OLTP systems maintain replicas in-memory. • OLAP systems can store copies of tables in different ways on replica nodes. 19

  20. OLAP Replication Sort Order Replica #1 Table 1: name OLAP DBMS Table 2: name Sort Order Replica #2 Table 1: id Table 2: id 20

  21. OLAP Replication Sort Order Replica #1 Table 1: name Table1.name ⨝ Table2.name OLAP DBMS Table 2: name Sort Order Replica #2 Table 1: id Table 2: id 21

  22. Security + Privacy Concerns • No truly encrypted solution exists. • Many companies are unable to use public cloud platforms. 22

  23. Performance Variance • DBMSs are sensitive to changes in underlying hardware performance. • large fluctuations in performance. 23

  24. OLTP Performance Variance 35% Difference YCSB on MySQL (Winter 2012) OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, Philippe Cudre-Mauroux 24 Medium EC2 Instances Proceedings of the VLDB Endowment, vol. 7, pages. 277 288, December 2013.

  25. Cloud database vendors. Source : https://www.flickr.com/photos/alestra/8891585632

  26. Important Features • Automatic Back-ups • Geo-replication • Elasticity / Live Reconfiguration • Efficient Multi-Tenancy • Workload Awareness 26

  27. Cloud Database Vendors • Cloud-friendly systems • Database-as-a-Service (DBaaS) 27

  28. Cloud-friendly DBMSs • Most DBMS vendors make it easy to deploy on cloud platforms. • Others provide support for easy scale-out in a cloud environment. • More than just pre-configured instances. 28

  29. OLTP DBaaS • Amazon RDS / Aurora • Microsoft Azure • Google Cloud SQL • Database.com • ClearDB • GenieDB • Clustrix 29

  30. OLAP DBaaS • Amazon Redshift • Google BigQuery • Microsoft Azure • Snowflake 30

  31. Parting Thoughts • The cloud does not magically make database problems go away. • DBMS on the cloud. • AF AIK, there is no truly autonomous DBMS as of yet. 31

  32. 32

  33. END @andy_pavlo

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend