DAT ABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3 rd , - - PowerPoint PPT Presentation

dat abases in
SMART_READER_LITE
LIVE PREVIEW

DAT ABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3 rd , - - PowerPoint PPT Presentation

DAT ABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3 rd , 2014 OLTP vs. OLAP databases . Source : https://www.flickr.com/photos/adesigna/3237575990 On-line Transaction Processing Fast operations that ingest new data and then update


slide-1
SLIDE 1

DAT ABASES IN THE CLOUD

@andy_pavlo CMU-Q 15-440 December 3rd, 2014
slide-2
SLIDE 2

OLTP vs. OLAP

databases.

Source: https://www.flickr.com/photos/adesigna/3237575990
slide-3
SLIDE 3

On-line Transaction Processing

  • Fast operations that ingest new data and

then update state using ACID transactions.

  • Only access a small amount of data.
  • Volume: 1k to 1m txn/sec
  • Latency: >1-50 ms
  • Database Size: 100s GB to 10s TB
3
slide-4
SLIDE 4

Example

  • line game in

the OLTP database.

4

Game Application Framework

Click Stream Game Updates

OL TP DBMS

Pre-computed model decides the next level the player is shown.

slide-5
SLIDE 5

Example

  • line game in

the OLTP database.

5

Game Application Framework

Click Stream Game Updates

OL TP DBMS

Real-time Monitoring

slide-6
SLIDE 6

Database Warehouses

  • Complete history of OLTP databases.
  • Complex queries that analyze large

segments of fact tables and combine them with dimension tables.

  • Volume: A couple queries per second
  • Latency: 1-60 seconds
  • Database Size: 100s TB to 10s PB
6
slide-7
SLIDE 7

Example

  • Compute model used to guide OLTP DBMS

decisions from historical data.

7

Game Application Framework

Click Stream Game Updates

OL TP DBMS OLAP DBMS

ETL New Model

slide-8
SLIDE 8

OLTP vs. OLAP

  • Storage Format:

– OL

TP → R

  • w-oriented

– OLAP → Column-oriented

  • Primary Database Location:

– OL

TP → In-Memory

– OLAP → Disks

  • Workloads:

– OL

TP → Write-Heavy

– OLAP → R

ead-Only

8
slide-9
SLIDE 9

Things to consider with databases in the cloud.

Source: https://www.flickr.com/photos/arvidnn/15285491335
slide-10
SLIDE 10

Good Things

  • Better Resource Utilization
  • Elastic Scaling
  • Database-as-a-Service Offerings
10
slide-11
SLIDE 11

Better Resource Utilization

  • Combine multiple silos onto
  • verprovisioned resources.
  • Public platform providers achieve better

economies of scale.

  • Database machines are (mostly) dead.
  • Optimal multi-tenant placement is a difficult

problem.

11
slide-12
SLIDE 12

Elastic Scaling

  • Automatically provision new resources on

the fly as needed.

  • Scaling up vs. Scaling out.
  • Difficult for OLTP DBMS to continue

processing transactions while data migrates.

12
slide-13
SLIDE 13

OLTP Scale-out Example

13

Elapsed Time

TPC-C Benchmark on H-Store (Fall 2014)

Scaling from 3 to 4 nodes

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing R.Taft, E.Mansour, M.Serafini, J.Duggan, A.J. Elmore, A.Aboulnaga, A.Pavlo, M.Stonebraker Proceedings of the VLDB Endowment, vol. 8, iss. 3, pages. 245 256, November 2014.
slide-14
SLIDE 14

Database-as-a-Service

  • Cloud provider manages physical

configuration of a DBMS.

  • Ideal for applications that are co-located in
  • Combine private data with curated

databases (i.e., data marts)

14
slide-15
SLIDE 15

Bad Things

  • I/O Virtualization
  • File system Replication
  • Security + Privacy Concerns
  • Performance Variance
15
slide-16
SLIDE 16

I/O Virtualization

  • Distributed file system stores data

transparently across multiple nodes.

  • This causes a DBMS pull data to query

push query to data

16
slide-17
SLIDE 17

OLAP I/O Virtualization

17

SELECT YEAR(o_date) AS o_year, AVG(o_amount) FROM orders GROUP BY o_year ORDER BY o_year ASC

OLAP DBMS

Terabytes! Distributed Filesystem

slide-18
SLIDE 18

OLAP I/O Virtualization

18

SELECT YEAR(o_date) AS o_year, AVG(o_amount) FROM orders GROUP BY o_year ORDER BY o_year ASC

OLAP DBMS

Bytes! Distributed Filesystem

slide-19
SLIDE 19

File System Replication

  • The DBMS should not rely on file system

replication for durability.

  • OLTP systems maintain replicas in-memory.
  • OLAP systems can store copies of tables in

different ways on replica nodes.

19
slide-20
SLIDE 20

OLAP Replication

20

OLAP DBMS

Table 1:

name

Table 2:

name

Table 1:

id

Table 2:

id

Sort Order Sort Order Replica #1 Replica #2

slide-21
SLIDE 21

OLAP Replication

21

OLAP DBMS

Table 1:

name

Table 2:

name

Table 1:

id

Table 2:

id

Sort Order Table1.name

Table2.name Sort Order Replica #1 Replica #2

slide-22
SLIDE 22

Security + Privacy Concerns

  • No truly encrypted solution exists.
  • Many companies are unable to use public

cloud platforms.

22
slide-23
SLIDE 23

Performance Variance

  • DBMSs are sensitive to changes in

underlying hardware performance.

  • large fluctuations in performance.
23
slide-24
SLIDE 24

OLTP Performance Variance

24

YCSB on MySQL (Winter 2012)

Medium EC2 Instances

35% Difference

OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, Philippe Cudre-Mauroux Proceedings of the VLDB Endowment, vol. 7, pages. 277 288, December 2013.
slide-25
SLIDE 25

Cloud database vendors.

Source: https://www.flickr.com/photos/alestra/8891585632
slide-26
SLIDE 26

Important Features

  • Automatic Back-ups
  • Geo-replication
  • Elasticity / Live Reconfiguration
  • Efficient Multi-Tenancy
  • Workload Awareness
26
slide-27
SLIDE 27

Cloud Database Vendors

  • Cloud-friendly systems
  • Database-as-a-Service (DBaaS)
27
slide-28
SLIDE 28

Cloud-friendly DBMSs

  • Most DBMS vendors make it easy to deploy
  • n cloud platforms.
  • Others provide support for easy scale-out in

a cloud environment.

  • More than just pre-configured instances.
28
slide-29
SLIDE 29

OLTP DBaaS

  • Amazon RDS / Aurora
  • Microsoft Azure
  • Google Cloud SQL
  • Database.com
  • ClearDB
  • GenieDB
  • Clustrix
29
slide-30
SLIDE 30

OLAP DBaaS

  • Amazon Redshift
  • Google BigQuery
  • Microsoft Azure
  • Snowflake
30
slide-31
SLIDE 31

Parting Thoughts

  • The cloud does not magically make

database problems go away.

  • DBMS on the cloud.
  • AF

AIK, there is no truly autonomous DBMS as of yet.

31
slide-32
SLIDE 32 32
slide-33
SLIDE 33

END

@andy_pavlo