CS 839: Design the Next-Generation Database Lecture 20: OLTP in - - PowerPoint PPT Presentation

cs 839 design the next generation database lecture 20
SMART_READER_LITE
LIVE PREVIEW

CS 839: Design the Next-Generation Database Lecture 20: OLTP in - - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud Xiangyao Yu 4/2/2020 1 Discussion Highlights SmartNIC for join Filtering, hash table, indexing Network traffic scheduling for shuffling (reduce the problem of bursty


slide-1
SLIDE 1

Xiangyao Yu 4/2/2020

CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud

1

slide-2
SLIDE 2

Discussion Highlights

2

SmartNIC for join

  • Filtering, hash table, indexing
  • Network traffic scheduling for shuffling (reduce the problem of bursty traffic)
  • Hash table in SmartNIC?
  • Sort in SmartNIC
  • Data partitioning

HW/SW techniques to improve performance of sort-merge join

  • Equivalent performance after removing bottlenecks? (Not necessarily)
  • Hardware acceleration for the sort and merge

Radix join to achieve theoretical maximum performance

  • Communication powered by SmartNICs/RDMA (network scheduling for shuffling)
  • Hash partitioning logic in SmartNIC
slide-3
SLIDE 3

Today’s Paper

3

SIGMOD 2017

slide-4
SLIDE 4

Cloud Computing

4

slide-5
SLIDE 5

Cloud Computing

Self-manage Hardware Self-deploy database DB as a Service (DBaaS) Managed by customer Managed by provider

slide-6
SLIDE 6

Shared Nothing vs. Shared Disk

Network

CPU HDD Memory CPU HDD Memory CPU HDD Memory

Shared Nothing

CPU Memory CPU Memory CPU Memory

Shared Disk

Network

HDD HDD HDD HDD

6

slide-7
SLIDE 7

Cloud Storage Disaggregation

CPU Mem CPU Mem CPU Mem Network 7

VM VM VM

HDD

VM

CPU

HDD

VM

CPU

HDD

VM

CPU

HDD

VM

CPU

HDD

VM

CPU

Storage disaggregation

  • Independent management

and scaling of compute and storage

  • Cost reduction

Smartness in Storage

  • Storage nodes contain

CPUs for computation

slide-8
SLIDE 8

Computation Pushdown in Cloud OLTP

8

Pushdown to cloud storage?

  • Concurrency control
  • Indexing
  • Buffer manager
  • Logging
slide-9
SLIDE 9

Computation Pushdown in Cloud OLTP

9

Pushdown to cloud storage?

  • Concurrency control
  • Indexing
  • Buffer manager
  • Logging

Push redo processing into the storage service

slide-10
SLIDE 10

Aurora – Single Master

10

slide-11
SLIDE 11

Quorum-Based Voting Protocol

Data replicated into V copies A write must acquire votes from Vw copies A read must acquire votes from Vr copes Vw + Vw > V => Vw > V / 2 Vr + Vw > V

11

Copy 1 Copy 2 Copy 3

For three copies Vw ≥ 2 Vr ≥ 2 For six copies Vw ≥ 4 Vr ≥ 3

slide-12
SLIDE 12

3-Way Replication

AZ: Availability zone

  • AZs fail independently

Data is unavailable if one AZ is unavailable and one

  • ther copy is unavailable

12

Copy 1 Copy 2 Copy 3 AZ A AZ B AZ C

slide-13
SLIDE 13

6-Way Replication

Can read if one AZ fails and one more node fails

  • Allow to rebuild a write quorum by adding additional replica

Can write if one AZ fails

13

Copies 1, 2 AZ A AZ B AZ C Copies 3, 4 Copies 5, 6

slide-14
SLIDE 14

Segmented Storage

Availability is determined by

  • MTTF: Mean time to failure
  • MTTR: Mean time to repair

Maximize availability

=> Minimize MTTR (MTTF is hard to reduce)

Segment: 10 GB block. Basic unit of failure and repair Protection Group (PG): Six replication copies of a segment

14

slide-15
SLIDE 15

Network IO in MySQL

15

IO traffic

  • REDO Log
  • Binary log
  • Data
  • Double-write
  • metadata (FRM)

Latency

  • Steps 1, 3, and 5 are

sequential and synchronous

slide-16
SLIDE 16

MySQL vs. Aurora

16

Aurora: send only REDO log to storage

slide-17
SLIDE 17

MySQL vs. Aurora – Network IO

17

slide-18
SLIDE 18

Storage Node

18

Only Steps 1 & 2 are in the foreground path

slide-19
SLIDE 19

Storage Node

19

Identify gaps in the log

slide-20
SLIDE 20

Storage Node

20

Gossip with peers to fill gaps

slide-21
SLIDE 21

Storage Node

21

Coalesce log records into data pages

slide-22
SLIDE 22

Storage Node

22

Periodically stage log and pages to S3

slide-23
SLIDE 23

Storage Node

23

Periodically garbage collect old versions and periodically validate CRC code on pages

* Cyclic redundancy check (CRC) is an error-detecting code

slide-24
SLIDE 24

Dirty Evict

24

DB Storage

REDO Log Dirty Write Back

DB Storage

REDO Log

P P A dirty page can be evicted if all changes in the page have been hardened in the log Read from storage upon a cache miss P P

slide-25
SLIDE 25

Read from One Quorum

25

Copies 1, 2 AZ A AZ B AZ C Copies 3, 4 Copies 5, 6

Three votes to read data The DB server knows which node contains the latest value => A single read from the update-to-date node

slide-26
SLIDE 26

Replication

If page is in replica’s local buffer, update the page Otherwise, discard the log record

26

Primary

REDO Log

Replica Storage Layer

REDO Log

slide-27
SLIDE 27

Evaluation – Aurora vs. MySQL

27

slide-28
SLIDE 28

Evaluation – Varying Data Sizes

28

Performance drops when data does not fit in main memory

slide-29
SLIDE 29

Evaluation – Real Customer Workloads

29

slide-30
SLIDE 30

Evaluation – Real Customer Workloads

30

slide-31
SLIDE 31

Discussion

Quantify the cost of computation in the storage layer Which fraction of speedup comes from improving MySQL vs. the new disaggregation design

31 CPU Mem CPU Mem CPU Mem Network

VM VM VM

HDD

VM

CPU

HDD

VM

CPU

HDD

VM

CPU

HDD

VM

CPU

HDD

VM

CPU

slide-32
SLIDE 32

OLTP in Cloud – Q/A

How the log can be considered as database? Mini-transactions (MTR)? Transaction not durable when client receives ack? Other bigger companies have similar offerings? Adoption of log as a database? Global LSN bottleneck Similar to logging shipping Aurora for OLAP? Serverless and multi-master?

32

slide-33
SLIDE 33

Group Discussion

Cloud storage and Smart SSD are similar in that both push computation to the data. What do you see as the key differences between the two? The initial version of Aurora (i.e., the one presented in this paper) supports only a single master. What are the challenges of moving to a multi-master setting? Can you think of other applications that can benefit from a smart and disaggregated storage service in the cloud?

33

slide-34
SLIDE 34

Before Next Lecture

Submit discussion summary to https://wisc-cs839-ngdb20.hotcrp.com

  • Deadline: Friday 11:59pm

Submit review for

  • Choosing A Cloud DBMS: Architectures and Tradeoffs
  • [optional] Amazon Redshift and the Case for Simpler Data Warehouses

34