cs 839 design the next generation database lecture 20
play

CS 839: Design the Next-Generation Database Lecture 20: OLTP in - PowerPoint PPT Presentation

CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud Xiangyao Yu 4/2/2020 1 Discussion Highlights SmartNIC for join Filtering, hash table, indexing Network traffic scheduling for shuffling (reduce the problem of bursty


  1. CS 839: Design the Next-Generation Database Lecture 20: OLTP in Cloud Xiangyao Yu 4/2/2020 1

  2. Discussion Highlights SmartNIC for join • Filtering, hash table, indexing • Network traffic scheduling for shuffling (reduce the problem of bursty traffic) • Hash table in SmartNIC? • Sort in SmartNIC • Data partitioning HW/SW techniques to improve performance of sort-merge join • Equivalent performance after removing bottlenecks? (Not necessarily) • Hardware acceleration for the sort and merge Radix join to achieve theoretical maximum performance • Communication powered by SmartNICs/RDMA (network scheduling for shuffling) • Hash partitioning logic in SmartNIC 2

  3. Today’s Paper SIGMOD 2017 3

  4. Cloud Computing 4

  5. Cloud Computing Managed by customer Managed by provider Self-manage Hardware Self-deploy database DB as a Service (DBaaS)

  6. Shared Nothing vs. Shared Disk CPU CPU CPU CPU CPU CPU Memory Memory Memory Memory Memory Memory Network HDD HDD HDD … HDD HDD HDD HDD Network Shared Nothing Shared Disk 6

  7. Cloud Storage Disaggregation VM VM VM Storage disaggregation CPU CPU CPU • Independent management and scaling of compute and Mem Mem Mem storage • Cost reduction Network Smartness in Storage CPU CPU CPU CPU CPU • Storage nodes contain CPUs for computation HDD HDD HDD HDD HDD VM VM VM VM VM 7

  8. Computation Pushdown in Cloud OLTP Pushdown to cloud storage? • Concurrency control • Indexing • Buffer manager • Logging 8

  9. Computation Pushdown in Cloud OLTP Pushdown to cloud storage? • Concurrency control • Indexing • Buffer manager • Logging Push redo processing into the storage service 9

  10. Aurora – Single Master 10

  11. Quorum-Based Voting Protocol Data replicated into V copies A write must acquire votes from V w copies A read must acquire votes from V r copes V w + V w > V => V w > V / 2 For three copies V w ≥ 2 V r + V w > V V r ≥ 2 For six copies V w ≥ 4 V r ≥ 3 Copy 1 Copy 2 Copy 3 11

  12. 3-Way Replication AZ A AZ B AZ C Copy 1 Copy 2 Copy 3 AZ: Availability zone • AZs fail independently Data is unavailable if one AZ is unavailable and one other copy is unavailable 12

  13. 6-Way Replication AZ A AZ B AZ C Copies 1, 2 Copies 3, 4 Copies 5, 6 Can read if one AZ fails and one more node fails • Allow to rebuild a write quorum by adding additional replica Can write if one AZ fails 13

  14. Segmented Storage Availability is determined by • MTTF: Mean time to failure • MTTR: Mean time to repair Maximize availability => Minimize MTTR (MTTF is hard to reduce) Segment : 10 GB block. Basic unit of failure and repair Protection Group (PG) : Six replication copies of a segment 14

  15. Network IO in MySQL IO traffic • REDO Log • Binary log • Data • Double-write • metadata (FRM) Latency • Steps 1, 3, and 5 are sequential and synchronous 15

  16. MySQL vs. Aurora Aurora: send only REDO log to storage 16

  17. MySQL vs. Aurora – Network IO 17

  18. Storage Node Only Steps 1 & 2 are in the foreground path 18

  19. Storage Node Identify gaps in the log 19

  20. Storage Node Gossip with peers to fill gaps 20

  21. Storage Node Coalesce log records into data pages 21

  22. Storage Node Periodically stage log and pages to S3 22

  23. Storage Node Periodically garbage collect old versions and periodically validate CRC code on pages * Cyclic redundancy check (CRC) is an error-detecting code 23

  24. Dirty Evict DB DB P P REDO REDO Dirty Write Back Log Log Storage Storage P P A dirty page can be evicted if all changes in the page have been hardened in the log Read from storage upon a cache miss 24

  25. Read from One Quorum AZ A AZ B AZ C Copies 1, 2 Copies 3, 4 Copies 5, 6 Three votes to read data The DB server knows which node contains the latest value => A single read from the update-to-date node 25

  26. Replication REDO Log Primary Replica REDO Log Storage Layer If page is in replica’s local buffer, update the page Otherwise, discard the log record 26

  27. Evaluation – Aurora vs. MySQL 27

  28. Evaluation – Varying Data Sizes Performance drops when data does not fit in main memory 28

  29. Evaluation – Real Customer Workloads 29

  30. Evaluation – Real Customer Workloads 30

  31. Discussion VM VM VM Quantify the cost of CPU CPU CPU computation in the storage layer Mem Mem Mem Network Which fraction of speedup comes from improving CPU CPU CPU CPU CPU MySQL vs. the new disaggregation design HDD HDD HDD HDD HDD VM VM VM VM VM 31

  32. OLTP in Cloud – Q/A How the log can be considered as database? Mini-transactions (MTR)? Transaction not durable when client receives ack? Other bigger companies have similar offerings? Adoption of log as a database ? Global LSN bottleneck Similar to logging shipping Aurora for OLAP? Serverless and multi-master? 32

  33. Group Discussion Cloud storage and Smart SSD are similar in that both push computation to the data. What do you see as the key differences between the two? The initial version of Aurora (i.e., the one presented in this paper) supports only a single master. What are the challenges of moving to a multi-master setting? Can you think of other applications that can benefit from a smart and disaggregated storage service in the cloud? 33

  34. Before Next Lecture Submit discussion summary to https://wisc-cs839-ngdb20.hotcrp.com • Deadline: Friday 11:59pm Submit review for • Choosing A Cloud DBMS: Architectures and Tradeoffs • [optional] Amazon Redshift and the Case for Simpler Data Warehouses 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend