distributed oltp databases part i
play

Distributed OLTP Databases (Part I) Lecture # 22 Andy Pavlo - PowerPoint PPT Presentation

Distributed OLTP Databases (Part I) Lecture # 22 Andy Pavlo Database Systems AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 ADM IN ISTRIVIA Project #3 : TODAY @ 11:59am Homework #5 : Monday Dec 3 rd @ 11:59pm


  1. Distributed OLTP Databases (Part I) Lecture # 22 Andy Pavlo Database Systems AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018

  2. 2 ADM IN ISTRIVIA Project #3 : TODAY @ 11:59am Homework #5 : Monday Dec 3 rd @ 11:59pm Project #4 : Monday Dec 10 th @ 11:59pm Extra Credit : Wednesday Dec 12 th @11:59pm Final Exam : Sunday Dec 16 th @ 8:30am CMU 15-445/645 (Fall 2018)

  3. 3 ADM IN ISTRIVIA Monday Dec 3 rd – VoltDB Lecture → Dr. Ethan Zhang (Lead Engineer) Wednesday Dec 5 th – Potpourri + Review → Vote for what system you want me to talk about. → https://cmudb.io/f18-systems Wednesday Dec 5 th – Extra Credit Check → Submit your extra credit assignment early to get feedback from me. CMU 15-445/645 (Fall 2018)

  4. 4 UPCO M IN G DATABASE EVEN TS Swarm64 Tech Talk → Thursday November 29 th @ 12pm → GHC 8102 ← Different Location! VoltDB Research Talk → Monday December 3 rd @ 4:30pm → GHC 8102 CMU 15-445/645 (Fall 2018)

  5. 5 PARALLEL VS. DISTRIBUTED Parallel DBMSs: → Nodes are physically close to each other. → Nodes connected with high-speed LAN. → Communication cost is assumed to be small. Distributed DBMSs: → Nodes can be far from each other. → Nodes connected using public network. → Communication cost and problems cannot be ignored. CMU 15-445/645 (Fall 2018)

  6. 6 DISTRIBUTED DBM Ss Use the building blocks that we covered in single- node DBMSs to now support transaction processing and query execution in distributed environments. → Optimization & Planning → Concurrency Control → Logging & Recovery CMU 15-445/645 (Fall 2018)

  7. 7 O LTP VS. O LAP On-line Transaction Processing (OLTP): → Short-lived read/write txns. → Small footprint. → Repetitive operations. On-line Analytical Processing (OLAP): → Long-running, read-only queries. → Complex joins. → Exploratory queries. CMU 15-445/645 (Fall 2018)

  8. 8 TO DAY'S AGEN DA System Architectures Design Issues Partitioning Schemes Distributed Concurrency Control CMU 15-445/645 (Fall 2018)

  9. 9 SYSTEM ARCH ITECTURE A DBMS's system architecture specifies what shared resources are directly accessible to CPUs. This affects how CPUs coordinate with each other and where they retrieve/store objects in the database. CMU 15-445/645 (Fall 2018)

  10. 10 SYSTEM ARCH ITECTURE Network Network Network Shared Shared Shared Shared Everything Memory Disk Nothing CMU 15-445/645 (Fall 2018)

  11. 11 SH ARED M EM O RY CPUs have access to common memory address space via a fast Network interconnect. → Each processor has a global view of all the in-memory data structures. → Each DBMS instance on a processor has to "know" about the other instances. CMU 15-445/645 (Fall 2018)

  12. 12 SH ARED DISK All CPUs can access a single logical disk directly via an interconnect but each have their own private memories. Network → Can scale execution layer independently from the storage layer. → Have to send messages between CPUs to learn about their current state. CMU 15-445/645 (Fall 2018)

  13. 13 SH ARED DISK EXAM PLE Node Storage Page ABC Get Id=101 Application Server Node CMU 15-445/645 (Fall 2018)

  14. 13 SH ARED DISK EXAM PLE Node Storage Get Id=200 Page XYZ Application Server Node CMU 15-445/645 (Fall 2018)

  15. 13 SH ARED DISK EXAM PLE Node Storage Get Id=101 Page ABC Node Application Server Node CMU 15-445/645 (Fall 2018)

  16. 13 SH ARED DISK EXAM PLE Node Storage Node Application Server Node CMU 15-445/645 (Fall 2018)

  17. 13 SH ARED DISK EXAM PLE Node Storage Page ABC Update 101 Node Application Server Node CMU 15-445/645 (Fall 2018)

  18. 13 SH ARED DISK EXAM PLE Node Storage Page ABC Update 101 Node Application Server Node CMU 15-445/645 (Fall 2018)

  19. 14 SH ARED N OTH IN G Each DBMS instance has its own Network CPU, memory, and disk. Nodes only communicate with each other via network. → Easy to increase capacity. → Hard to ensure consistency. CMU 15-445/645 (Fall 2018)

  20. 15 SH ARED N OTH IN G EXAM PLE Node P1→ ID:1-150 Get Id=200 Application Server Node P2→ ID:151-300 CMU 15-445/645 (Fall 2018)

  21. 15 SH ARED N OTH IN G EXAM PLE Get Id=10 Node Get Id=200 P1→ ID:1-150 Get Id=200 Application Server Node P2→ ID:151-300 CMU 15-445/645 (Fall 2018)

  22. 15 SH ARED N OTH IN G EXAM PLE Node P1→ ID:1-150 Node Application Server Node P2→ ID:151-300 CMU 15-445/645 (Fall 2018)

  23. 15 SH ARED N OTH IN G EXAM PLE Node P1→ ID:1-100 Node P3→ ID:101-200 Application Server Node P2→ ID:201-300 CMU 15-445/645 (Fall 2018)

  24. 16 EARLY DISTRIBUTED DATABASE SYSTEM S MUFFIN – UC Berkeley (1979) SDD-1 – CCA (1979) System R* – IBM Research (1984) Stonebraker Bernstein Gamma – Univ. of Wisconsin (1986) NonStop SQL – Tandem (1987) Mohan DeWitt Gray CMU 15-445/645 (Fall 2018)

  25. 17 DESIGN ISSUES How does the application find data? How to execute queries on distributed data? → Push query to data. → Pull data to query. How does the DBMS ensure correctness? CMU 15-445/645 (Fall 2018)

  26. 18 H O M O GEN O US VS. H ETERO GEN O US Approach #1: Homogenous Nodes → Every node in the cluster can perform the same set of tasks (albeit on potentially different partitions of data). → Makes provisioning and failover "easier". Approach #2: Heterogenous Nodes → Nodes are assigned specific tasks. → Can allow a single physical node to host multiple "virtual" node types for dedicated tasks. CMU 15-445/645 (Fall 2018)

  27. 19 M O N GO DB CLUSTER ARCH ITECTURE Shards ( mongod ) Router ( mongos ) P1 P2 Get Id=101 Router ( mongos ) ⋮ P3 P4 Application Server Config Server ( mongod ) ⋮ CMU 15-445/645 (Fall 2018)

  28. 19 M O N GO DB CLUSTER ARCH ITECTURE Shards ( mongod ) Router ( mongos ) P1 P2 Get Id=101 Router ( mongos ) ⋮ P3 P4 Application Server P1→ ID:1-100 Config Server ( mongod ) P2→ ID:101-200 P3→ ID:201-300 ⋮ P4→ ID:301-400 CMU 15-445/645 (Fall 2018)

  29. 19 M O N GO DB CLUSTER ARCH ITECTURE Shards ( mongod ) Router ( mongos ) P1 P2 Get Id=101 Router ( mongos ) ⋮ P3 P4 Application Server P1→ ID:1-100 Config Server ( mongod ) P2→ ID:101-200 P3→ ID:201-300 ⋮ P4→ ID:301-400 CMU 15-445/645 (Fall 2018)

  30. 20 DATA TRAN SPAREN CY Users should not be required to know where data is physically located, how tables are partitioned or replicated . A SQL query that works on a single-node DBMS should work the same on a distributed DBMS. CMU 15-445/645 (Fall 2018)

  31. 21 DATABASE PARTITIO N IN G Split database across multiple resources: → Disks, nodes, processors. → Sometimes called "sharding" The DBMS executes query fragments on each partition and then combines the results to produce a single answer. CMU 15-445/645 (Fall 2018)

  32. 22 N AÏVE TABLE PARTITIO N ING Each node stores one and only table. Assumes that each node has enough storage space for a table. CMU 15-445/645 (Fall 2018)

  33. 23 N AÏVE TABLE PARTITIO N ING Partitions Table1 Table2 Ideal Query: SELECT * FROM table CMU 15-445/645 (Fall 2018)

  34. 23 N AÏVE TABLE PARTITIO N ING Partitions Table1 Table2 Table1 Ideal Query: SELECT * FROM table CMU 15-445/645 (Fall 2018)

  35. 23 N AÏVE TABLE PARTITIO N ING Partitions Table1 Table2 Table1 Table2 Ideal Query: SELECT * FROM table CMU 15-445/645 (Fall 2018)

  36. 24 H O RIZO N TAL PARTITIO N IN G Split a table's tuples into disjoint subsets. → Choose column(s) that divides the database equally in terms of size, load, or usage. → Each tuple contains all of its columns. → Hash Partitioning, Range Partitioning The DBMS can partition a database physical (shared nothing) or logically (shared disk). CMU 15-445/645 (Fall 2018)

  37. 25 H O RIZO N TAL PARTITIO N IN G Partitioning Key Partitions Table1 101 a XXX 2017-11-29 hash(a)%4 = P2 102 b XXY 2017-11-28 hash(b)%4 = P4 103 c XYZ 2017-11-29 hash(c)%4 = P3 104 d XYX 2017-11-27 hash(d)%4 = P2 105 e XYY 2017-11-29 hash(e)%4 = P1 Ideal Query: SELECT * FROM table WHERE partitionKey = ? CMU 15-445/645 (Fall 2018)

  38. 25 H O RIZO N TAL PARTITIO N IN G Partitioning Key Partitions Table1 101 a XXX 2017-11-29 hash(a)%4 = P2 102 b XXY 2017-11-28 hash(b)%4 = P4 103 c XYZ 2017-11-29 hash(c)%4 = P3 104 d XYX 2017-11-27 hash(d)%4 = P2 105 e XYY 2017-11-29 hash(e)%4 = P1 Ideal Query: SELECT * FROM table WHERE partitionKey = ? CMU 15-445/645 (Fall 2018)

  39. 25 H O RIZO N TAL PARTITIO N IN G Partitioning Key Partitions Table1 101 a XXX 2017-11-29 hash(a)%4 = P2 P1 P2 102 b XXY 2017-11-28 hash(b)%4 = P4 103 c XYZ 2017-11-29 hash(c)%4 = P3 104 d XYX 2017-11-27 hash(d)%4 = P2 105 e XYY 2017-11-29 hash(e)%4 = P1 P3 P4 Ideal Query: SELECT * FROM table WHERE partitionKey = ? CMU 15-445/645 (Fall 2018)

  40. 26 LO GICAL PARTITIO N IN G Node Id=1 Storage Id=2 Get Id=1 Id=1 Id=2 Id=3 Application Id=4 Server Node Id=3 Id=4 CMU 15-445/645 (Fall 2018)

  41. 26 LO GICAL PARTITIO N IN G Node Id=1 Storage Id=2 Id=1 Id=2 Get Id=3 Id=3 Application Id=4 Server Node Id=3 Id=4 CMU 15-445/645 (Fall 2018)

  42. 27 PH YSICAL PARTITIO N IN G Node Application Server Node CMU 15-445/645 (Fall 2018)

  43. 27 PH YSICAL PARTITIO N IN G Node Id=1 Id=2 Application Server Node Id=3 Id=4 CMU 15-445/645 (Fall 2018)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend