in tashkent
play

in Tashkent CSEP 545 Transaction Processing Sameh Elnikety - PowerPoint PPT Presentation

Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety Replication for Performance Expensive Limited scalability 2 DB Replication is Challenging Single database system Large, persistent state Transactions


  1. Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety

  2. Replication for Performance Expensive Limited scalability 2

  3. DB Replication is Challenging • Single database system – Large, persistent state – Transactions – Complex software • Replication challenges – Maintain consistency – Middleware replication 3

  4. Background Standalone Replica 1 DBMS 4

  5. Background Replica 1 Replica 2 Load Balancer Replica 3 5

  6. Read Tx Read tx does Replica 1 not change DB state Replica 2 T Load Balancer Replica 3 6

  7. Update Tx 1/2 Update tx Replica 1 changes DB state Replica 2 T Load Balancer ws ws Replica 3 7

  8. Update Tx 1/2 Update tx Apply (or commit) Replica 1 changes T everywhere ws DB state Replica 2 T Load Balancer ws ws ws Replica 3 Example: ws T1 : { set x = 1 } 8

  9. Update Tx 2/2 Update tx Replica 1 changes DB state Replica 2 T T Ordering Load ws ws Balancer Replica 3 ws ws 9

  10. Update Tx 2/2 Update tx Commit updates Replica 1 changes ws in order DB state ws Replica 2 Ordering Load ws T Balancer ws ws ws Replica 3 Replica 3 Example: ws T T1 : { set x = 1 } ws T2 : { set x = 7 } 10

  11. Sub-linear Scalability Wall Replica 1 ws ws Replica 2 Ordering Load ws T Balancer ws ws ws Replica 3 Replica 3 Replica 4 ws T ws 11

  12. This Talk • General scaling techniques – Address fundamental bottlenecks – Synergistic, implemented in middleware – Evaluated experimentally 12

  13. Super-linear Scalability 120 37 X 100 25 X 80 TPS 60 12 X 40 7 X 20 1 X 0 Single Base United MALB UF

  14. Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging 14

  15. Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging 15

  16. Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging R* reading Replica 1/N N.R (optimized) update U* N.U logging (N-1).ws* 16

  17. Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging R* reading Replica 1/N MALB N.R (optimized) Update Filtering update U* N.U Uniting O & D logging (N-1).ws* 17

  18. Key Points 1. Commit updates in order – Perform serial synchronous disk writes – Unite ordering and durability 2. Load balancing – Optimize for equal load: memory contention – MALB: optimize for in-memory execution 3. Update propagation – Propagate updates everywhere – Update filtering: propagate to where needed 18

  19. Roadmap Tx A Replica 1 Replica 2 Load Ordering Balancer Replica 3 Commit Load balancing updates in Update propagation order 19

  20. Key Idea • Traditionally: – Commit ordering and durability are separated • Key idea: – Unite commit ordering and durability 20

  21. All Replicas Must Agree • All replicas agree on Replica 1 Tx A – which update tx commit durability – their commit order • Total order Replica 2 – Determined by middleware Tx B – Followed by each replica durability Replica 3 durability 21

  22. Order Outside DBMS Replica 1 Tx A Tx A durability Tx B Replica 2 Ordering Tx B durability Replica 3 durability 22

  23. Order Outside DBMS Replica 1 Tx A Tx A durability A  B A  B Tx B Replica 2 Ordering Tx B durability A  B A  B A  B Replica 3 A  B durability A  B 23

  24. Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability 24

  25. Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  B A 25

  26. Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  B A Cannot commit A & B concurrently! 26

  27. Enforce Order = Serial Commit Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability A 27

  28. Enforce Order = Serial Commit Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  A B 28

  29. Commit Serialization is Slow Ordering A  B  C Commit order A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C DBMS CPU CPU CPU Durability Durability Durability durability A  B A  B  C A 29

  30. Commit Serialization is Slow Ordering A  B  C Commit order A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C Problem: Durability & ordering separated → serial disk writes DBMS CPU CPU CPU Durability Durability Durability durability A  B A  B  C A 30

  31. Unite D. & O. in Middleware Ordering A  B  C Commit order Durability durability A  B  C A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C DBMS CPU CPU CPU durability OFF 31

  32. Unite D. & O. in Middleware Ordering A  B  C Commit order Durability durability A  B  C A  B  C Proxy Ack B Ack A Ack C Solution: Commit A Commit B Commit C Move durability to MW DBMS Durability & ordering in middleware → group commit CPU CPU CPU durability OFF 32

  33. Implementation: Uniting D & O in MW • Middleware logs tx effects – Durability of update tx • Guaranteed in middleware • Turn durability off at database • Middleware performs durability & ordering – United → group commit → fast • Database commits update tx serially – Commit = quick main memory operation 33

  34. Uniting Improves Throughput • Metric 40 12 X – Throughput 35 • Workload 30 – TPC-W Ordering 25 TPS 7 X (50% updates) 20 • System 15 – Linux cluster 10 – PostgreSQL 1 X 5 – 16 replicas 0 Single Base United – Serializable exec. TPC-W

  35. Roadmap Tx A Replica 1 Replica 2 Load Ordering Balancer Replica 3 Commit Load balancing updates in Update propagation order 35

  36. Key Idea Replica 1 Mem Equal load on replicas Disk Load Balancer Replica 2 Mem Disk 36

  37. Key Idea Replica 1 Mem Equal load on replicas Disk Load Balancer Replica 2 Mem MALB: (Memory-Aware Load Balancing) Disk Optimize for in-memory execution 37

  38. How Does MALB Work? Database 1 2 3 Workload A → 1 2 B → 2 3 Mem Memory 38

  39. Read Data From Disk Replica 1 A → 1 2 Mem B → 2 3 Disk 1 1 2 3 3 A, B, A, B Least Loaded Replica 2 Mem Disk 1 2 3 39

  40. Read Data From Disk Replica 1 A → 1 2 Mem B → 2 3 1 1 2 3 3 Slow Disk 1 1 2 3 3 A, B, A, B Least Loaded Replica 2 Mem Slow 1 1 2 3 3 Disk 1 2 3 40

  41. Data Fits in Memory Replica 1 A → 1 2 Mem B → 2 3 Disk 1 2 3 A, B, A, B MALB Replica 2 Mem Disk 1 2 3 41

  42. Data Fits in Memory Replica 1 A → 1 2 Mem B → 2 3 1 2 Fast Disk 1 2 3 A, B, A, B MALB Replica 2 Mem Fast 2 3 Memory info? Disk Many tx and replicas? 1 2 3 42

  43. Estimate Tx Memory Needs • Exploit tx execution plan – Which tables & indices are accessed – Their access pattern • Linear scan, direct access • Metadata from database – Sizes of tables and indices 43

  44. Grouping Transactions • Objective – Construct tx groups that fit together in memory • Bin packing – Item: tx memory needs – Bin: memory of replica – Heuristic: Best Fit Decreasing • Allocate replicas to tx groups – Adjust for group loads 44

  45. MALB in Action A B C D E F MALB 45

  46. MALB in Action Memory needs for A, B, C, D, E, F A B C D E F MALB 46

  47. MALB in Action Group A Memory needs for A, B, C, D, E, F A B C Group B C D E F MALB Group D E F 47

  48. MALB in Action Replica Group A Memory needs for A A, B, C, D, E, F Disk A B C Replica Group B C D E F MALB B C Disk Group D E F Replica D E F Disk 48

  49. MALB Summary • Objective – Optimize for in-memory execution • Method – Estimate tx memory needs – Construct tx groups – Allocate replicas to tx groups 49

  50. Experimental Evaluation • Implementation – No change in consistency – Still middleware • Compare – United : efficient baseline system – MALB : exploits working set information • Same environment – Linux cluster running PostgreSQL – Workload: TPC-W Ordering (50% update txs) 50

  51. MALB Doubles Throughput 120 TPC-W 100 Ordering 105% 25 X 80 16 replicas TPS 60 12 X 40 7 X 20 1 X 0 Single Base United MALB UF 51

  52. MALB Doubles Throughput 1.0 120 Read I/O, normalized 100 0.8 105% 25 X 80 TPS 0.6 60 0.4 12 X 40 7 X 0.2 20 1 X 0 0.0 Single Base United MALB UF United MALB 52

  53. Big Gains with MALB DB Size 12% 75% 182% Big 45% 105% 48% Small 29% 0% 4% Mem Size Small Big

  54. Big Gains with MALB DB Size Run 12% 75% 182% from Big disk 45% 105% 48% Small 29% 0% 4% Run from memory Mem Size Small Big

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend