in Tashkent CSEP 545 Transaction Processing Sameh Elnikety - - PowerPoint PPT Presentation
in Tashkent CSEP 545 Transaction Processing Sameh Elnikety - - PowerPoint PPT Presentation
Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety Replication for Performance Expensive Limited scalability 2 DB Replication is Challenging Single database system Large, persistent state Transactions
Replication for Performance
Expensive Limited scalability
2
DB Replication is Challenging
- Single database system
– Large, persistent state – Transactions – Complex software
- Replication challenges
– Maintain consistency – Middleware replication
3
Background
Replica 1 Standalone DBMS
4
Background
Replica 2 Replica 1 Replica 3 Load Balancer
5
Read Tx
Replica 2 Replica 1 Replica 3 Load Balancer
T Read tx does not change DB state
6
Update tx changes DB state
Update Tx 1/2
Replica 2 Replica 1 Replica 3 Load Balancer
T
ws ws
7
Update tx changes DB state
Update Tx 1/2
Replica 2 Replica 1 Replica 3 Load Balancer
T
ws
Apply (or commit) T everywhere
ws ws ws
Example: T1: { set x = 1 }
8
ws
Ordering
Update Tx 2/2
Replica 2 Replica 1 Replica 3 Load Balancer
T
ws
Update tx changes DB state
ws
T
ws ws
9
Ordering
Update Tx 2/2
Replica 2 Replica 1 Replica 3 Load Balancer
T Update tx changes DB state T
ws ws ws ws ws ws ws ws
Replica 3
Example: T1: { set x = 1 } T2: { set x = 7 } Commit updates in order
10
Ordering
Sub-linear Scalability Wall
Replica 2 Replica 1 Replica 3 Load Balancer
T T
ws ws ws ws ws ws ws ws
Replica 3
11
Replica 4
- General scaling techniques
– Address fundamental bottlenecks – Synergistic, implemented in middleware – Evaluated experimentally
This Talk
12
Super-linear Scalability
20 40 60 80 100 120 Single Base United MALB UF
TPS
12 X 25 X 37 X 1 X 7 X
Big Picture: Let’s Oversimplify
Standalone DBMS R
reading update logging
U
14
reading update logging
Big Picture: Let’s Oversimplify
Replica 1/N (traditional) Standalone DBMS R
reading update logging
U N.R N.U R U (N-1).ws
15
reading update logging reading update logging
Big Picture: Let’s Oversimplify
Replica 1/N (traditional) Replica 1/N (optimized) Standalone DBMS
16
R
reading update logging
U N.R N.U R U (N-1).ws N.R N.U R* U* (N-1).ws*
reading update logging reading update logging
Big Picture: Let’s Oversimplify
Replica 1/N (traditional) Replica 1/N (optimized) Standalone DBMS
17
R
reading update logging
U N.R N.U R U (N-1).ws N.R N.U R* U* (N-1).ws*
MALB Update Filtering Uniting O & D
Key Points
- 1. Commit updates in order
– Perform serial synchronous disk writes – Unite ordering and durability
- 2. Load balancing
– Optimize for equal load: memory contention – MALB: optimize for in-memory execution
- 3. Update propagation
– Propagate updates everywhere – Update filtering: propagate to where needed
18
Tx A
Roadmap
Replica 2 Replica 1 Replica 3 Load Balancer Ordering
Load balancing Update propagation Commit updates in
- rder
19
- Traditionally:
– Commit ordering and durability are separated
- Key idea:
– Unite commit ordering and durability
Key Idea
20
All Replicas Must Agree
- All replicas agree on
– which update tx commit – their commit order
- Total order
– Determined by middleware – Followed by each replica
durability
Replica 3 Tx A Tx B
durability
Replica 2
durability
Replica 1
21
Tx B
durability
Replica 3
Ordering
Tx A
Order Outside DBMS
Tx A Tx B
durability
Replica 2
durability
Replica 1
22
Tx B
durability
Replica 3
Ordering
Tx A A B
A B
Order Outside DBMS
Tx A Tx B
durability
Replica 2
A B
durability
Replica 1
A B A B
A B A B 23
Ordering
A B
DBMS durability
Replica 3
Proxy Tx A Tx B
SQL interface Task A Task B
Enforce External Commit Order
24
Ordering
A B
DBMS durability
Replica 3
Proxy Tx A Tx B
SQL interface Task A Task B
B A
Enforce External Commit Order
25
Ordering
A B
DBMS durability
Replica 3
Proxy Tx A Tx B
SQL interface Task A Task B
B A
Cannot commit A & B concurrently!
Enforce External Commit Order
26
Ordering
A B
durability
Replica 3
Proxy Tx A Tx B
SQL interface Task A Task B
A
Enforce Order = Serial Commit
DBMS
27
Ordering
A B
durability
Replica 3
Proxy Tx A Tx B
SQL interface Task A Task B
A B
Enforce Order = Serial Commit
DBMS
28
Commit Serialization is Slow
Durability A
Proxy DBMS
durability CPU
Ordering
A B C
Commit order A B C Durability A B
CPU
Durability A B C
CPU
Commit A Commit B Commit C Ack A Ack B Ack C
29
Commit Serialization is Slow
Durability A
Proxy DBMS
durability CPU
Ordering
A B C
Commit order A B C Durability A B
CPU
Durability A B C
CPU
Commit A Commit B Commit C Ack A Ack B Ack C
Problem: Durability & ordering separated → serial disk writes
30
Commit A Commit B Commit C Ack A Ack B Ack C
Unite D. & O. in Middleware
Proxy DBMS
CPU
Ordering
A B C
Commit order A B C
CPU
Durability A B C
CPU
durability OFF
durability
31
Commit A Commit B Commit C Ack A Ack B Ack C
Unite D. & O. in Middleware
Proxy DBMS
CPU
Ordering
A B C
Commit order A B C
CPU
Durability A B C
CPU
durability OFF
durability
Solution: Move durability to MW Durability & ordering in middleware → group commit
32
- Middleware logs tx effects
– Durability of update tx
- Guaranteed in middleware
- Turn durability off at database
- Middleware performs durability & ordering
– United → group commit → fast
- Database commits update tx serially
– Commit = quick main memory operation
Implementation: Uniting D & O in MW
33
Uniting Improves Throughput
- Metric
– Throughput
- Workload
– TPC-W Ordering (50% updates)
- System
– Linux cluster – PostgreSQL – 16 replicas – Serializable exec.
5 10 15 20 25 30 35 40 Single Base United
TPC-W
1 X 12 X 7 X
TPS
Tx A
Roadmap
Replica 2 Replica 1 Replica 3 Load Balancer Ordering
Load balancing Update propagation Commit updates in
- rder
35
Key Idea
Replica 1 Mem Disk Replica 2 Mem Disk Load Balancer
Equal load
- n replicas
36
Key Idea
Replica 1 Mem Disk Replica 2 Mem Disk Load Balancer
Equal load
- n replicas
MALB: (Memory-Aware Load Balancing) Optimize for in-memory execution
37
How Does MALB Work?
Database 2 1 3 Workload A → B →
Mem
Memory 2 1 2 3
38
Read Data From Disk
A, B, A, B
Replica 1 Mem Disk
2 1 3
Replica 2 Mem Disk
2 1 3
Least Loaded
3 1
A → B →
2 1 2 3
39
Read Data From Disk
A, B, A, B
Replica 1 Mem Disk
2 1 3
Replica 2 Mem Disk
2 1 3
Least Loaded
3 1
Slow Slow
A → B →
2 1 2 3
40
2 1 3 3 1 2 1 3 3 1
Data Fits in Memory
Replica 1 Mem Disk
2 1 3
Replica 2 Mem Disk
2 1 3
MALB A → B →
2 1 2 3
A, B, A, B
41
Data Fits in Memory
Replica 1 Mem Disk
2 1 3 2 1
Replica 2 Mem Disk
2 1 3 3 2
MALB
Fast Fast
A → B →
2 1 2 3
Memory info? Many tx and replicas?
A, B, A, B
42
- Exploit tx execution plan
– Which tables & indices are accessed – Their access pattern
- Linear scan, direct access
- Metadata from database
– Sizes of tables and indices
Estimate Tx Memory Needs
43
- Objective
– Construct tx groups that fit together in memory
- Bin packing
– Item: tx memory needs – Bin: memory of replica – Heuristic: Best Fit Decreasing
- Allocate replicas to tx groups
– Adjust for group loads
Grouping Transactions
44
MALB in Action
A B C D E F
MALB
45
MALB in Action
A B C D E F
MALB Memory needs for A, B, C, D, E, F
46
Group A
MALB in Action
A B C D E F Group B C Group D E F
MALB Memory needs for A, B, C, D, E, F
47
Group A
MALB in Action
A B C D E F
Replica Replica Replica
Group B C A Group D E F B C D E F
MALB Disk Disk Disk Memory needs for A, B, C, D, E, F
48
- Objective
– Optimize for in-memory execution
- Method
– Estimate tx memory needs – Construct tx groups – Allocate replicas to tx groups
MALB Summary
49
- Implementation
– No change in consistency – Still middleware
- Compare
– United: efficient baseline system – MALB: exploits working set information
- Same environment
– Linux cluster running PostgreSQL – Workload: TPC-W Ordering (50% update txs)
Experimental Evaluation
50
MALB Doubles Throughput
TPC-W Ordering 16 replicas
51
20 40 60 80 100 120 Single Base United MALB UF
TPS
105%
12 X 25 X 1 X 7 X
MALB Doubles Throughput
52
0.0 0.2 0.4 0.6 0.8 1.0 United MALB 20 40 60 80 100 120 Single Base United MALB UF
TPS Read I/O, normalized
105%
12 X 25 X 1 X 7 X
Big Small Big Small
Mem Size DB Size
Big Gains with MALB
4% 0% 29% 48% 105% 45% 182% 75% 12%
Big Small Big Small
Mem Size DB Size
Big Gains with MALB
4% 0% 29% 48% 105% 45% 182% 75% 12%
Run from memory Run from disk
Tx A
Roadmap
Replica 2 Replica 1 Replica 3 Load Balancer Ordering
Load balancing Update propagation Commit updates in
- rder
55
- Traditional:
– Propagate updates everywhere
- Update Filtering:
– Propagate updates to where they are needed
Key Idea
56
Update Filtering Example
Replica 1 Mem Disk
2 1 3
Replica 2 Mem Disk
2 1 3
MALB UF A → B →
2 1 2 3
A, B, A, B
57
Group A
Update Filtering Example
Replica 1 Group B Mem Disk
2 1 3 2 1
Replica 2 Mem Disk
2 1 3 3 2
MALB UF A → B →
2 1 2 3
A, B, A, B
58
Group A
Update Filtering Example
Disk Replica 1 Group B Mem
2 1 2 1
Replica 2 Mem Disk
2 1 3 2
MALB UF Update table 1
3 3
A → B →
2 1 2 3
A, B, A, B
59
Group A
Update Filtering Example
Disk Replica 1 Group B Mem
2 1 2 1
Replica 2 Mem Disk
2 1 3 2
MALB UF Update table 1
3 3
A → B →
2 1 2 3
A, B, A, B
60
Group A
Update Filtering Example
Disk Replica 1 Group B Mem
2 1 2 1
Replica 2 Mem Disk
2 3 2
MALB UF Update table 1
3 3
A → B →
2 1 2 3
A, B, A, B
61
1
Group A
Update Filtering Example
Disk Replica 1 Group B Mem
2 1 2 1
Replica 2 Mem Disk
2 1 3 2
MALB UF Update table 1
3
Update table 3
3
A → B →
2 1 2 3
A, B, A, B
62
Group A
Update Filtering Example
Disk Replica 1 Group B Mem
2 1 2 1
Replica 2 Mem Disk
2 1 3 2
MALB UF Update table 1
3
Update table 3
3
A → B →
2 1 2 3
A, B, A, B
63
Group A
Update Filtering Example
Disk Replica 1 Group B Mem
2 1 2 1
Replica 2 Mem Disk
2 1 3 2
MALB UF Update table 1
3
Update table 3
3
A → B →
2 1 2 3
A, B, A, B
64
Update Filtering in Action
UF
65
Update Filtering in Action
UF
Update to red table
66
Update Filtering in Action
UF
Update to red table Update to green table
67
Update Filtering in Action
UF
Update to red table Update to green table
68
Update Filtering in Action
UF
Update to red table Update to green table
69
20 40 60 80 100 120 Single Base United MALB UF
MALB+UF Triples Throughput
37 X
TPS
12 X 25 X 1 X 7 X
49% TPC-W Ordering 16 replicas
2 4 6 8 10 12 14 MALB UF 20 40 60 80 100 120 Single Base United MALB UF
MALB+UF Triples Throughput
37 X
TPS
12 X 25 X 1 X 7 X
- Prop. Updates
15 7
49%
1.49
0.5 1 1.5 2 MALB MALB+UF
Filtering Opportunities
50% Ordering Mix 5% Browsing Mix
1.02
0.5 1 1.5 2
MALB MALB+UF
Updates
Ratio MALB+UF / MALB
72
Conclusions
- 1. Commit updates in order
– Perform serial synchronous disk writes – Unite ordering and durability
- 2. Load balancing
– Optimize for equal load: memory contention – MALB: optimize for in-memory execution
- 3. Update propagation
– Propagate updates everywhere – Update filtering: propagate to where needed
73