Database Systems Do Not Scale to 1000 CPU Cores
And Other Tales of the Macabre
@andy_pavlo
Database Systems Do Not Scale to 1000 CPU Cores And Other Tales of - - PowerPoint PPT Presentation
Database Systems Do Not Scale to 1000 CPU Cores And Other Tales of the Macabre @ andy_pavlo 2 Three million children die per year due to poor nutrition. Source: http://www.wfp.org/hunger/stats 3 Three days after you die, stomach enzymes
And Other Tales of the Macabre
@andy_pavlo
2
Three million children die per year due to poor nutrition.
Source: http://www.wfp.org/hunger/stats3
Three days after you die, stomach enzymes start to digest you.
Source: http://discovermagazine.com/2006/sep/10-20thingsdeath4
Everyone in this room will be dead in 65 years.
Source: http://discovermagazine.com/2006/sep/10-20thingsdeath5
Database systems cannot scale to 1000 CPU cores.
Source: http://www.vldb.org/pvldb/vol8/p209-yu.pdf6
DBx1000 on Graphite Simulator Write-Intensive Workload High Contention
Why This Matters
complex and larger.
future “many-core” CPU architectures.
7
Today’s Talk
8
9
Transaction Processing
On-line Transaction Processing
update state using ACID transactions.
– Send $50 from user A to user B
10
Concurrency Control
multi-programmed fashion while preserving the illusion that each of them is executing alone on a dedicated system.
11
Concurrency Control
12
Two-Phase Locking (2PL)
13
Transaction #1
BEGIN COMMIT
LOCK(A) LOCK(B) UNLOCK(A) UNLOCK(B) READ(A) WRITE(B)
Shrinking Phase
LOCK(A) LOCK(B)
Growing Phase
Transaction #2
BEGIN COMMIT
LOCK(B) LOCK(A) WRITE(A) UNLOCK(A) UNLOCK(B) WRITE(B)
Two-Phase Locking (2PL)
14
Transaction #1
BEGIN COMMIT
LOCK(A) LOCK(B) UNLOCK(A) UNLOCK(B) READ(A) WRITE(B) LOCK(A) LOCK(B)
Transaction #2
BEGIN COMMIT
LOCK(B) LOCK(A) WRITE(A) UNLOCK(A) UNLOCK(B) WRITE(B)
Two-Phase Locking (2PL)
15
Transaction #1
BEGIN COMMIT
LOCK(A) LOCK(B) UNLOCK(A) UNLOCK(B) READ(A) WRITE(B) LOCK(A) LOCK(B)
Two-Phase Locking (2PL)
16
Record Read Timestamp Write Timestamp
A B 10000
Timestamp Ordering (T/O)
17
Transaction #1
BEGIN COMMIT
READ(A) WRITE(B) WRITE(A)
10000 10000
10000
10001
Record Read Timestamp Write Timestamp
A B 10000
Timestamp Ordering (T/O)
18
Transaction #1
BEGIN COMMIT
READ(A) WRITE(B) WRITE(A)
10001 10001
10000
10001
Record Read Timestamp Write Timestamp
A B 10000
Timestamp Ordering (T/O)
19
Transaction #1
BEGIN COMMIT
READ(A) WRITE(B) WRITE(A)
10001 10001
10005
10001
Timestamp Ordering (T/O)
20
Concurrency Control Schemes
21
DL_DETECT NO_WAIT WAIT_DIE 2PL w/ Deadlock Detection 2PL w/ Non-waiting Prevention 2PL w/ Wait-and-Die Prevention TIMESTAMP MVCC OCC Basic T/O Algorithm Multi-Version T/O Optimistic Concurrency Control
22
Evaluation Testbed
23
No DBMS supports multiple CC schemes. No CPU supports 1000 cores.
Experimental Platform
24
DBx1000 Graphite Simulator Compute Cluster
Core L2 L1
Worker Threads
Target Workload
– 20 million tuples – Each tuple is 1KB (total database is ~20GB)
25
26
Evaluation
27
DBx1000 on Graphite Simulator Read-Only Workload No Contention
28
DBx1000 on Graphite Simulator Write-Intensive Workload Medium Contention
29
DBx1000 on Graphite Simulator Write-Intensive Workload High Contention
Time % Breakdown (512 Cores)
30
DBx1000 on Graphite Simulator Write-Intensive Workload High Contention
Bottlenecks
– DL_DETECT, WAIT_DIE
– All T/O algorithms + WAIT_DIE
– OCC + MVCC
31
Bottlenecks
– DL_DETECT, WAIT_DIE
– All T/O algorithms + WAIT_DIE
– OCC + MVCC
32
Locking Thrashing
causing other transactions to wait a longer to acquire locks.
acquire locks in primary key order.
33
34
DBx1000 with 2PL DL_DETECT Write-Intensive Workload No Deadlocks (Ordered Lock Acquisition)
35
DBx1000 with 2PL DL_DETECT Write-Intensive Workload No Deadlocks (Ordered Lock Acquisition)
36
Potential Solutions
Hardware/Software Co-Design
new hardware-level optimizations:
– Hardware-accelerated Lock Sharing – Asynchronous Memory Copying – Decentralized Memory Controller.
37
Next Steps
– Logging + Recovery – Indexes
concurrency control algorithms.
38
39
Andy Pavlo Mike Stonebraker Srini Devadas Xiangyao Yu
http://cmudb.io/1000cores
@andy_pavlo