OLTP on Hardware Islands
Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki
Data-Intensive Application and Systems Lab, EPFL *IBM Research - Almaden
OLTP on Hardware Islands Danica Porobic , Ippokratis Pandis*, Miguel - - PowerPoint PPT Presentation
OLTP on Hardware Islands Danica Porobic , Ippokratis Pandis*, Miguel Branco, Pnar Tzn , Anastasia Ailamaki Data-Intensive Application and Systems Lab, EPFL *IBM Research - Almaden Hardware topologies have changed Core 2 Hardware
Data-Intensive Application and Systems Lab, EPFL *IBM Research - Almaden
Core
2
Core Core Core Core
3
Core
Core Core Core Core
4
Core
Core Core Core Core Core Core Core Core
5
6
% Multisite Transactions in Workload Throughput Shared-Nothing Shared-Everything
Best Worst
Performance
Thread to core assignment
7
8
Core Core Core L1 L2 L1 L2 L1 L2 L3 Memory controller Core L1 L2 Core Core Core L1 L2 L1 L2 L1 L2 L3 Core L1 L2 L1 L3 Inter-socket links Memory controller Inter-socket links Inter-socket links Inter-socket links
50 cycles 500 cycles <10 cycles
OS OS
Counter microbenchmark TPC-C Payment
50 100 150 200 250 300 350 400 Throughput (Mtps) 2 4 6 8 10 12 Throughput (Ktps) 8socket x 10cores 4socket x 6cores
39%
9
Unpredictable
40% 47%
? ? ? ? ? ? ? ?
? ? ? ?
Spread Island Spread Island
10
1 10 100 1000 10000
Counter per core Counter per socket Single counter Throughput (Mtps)
Counter microbenchmark TPC-C Payment – local-only
20 40 60 80 100 120 140 160 Shared nothing Shared everything Throughput (Ktps)
18.7x 516.8x 4.5x
8socket x 10cores 4socket x 6cores Log scale
– Experimental setup – Read-only workloads – Update workloads – Impact of skew
11
– Top-of-the-line open source storage manager – Enabled shared-nothing capability
– 4-socket, 6-core Intel Xeon E7530, 64GB RAM – 8-socket, 10-core Intel Xeon E7-L8867, 192GB RAM
12
– Probe/update N rows from the local partition
– Probe/update 1 row from the local partition – Probe/update N-1 rows uniformly from any partition – Partitions may reside on the same instance
13
Shared-everything
Shared-nothing
14
15
Contention for shared data No locks or latches Messaging
Fewer messages for 1 transaction
10 20 30 40 50 60 0% 50% 100% Time per transactions (µs) Multisite transactions Logging Locking Communication Xct management Xct execution
16
4 Islands 10 rows
17
Physical contention More instances per transaction All instances involved in a transaction
18
No latches 2 round of messages +Extra logging +Lock held longer
50 100 150 200 250 300 0% 50% 100% Time per transactions (µs) Multisite transactions Logging Locking Communication Xct management Xct execution
19
4 Islands 10 rows
20
Efficient logging with Aether* More instances per transaction Increased contention
*R. Johnson, et al: Aether: a scalable approach to logging, VLDB 2010
21
100 200 300 400 500 600 700 800 0.25 0.5 0.75 1 Skew factor 50% multisite 100 200 300 400 500 600 700 800 0.25 0.5 0.75 1 Throughput (KTps) Skew factor Local only 24 Islands 4 Islands 1 Island
Few instances are highly loaded Still few hot instances Contention for hot data Larger instances can balance load
– Runs on close cores – Small instances limits contention between threads – Few instances simplify partitioning
– Automatically choose and setup optimal configuration – Dynamically adjust to workload changes
22