Hardware Read-Write Lock Elision Alexander Shady Issa Pascal - - PowerPoint PPT Presentation
Hardware Read-Write Lock Elision Alexander Shady Issa Pascal - - PowerPoint PPT Presentation
Hardware Read-Write Lock Elision Alexander Shady Issa Pascal Felber Matveev Paolo Romano Multicores are everywhere 21/4/16 Hardware Read-Write Lock Elision - Eurosys 2016 2 Parallel programming Main memory Core 1 Core 2 Core 3 Core 4
Multicores are everywhere
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 2
Parallel programming
Main memory Core 1 Core 2 Core 3 Core 4
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 3
Parallel programming
Main memory Core 1 Core 2 Core 3 Core 4
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 4
Parallel programming
Main memory Core 1 Core 2 Core 3 Core 4 complexity
- deadlocks
- livelocks
- priority inversions
- convoy effects
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 5
Parallel programming
Main memory Core 1 Core 2 Core 3 Core 4 Transactional memory
atomic{ if(bal>amount) withdraw(amount) }
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 6
Hardware Transactional Memory
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 7
Hardware lock elision
Thread 1
lock r(A) unlock w(B) lock w(X) w(Y) unlock
Thread 2
lock
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 8
Hardware lock elision
Thread 1
lock r(A) unlock w(B) lock w(X) w(Y) unlock
Thread 2
Begin H/W Tx Commit H/W Tx Begin H/W Tx Commit H/W Tx
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 9
Hardware lock elision
Thread 1
lock r(A) r(X) lock w(X)
Thread 2
Begin H/W Tx Begin H/W Tx
abort acquire lock normally
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 10
Hardware lock elision
Thread 1
lock r(A) r(X) lock w(X)
Thread 2
Begin H/W Tx Begin H/W Tx
abort acquire lock normally
capacity prohibited instructions page faults TLB miss
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 11
Read-write Locks
read mode write mode concurrent readers ✔ sequential writers ✘ read dominated workloads blocks readers ✘
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 12
Hardware read-write lock elision
readers run without instrumentation:
- no H/W Txs
- no S/W tracking of
read locations
- no lock acquisition
writers run in H/W Txs:
- no lock acquisition
- H/W tracking of
read/write locations
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 13
Hardware read-write lock elision
Reader Writer
r-lock r(X) r-unlock r(?) w-lock w(X) w(Y) w-unlock
Begin HW Tx Commit HW Tx
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 14
Hardware read-write lock elision
Reader Writer
r-lock r(X) r-unlock r(?) w-lock w(X) w(Y) w-unlock
? = Y
Begin H/W Tx Commit H/W Tx
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 15
Hardware read-write lock elision
Reader Writer
r-lock r(X) r-unlock r(?) w-lock w(X) w(Y) w-unlock
? = Y
w-unlock
Begin H/W Tx Commit H/W Tx
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 16
Hardware read-write lock elision
Reader Writer
r-lock r(X) r-unlock r(Y) w-lock w(X) w(Y) w-unlock
abort
Begin H/W Tx Commit H/W Tx
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 17
Hardware read-write lock elision
Reader Writer
r-lock r(X) r-unlock r(?) w-lock w(X) w(Y) w-unlock w-unlock
wait for concurrent readers active here
Begin H/W Tx Commit H/W Tx
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 18
Hardware read-write lock elision
R1 Writer
r-unlock r-unlock w-unlock w-unlock
reader state R1 inactive R2 active . .
R2
r-lock
reader state R1 inactive R2 inactive . .
abort
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 19
Hardware read-write lock elision
Reader Writer
r-lock r(X) r-unlock r(?) w-lock w(X) w(Y)
Suspend HW Tx Resume HW Tx Commit HW Tx
w-unlock
Begin HW Tx
wait for concurrent readers active here
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 20
Hardware read-write lock elision
Writes use H/W Txs concurrency among writers ✔ Txs may never commit
- fallback lock is a must
- Readers must synchronize
with lock holder ✘
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 21
Hardware read-write lock elision
Reader Writer in lock fallback
r-lock r(X) r-unlock w-lock w(X) w(Y)
wait for concurrent readers
Release lock
w-unlock
Acquire lock
r(Y) r-lock
wait for lock check lock
- ld
X
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 22
Rollback Only Transactions
Thread 1
lock r(X) lock w(X)
Thread 2
Begin H/W Tx Begin H/W Tx
abort
lock r(X) lock w(X) Begin ROT Begin ROT r(X) unlock Commit ROT unlock Commit ROT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 23
Rollback Only Transactions
Thread 1
lock r(X) lock w(X)
Thread 2
Begin H/W Tx Begin H/W Tx
abort
lock r(X) lock w(X) Begin ROT Begin ROT r(X) unlock Commit ROT unlock Commit ROT
- ld
X new X
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 24
Using Rollback Only Transactions (ROTs)
atomic ✔ no tracking of reads ✘ not serializable ✘ allow larger Txs ✔ no need for Suspend/Resume ✔ single writer ✘
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 25
Experiments
synthetic benchmarks degree of contention length of Transactions complex benchmarks and applications STMBench7 TPC-C Kyoto Cabinet 10 cores 80 H/W threads
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 26
RW-LEOPT RW-LEPES
HTM ROT GL ROT GL
Optimistic Pessimestic
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 27
RW-LEOPT RW-LEPES HLE BRLock RWL SGL
HTM ROT GL ROT GL
Optimistic Pessimestic
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 28
Synthetic benchmarks
Degree of contention Size of Txs
RW-LEOPT RW-LEPES HLE BRLock RWL SGL
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 29
Stress test – performance
Low contention High contention Small Txs Large Txs
RW-LEOPT RW-LEPES HLE BRLock RWL SGL
10% writers
0.01 0.1 1 16 32 64 80 Time (s) Number of threads 0.1 1 16 32 64 80 Time (s) Number of threads RW-LEOPT RW-LEPES HLE BRLock RWL SGL 0.1 1 16 32 64 80 Time (s) Number of threads 0.1 1 10 16 32 64 80 Time (s) Number of threads
8X 10X 7X
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 30
Aborts (%)
HLE RW-LE PES RW-LE OPT
Aborts (%)
HLE RW-LE PES RW-LE OPT
Stress test – abort rate
Low contention High contention Small Txs Large Txs
10% writers
HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity
Aborts (%)
HLE RW-LE PES RW-LE OPT
Aborts (%)
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 31
Stress test
Low contention High contention Small Txs Large Txs
RW-LEOPT RW-LEPES HLE BRLock RWL SGL
90% writers
0.1 1 16 32 64 80 Time (s) Number of threads RW-LEOPT RW-LEPES HLE BRLock RWL SGL 0.01 0.1 1 16 32 64 80 Time (s) Number of threads 0.1 1 10 100 16 32 64 80 Time (s) Number of threads 0.1 1 10 16 32 64 80 Time (s) Number of threads
- 10%
- 25%
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 32
Synthetic benchmarks
Low contention High contention Low capacity High capacity
20 40 60 80 100 1% write locks Commits (%)
HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
20 40 60 80 100 1% write locks Commits (%)
HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
20 40 60 80 100 1% write locks Commits (%)
HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
20 40 60 80 100 1% write locks Commits (%)
HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 33
Synthetic benchmarks
Low contention High contention Low capacity High capacity
20 40 60 80 100 1% write locks Aborts (%)
HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
20 40 60 80 100 1% write locks Aborts (%)
HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
20 40 60 80 100 1% write locks Aborts (%)
HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
20 40 60 80 100 1% write locks Aborts (%)
HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 34
TPC-C
0.1 1 10 100 16 32 64 80 1% write locks Speedup (vs. SGL 1 thr.) 16 32 64 80 10% write locks Number of threads RW-LEOPT RW-LEPES 16 32 64 80 50% write locks HLE BRLock RWL SGL
6X
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 35
TPC-C
20 40 60 80 100 1% write locks Commits (%)
HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (1,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 36
TPC-C
20 40 60 80 100 1% write locks Aborts (%)
HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT
10% write locks Number of threads (1,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
50% write locks
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 37
STMbench7
2 4 6 8 10 16 32 64 80 10% write locks Throughput (103 Tx/s) 16 32 64 80 50% write locks Number of threads RW-LEOPT RW-LEPES 16 32 64 80 90% write locks HLE BRLock RWL SGL
4X
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 38
STMbench7
20 40 60 80 100 10% write locks Commits (%)
HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT
50% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 39
STMbench7
20 40 60 80 100 10% write locks Aborts (%)
HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT
50% write locks Number of threads (2,4,8,16,32,64,80)
HLE RW-LE PES RW-LE OPT
90% write locks
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 40
Kyoto Cabinet
2 4 6 8 16 32 64 <1% write locks Throughput (106 Tx/s) 16 32 64 5% write locks Number of threads RW-LEOPT RW-LEPES 16 32 64 10% write locks HLE BRLock Orig SGL
2X
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 41
Kyoto Cabinet
20 40 60 80 100 <1% write locks Commits (%)
HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT
5% write locks Number of threads (1,4,8,16,32,64)
HLE RW-LE PES RW-LE OPT
10% write locks
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 42
Kyoto Cabinet
10 20 30 40 50 <1% write locks Aborts (%)
HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT
5% write locks Number of threads (1,4,8,16,32,64)
HLE RW-LE PES RW-LE OPT
10% write locks
HLE RW-LE PES RW-LE OPT
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 43
Conclusions
readers without instrumentation writers using H/W Tx
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 44
Conclusions
readers without instrumentation writers using H/W Tx
10X
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 45
Conclusions
readers without instrumentation writers using H/W Tx suspend/resume ROTs
10x
Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 46