Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics
Minjia Zhang,
1
Low-Overhead Software Transactional Memory with Progress Guarantees - - PowerPoint PPT Presentation
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, Jipeng Huang, Man Cao, Michael D. Bond 1 Do We Need Efficient STM? 2 Problem Solved! Blue Gene/Q 3 Problem Solved? HTM is limited 4
1
2
3
Blue Gene/Q
4
[1] I. Calciu et al. Invyswell: A Hybrid Transactional Memory for Haswell’s Restricted Transactional Memory. In PACT, 2014. [2] R. M. Yoo et al. Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance
5
atomic { from.balance -= amount; to.balance += amount; } transaction
6
[1] C. Cascaval et al. Software Transactional Memory: Why Is It Only a Research Toy? In CACM, 2008 [2] A. Dragojevi´c, et al. Why STM Can Be More than a Research Toy. In CACM, 2011 [3] R. M. Yoo et al. Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough. In SPAA, 2008.
7
[1] C. Cascaval et al. Software Transactional Memory: Why Is It Only a Research Toy? In CACM, 2008 [2] A. Dragojevi´c, et al. Why STM Can Be More than a Research Toy. In CACM, 2011 [3] R. M. Yoo et al. Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough. In SPAA, 2008.
T1 atomic { … … = o.f; … = p.g; …
p.g = …; … }
8
T2
9
p.g = … T2 T1 atomic { … … = o.f; … = p.g; …
p.g = …; … }
10
t.k = … T2 T1 atomic { … … = o.f; … = p.g; …
p.g = …; … }
11
instrumentation ? T2 T1 atomic { … … = o.f; … = p.g; …
p.g = …; … }
12
13
14
15
16
17
f lock state
18
∈ {WrExT, RdExT, RdSh} f lock state
19
Time T1
f lock state T2
WrExT1
transaction start
txn id: 42
20
Time T1
f lock state T2 last txn
WrExT1
transaction start
txn id: 42
21
Time T1
f lock state T2
update
last txn 42
WrExT1
transaction start
txn id: 42
22
Time T1
f lock state T2
add
undo log last txn 42
…
WrExT1
transaction start
txn id: 42
23
Time T1 T2
f lock state
update
last txn 1 42
…
WrExT1
transaction start
txn id: 42
24
Time T1 T2
f lock state last txn 1 42
…
WrExT1
transaction start
txn id: 42
… …
25
Time T1 T2
f lock state No synchronization on T1’s accesses to o Problem! last txn 1 42
…
WrExT1
transaction start
txn id: 42
26
Time T1 T2
f lock state T2 starts coordination
… …
last txn 1 42
…
WrExT1
transaction start
txn id: 42
27
Time T1 T2
f lock state
update
… …
last txn 1 42
…
IntT2
transaction start
txn id: 42
28
Time T1 T2
f lock state
request
… …
last txn 1 42
…
IntT2
transaction start
txn id: 42
29
Time T1 T2
f lock state
request
… = o.f
… … safe point safe point
last txn 1 42
…
IntT2
transaction start
txn id: 42
30
Time T1 T2
f lock state
request
… = o.f
… … safe point safe point
Detecting Conflicts
last txn 1 42
…
IntT2
transaction start
txn id: 42
31
Time T1 T2
f lock state
request
… = o.f safe point safe point
… …
Detecting Conflicts Contention Management
detected conflicts
Resolving Conflicts
last txn 1 42
…
IntT2
transaction start 32
Time T1 T2
f lock state
safe point
no conflict request
… … … safe point
Detecting Conflicts
last txn
txn id: 43
1 42
…
IntT2
transaction start
txn id: 42
33
Time T1 T2
f lock state
request
… = o.f safe point
… …
Detecting Conflicts
last txn 1 42
…
IntT2
transaction start 34
Time T1 T2
f lock state
response
waiting
request
txn id: 42
… = o.f safe point
… …
Detecting Conflicts
last txn 1 42
…
IntT2
transaction start
txn id: 42
35
Time T1 T2
f lock state
request
safe point
… … … = o.f
may abort Detecting Conflicts
last txn
waiting
may abort
response
1 42
…
IntT2
transaction start
txn id: 42
36
Time T1 T2
f lock state
request
safe point
… … … = o.f
may abort Detecting Conflicts
last txn
waiting
may abort
Starvation and livelock freedom
response
1 42
…
IntT2
transaction start transaction start
txn id: 42
37
Time T1 T2
f lock state
transactional access
request
safe point
… … … = o.f
abort Detecting Conflicts
last txn
waiting
Transactional vs. Transactional Conflict
response
1 42
…
IntT2
transaction start
retry
transaction start
txn id: 42
38
Time T1 T2
f lock state
transactional access request
safe point
… … … = o.f
Detecting Conflicts abort
last txn
waiting
Transactional vs. Transactional Conflict
response
1 42
…
IntT2
transaction start
txn id: 42
39
Time T1 T2
f lock state
safe point
non-transactional access request
safe point
… … … = o.f
Detecting Conflicts abort
last txn
waiting
Transactional vs. Non-transactional Conflict
response
1 42
…
IntT2
transaction start
txn id: 42
40
Time T1 T2
f lock state
non-transactional access
retry
request
safe point
… … … = o.f
Detecting Conflicts abort
last txn
waiting
Transactional vs. Non-transactional Conflict
response
1 42
…
IntT2
41
Time T1 T2
non-transactional access request
response
T1
transaction end
safe point … = o.f
Non-transactional accesses short transactions no setting up/tearing down cost
42
Time T1 T2
f lock state
request transaction end transaction start
txn id: 51
safe point
Detecting Conflicts
last txn
waiting
response
1 42
…
IntT2
transaction start
txn id: 51
43
Time T1 T2
f lock state
acquire lock
request transaction end
safe point
Detecting Conflicts
last txn
waiting
response
1 42
…
WrExT2
transaction start
txn id: 51
44
Time T1 T2
f lock state
request transaction end update add
undo log
safe point
Detecting Conflicts
last txn
waiting
response
2 51
…
WrExT2
transaction start
txn id: 51
45
Time T1 T2
f lock state
request transaction end
undo log
Two versions of coordination protocol
safe point
Detecting Conflicts
last txn
waiting
response
2 51
…
WrExT2
46
txn: 51
47
Time T1 T2
… = o.f … …
… … … = o.f … …
txn: 42 txn: 43 txn: 52
… = o.f … …
… …
48
Time T1 T2
request response
…
… = o.f … …
… … … = o.f … …
… = o.f … …
…
request response safe point safe point
txn: 51 txn: 42 txn: 43 txn: 52
request
49
Handling High Contention
50
Time T1 T2
… = o.f … …
… … … = o.f … …
… = o.f … …
…
txn: 51 txn: 42 txn: 43 txn: 52
…
51
Time T1 T2
… = o.f … …
… … … = o.f … …
… = o.f … …
…
txn: 51 txn: 42 txn: 43 txn: 52
…
52
1 B. Saha et al. McRT-STM: A High Performance Software Transactional Memory System for a Multi-Core Runtime. In PPoPP, 2006. 2 T. Shpeisman et al. Enforcing Isolation and Ordering in STM. In PLDI, 2007. 3 L. Dalessandro et al. NOrec: Streamlining STM by Abolishing Ownership Records. In PPoPP, 2010.
Write concurrency control Read concurrency control LarkTM-O Eager per-object biased reader–writer lock Eager per-object biased reader–writer lock LarkTM-S IntelSTM–LarkTM-O hybrid IntelSTM–LarkTM-O hybrid IntelSTM1,2 Eager per-object lock Lazy version validation NOrec3 Lazy global seqlock Lazy value validation
53
Instrumented accesses LarkTM-O All accesses LarkTM-S All accesses IntelSTM All accesses NOrec All transactional accesses
except redundant accesses
54
Progress Guarantee LarkTM-O Livelock and starvation free LarkTM-S Livelock and starvation free IntelSTM None NOrec Livelock free
55
Semantics LarkTM-O Strong Atomicity LarkTM-S Strong Atomicity IntelSTM Strong Atomicity NOrec Single Global Lock Atomicity (SLA)
decisions, redundant barrier analysis, name-mangling)
the Jikes RVM Research Archive
56
57
58
Overhead (%)
59
50 100 150 200 250 300 Overhead (%) NOrec
610
60
50 100 150 200 250 300 Overhead (%) NOrec IntelSTM
610 2870
61
50 100 150 200 250 300 Overhead (%) NOrec IntelSTM LarkTM-O
610 2870
62
50 100 150 200 250 300 Overhead (%) NOrec IntelSTM LarkTM-O LarkTM-S
610 2870
63
50 100 150 200 250 300 Overhead (%) NOrec IntelSTM LarkTM-O LarkTM-S
610 2870 40% 73%
64
NOrec 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads NOrec IntelSTM LarkTM-O LarkTM-S
65
NOrec IntelSTM 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads NOrec IntelSTM LarkTM-O LarkTM-S
66
NOrec IntelSTM LarkTM-O 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads NOrec IntelSTM LarkTM-O LarkTM-S
67
NOrec IntelSTM LarkTM-O LarkTM-S 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads NOrec IntelSTM LarkTM-O LarkTM-S
68
NOrec IntelSTM LarkTM-O LarkTM-S 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads
Low instrumentation
69
NOrec IntelSTM LarkTM-O LarkTM-S 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads
scales well Low instrumentation
70
NOrec IntelSTM LarkTM-O LarkTM-S 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads
scales well Low instrumentation
Strong progress guarantees
71
NOrec IntelSTM LarkTM-O LarkTM-S 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads
scales well Low instrumentation
Strong progress guarantees Strong semantics
72
NOrec IntelSTM LarkTM-O LarkTM-S 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 4 8 Speedup Threads
scales well Low instrumentation
Strong progress guarantees Strong semantics