Practical Parallel Nesting for Software Transactional Memory
Nuno Diegues and Jo˜ ao Cachopo
ndiegues@gsd.inesc-id.pt
1/29
Practical Parallel Nesting for Software Transactional Memory Nuno - - PowerPoint PPT Presentation
Practical Parallel Nesting for Software Transactional Memory Nuno Diegues and Jo ao Cachopo ndiegues@gsd.inesc-id.pt 1/29 Problem A 18 16 14 12 Speedup 10 8 6 4 2 0 1 2 4 8 16 32 48 # threads 2/29 Problem A 18 B 16
ndiegues@gsd.inesc-id.pt
1/29
2 4 6 8 10 12 14 16 18 1 2 4 8 16 32 48 Speedup # threads A 2/29
2 4 6 8 10 12 14 16 18 1 2 4 8 16 32 48 Speedup # threads A B 2/29
2 4 6 8 10 12 14 16 18 1 2 4 8 16 32 48 Speedup # threads A B
2/29
3/29
3/29
3/29
3/29
3/29
4/29
A B C D
4/29
A B C D B B D D D C
4/29
4/29
A B C D B B D D D C
5/29
5/29
5/29
5/29
5/29
5/29
5/29
5/29
5/29
5/29
6/29
1
2
3
6/29
1
2
3
6/29
7/29
8/29
8/29
8/29
8/29
nClock: 0
nClock: 0 nClock: 0
nClock: 0
8/29
nClock: 1
nClock: 0
committed
nClock: 0 nClock: 0
8/29
nClock: 1
nClock: 0
committed
nClock: 0 nClock: 0
write-set write-set
value value value value
shared memory
9/29
nClock: 1
nClock: 0
committed
nClock: 0 nClock: 0
write-set write-set read:
value value value value
shared memory
9/29
nClock: 1
nClock: 0
committed
nClock: 0 nClock: 0
write-set write-set read
9/29
nClock: 1
nClock: 0
committed
nClock: 0 nClock: 0
write-set write-set read write-set
9/29
nClock: 1
nClock: 0
committed
nClock: 0 nClock: 0
write-set write-set read write-set
9/29
nClock: 1
nClock: 0
committed
nClock: 0 nClock: 0
write-set write-set read write-set
9/29
nClock: 1
nClock: 0
committed
nClock: 0 nClock: 0
write-set write-set read write-set
9/29
10/29
◮ uncommon case, however 11/29
◮ uncommon case, however
11/29
◮ uncommon case, however
◮ regardless of depth ◮ less metadata 11/29
◮ uncommon case, however
◮ regardless of depth ◮ less metadata ◮ assumes no concurrent writes ⋆ for performance, not correctness ⋆ slow-mode fallback path ⋆ if common, exploit other inner parallelism level 11/29
12/29
variable X
permanent
variable Y
permanent
12/29
variable X
permanent
variable Y
permanent version: value: previous: 99 version: value: previous: 42
15 7
12/29
variable X
permanent tentative
variable Y
permanent tentative
12/29
variable X
permanent
value: previous: 10 tentative
variable Y
permanent
value: previous: 15 tentative
value: previous: 6
E A B
12/29
variable X
permanent
value: previous: 10 tentative
variable Y
permanent
value: previous: 15 tentative
value: previous: 6
E A B write-set write-set write-set
13/29
variable X
permanent
value: previous: 10 tentative
variable Y
permanent
value: previous: 15 tentative
value: previous: 6
13/29
variable X
permanent
value: previous: 10 tentative
variable Y
permanent
value: previous: 15 tentative
value: previous: 6
committed
13/29
variable X
permanent
value: previous: 10 tentative
variable Y
permanent
value: previous: 15 tentative
value: previous: 6
committed
committed
13/29
variable X
permanent
value: previous: 10 tentative
variable Y
permanent
value: previous: 15 tentative
value: previous: 6
committed
committed
committed 13/29
14/29
14/29
variable Y
permanent
value: tentative
A write:
committed
committed
committed 15/29
variable Y
permanent
value: 42 tentative
C write: write:
committed
committed
committed
42
15/29
variable Y
permanent
value: 42 tentative
C write: write:
committed
committed
committed
42
aborted 15/29
variable Y
permanent
value: 42 tentative
C write: write:
committed
committed
committed
10
15/29
16/29
16/29
16/29
1 Can we use parallel nesting to improve performance of STMs? 17/29
1 Can we use parallel nesting to improve performance of STMs? 2 Compare with other STMs with parallel nesting support 17/29
1 Can we use parallel nesting to improve performance of STMs? 2 Compare with other STMs with parallel nesting support 3 Overhead assessment 17/29
0.5 1 1.5 2 2.5 1x1 1x2 1x3 2x3 4x3 8x3 16x3 speedup # threads: top-level txs x nested txs
18/29
19/29
19/29
0.5 1 1.5 2 2.5 1x1 1x2 1x3 2x3 4x3 8x3 16x3 speedup # threads: top-level txs x nested txs
20/29
0.5 1 1.5 2 2.5 1x1 1x2 1x3 2x3 4x3 8x3 16x3 speedup # threads: top-level txs x nested txs
21/29
0.5 1 1.5 2 2.5 1x1 1x2 1x3 2x3 4x3 8x3 16x3 speedup # threads: top-level txs x nested txs
22/29
2 4 6 8 10 12 14 1x1 1x2 1x4 1x8 1x16 2x16 3x16 speedup # threads: top-level txs x nested txs
23/29
2 4 6 8 10 12 14 16 18 1x1 1x2 1x4 1x8 1x16 2x16 3x16 speedup # threads: top-level txs x nested txs
24/29
25/29
10 20 30 40 50 60 70 1x1 1x2 1x4 1x8 1x16 2x16 3x16 throughput (105 ops/sec) # threads
jvstm-tl nestm-tl pnstm-tl
26/29
10 20 30 40 50 60 70 1x1 1x2 1x4 1x8 1x16 2x16 3x16 throughput (105 ops/sec) # threads
jvstm-tl jvstm-pn nestm-tl nestm-pn pnstm-tl pnstm-pn
26/29
1 2 3 4 5 6 1 8 32 128 throughput (105 ops/sec) # depth
jvstm nestm pnstm
27/29
1 2 3 4 5 6 1 8 32 128 throughput (105 ops/sec) # depth
jvstm nestm pnstm
27/29
28/29
◮ ...and exploit inner-parallelism ◮ Parallel Nesting is key to achieve that ◮ long running transactions 28/29
◮ ...and exploit inner-parallelism ◮ Parallel Nesting is key to achieve that ◮ long running transactions
◮ improvements in motivation scenario ◮ ...and with respect to state of the art
◮ http://inesc-id-esw.github.io/jvstm/ 28/29
29/29
Global latest commit: 0
30/29
Global latest commit: 0
starting version: 0 30/29
Global latest commit: 0
starting version: 0
w(x,6) write set: { x: 6 }
30/29
Global latest commit: 1
version: value: previous: 1 6
starting version: 0
w(x,6) write set: { x: 6 }
variable X
permanent tentative
30/29
Global latest commit: 1
version: value: previous: 1 6
starting version: 1
variable X
permanent tentative
30/29
Global latest commit: 1
version: value: previous: 1 6
starting version: 1
variable X
permanent tentative
status:
Orec1
Alive B 30/29
Global latest commit: 1
version: value: previous: 1 6
starting version: 1
variable X
permanent tentative
status:
Orec1
Alive B
value: previous:
w(x,0) write set: { x }
30/29
Global latest commit: 2
version: value: previous: 2
starting version: 1
variable X
permanent tentative
status:
Orec1
2 B
value: previous:
write set: { x } w(x,0) version: value: previous: 1 6
30/29
Global latest commit: 2
version: value: previous: 2
variable X
permanent tentative
status:
Orec1
2 B
value: previous:
version: value: previous: 1 6
starting version: 2
r(x) = ?
status:
Orec2
Alive C
write set: { y, z, w, ... }
30/29
Global latest commit: 2
version: value: previous: 2
variable X
permanent tentative
status:
Orec1
2 B
value: previous:
version: value: previous: 1 6
starting version: 2
r(x) = 0
status:
Orec2
Alive C
write set: { y, z, w, ... }
30/29
Global latest commit: 2
version: value: previous: 2
variable X
permanent tentative
status:
Orec1
2 B
value: previous:
version: value: previous: 1 6
starting version: 2
status:
Orec2
Alive C
write set: { y, z, w, ... } w(x,42)
30/29
Global latest commit: 2
version: value: previous: 2
variable X
permanent tentative
status:
Orec1
2 B
value: previous: 42
version: value: previous: 1 6
starting version: 2
status:
Orec2
Alive C
write set: { x, y, z, w, ... } w(x,42)
30/29
Global latest commit: 2
version: value: previous: 2
variable X
permanent tentative
value: previous: 42
version: value: previous: 1 6
starting version: 2
status:
Orec2
Abort C
write set: { x, y, z, w, ... }
30/29
Global latest commit: 2
version: value: previous: 2
variable X
permanent tentative
value: previous: 42
version: value: previous: 1 6
starting version: 2
status:
Orec2
Abort C
write set: { x, y, z, w, ... }
starting version: 2
r(x) = ?
30/29
Global latest commit: 2
version: value: previous: 2
variable X
permanent tentative
value: previous: 42
version: value: previous: 1 6
starting version: 2
status:
Orec2
Abort C
write set: { x, y, z, w, ... }
starting version: 2
r(x) = ? write set: { }
30/29
31/29
new read-set
31/29
new read-set
r(x)
31/29
new read-set
31/29
new read-set
read-set pool
31/29
new read-set
read-set pool
31/29
read-set pool
31/29
read-set pool
r(x)
31/29
read-set pool
r(x)
31/29
read-set pool
r(x)
31/29
read-set pool
r(x)
31/29
read-set pool
r(x)
31/29
read-set pool
thread1 thread2
31/29
read-set pool
thread1 thread2
31/29
read-set pool
thread1 thread2 committed
31/29
read-set pool
thread1 thread2 committed
array: isFree: false counter: thread2's
Atomic Counter 1
31/29
read-set pool
thread1 thread2 committed Atomic Counter 1
array: isFree: false counter: thread2's array: isFree: false counter: thread2's array: isFree: false counter: thread2's array: isFree: true counter: thread2's read-set next
31/29
read-set pool
thread1 thread2 committed Atomic Counter 2
array: isFree: false counter: thread2's array: isFree: true counter: thread2's array: isFree: false counter: thread2's array: isFree: true counter: thread2's
31/29
clock: 0 clock: 0 clock: 0 clock: 0 clock: 0
32/29
ancV A: 0 B: 0 ancV A: 0 ancV A: 0 B: 0 ancV A: 0 clock: 0 clock: 0 clock: 0 clock: 0 clock: 0
32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
wS: { Y } wS: { Y }
ancV A: 0 B: 0 ancV A: 0 ancV A: 0 B: 0 ancV A: 0 clock: 0 clock: 0 clock: 0 clock: 0 clock: 0
32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 0 E
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 B: 0 ancV A: 0 ancV A: 0 B: 0 ancV A: 0 clock: 0 clock: 0 clock: 0 clock: 0 clock: 0
32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 1 B
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 B: 0 ancV A: 0 ancV A: 0 clock: 0 clock: 1 clock: 0 clock: 0
committed 32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 1 B
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 B: 0 ancV A: 0 ancV A: 0 clock: 0 clock: 1 clock: 0 clock: 0
committed 32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 1 B
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 B: 0 ancV A: 0 ancV A: 0 clock: 0 clock: 1 clock: 0 clock: 0
committed 32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 1 B
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 B: 0 ancV A: 0 ancV A: 0 clock: 0 clock: 1 clock: 0 clock: 0
committed 32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 1 B
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 B: 0 ancV A: 0 ancV A: 0 clock: 0 clock: 1 clock: 0 clock: 0
committed
restart
32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 1 B
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 B: 1 ancV A: 0 ancV A: 0 clock: 0 clock: 1 clock: 0 clock: 0
committed 32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 1 B
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 B: 1 ancV A: 0 ancV A: 0 clock: 0 clock: 1 clock: 0 clock: 0
committed
rS: { Y -> IP }
32/29
variable Y
permanent
value: previous: 15 tentative
value: previous: 5
status:
Orec5
Alive nestedVer: 1 B
status:
Orec1
Alive nestedVer: 0 A
wS: { Y } wS: { Y }
ancV A: 0 ancV A: 0 clock: 0 clock: 2 clock: 0
committed
rS: { Y -> IP }
committed 32/29
write x=1
A
write y=42
C B
read x:1 write y=0 write x:2 read x:2 write z=2 value of "y" for A time 42
33/29