NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY
Tim Harris, 25 November 2016
NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, - - PowerPoint PPT Presentation
NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8 Problems with locks Atomic blocks and composition Hardware transactional memory Software transactional memory Transactional
Tim Harris, 25 November 2016
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
4
In this course, we covered …. Best practices … New and clever ideas … And common-sense observations.
Art of Multiprocessor Programming
5
In this course, we covered …. Best practices … New and clever ideas … And common-sense observations. Nevertheless … Concurrent programming is still too hard … Here we explore why this is …. And what we can do about it.
Art of Multiprocessor Programming
6
Art of Multiprocessor Programming
7
Easily made correct … But not scalable.
Art of Multiprocessor Programming
8
Can be tricky …
Art of Multiprocessor Programming
9
If a thread holding a lock is delayed … No one else can make progress
Art of Multiprocessor Programming
– Locks and objects – Exists only in programmer’s mind
/* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder, mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */
Actual comment from Linux Kernel
(hat tip: Bradley Kuszmaul) Art of Multiprocessor Programming
11
enq(x) enq(y) double-ended queue No interference if ends “far apart” Interference OK if queue is small Clean solution is publishable result:
[Michael & Scott PODC 97]
Art of Multiprocessor Programming
Art of Multiprocessor Programming 12
Transfer item from one queue to another Must be atomic : No duplicate or missing items
Art of Multiprocessor Programming 13
Lock source Lock target Unlock source & target
Art of Multiprocessor Programming 14
Lock source Lock target Unlock source & target Methods cannot provide internal synchronization Objects must expose locking protocols to clients Clients must devise and follow protocols Abstraction broken!
15
zzz
Empty buffer
Yes!
Art of Multiprocessor Programming
If buffer is empty, wait for item to show up
16
empty empty zzz…
Art of Multiprocessor Programming
Wait for either?
Art of Multiprocessor Programming 17 17
– to meet the multicore challenge
– Replace locking with a transactional API – Design languages or libraries – Implement efficient run-time systems
Art of Multiprocessor Programming 18 18
Block of code …. Atomic: appears to happen instantaneously Serializable: all appear to happen in one-at-a-time
Commit: takes effect (atomically) Abort: has no effect (typically restarted)
Art of Multiprocessor Programming 19 19
atomic { x.remove(3); y.add(3); } atomic { y = null; }
Art of Multiprocessor Programming 20 20
atomic { x.remove(3); y.add(3); } atomic { y = null; }
No data race
Art of Multiprocessor Programming 21 21
public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = left; left.right = q; left = q; }
Write sequential Code
Art of Multiprocessor Programming 22 22
public void LeftEnq(item x) atomic { Qnode q = new Qnode(x); q.left = left; left.right = q; left = q; } }
Art of Multiprocessor Programming 23 23
public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = left; left.right = q; left = q; } }
Enclose in atomic block
Art of Multiprocessor Programming 24 24
– Conditional waits – Enhanced concurrency – Complex patterns
Art of Multiprocessor Programming 25
Art of Multiprocessor Programming 26
public void Transfer(Queue<T> q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } }
Trivial or what?
Art of Multiprocessor Programming 27 27
public T LeftDeq() { atomic { if (left == null) retry; … } }
Roll back transaction and restart when something changes
Art of Multiprocessor Programming 28 28
atomic { x = q1.deq(); } orElse { x = q2.deq(); }
Run 1st method. If it retries … Run 2nd method. If it retries … Entire statement retries
Art of Multiprocessor Programming 29 29
– Invalidation – Consistency checking
– Branch prediction =
Art of Multiprocessor Programming 30 30
Interconnect
caches memory
read
active
T
Art of Multiprocessor Programming 31 31
read
active
T T
active
caches memory
Art of Multiprocessor Programming 32 32
active
T T
active
committed
caches memory
Art of Multiprocessor Programming 33 33
write
active
committed T D caches
memory
Art of Multiprocessor Programming 34 34
active
T T
active
write
aborted
D caches
memory
Art of Multiprocessor Programming 35 35
– If no cache conflicts, we win.
– Read-only: valid – Modified: dirty (eventually written back)
– Except for a few details …
Art of Multiprocessor Programming 36 36
– Transactional cache size – Scheduling quantum
– Too big – Too slow – Actual limits platform-dependent
– Transaction size and length – Bounded HW resources – Guarantees vs best-effort
– Transaction size and length – Bounded HW resources – Guarantees vs best-effort
– Diagnostics essential – Try again in software?
Locks don’t compose, transactions do. Composition necessary for Software Engineering. But practical HTM doesn’t really support composition! Why we need STM
reads and writes executed atomically
– External: with respect to the interleavings
– Internal: the transaction itself should
Application Memory X Y 4 2 8 4 Invariant x = 2y Transaction A: Write x Write y Transaction B: Read x Read y Compute z = 1/(x-y) = 1/2
Art of Multiprocessor Programming 43
– Lock-based – Lock-free
Consistency
Art of Multiprocessor Programming 44
– Read set: locations & values read – Write set: locations & values to be written
– Changes installed at commit
– Conflicts detected at commit
Art of Multiprocessor Programming 45 45
Map Application Memory
V# V# V#
Array of version #s & locks
Art of Multiprocessor Programming 46 46
Mem Locks
V# V# V# V# V#
Add version numbers & values to read set
Art of Multiprocessor Programming 47 47
Mem Locks
V# V# V# V# V#
Add version numbers & new values to write set
Art of Multiprocessor Programming 48 48
Mem Locks
V# V# V# V# V#
V#+1 V#+1
Acquire write locks Check version numbers unchanged Install new values Increment version numbers Unlock.
V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 X V# 1 V# 0 Y V# 1 V# 0 V# 0
Mem Locks
V#+1 0 V#+1 0 V# 0 V# 0 V# 0 V#+1 0 V# 0 V# 0 V# 0 V# 0 V#+1 0 V# 0
X Y Quick read of values freshly written by the reading transaction
V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0
Mem Locks
V#+1 0 V# 0 V# 0
Hold locks for very short duration
V# 1 V# 1 V# 1 X Y V#+1 0 V# 1 V#+1 0
V# 0 V#+1 0 V# 0 V# 0 V# 0 V# 0 V#+1 0 V# 0
X Y
ENC Hand MCS COM
Red-Black Tree 20% Delete 20% Update 60% Lookup
COM ENC Hand MCS
Red-Black Tree 5% Delete 5% Update 90% Lookup
abort.
can happen
Art of Multiprocessor Programming 54
x y 4 2 8 4
Invariant: x = 2y Transaction A: reads x = 4 Transaction B: writes 8 to x, 16 to y, aborts A ) Transaction A: (zombie) reads y = 4 computes 1/(x-y) Divide by zero FAIL!
Art of Multiprocessor Programming 55
transactions
always consistent
100
Art of Multiprocessor Programming 56 56
Mem Locks
12 32 56 19 17
100
Shared Version Clock Private Read Version (RV)
Copy version clock to local read version clock
100
Art of Multiprocessor Programming 57 57
Mem Locks
12 32 56 19 17
100
Shared Version Clock Private Read Version (RV)
Copy version clock to local read version clock Read lock, version #, and memory
Art of Multiprocessor Programming 58 58
Mem Locks
12 32 56 19 17
100
Shared Version Clock Private Read Version (RV)
Copy version clock to local read version clock Read lock, version #, and memory, check version # less than read clock
100
On Commit: check unlocked & version #s less than local read clock
Art of Multiprocessor Programming 59 59
Mem Locks
12 32 56 19 17
100
Shared Version Clock Private Read Version (RV)
Copy version clock to local read version clock Read lock, version #, and memory On Commit: check unlocked & version #s less than local read clock
100
We have taken a snapshot without keeping an explicit read set!
read lock: check unlocked, unchanged, and v# <= RV
87 0 87 0 34 0 88 0 V# 0 44 0 V# 0 34 0 99 0 99 0 50 0 50 0
Mem Locks Reads form a snapshot of memory. No read set!
100
Shared Version Clock
87 0 34 0 99 0 50 0 87 0 34 0 88 0 V# 0 44 0 V# 0 99 0 50 0
100
RV
100
Art of Multiprocessor Programming 61 61
Mem Locks
12 32 56 19 17
100
Shared Version Clock Private Read Version (RV)
Copy version clock to local read version clock
100
Art of Multiprocessor Programming 62 62
Mem Locks
12 32 56 19 17
100
Shared Version Clock Private Read Version (RV)
Copy version clock to local read version clock On read/write, check: Unlocked & version # < RV Add to R/W set
Art of Multiprocessor Programming 63 63
Mem Locks 100
Shared Version Clock
100
12 32 56 19 17
Private Read Version (RV)
Acquire write locks
Art of Multiprocessor Programming 64 64
Mem Locks 100
Shared Version Clock
100 101
12 32 56 19 17
Private Read Version (RV)
Acquire write locks Increment Version Clock
Art of Multiprocessor Programming 65 65
Mem Locks 100
Shared Version Clock
100 101
12 32 56 19 17
Private Read Version (RV)
Acquire write locks Increment Version Clock Check version numbers ≤ RV
Art of Multiprocessor Programming 66 66
Mem Locks 100
Shared Version Clock
100 101
12 32 56 19 17
Private Read Version (RV)
Acquire write locks Increment Version Clock Check version numbers ≤ RV Update memory
Art of Multiprocessor Programming 67 67
Mem Locks 100
Shared Version Clock
100 101
12 32 56 19 17
Private Read Version (RV)
Acquire write locks Increment Version Clock Check version numbers ≤ RV Update memory Update write version #s
101 101
unlocked and v# <= RV then add to Read/Write-Set
Reads+Inc+Writes =serializable
100
Shared Version Clock
87 0 87 0 34 0 88 0 44 0 V# 0 34 0 99 0 99 0 50 0 50 0
Mem Locks
87 0 34 0 99 0 50 0 34 1 99 1 87 0 X Y
Commit
121 0 121 0 50 0 87 0 121 0 88 0 V# 0 44 0 V# 0 121 0 50 0 100
RV
100 120 121
X Y
Art of Multiprocessor Programming 69
choices
issues
Art of Multiprocessor Programming 70
– managed languages, Java, C#, … – Easy to control interactions between transactional & non-trans threads
– C, C++, … – Hard to control interactions between transactional & non-trans threads
Art of Multiprocessor Programming 71
– modify private copies & install on commit – Commit requires work – Consistency easier
– Modify in place, roll back on abort – Makes commit efficient – Consistency harder
Art of Multiprocessor Programming 72
– Detect before conflict arises – “Contention manager” module resolves
– Detect on commit/abort
– Eager write/write, lazy read/write …
Art of Multiprocessor Programming 73
that could have committed.
computation.
Art of Multiprocessor Programming 74
conflicts?
and who rolls back?
work but formal work in infancy
Art of Multiprocessor Programming 75
– Oldest? – Most work? – Non-waiting?
Judgment of Solomon
Art of Multiprocessor Programming 76
– Provide transaction- safe libraries – Undoable file system/DB calls
– Opening cash drawer – Firing missile
Art of Multiprocessor Programming 77
irrevocable
– If transaction tries I/O, switch to irrevocable mode.
– Requires serial execution
– In irrevocable transactions
Art of Multiprocessor Programming 78
int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); }
Art of Multiprocessor Programming 79
int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException!
Art of Multiprocessor Programming 80
int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException! What is printed?
Art of Multiprocessor Programming 81
– Preserves invariants – Safer
– Like locking semantics – What if exception object refers to values modified in transaction?
Art of Multiprocessor Programming 82
atomic void foo() { bar(); } atomic void bar() { … }
Art of Multiprocessor Programming 83
– Who knew that cosine() contained a transaction?
– If child aborts, so does parent
– If child aborts, partial rollback of child only
STM is too inefficient
Requires radical change in programming style
Erlang-style shared nothing only true path to salvation
There is nothing wrong with what we do today.
Hat tip: Jeremy Kemp
You are here
Art of Multiprocessor Programming 89