Transactional Memory Companion slides for The Art of Multiprocessor - - PowerPoint PPT Presentation

transactional memory
SMART_READER_LITE
LIVE PREVIEW

Transactional Memory Companion slides for The Art of Multiprocessor - - PowerPoint PPT Presentation

Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Moores Law Transistor count still rising Clock speed flattening sharply Art of Multiprocessor 2 Programming


slide-1
SLIDE 1

Transactional Memory

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

slide-2
SLIDE 2

Art of Multiprocessor Programming 2

Moore’s Law

Clock speed flattening sharply Transistor count still rising

slide-3
SLIDE 3

Moore’s Law (in practice)

Art of Multiprocessor Programming 3

slide-4
SLIDE 4

Art of Multiprocessor Programming 4

Nearly Extinct: the Uniprocesor

memory cpu

slide-5
SLIDE 5

Art of Multiprocessor Programming 5

Endangered: The Shared Memory Multiprocessor (SMP)

cache

Bus

Bus

shared memory

cache cache

slide-6
SLIDE 6

Art of Multiprocessor Programming 6

The New Boss: The Multicore Processor (CMP)

cache

Bus

Bus

shared memory

cache cache

All on the same chip Sun T2000 Niagara

slide-7
SLIDE 7

Art of Multiprocessor Programming 7

Traditional Scaling Process

User code Traditional Uniprocessor Speedup

1.8x 7x 3.6x Time: Moore’s law

slide-8
SLIDE 8

Ideal Scaling Process

Art of Multiprocessor Programming 8

User code Multicore Speedup

1.8x 7x 3.6x Unfortunately, not so simple…

slide-9
SLIDE 9

Actual Scaling Process

Art of Multiprocessor Programming 9

1.8x 2x 2.9x

User code Multicore Speedup

Parallelization and Synchronization require great care…

slide-10
SLIDE 10

Art of Multiprocessor Programming 10

Amdahl’s Law Speedup=

1-thread execution time n-thread execution time

slide-11
SLIDE 11

Art of Multiprocessor Programming 11

Amdahl’s Law Speedup=

​1/1+𝑞+​𝑞/𝑜

slide-12
SLIDE 12

Art of Multiprocessor Programming 12

Amdahl’s Law Speedup=

​1/1+𝑞+​𝑞/𝑜

parallel fraction

slide-13
SLIDE 13

Art of Multiprocessor Programming 13

Amdahl’s Law Speedup=

​1/1+𝑞+​𝑞/𝑜

parallel fraction Number of threads

slide-14
SLIDE 14

Art of Multiprocessor Programming 14

Amdahl’s Law Speedup=

​1/1+𝑞+​𝑞/𝑜

parallel fraction Number of threads sequential fraction

slide-15
SLIDE 15

Bad synchronization ruins everything

Amdal’s Law

slide-16
SLIDE 16

16

Example

Art of Multiprocessor Programming

You buy a 10-core machine … Your application is: 60% concurrent 40% sequential How close to a 10-fold speedup?

slide-17
SLIDE 17

17

Example

Art of Multiprocessor Programming

You buy a 10-core machine … Your application is: 60% concurrent 40% sequential How close to a 10-fold speedup?

​ 1 / 1 − . 6 − ​ . 6 / 1 = 2 . 1

slide-18
SLIDE 18

18

Example

Art of Multiprocessor Programming

You buy a 10-core machine … Your application is: 80% concurrent 20% sequential How close to a 10-fold speedup?

slide-19
SLIDE 19

19

Example

Art of Multiprocessor Programming

You buy a 10-core machine … Your application is: 80% concurrent 20% sequential How close to a 10-fold speedup?

​ 1 / 1 − . 8 − ​ . 8 / 1 = 3 . 5

slide-20
SLIDE 20

20

Example

Art of Multiprocessor Programming

You buy a 10-core machine … Your application is: 90% concurrent 10% sequential How close to a 10-fold speedup?

slide-21
SLIDE 21

21

Example

Art of Multiprocessor Programming

You buy a 10-core machine … Your application is: 80% concurrent 20% sequential How close to a 10-fold speedup?

​ 1 / 1 − . 9 − ​ . 9 / 1 = 5 . 2

slide-22
SLIDE 22

22

Example

Art of Multiprocessor Programming

You buy a 10-core machine … Your application is: 99% concurrent 01% sequential How close to a 10-fold speedup?

slide-23
SLIDE 23

23

Example

Art of Multiprocessor Programming

You buy a 10-core machine … Your application is: 80% concurrent 20% sequential How close to a 10-fold speedup?

​ 1 / 1 − . 9 9 − ​ . 9 9 / 1 = 9 .

slide-24
SLIDE 24

Art of

Diminishing Returns

0.5 1 1.5 2 2.5 3 3.5 4 4.5 speedup

This course is about the parts that are hard to make concurrent … but still have a big influence on speedup!

slide-25
SLIDE 25

25

Locking

Art of Multiprocessor Programming

slide-26
SLIDE 26

26

Coarse-Grained Locking

Art of Multiprocessor Programming

Easily made correct … But not scalable.

slide-27
SLIDE 27

27

Fine-Grained Locking

Art of Multiprocessor Programming

Can be very tricky …

slide-28
SLIDE 28

28

Locks are not Robust

Art of Multiprocessor Programming

If a thread holding a lock is delayed … No one else can make progress

slide-29
SLIDE 29

Locking Relies on Conventions

/* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */

Art of Multiprocessor Programming

Relation between … Lock data and object data … Exists only in programmer’s mind

Actual comment from Linux Kernel

(hat tip: Bradley Kuszmaul)

slide-30
SLIDE 30

30

Simple Problems are hard

enq(x) enq(y) double-ended queue No interference if ends “far apart” Interference OK if queue is small Clean solution is publishable result:

[Michael & Scott PODC 97]

Art of Multiprocessor Programming

slide-31
SLIDE 31

Art of Multiprocessor Programming 31

Locks Not Composable

Transfer item from one queue to another Must be atomic : No duplicate or missing items

slide-32
SLIDE 32

Art of Multiprocessor Programming 32

Locks Not Composable

Lock source Lock target Unlock source & target

slide-33
SLIDE 33

Art of Multiprocessor Programming 33

Locks Not Composable

Lock source Lock target Unlock source & target Methods cannot provide internal synchronization Objects must expose locking protocols to clients Clients must devise and follow protocols Abstraction broken!

slide-34
SLIDE 34

34

Monitor Wait and Signal

zzz

Empty buffer

Yes!

Art of Multiprocessor Programming

If buffer is empty, wait for item to show up

slide-35
SLIDE 35

35

Wait and Signal do not Compose

empty empty zzz…

Art of Multiprocessor Programming

Wait for either?

slide-36
SLIDE 36

Art of Multiprocessor Programming 36 36

The Transactional Manifesto

Much modern programming practice inadequate for multicore world Agenda Replace locking with a transactional API Design languages and libraries Implement efficient run-times

slide-37
SLIDE 37

Road Map

37

Transactional Memory Hardware Transactional Memory Hybrid Transactional Memory Software Transactional Memory Research Questions

slide-38
SLIDE 38

Road Map

38

Transactional Memory Hardware Transactional Memory Hybrid Transactional Memory Software Transactional Memory Research Questions

slide-39
SLIDE 39

Art of Multiprocessor Programming 39 39

Transactions

Block of code …. Atomic: appears to happen instantaneously Serializable: all appear to happen in one-at-a-time

  • rder

Commit: takes effect (atomically) Abort: has no effect (typically restarted)

slide-40
SLIDE 40

Art of Multiprocessor Programming 40 40

atomic { x.remove(3); y.add(3); } atomic { y = null; }

Atomic Blocks

slide-41
SLIDE 41

Art of Multiprocessor Programming 41 41

atomic { x.remove(3); y.add(3); } atomic { y = null; }

Atomic Blocks

No data race

slide-42
SLIDE 42

Art of Multiprocessor Programming 42 42

public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = left; left.right = q; left = q; }

A Double-Ended Queue

Write sequential Code

slide-43
SLIDE 43

Art of Multiprocessor Programming 43 43

public void LeftEnq(item x) atomic { Qnode q = new Qnode(x); q.left = left; left.right = q; left = q; } }

A Double-Ended Queue

slide-44
SLIDE 44

Art of Multiprocessor Programming 44 44

public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = left; left.right = q; left = q; } }

A Double-Ended Queue

Enclose in atomic block

slide-45
SLIDE 45

Art of Multiprocessor Programming 45 45

Warning

Not always this simple! Conditional waits? False conflicts? Resource limits? Better problems to have …

slide-46
SLIDE 46

Art of Multiprocessor Programming 46

Composition?

slide-47
SLIDE 47

Art of Multiprocessor Programming 47

Composition?

public void Transfer(Queue<T> q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } }

Trivial or what?

slide-48
SLIDE 48

Art of Multiprocessor Programming 48 48

public T LeftDeq() { atomic { if (left == null) retry; … } }

Conditional Waiting

Roll back transaction and restart when something changes

slide-49
SLIDE 49

Art of Multiprocessor Programming 49 49

Composable Conditional Waiting

atomic { x = q1.deq(); } orElse { x = q2.deq(); }

Run 1st method. If it retries … Run 2nd method. If it retries … Entire statement retries

slide-50
SLIDE 50

Research Questions

Road Map

50

Transactional Memory Hardware Transactional Memory Hybrid Transactional Memory Software Transactional Memory

slide-51
SLIDE 51

Art of Multiprocessor Programming 51 51

Hardware Transactional Memory

Exploit standard “cache coherence” Detect synchronization conflicts … Invalidate cached copies of data.

slide-52
SLIDE 52

Standard Cache Coherence

Bus

cache

memory

cache cache

Art of Multiprocessor Programming

slide-53
SLIDE 53

53

Standard Cache Coherence

Bus

cache

memory

cache cache

Random access memory (10s of cycles)

Art of Multiprocessor Programming

slide-54
SLIDE 54

54

Standard Cache Coherence

cache

memory

cache cache

Bus

Shared Bus

  • Broadcast medium
  • One broadcaster at a time
  • Processors and memory all “snoop”

Art of Multiprocessor Programming

slide-55
SLIDE 55

55

Standard Cache Coherence

Bus

cache

memory

cache cache

Per-Processor Caches

  • Small
  • Fast: 1 or 2 cycles
  • Address & state information

Art of Multiprocessor Programming

slide-56
SLIDE 56

56

Bus

Processor Issues Load Request

Bus

cache

memory

cache cache

data

load x

Art of Multiprocessor Programming

slide-57
SLIDE 57

57

Bus

Processor Issues Load Request

Bus

cache

memory

cache cache load x

Art of Multiprocessor Programming

Got it! data

E

data

slide-58
SLIDE 58

58

Bus

Processor Issues Load Request

Bus

memory

cache cache data data

Load x

E

Art of Multiprocessor Programming

slide-59
SLIDE 59

59

Bus

Other Cache Responds

memory

cache cache data Got it data data

Bus

E S S

Art of Multiprocessor Programming

slide-60
SLIDE 60

60

S

Modify Cached Data

Bus

data

memory

cache data

data

data

S

Art of Multiprocessor Programming

slide-61
SLIDE 61

61

Bus

Invalidate

Bus

memory

cache data data data cache Invalidate x

S S M I

Art of Multiprocessor Programming

slide-62
SLIDE 62

62

cache

Bus

Invalidate

memory

cache data data

This cache acquires write permission

Art of Multiprocessor Programming

slide-63
SLIDE 63

63

cache

Bus

Invalidate

memory

cache data data

Other caches lose read permission This cache acquires write permission

Art of Multiprocessor Programming

slide-64
SLIDE 64

64

cache

Bus

Invalidate

memory

cache data data

Memory provides data only if not present in any cache, so no need to change it now (expensive)

Art of Multiprocessor Programming

slide-65
SLIDE 65

Art of Multiprocessor Programming 65 65

HW Transactional Memory

Interconnect

caches memory

read

active

T

slide-66
SLIDE 66

Art of Multiprocessor Programming 66 66

Transactional Memory

read

active

T T

active

caches memory

slide-67
SLIDE 67

Art of Multiprocessor Programming 67 67

Transactional Memory

active

T T

active

committed

caches memory

slide-68
SLIDE 68

Art of Multiprocessor Programming 68 68

Transactional Memory

write

active

committed T D caches

memory

slide-69
SLIDE 69

Art of Multiprocessor Programming 69 69

Rewind

active

T T

active

write

aborted

D caches

memory

slide-70
SLIDE 70

Art of Multiprocessor Programming 70 70

Transaction Commit

At Commit point … No cache conflicts? We win. Mark transactional cache entries …. Was: read-only, Now: valid Was: modified, Now: dirty (will be written back) That’s (almost) everything!

slide-71
SLIDE 71

Road Map

71

Transactional Memory Hardware Transactional Memory Hybrid Transactional Memory Software Transactional Memory Research Questions

slide-72
SLIDE 72

72

Hardware Transactional Memory (HTM)

IBM’s Blue Gene/Q & System Z & Power8 Intel’s Haswell TSX extensions

slide-73
SLIDE 73

if (_xbegin() == _XBEGIN_STARTED) { speculative code _xend() } else { abort handler }

Intel RTM

slide-74
SLIDE 74

if (_xbegin() == _XBEGIN_STARTED) { speculative code _xend() } else { abort handler }

Intel RTM

start a speculative transaction

slide-75
SLIDE 75

if (_xbegin() == _XBEGIN_STARTED) { speculative code _xend() } else { abort handler }

Intel RTM

If you see this, you are inside a transaction

slide-76
SLIDE 76

if (_xbegin() == _XBEGIN_STARTED) { speculative code _xend() } else { abort handler }

Intel RTM

If you see anything else, your transaction aborted

slide-77
SLIDE 77

if (_xbegin() == _XBEGIN_STARTED) { speculative code _xend() } else { abort handler }

Intel RTM

you could retry the transaction, or take an alternative path

slide-78
SLIDE 78

if (_xbegin() == _XBEGIN_STARTED) { speculative code } else if (status & _XABORT_EXPLICIT) { aborted by user code } else if (status & _XABORT_CONFLICT) { read-write conflict } else if (status & _XABORT_CAPACITY) { cache overflow } else { … }

Abort codes

slide-79
SLIDE 79

if (_xbegin() == _XBEGIN_STARTED) { speculative code } else if (status & _XABORT_EXPLICIT) { aborted by user code } else if (status & _XABORT_CONFLICT) { read-write conflict } else if (status & _XABORT_CAPACITY) { cache overflow } else { … }

Abort codes

speculative code can call _xabort()

slide-80
SLIDE 80

if (_xbegin() == _XBEGIN_STARTED) { speculative code } else if (status & _XABORT_EXPLICIT) { aborted by user code } else if (status & _XABORT_CONFLICT) { read-write conflict } else if (status & _XABORT_CAPACITY) { cache overflow } else { … }

Abort codes

synchronization conflict

  • ccurred (maybe retry)
slide-81
SLIDE 81

if (_xbegin() == _XBEGIN_STARTED) { speculative code } else if (status & _XABORT_EXPLICIT) { aborted by user code } else if (status & _XABORT_CONFLICT) { read-write conflict } else if (status & _XABORT_CAPACITY) { cache overflow } else { … }

Abort codes

read/write set too big (maybe don’t retry)

slide-82
SLIDE 82

if (_xbegin() == _XBEGIN_STARTED) { speculative code } else if (status & _XABORT_EXPLICIT) { aborted by user code } else if (status & _XABORT_CONFLICT) { read-write conflict } else if (status & _XABORT_CAPACITY) { cache overflow } else { … }

Abort codes

  • ther abort codes …
slide-83
SLIDE 83

Too Big

Transaction aborts if data set

  • verflows caches, internal buffers
slide-84
SLIDE 84

Too Slow

Transaction aborts on timer interrupt

slide-85
SLIDE 85

Just Not in the Mood

Many other reasons: TLB miss, illegal instruction, page fault …

slide-86
SLIDE 86

Hybrid Transactional Memory

slide-87
SLIDE 87

if (_xbegin() == _XBEGIN_STARTED) { read lock state if (lock taken) _xabort(); work; _xend() } else { lock->lock(); work; lock->unlock(); }

Non-Speculative Fallback

slide-88
SLIDE 88

if (_xbegin() == _XBEGIN_STARTED) { read lock state if (lock taken) _xabort(); work; _xend() } else { lock->lock(); work; lock->unlock(); }

Non-Speculative Fallback

reading lock ensures that transaction will abort if another thread acquires lock

slide-89
SLIDE 89

if (_xbegin() == _XBEGIN_STARTED) { read lock state if (lock taken) _xabort(); work; _xend() } else { lock->lock(); work; lock->unlock(); }

Non-Speculative Fallback

abort if another thread has acquired lock

slide-90
SLIDE 90

if (_xbegin() == _XBEGIN_STARTED) { read lock state if (lock taken) _xabort(); work; _xend() } else { lock->lock(); work; lock->unlock(); }

Non-Speculative Fallback

  • n abort, acquire lock & do work

(aborting concurrent speculative transactions)

Art of Multiprocessor Programming

slide-91
SLIDE 91

91

Lock Elision

<HLE acquire prefix> lock(); do work; <HLE release prefix> unlock()

Art of Multiprocessor Programming

slide-92
SLIDE 92

92

Lock Elision

<HLE acquire prefix> lock(); do work; <HLE release prefix> unlock()

first time around, read lock and execute speculatively

Art of Multiprocessor Programming

slide-93
SLIDE 93

93

Lock Elision

<HLE acquire prefix> lock(); do work; <HLE release prefix> unlock()

if speculation fails, no more Mr. Nice Guy, acquire the lock

Art of Multiprocessor Programming

slide-94
SLIDE 94

Conventional Locks

94

lock transfer latencies serialized execution locks

Art of Multiprocessor Programming

slide-95
SLIDE 95

Lock Elision

95

locks lock elision

Art of Multiprocessor Programming

slide-96
SLIDE 96

Lock Teleportation

96

slide-97
SLIDE 97

Art of Multiprocessor Programming 97

Hand-over-Hand locking

a b c

Art of Multiprocessor Programming

slide-98
SLIDE 98

Art of Multiprocessor Programming 98

Hand-over-Hand locking

a b c

slide-99
SLIDE 99

99

Hand-over-Hand locking

a b c

Art of Multiprocessor Programming

slide-100
SLIDE 100

100

Hand-over-Hand locking

a b c

Art of Multiprocessor Programming

slide-101
SLIDE 101

Art of Multiprocessor Programming 101

Removing a Node

a b c d remove(b)

slide-102
SLIDE 102

102

Removing a Node

a b c d

Art of Multiprocessor Programming

remove(b)

slide-103
SLIDE 103

Lock Teleportation

a b c d

Art of Multiprocessor Programming

slide-104
SLIDE 104

Lock Teleportation

a b c d

read transaction

Art of Multiprocessor Programming

slide-105
SLIDE 105

Lock Teleportation

a b c d

read transaction

Art of Multiprocessor Programming

slide-106
SLIDE 106

Lock Teleportation

a b c d no locks acquired

Art of Multiprocessor Programming

slide-107
SLIDE 107

How Far to Teleport?

107

Too short? Missed opportunity Too far? Transaction aborts, work lost

slide-108
SLIDE 108

Adaptive Teleportion

108

On Success: limit = limit + 1 limit = limit + 1 On Failure: limit = limit / 2 limit = limit / 2

slide-109
SLIDE 109

Art of Multiprocessor Programming 109 109

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

slide-110
SLIDE 110

Art of Multiprocessor Programming 110 110

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

locked node In sorted list

slide-111
SLIDE 111

Art of Multiprocessor Programming 111 111

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

locked node In sorted list Value to search for

slide-112
SLIDE 112

Art of Multiprocessor Programming 112 112

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

Returns locked node with value less than or equal to v

slide-113
SLIDE 113

Art of Multiprocessor Programming 113 113

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

Try for a fixed number of times

slide-114
SLIDE 114

Art of Multiprocessor Programming 114 114

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

Executed as read-only transaction

slide-115
SLIDE 115

Art of Multiprocessor Programming 115 115

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

Thread-local variable that controls how far to traverse the list

slide-116
SLIDE 116

Art of Multiprocessor Programming 116 116

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

Stop if either (1) we find value v, or (2) we traverse teleportLimit teleportLimit nodes

slide-117
SLIDE 117

Art of Multiprocessor Programming 117 117

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

Unlock starting node, lock final node

slide-118
SLIDE 118

Art of Multiprocessor Programming 118 118

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse up to teleportLimit nodes move lock _xend(); teleportLimit++; return pred; } else { teleportLimit = teleportLimit/2 }}};

Try to commit transaction

slide-119
SLIDE 119

Art of Multiprocessor Programming 119 119

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse list up to threshold move lock _xend(); teleportLimit++; return last node; } else { teleportLimit = teleportLimit/2 }}};

On commit, advance teleportLimit teleportLimit by 1, and return locked node

slide-120
SLIDE 120

Art of Multiprocessor Programming 120 120

Node* teleport(Node* start, T v) { int retries = RETRY_THRESHOLD; while (--retries) { int distance = 0; if (xbegin() == _XBEGIN_STARTED) { traverse list up to threshold move lock _xend(); teleportLimit++; return last node; } else { teleportLimit = teleportLimit/2 }}};

On abort, cut teleportLimit teleportLimit in half

slide-121
SLIDE 121

Lock-Based STMs

121

STMs come in different forms: Lock-Free Lock-based

slide-122
SLIDE 122

Lock-Based STM

122

But, didn’t you just say that locks are evil? For applications, yes! For run-time systems written by experts, maybe not ….

slide-123
SLIDE 123

Lock-Based STMs

123

Each transaction keeps Read Set: locations and values read Write Set: locations and values written Changes installed at commit Conflicts detected at comit

slide-124
SLIDE 124

124

11:00 13:01 16:20

Client Memory lock Too many locks!

11:00 10:22

slide-125
SLIDE 125

11:00 11:00 10:22

125

Client Memory lock Lock Striping

slide-126
SLIDE 126

126

a b c d e 11:00 10:20 11:00

slide-127
SLIDE 127

a 11:00 11:00 10:22 b c

127

a b c d e 11:00 11:00 10:22

Read set Add address, values, and versions to read set To read memory … Check unlocked

slide-128
SLIDE 128

128

To write memory …

c’ e’ 11:01 11:01 a b c d e 11:00 11:00 10:22

Write set Add address, new values and versions to write set

slide-129
SLIDE 129

a b c d e 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set

slide-130
SLIDE 130

a b c d e 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set

slide-131
SLIDE 131

a b c d e 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set To commit … Acquire write locks Compare version #s Install new values

c’ e’

slide-132
SLIDE 132

a b c’ d e’ 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set To commit … Acquire write locks Compare version #s Install new values Increment version #s

11:01 11:01

Release locks

slide-133
SLIDE 133

Zombie Transactions

133

A zombie human is dead but act like it is alive … A zombie transaction is one that will certainly abort, but continues to run … Why do we care?

slide-134
SLIDE 134

134

2 1 x y

Invariant: x = 2 y

slide-135
SLIDE 135

135

2 1 x y

Invariant: x = 2 y read x = 2

slide-136
SLIDE 136

136

2 1 4 2 x y

Invariant: x = 2 y x ← 4 y ← 2 commit read x = 2

slide-137
SLIDE 137

137

2 1 4 2 x y

Invariant: x = 2 y T h i s t r a n s a c t i

  • n

i s a z

  • m

b i e , d

  • m

e d t

  • d

i e , b u t s t i l l r u n n i n g ! read x = 2 read y = 2 Who cares?

slide-138
SLIDE 138

138

2 1 4 2 x y

Invariant: x = 2 y z ← 1/(x-y) Oh, no! It divides by zero and crashes the system! read x = 2 read y = 2

slide-139
SLIDE 139

139

2 1 4 2 x y

Invariant: x = 2 y z ← 1/(x-y) T h e p r

  • p

e r t y t h a t e v e r y t r a n s a c t i

  • n

s e e s a c

  • n

s i s t e n s t a t e i s c a l l e d … read x = 2 read y = 2 Opacity

slide-140
SLIDE 140

Version Clock

140

11:00

Introduce version clock Incremented by (some) writers Guarantees opacity Read by everyone

slide-141
SLIDE 141

Transactin

141

11:00

a b c d e 11:00 10:30 09:00

Version numbers not really timestamps, but useful to pretend

slide-142
SLIDE 142

Transactions

142

11:00 11:00

a b c d e 11:00 10:30 09:00

Copy clock to rv rv

slide-143
SLIDE 143

a b c d e 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set

11:00 11:00

R u n s p e c u l a t i v e t r a n s a c t i

  • n

a s b e f

  • r

e …

slide-144
SLIDE 144

a b c d e 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set

11:00 11:00

Lock Write Set …

slide-145
SLIDE 145

11:00

a b c d e 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set

11:00

Increment global clock

11:01

slide-146
SLIDE 146

a b c d e 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set

11:01 11:00

Validate read set…

slide-147
SLIDE 147

a b c d e 1 2 1 11:00 11:00 10:22 c’ e’ 11:01 11:01

Write set

a 11:00 11:00 10:22 b c

Read set

11:01 11:00

Commit & release locks

slide-148
SLIDE 148

a b c d e 1 2 1 11:00 11:00 10:22 a 11:00 11:00 10:22 b c

Read set

11:00 11:00

Read-only transactions?

slide-149
SLIDE 149

a b c d e 1 2 1 11:00 11:00 10:22 a 11:00 11:00 10:22 b c

Read set

11:00 11:00

Check that version numbers less than or equal to cached clock C h e c k t h a t v a r i a b l e s r e a d a r e u n l

  • c

k e d

slide-150
SLIDE 150

Road Map

150

Transactional Memory Hardware Transactional Memory Hybrid Transactional Memory Software Transactional Memory Research Questions

slide-151
SLIDE 151

Art of Multiprocessor Programming 151

TM Design Issues

  • Implementation

choices

  • Language design

issues

  • Semantic issues
slide-152
SLIDE 152

Art of Multiprocessor Programming 152

Granularity

  • Object

– managed languages, Java, C#, … – Easy to control interactions between transactional & non-trans threads

  • Word

– C, C++, … – Hard to control interactions between transactional & non-trans threads

slide-153
SLIDE 153

Art of Multiprocessor Programming 153

Direct/Deferred Update

  • Deferred

– modify private copies & install on commit – Commit requires work – Consistency easier

  • Direct

– Modify in place, roll back on abort – Makes commit efficient – Consistency harder

slide-154
SLIDE 154

Art of Multiprocessor Programming 154

Conflict Detection

  • Eager

– Detect before conflict arises – “Contention manager” module resolves

  • Lazy

– Detect on commit/abort

  • Mixed

– Eager write/write, lazy read/write …

slide-155
SLIDE 155

Art of Multiprocessor Programming 155

Conflict Detection

  • Eager detection may abort transactions

that could have committed.

  • Lazy detection discards more

computation.

slide-156
SLIDE 156

Art of Multiprocessor Programming 156

Contention Management & Scheduling

  • How to resolve

conflicts?

  • Who moves forward

and who rolls back?

  • Lots of empirical

work but formal work in infancy

slide-157
SLIDE 157

Art of Multiprocessor Programming 157

Contention Manager Strategies

  • Exponential backoff
  • Priority to

– Oldest? – Most work? – Non-waiting?

  • None Dominates
  • But needed anyway

Judgment of Solomon

slide-158
SLIDE 158

Art of Multiprocessor Programming 158

I/O & System Calls?

  • Some I/O revocable

– Provide transaction- safe libraries – Undoable file system/DB calls

  • Some not

– Opening cash drawer – Firing missile

slide-159
SLIDE 159

Art of Multiprocessor Programming 159

I/O & System Calls

  • One solution: make transaction

irrevocable

– If transaction tries I/O, switch to irrevocable mode.

  • There can be only one …

– Requires serial execution

  • No explicit aborts

– In irrevocable transactions

slide-160
SLIDE 160

Art of Multiprocessor Programming 160

Exceptions

int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); }

slide-161
SLIDE 161

Art of Multiprocessor Programming 161

Exceptions

int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException!

slide-162
SLIDE 162

Art of Multiprocessor Programming 162

Exceptions

int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException! What is printed?

slide-163
SLIDE 163

Art of Multiprocessor Programming 163

Unhandled Exceptions

  • Aborts transaction

– Preserves invariants – Safer

  • Commits transaction

– Like locking semantics – What if exception object refers to values modified in transaction?

slide-164
SLIDE 164

Art of Multiprocessor Programming 164

Nested Transactions

atomic void foo() { bar(); } atomic void bar() { … }

slide-165
SLIDE 165

Art of Multiprocessor Programming 165

Nested Transactions

  • Needed for modularity

– Who knew that cosine() contained a transaction?

  • Flat nesting

– If child aborts, so does parent

  • First-class nesting

– If child aborts, partial rollback of child only

slide-166
SLIDE 166

166

Locks and transactions complement on another

slide-167
SLIDE 167

167

TM can improve memory management, both automatic and explicit.

slide-168
SLIDE 168

168

TM restructures in-memory databases

slide-169
SLIDE 169

Power and Energy

169

New research in energy-efficient synchronization

slide-170
SLIDE 170

GPUs, etc.

170

GPUs and accelerators need synchronization

slide-171
SLIDE 171

171

TM can simplify operating system kernels, device drivers, security …

slide-172
SLIDE 172

172

Transaction-Friendly data structures

slide-173
SLIDE 173

Theory

173

slide-174
SLIDE 174

Architecture

174

slide-175
SLIDE 175

Gartner Hype Cycle

Hat tip: Jeremy Kemp

You are here

slide-176
SLIDE 176

176

slide-177
SLIDE 177

Спасибо!

Art of Multiprocessor Programming 177

slide-178
SLIDE 178

178