Hardware Read-Write Lock Elision Alexander Shady Issa Pascal - - PowerPoint PPT Presentation

hardware read write lock elision
SMART_READER_LITE
LIVE PREVIEW

Hardware Read-Write Lock Elision Alexander Shady Issa Pascal - - PowerPoint PPT Presentation

Hardware Read-Write Lock Elision Alexander Shady Issa Pascal Felber Matveev Paolo Romano Multicores are everywhere 21/4/16 Hardware Read-Write Lock Elision - Eurosys 2016 2 Parallel programming Main memory Core 1 Core 2 Core 3 Core 4


slide-1
SLIDE 1

Hardware Read-Write Lock Elision

Pascal Felber Shady Issa Paolo Romano Alexander Matveev

slide-2
SLIDE 2

Multicores are everywhere

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 2

slide-3
SLIDE 3

Parallel programming

Main memory Core 1 Core 2 Core 3 Core 4

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 3

slide-4
SLIDE 4

Parallel programming

Main memory Core 1 Core 2 Core 3 Core 4

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 4

slide-5
SLIDE 5

Parallel programming

Main memory Core 1 Core 2 Core 3 Core 4 complexity

  • deadlocks
  • livelocks
  • priority inversions
  • convoy effects

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 5

slide-6
SLIDE 6

Parallel programming

Main memory Core 1 Core 2 Core 3 Core 4 Transactional memory

atomic{ if(bal>amount) withdraw(amount) }

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 6

slide-7
SLIDE 7

Hardware Transactional Memory

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 7

slide-8
SLIDE 8

Hardware lock elision

Thread 1

lock r(A) unlock w(B) lock w(X) w(Y) unlock

Thread 2

lock

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 8

slide-9
SLIDE 9

Hardware lock elision

Thread 1

lock r(A) unlock w(B) lock w(X) w(Y) unlock

Thread 2

Begin H/W Tx Commit H/W Tx Begin H/W Tx Commit H/W Tx

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 9

slide-10
SLIDE 10

Hardware lock elision

Thread 1

lock r(A) r(X) lock w(X)

Thread 2

Begin H/W Tx Begin H/W Tx

abort acquire lock normally

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 10

slide-11
SLIDE 11

Hardware lock elision

Thread 1

lock r(A) r(X) lock w(X)

Thread 2

Begin H/W Tx Begin H/W Tx

abort acquire lock normally

capacity prohibited instructions page faults TLB miss

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 11

slide-12
SLIDE 12

Read-write Locks

read mode write mode concurrent readers ✔ sequential writers ✘ read dominated workloads blocks readers ✘

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 12

slide-13
SLIDE 13

Hardware read-write lock elision

readers run without instrumentation:

  • no H/W Txs
  • no S/W tracking of

read locations

  • no lock acquisition

writers run in H/W Txs:

  • no lock acquisition
  • H/W tracking of

read/write locations

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 13

slide-14
SLIDE 14

Hardware read-write lock elision

Reader Writer

r-lock r(X) r-unlock r(?) w-lock w(X) w(Y) w-unlock

Begin HW Tx Commit HW Tx

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 14

slide-15
SLIDE 15

Hardware read-write lock elision

Reader Writer

r-lock r(X) r-unlock r(?) w-lock w(X) w(Y) w-unlock

? = Y

Begin H/W Tx Commit H/W Tx

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 15

slide-16
SLIDE 16

Hardware read-write lock elision

Reader Writer

r-lock r(X) r-unlock r(?) w-lock w(X) w(Y) w-unlock

? = Y

w-unlock

Begin H/W Tx Commit H/W Tx

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 16

slide-17
SLIDE 17

Hardware read-write lock elision

Reader Writer

r-lock r(X) r-unlock r(Y) w-lock w(X) w(Y) w-unlock

abort

Begin H/W Tx Commit H/W Tx

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 17

slide-18
SLIDE 18

Hardware read-write lock elision

Reader Writer

r-lock r(X) r-unlock r(?) w-lock w(X) w(Y) w-unlock w-unlock

wait for concurrent readers active here

Begin H/W Tx Commit H/W Tx

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 18

slide-19
SLIDE 19

Hardware read-write lock elision

R1 Writer

r-unlock r-unlock w-unlock w-unlock

reader state R1 inactive R2 active . .

R2

r-lock

reader state R1 inactive R2 inactive . .

abort

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 19

slide-20
SLIDE 20

Hardware read-write lock elision

Reader Writer

r-lock r(X) r-unlock r(?) w-lock w(X) w(Y)

Suspend HW Tx Resume HW Tx Commit HW Tx

w-unlock

Begin HW Tx

wait for concurrent readers active here

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 20

slide-21
SLIDE 21

Hardware read-write lock elision

Writes use H/W Txs concurrency among writers ✔ Txs may never commit

  • fallback lock is a must
  • Readers must synchronize

with lock holder ✘

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 21

slide-22
SLIDE 22

Hardware read-write lock elision

Reader Writer in lock fallback

r-lock r(X) r-unlock w-lock w(X) w(Y)

wait for concurrent readers

Release lock

w-unlock

Acquire lock

r(Y) r-lock

wait for lock check lock

  • ld

X

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 22

slide-23
SLIDE 23

Rollback Only Transactions

Thread 1

lock r(X) lock w(X)

Thread 2

Begin H/W Tx Begin H/W Tx

abort

lock r(X) lock w(X) Begin ROT Begin ROT r(X) unlock Commit ROT unlock Commit ROT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 23

slide-24
SLIDE 24

Rollback Only Transactions

Thread 1

lock r(X) lock w(X)

Thread 2

Begin H/W Tx Begin H/W Tx

abort

lock r(X) lock w(X) Begin ROT Begin ROT r(X) unlock Commit ROT unlock Commit ROT

  • ld

X new X

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 24

slide-25
SLIDE 25

Using Rollback Only Transactions (ROTs)

atomic ✔ no tracking of reads ✘ not serializable ✘ allow larger Txs ✔ no need for Suspend/Resume ✔ single writer ✘

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 25

slide-26
SLIDE 26

Experiments

synthetic benchmarks degree of contention length of Transactions complex benchmarks and applications STMBench7 TPC-C Kyoto Cabinet 10 cores 80 H/W threads

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 26

slide-27
SLIDE 27

RW-LEOPT RW-LEPES

HTM ROT GL ROT GL

Optimistic Pessimestic

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 27

slide-28
SLIDE 28

RW-LEOPT RW-LEPES HLE BRLock RWL SGL

HTM ROT GL ROT GL

Optimistic Pessimestic

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 28

slide-29
SLIDE 29

Synthetic benchmarks

Degree of contention Size of Txs

RW-LEOPT RW-LEPES HLE BRLock RWL SGL

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 29

slide-30
SLIDE 30

Stress test – performance

Low contention High contention Small Txs Large Txs

RW-LEOPT RW-LEPES HLE BRLock RWL SGL

10% writers

0.01 0.1 1 16 32 64 80 Time (s) Number of threads 0.1 1 16 32 64 80 Time (s) Number of threads RW-LEOPT RW-LEPES HLE BRLock RWL SGL 0.1 1 16 32 64 80 Time (s) Number of threads 0.1 1 10 16 32 64 80 Time (s) Number of threads

8X 10X 7X

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 30

slide-31
SLIDE 31

Aborts (%)

HLE RW-LE PES RW-LE OPT

Aborts (%)

HLE RW-LE PES RW-LE OPT

Stress test – abort rate

Low contention High contention Small Txs Large Txs

10% writers

HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity

Aborts (%)

HLE RW-LE PES RW-LE OPT

Aborts (%)

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 31

slide-32
SLIDE 32

Stress test

Low contention High contention Small Txs Large Txs

RW-LEOPT RW-LEPES HLE BRLock RWL SGL

90% writers

0.1 1 16 32 64 80 Time (s) Number of threads RW-LEOPT RW-LEPES HLE BRLock RWL SGL 0.01 0.1 1 16 32 64 80 Time (s) Number of threads 0.1 1 10 100 16 32 64 80 Time (s) Number of threads 0.1 1 10 16 32 64 80 Time (s) Number of threads

  • 10%
  • 25%

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 32

slide-33
SLIDE 33

Synthetic benchmarks

Low contention High contention Low capacity High capacity

20 40 60 80 100 1% write locks Commits (%)

HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

20 40 60 80 100 1% write locks Commits (%)

HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

20 40 60 80 100 1% write locks Commits (%)

HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

20 40 60 80 100 1% write locks Commits (%)

HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 33

slide-34
SLIDE 34

Synthetic benchmarks

Low contention High contention Low capacity High capacity

20 40 60 80 100 1% write locks Aborts (%)

HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

20 40 60 80 100 1% write locks Aborts (%)

HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

20 40 60 80 100 1% write locks Aborts (%)

HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

20 40 60 80 100 1% write locks Aborts (%)

HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 34

slide-35
SLIDE 35

TPC-C

0.1 1 10 100 16 32 64 80 1% write locks Speedup (vs. SGL 1 thr.) 16 32 64 80 10% write locks Number of threads RW-LEOPT RW-LEPES 16 32 64 80 50% write locks HLE BRLock RWL SGL

6X

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 35

slide-36
SLIDE 36

TPC-C

20 40 60 80 100 1% write locks Commits (%)

HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (1,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 36

slide-37
SLIDE 37

TPC-C

20 40 60 80 100 1% write locks Aborts (%)

HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT

10% write locks Number of threads (1,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

50% write locks

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 37

slide-38
SLIDE 38

STMbench7

2 4 6 8 10 16 32 64 80 10% write locks Throughput (103 Tx/s) 16 32 64 80 50% write locks Number of threads RW-LEOPT RW-LEPES 16 32 64 80 90% write locks HLE BRLock RWL SGL

4X

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 38

slide-39
SLIDE 39

STMbench7

20 40 60 80 100 10% write locks Commits (%)

HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT

50% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 39

slide-40
SLIDE 40

STMbench7

20 40 60 80 100 10% write locks Aborts (%)

HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT

50% write locks Number of threads (2,4,8,16,32,64,80)

HLE RW-LE PES RW-LE OPT

90% write locks

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 40

slide-41
SLIDE 41

Kyoto Cabinet

2 4 6 8 16 32 64 <1% write locks Throughput (106 Tx/s) 16 32 64 5% write locks Number of threads RW-LEOPT RW-LEPES 16 32 64 10% write locks HLE BRLock Orig SGL

2X

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 41

slide-42
SLIDE 42

Kyoto Cabinet

20 40 60 80 100 <1% write locks Commits (%)

HTM ROT SGL Uninstrumented HLE RW-LE PES RW-LE OPT

5% write locks Number of threads (1,4,8,16,32,64)

HLE RW-LE PES RW-LE OPT

10% write locks

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 42

slide-43
SLIDE 43

Kyoto Cabinet

10 20 30 40 50 <1% write locks Aborts (%)

HTM tx HTM non-tx HTM capacity Lock aborts ROT conflicts ROT capacity HLE RW-LE PES RW-LE OPT

5% write locks Number of threads (1,4,8,16,32,64)

HLE RW-LE PES RW-LE OPT

10% write locks

HLE RW-LE PES RW-LE OPT

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 43

slide-44
SLIDE 44

Conclusions

readers without instrumentation writers using H/W Tx

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 44

slide-45
SLIDE 45

Conclusions

readers without instrumentation writers using H/W Tx

10X

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 45

slide-46
SLIDE 46

Conclusions

readers without instrumentation writers using H/W Tx suspend/resume ROTs

10x

Hardware Read-Write Lock Elision - Eurosys 2016 21/4/16 46