Time-Warp: Lightweight Abort Minimization in Transactional Memory - - PowerPoint PPT Presentation

time warp lightweight abort minimization in transactional
SMART_READER_LITE
LIVE PREVIEW

Time-Warp: Lightweight Abort Minimization in Transactional Memory - - PowerPoint PPT Presentation

Time-Warp: Lightweight Abort Minimization in Transactional Memory Nuno Diegues and Paolo Romano ndiegues@gsd.inesc-id.pt Nuno Diegues 1/27 Transactional Memory Powerful abstraction for synchronization in shared memory Nuno Diegues 2/27


slide-1
SLIDE 1

Time-Warp: Lightweight Abort Minimization in Transactional Memory

Nuno Diegues and Paolo Romano

ndiegues@gsd.inesc-id.pt

Nuno Diegues 1/27

slide-2
SLIDE 2

Transactional Memory

Powerful abstraction for synchronization in shared memory

Nuno Diegues 2/27

slide-3
SLIDE 3

Transactional Memory

Powerful abstraction for synchronization in shared memory Executions equivalent to serial ones

Nuno Diegues 2/27

slide-4
SLIDE 4

Transactional Memory

Powerful abstraction for synchronization in shared memory Executions equivalent to serial ones Optimistic

Nuno Diegues 2/27

slide-5
SLIDE 5

Transactional Memory

Powerful abstraction for synchronization in shared memory Executions equivalent to serial ones Optimistic Transactions may abort to ensure correctness

◮ Typically, more aborts than needed Nuno Diegues 2/27

slide-6
SLIDE 6

Problem

head A D E

Linked List

...

Nuno Diegues 3/27

slide-7
SLIDE 7

Problem

T head A D E

Linked List

...

B

insert B Nuno Diegues 3/27

slide-8
SLIDE 8

Problem

T head A D E

Linked List

...

U B

insert B remove E Nuno Diegues 3/27

slide-9
SLIDE 9

Problem

RO

contains D?

T head A D E

Linked List

...

U B

insert B remove E Nuno Diegues 3/27

slide-10
SLIDE 10

Problem

RO

contains D?

T head A D E

Linked List

...

U B

insert B remove E Nuno Diegues 4/27

slide-11
SLIDE 11

Problem

RO

contains D?

T head A D E

Linked List

...

U B

insert B remove E

X

Nuno Diegues 4/27

slide-12
SLIDE 12

Problem

RO

contains D?

T head A D E

Linked List

...

U B

insert B remove E read-set: { head.next, A.next, D.next } write-set: { D.next } read-set: { head.next, A.next } write-set: { A.next } read-set: { head.next, A.next }

X

Nuno Diegues 4/27

slide-13
SLIDE 13

Problem

RO

contains D?

head A D E

Linked List

...

U B

remove E read-set: { head.next, A.next, D.next } write-set: { D.next } read-set: { head.next, A.next }

X

T

read head.next: A read A.next: D write B.next = D write A.next = B Nuno Diegues 4/27

slide-14
SLIDE 14

Problem

head A D E

Linked List

...

U B

remove E read-set: { head.next, A.next, D.next } write-set: { D.next }

X

T

read head.next: A read A.next: D write B.next = D write A.next = B

RO

read head.next: A read A.next: D

rw

Nuno Diegues 4/27

slide-15
SLIDE 15

Problem

RO

read head.next: A

T

rw

head A D E

Linked List

...

read head.next: A read A.next: D write B.next = D write A.next = B

U

read head.next: A read A.next: D read D.next = E write D.next = E.next read A.next: D

rw

B

X

Nuno Diegues 4/27

slide-16
SLIDE 16

Problem

RO T

rw

head A D E

Linked List

...

write A.next = B

U

read A.next: D read A.next: D

rw

B

X

Nuno Diegues 4/27

slide-17
SLIDE 17

Problem

To guarantee a given correctness level, a TM aborts transactions. Typical STMs use the following rule:

Nuno Diegues 5/27

slide-18
SLIDE 18

Problem

To guarantee a given correctness level, a TM aborts transactions. Typical STMs use the following rule: function commit(Transaction tx): for each ‹datum, version›∈ tx.readSet do if not latestVersion(datum, version) then abort(tx)

Nuno Diegues 5/27

slide-19
SLIDE 19

Problem

Condition: Abort T if its reads are not up-to-date when it attempts to commit.

Nuno Diegues 6/27

slide-20
SLIDE 20

Problem

Condition: Abort T if its reads are not up-to-date when it attempts to commit. Serializability: Necessary condition But not sufficient

Nuno Diegues 6/27

slide-21
SLIDE 21

Problem

Condition: Abort T if its reads are not up-to-date when it attempts to commit. Serializability: Necessary condition But not sufficient Deemed to be practical

◮ without being overly conservative (eg., precluding all concurrency) Nuno Diegues 6/27

slide-22
SLIDE 22

Objective

RO T

rw

head A D E

Linked List

...

write A.next = B

U

read A.next: D read A.next: D

rw

B

X

Nuno Diegues 7/27

slide-23
SLIDE 23

Objective

RO T

rw

head A D E

Linked List

...

write A.next = B

U

read A.next: D read A.next: D

rw

B

Nuno Diegues 7/27

slide-24
SLIDE 24

Objective

Lightweight minimization of spurious aborts: More restrictive abort condition Always read consistently Read-only transactions that never abort

Nuno Diegues 7/27

slide-25
SLIDE 25

Outline

Problem and Motivation Objective Existing Work Time-Warp Evaluation

Nuno Diegues 8/27

slide-26
SLIDE 26

Existing Work

Additional versions — fixed number in LSA [DISC06] MV-Permissiveness — as many as needed in JVSTM [PPoPP11] Permissiveness — AbortsAvoider [SPAA09] Interval-Based — AVSTM [DISC08]

Nuno Diegues 9/27

slide-27
SLIDE 27

Existing Work

Interval-Based: AVSTM [DISC08], TSTM [TPDS12], IR_VWC_P [ICA3PP11]

Nuno Diegues 10/27

slide-28
SLIDE 28

Existing Work

Interval-Based: AVSTM [DISC08], TSTM [TPDS12], IR_VWC_P [ICA3PP11] bounds for serialization order refined with transaction execution imposed by concurrent commits choose one value in the final interval

Nuno Diegues 10/27

slide-29
SLIDE 29

Interval-based Approach

Disadvantages: A read-only transaction may abort An update transaction may abort due to one miss Scalability issues on commit

Nuno Diegues 11/27

slide-30
SLIDE 30

read x

T1

read y write y read x write x

rw

T2

Nuno Diegues 12/27

slide-31
SLIDE 31

read x

T1

read y write y read x write x

rw

T2 T2 T1

time serialization Nuno Diegues 12/27

slide-32
SLIDE 32

read x

T1

read y write y read x write x

rw

T2 T2 T1

time

rw

serialization

Decouple serialization order from commit order

Nuno Diegues 12/27

slide-33
SLIDE 33

read x

T1

read y write y read x write x

rw

T2 T2

serialization Nuno Diegues 12/27

slide-34
SLIDE 34

read x

T1

read y write y read x write x

rw

T2 T2

serialization

T1

rw

Nuno Diegues 12/27

slide-35
SLIDE 35

read x

T1

read y write y read x write x

rw

T2 T2

serialization

T1

rw

Time-warp commit

Nuno Diegues 12/27

slide-36
SLIDE 36

read x

T1

read y write y read x write x

rw

T2 T2

serialization

T1

rw

t=4 t=3

Time-warp commit

Nuno Diegues 12/27

slide-37
SLIDE 37

read x

T1

read y write y read x write x

rw

T2 T2

serialization

T1

rw

t=4 t=3 time-warp=2

Time-warp commit — versions are produced with past version

Nuno Diegues 12/27

slide-38
SLIDE 38

Abort Condition

When can we not apply this idea?

Nuno Diegues 13/27

slide-39
SLIDE 39

Abort Condition

When can we not apply this idea? Look out for a specific structure

Nuno Diegues 13/27

slide-40
SLIDE 40

Abort Condition

When can we not apply this idea? Look out for a specific structure: Three transactions connected

◮ a triad

A

write y read y

rw

T

write x

B

read x

rw

Nuno Diegues 13/27

slide-41
SLIDE 41

Abort Condition

When can we not apply this idea? Look out for a specific structure: Three transactions connected

◮ a triad

The link between all three

◮ the pivot

A

write y read y

rw

T

write x

B

read x

rw

Pivot Nuno Diegues 13/27

slide-42
SLIDE 42

Abort Condition

When can we not apply this idea? Look out for a specific structure: Three transactions connected

◮ a triad

The link between all three

◮ the pivot

Abort if:

◮ Completes a triad ◮ Whose pivot time-warp

commits

A

write y read y

rw

T

write x

B

read x

rw

Pivot Nuno Diegues 13/27

slide-43
SLIDE 43

Abort Condition

Necessary condition (tighter) Still cheap enough to check

A

write y read y

rw

T

write x

B

read x

rw

Pivot read z write z

wr

Nuno Diegues 13/27

slide-44
SLIDE 44

How to Validate

Upon commit, transaction T performs:

Nuno Diegues 14/27

slide-45
SLIDE 45

How to Validate

Upon commit, transaction T performs: Validate each write k

Nuno Diegues 14/27

slide-46
SLIDE 46

How to Validate

Upon commit, transaction T performs: Validate each write k

◮ Detect if some concurrent T ′ read k ◮ If so, T ′ witnessed that T did not exist ◮ We forbid T from time-warping Nuno Diegues 14/27

slide-47
SLIDE 47

How to Validate

Upon commit, transaction T performs: Validate each write k

◮ Detect if some concurrent T ′ read k ◮ If so, T ′ witnessed that T did not exist ◮ We forbid T from time-warping

Validate each read k

Nuno Diegues 14/27

slide-48
SLIDE 48

How to Validate

Upon commit, transaction T performs: Validate each write k

◮ Detect if some concurrent T ′ read k ◮ If so, T ′ witnessed that T did not exist ◮ We forbid T from time-warping

Validate each read k

◮ Detect if some concurrent T ′ committed a new version to k ◮ If so, T must time-warp Nuno Diegues 14/27

slide-49
SLIDE 49

How to Validate

Upon commit, transaction T performs: Validate each write k

◮ Detect if some concurrent T ′ read k ◮ If so, T ′ witnessed that T did not exist ◮ We forbid T from time-warping

Validate each read k

◮ Detect if some concurrent T ′ committed a new version to k ◮ If so, T must time-warp

Abort T if it must time-warp, but cannot do so

Nuno Diegues 14/27

slide-50
SLIDE 50

How to Validate

Upon commit, transaction T performs: Validate each write k

◮ Detect if some concurrent T ′ read k ◮ If so, T ′ witnessed that T did not exist ◮ We forbid T from time-warping

Validate each read k

◮ Detect if some concurrent T ′ committed a new version to k ◮ If so, T must time-warp

Abort T if it must time-warp, but cannot do so Semi-visible readers scheme

◮ Know that some transaction read, not which ◮ Write transactions amortize the cost during read validation Nuno Diegues 14/27

slide-51
SLIDE 51

Evaluation Study

Wide variety of benchmarks: Micro-benchmarks: skip-list Macro-benchmarks: STMBench7 and STAMP

Nuno Diegues 15/27

slide-52
SLIDE 52

Evaluation Study

Wide variety of benchmarks: Micro-benchmarks: skip-list Macro-benchmarks: STMBench7 and STAMP STMs spanning the design space:

Nuno Diegues 15/27

slide-53
SLIDE 53

Evaluation Study

Wide variety of benchmarks: Micro-benchmarks: skip-list Macro-benchmarks: STMBench7 and STAMP STMs spanning the design space: NOrec: aimed at low thread count; single commit lock

Nuno Diegues 15/27

slide-54
SLIDE 54

Evaluation Study

Wide variety of benchmarks: Micro-benchmarks: skip-list Macro-benchmarks: STMBench7 and STAMP STMs spanning the design space: NOrec: aimed at low thread count; single commit lock TL2: commit-time locking

Nuno Diegues 15/27

slide-55
SLIDE 55

Evaluation Study

Wide variety of benchmarks: Micro-benchmarks: skip-list Macro-benchmarks: STMBench7 and STAMP STMs spanning the design space: NOrec: aimed at low thread count; single commit lock TL2: commit-time locking JVSTM: lock-free, multi-version

Nuno Diegues 15/27

slide-56
SLIDE 56

Evaluation Study

Wide variety of benchmarks: Micro-benchmarks: skip-list Macro-benchmarks: STMBench7 and STAMP STMs spanning the design space: NOrec: aimed at low thread count; single commit lock TL2: commit-time locking JVSTM: lock-free, multi-version AVSTM: lock-free, probabilistic permissive

Nuno Diegues 15/27

slide-57
SLIDE 57

Evaluation Study

Wide variety of benchmarks: Micro-benchmarks: skip-list Macro-benchmarks: STMBench7 and STAMP STMs spanning the design space: NOrec: aimed at low thread count; single commit lock TL2: commit-time locking JVSTM: lock-free, multi-version AVSTM: lock-free, probabilistic permissive TWM: lock-free, multi-version, time-warp

Nuno Diegues 15/27

slide-58
SLIDE 58

Evaluation Environment

JVSTM TL2 NOrec AVSTM TWM

4xAMD Opteron 6272: 64 cores 32GB RAM Ubuntu 12.04 Oracle JVM 1.7 10 retries minimum

1000 2000 3000 4000 5000 1 4 8 16 32 64 throughput (1000 * txs/s) threads

Nuno Diegues 16/27

slide-59
SLIDE 59

Skip-List

Nuno Diegues 17/27

slide-60
SLIDE 60

Evaluation: skip-list

JVSTM TL2 NOrec AVSTM TWM

100 thousand elements 25% modifications 1.8× speedup over AVSTM

Speedup

1000 2000 3000 4000 5000 1 4 8 16 32 64 throughput (1000 * txs/s) threads

Nuno Diegues 18/27

slide-61
SLIDE 61

Evaluation: skip-list

JVSTM TL2 NOrec AVSTM TWM

Overhead of instrumentation Time-warp benefits from concurrency AVSTM lags behind

Speedup - up to 8 threads

400 600 800 1000 1200 1400 1600 1 4 8 throughput (1000 * txs/s) threads

Nuno Diegues 18/27

slide-62
SLIDE 62

Evaluation: skip-list

JVSTM TL2 NOrec AVSTM TWM

Multi-version is not enough Time-Warp is similar to AVSTM

Abort Rate

20 40 60 1 4 8 16 32 64 aborted txs (%) threads

Nuno Diegues 18/27

slide-63
SLIDE 63

Evaluation: skip-list without contention

JVSTM TL2 NOrec AVSTM TWM

Speedup

250 500 750 1000 1 4 8 16 32 64 throughput (1000 * txs/s) threads

Conflict-free workload Unveil overheads and contention points

Nuno Diegues 19/27

slide-64
SLIDE 64

Evaluation: skip-list without contention

JVSTM TL2 NOrec AVSTM TWM

Overhead breakdown

TL2

read commit readSet-val writeSet-val 1 4 8 16 32 64 time (microseconds) 60 120 180

JVSTM TWM AVSTM NOrec

threads Nuno Diegues 19/27

slide-65
SLIDE 65

Evaluation: skip-list without contention

JVSTM TL2 NOrec AVSTM TWM

Overhead breakdown

TL2

read commit readSet-val writeSet-val 1 4 8 16 32 64 time (microseconds) 60 120 180

JVSTM TWM AVSTM NOrec

threads

Increased parallelism bottlenecks on NOrec commit TWM and AVSTM have most overheads TWM remains close to JVSTM, both lock-free and multi-versioned

Nuno Diegues 19/27

slide-66
SLIDE 66

STAMP

Nuno Diegues 20/27

slide-67
SLIDE 67

Evaluation: STAMP summary

JVSTM TL2 NOrec AVSTM TWM

Geometric Mean of Speedup

0.6 0.8 1 1.2 1.4 1.6 1.8 2 JVSTM TL2 Norec AVSTM speedup of TWM relative to STM

1 thread 4 threads 8 threads 16 threads 32 threads 64 threads

Nuno Diegues 21/27

slide-68
SLIDE 68

Evaluation: STAMP summary

JVSTM TL2 NOrec AVSTM TWM

Always better than AVSTM But difference is smaller with more threads Takes some concurrency to improve

Geometric Mean of Speedup

0.6 0.8 1 1.2 1.4 1.6 1.8 2 JVSTM TL2 Norec AVSTM speedup of TWM relative to STM

1 thread 4 threads 8 threads 16 threads 32 threads 64 threads

Nuno Diegues 21/27

slide-69
SLIDE 69

Evaluation: STAMP Kmeans

JVSTM TL2 NOrec AVSTM TWM

Speedup Kmeans

1 2 3 4 5 6 7 8 9 1 4 8 16 32 64 speedup threads

Nuno Diegues 22/27

slide-70
SLIDE 70

Evaluation: STAMP Vacation

JVSTM TL2 NOrec AVSTM TWM

Speedup Vacation

1 2 3 4 5 6 7 8 9 1 4 8 16 32 64 speedup threads

Nuno Diegues 23/27

slide-71
SLIDE 71

Evaluation: STAMP abort rate 1/2 Average per benchmark

Benchmark STM genome intruder kmeans-l kmeans-h labyrinth ssca2 vac-l vac-h TWM 3.8 3.8 1.4 4.2 8.8 10.5 6.4 17.8 JVSTM 15.4 3.2 1.6 4.9 12.3 11.3 12.1 41.1 TL2 12.1 4.8 3.8 3.4 13.8 11.7 10.0 41.4 NOrec 21.1 6.0 3.8 6.4 27.6 14.9 19.9 55.0 AVSTM 13.0 3.5 2.6 4.8 10.4 11.5 9.4 18.9

Nuno Diegues 24/27

slide-72
SLIDE 72

Evaluation: STAMP abort rate 1/2 Average per benchmark

Benchmark STM genome intruder kmeans-l kmeans-h labyrinth ssca2 vac-l vac-h TWM 3.8 3.8 1.4 4.2 8.8 10.5 6.4 17.8 JVSTM 15.4 3.2 1.6 4.9 12.3 11.3 12.1 41.1 TL2 12.1 4.8 3.8 3.4 13.8 11.7 10.0 41.4 NOrec 21.1 6.0 3.8 6.4 27.6 14.9 19.9 55.0 AVSTM 13.0 3.5 2.6 4.8 10.4 11.5 9.4 18.9

Nuno Diegues 25/27

slide-73
SLIDE 73

Evaluation: STAMP abort rate 2/2 Average per threads

Threads STM 4 8 16 32 64 TWM 1.2 4.4 6.6 9.9 15.7 JVSTM 1.8 7.0 10.2 15.7 21.2 TL2 2.6 6.5 11.4 16.1 20.9 NOrec 3.4 9.6 18.6 24.9 34.0 AVSTM 2.5 5.5 8.6 12.7 17.6

Nuno Diegues 26/27

slide-74
SLIDE 74

Summary

Time-Warp: Also allow write transactions to commit in the past Efficient validation rules that scale Average improvement 65% in high concurrency Coming next: Adaptive validation Hybridize with Intel TSX Time-Warp in single-version STMs

Nuno Diegues 27/27

slide-75
SLIDE 75

Thank You

Questions?

Nuno Diegues 28/27