On the design space of Parallel Nesting Nuno Diegues Jo ao Cachopo - - PowerPoint PPT Presentation

on the design space of parallel nesting
SMART_READER_LITE
LIVE PREVIEW

On the design space of Parallel Nesting Nuno Diegues Jo ao Cachopo - - PowerPoint PPT Presentation

On the design space of Parallel Nesting Nuno Diegues Jo ao Cachopo nmld, joao.cachopo@ist.utl.pt INESC-ID/Technical University of Lisbon July 19, 2012 1/18 Introduction Selling point of TM Composability 2/18 Introduction Selling point


slide-1
SLIDE 1

On the design space of Parallel Nesting

Nuno Diegues Jo˜ ao Cachopo

nmld, joao.cachopo@ist.utl.pt INESC-ID/Technical University of Lisbon

July 19, 2012

1/18

slide-2
SLIDE 2

Introduction Selling point of TM

Composability

2/18

slide-3
SLIDE 3

Introduction

Selling point of TM

Composability Parallel Nesting

2/18

slide-4
SLIDE 4

Time complexity analysis may be deceiving in TMs

3/18

slide-5
SLIDE 5

Outline

Compare three parallel nesting approaches

1 JVSTM 2 NesTM1 3 PNSTM2

  • 1W. Baek, N. Bronson, C. Kozyrakis, and K. Olukotun. Implementing and evaluating nested parallel transactions in

software transactional memory. In SPAA ’10.

  • 2J. Barreto, A. Dragojevi´

c, P. Ferreira, R. Guerraoui, and M. Kapalka. Leveraging parallel nesting in transactional memory. In PPoPP ’10. 4/18

slide-6
SLIDE 6

Outline

Compare three parallel nesting approaches

1 JVSTM ← 2 NesTM1 3 PNSTM2

  • 1W. Baek, N. Bronson, C. Kozyrakis, and K. Olukotun. Implementing and evaluating nested parallel transactions in

software transactional memory. In SPAA ’10.

  • 2J. Barreto, A. Dragojevi´

c, P. Ferreira, R. Guerraoui, and M. Kapalka. Leveraging parallel nesting in transactional memory. In PPoPP ’10. 4/18

slide-7
SLIDE 7

Worst-case complexities - JVSTM

JVSTM read O(maxDepth) write O(1) commit O(r + children)

5/18

slide-8
SLIDE 8

Worst-case complexities - JVSTM

JVSTM read O(maxDepth) write O(1) commit O(r + children)

A B C

maxDepth

D E

5/18

slide-9
SLIDE 9

Worst-case complexities - JVSTM

JVSTM read O(maxDepth) write O(1) commit O(r+children)

A B C D E

committed committed children 5/18

slide-10
SLIDE 10

Worst-case complexities - NesTM

JVSTM NesTM read O(maxDepth) O(1) write O(1) O(txDepth) commit O(r + children) O(r + w)

6/18

slide-11
SLIDE 11

Worst-case complexities - NesTM

JVSTM NesTM read O(maxDepth) O(1) write O(1) O(txDepth) commit O(r + children) O(r + w)

A B C

txDepth

D E

6/18

slide-12
SLIDE 12

Worst-case complexities - PNSTM

JVSTM NesTM PNSTM read O(maxDepth) O(1) O(1) write O(1) O(txDepth) O(1) commit O(r + children) O(r + w) O(1)

7/18

slide-13
SLIDE 13

Worst-case complexities

JVSTM NesTM PNSTM read O(maxDepth) O(1) O(1) write O(1) O(txDepth) O(1) commit O(r + children) O(r + w) O(1) Best one?

8/18

slide-14
SLIDE 14

Practical comparison

STMBench7 - running given number of transactions

9/18

slide-15
SLIDE 15

Practical comparison

STMBench7 - running given number of transactions Implementation of STMs

9/18

slide-16
SLIDE 16

Practical comparison

STMBench7 - running given number of transactions Implementation of STMs Same API

9/18

slide-17
SLIDE 17

Practical comparison

STMBench7 - running given number of transactions Implementation of STMs Same API 48 core machine

9/18

slide-18
SLIDE 18

STMBench7

2 4 6 8 10 12 14 16 18 1(1) 1(2) 1(3) 2(3) 4(3) 8(3) 16(3) throughput (txs/sec) # threads tops(nested) jvstm nestm pnstm 10/18

slide-19
SLIDE 19

STMBench7

2 4 6 8 10 12 14 16 18 1(1) 1(2) 1(3) 2(3) 4(3) 8(3) 16(3) throughput (txs/sec) # threads tops(nested) jvstm nestm pnstm

5 and 15 times with 48 threads/parallel nested

10/18

slide-20
SLIDE 20

STMBench7 - Large depth count

2 4 6 8 10 1 8 32 128 throughput (txs/sec) # depth jvstm nestm pnstm

11/18

slide-21
SLIDE 21

Discussion

What is causing this?

12/18

slide-22
SLIDE 22

Complexities of the fast-paths

JVSTM NesTM PNSTM read O(1) O(1) O(1) write O(1) O(1) O(1)

13/18

slide-23
SLIDE 23

Fast-paths occurrence

Fast-path Slow-path JVSTM 0.99 0.01 NesTM 0.39 0.61 PNSTM 0.39 0.61

14/18

slide-24
SLIDE 24

Fast-paths occurrence

Fast-path Slow-path Time (µs) JVSTM 0.99 0.01 1046 NesTM 0.39 0.61 5200 PNSTM 0.39 0.61 7357

14/18

slide-25
SLIDE 25

Conflicts detected

Conflicts JVSTM 845 NesTM 1627 PNSTM 84496

15/18

slide-26
SLIDE 26

Conflict detection

JVSTM NesTM PNSTM r-r

  • yes

r-w yes yes yes w-w yes (if nested) yes yes

16/18

slide-27
SLIDE 27

Conflict detection

JVSTM NesTM PNSTM r-r

  • yes

r-w yes yes yes w-w yes (if nested) yes yes Cheaper complexity bounds, more conflicts detected?

16/18

slide-28
SLIDE 28

Summary

Parallel nesting design is coupled with baseline TM Complexity analysis may be deceiving Average case and conflict detection

17/18

slide-29
SLIDE 29

Thank you

Questions?

18/18

slide-30
SLIDE 30

PNSTM

A

bn: ?

Pool of free bitnums: 1 2 3

19/18

slide-31
SLIDE 31

PNSTM

A

bn: 0

Pool of free bitnums: 1 2 3

19/18

slide-32
SLIDE 32

PNSTM

A

Access Stack of variable X

1 TA Ok

bn: 0

Pool of free bitnums: 1 2 3 index 0 1 2 3

TA reads X

19/18

slide-33
SLIDE 33

PNSTM

A C B

Access Stack of variable X

1 TA Ok

bn: 0 bn: 1 bn: 2

Pool of free bitnums: 1 2 3 index 0 1 2 3

TA spawns two children

19/18

slide-34
SLIDE 34

PNSTM

A C B

Access Stack of variable X

1 1 1 TB TA ? Ok

bn: 0 bn: 1 bn: 2

Pool of free bitnums: 1 2 3 index 0 1 2 3

TB reads X

19/18

slide-35
SLIDE 35

PNSTM

A C B

Access Stack of variable X

1 1 1 TB TA Ok Ok

bn: 0 bn: 1 bn: 2

Pool of free bitnums: 1 2 3 index 0 1 2 3

TB reads X

19/18

slide-36
SLIDE 36

PNSTM

A C B D

Access Stack of variable X

1 1 1 TB TA Ok Ok

bn: 0 bn: 1 bn: 2 bn: 3

Pool of free bitnums: 1 2 3 index 0 1 2 3

TB spawns a child

19/18

slide-37
SLIDE 37

PNSTM

A C B D

Access Stack of variable X

1 1 1 1 1 1 TD TB TA Ok Ok

bn: 0 bn: 1 bn: 2 bn: 3

Pool of free bitnums: 1 2 3 index 0 1 2 3

?

TD reads X

19/18

slide-38
SLIDE 38

PNSTM

A C B D

Access Stack of variable X

1 1 1 1 1 1 TD TB TA Conflict Ok Ok

bn: 0 bn: 1 bn: 2 bn: 3

Pool of free bitnums: 1 2 3 index 0 1 2 3

TD reads X

19/18

slide-39
SLIDE 39

NesTM

timestamp

global clock: 0

tid

variable X:

tid: 1

ts: 0

T1 starts

20/18

slide-40
SLIDE 40

NesTM

timestamp

global clock: 0 1

tid

variable X:

tid: 1

ts: 0

Ok

T1 writes to X

20/18

slide-41
SLIDE 41

NesTM

timestamp

global clock: 0 1

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

T1 spawns two children

20/18

slide-42
SLIDE 42

NesTM - read operation

timestamp

global clock: 0 1

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

RS: X

T3 reads X

21/18

slide-43
SLIDE 43

NesTM - write operation

timestamp

global clock: 0 1

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

RS: X

tid: 4

ts: 0

T3 spawns a child

22/18

slide-44
SLIDE 44

NesTM - write operation

timestamp

global clock: 0 1

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

RS: X

tid: 4

ts: 0

did not read X

T4 writes to X

22/18

slide-45
SLIDE 45

NesTM - write operation

timestamp

global clock: 0 1

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

RS: X

tid: 4

ts: 0

X's timestamp ≤ T3's timestamp

T4 writes to X

22/18

slide-46
SLIDE 46

NesTM - write operation

timestamp

global clock: 0 1

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

RS: X

tid: 4

ts: 0

previous owner: stop

T4 writes to X

22/18

slide-47
SLIDE 47

NesTM - write operation

timestamp

global clock: 0 4

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

RS: X

tid: 4

ts: 0

Ok

T4 writes to X

22/18

slide-48
SLIDE 48

NesTM - commit operation

timestamp

global clock: 0 4

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

RS: X

tid: 4

ts: 0

T4 prepares commits

23/18

slide-49
SLIDE 49

NesTM - commit operation

timestamp

global clock: 1 3 1

tid

variable X:

tid: 2

ts: 0

tid: 3

ts: 0

tid: 1

ts: 0

RS: X

tid: 4

ts: 0

committed

T4 commits

23/18

slide-50
SLIDE 50

JVSTM - write operation

variable X:

A

Orec1

ORec1

TA writes to X

24/18

slide-51
SLIDE 51

JVSTM - write operation

variable X:

A

Orec1

ORec1

B

Orec2

C

Orec3

TA spawns two children

24/18

slide-52
SLIDE 52

JVSTM - write operation

variable X:

A

Orec1

ORec3

B

Orec2

C

Orec3

ORec1

TC writes to X

24/18

slide-53
SLIDE 53

JVSTM - write operation

variable X:

A

Orec1

ORec4

B

Orec2

C

Orec3

D

Orec4

ORec3 ORec1

TD is spawned and writes to X

24/18

slide-54
SLIDE 54

JVSTM - read operation

variable X:

A

Orec1

ORec4

B

Orec2

C

Orec3

D

Orec4

ORec3 ORec1

TD reads X

25/18

slide-55
SLIDE 55

JVSTM - read operation

variable X:

A

Orec1

ORec4

B

Orec2

C

Orec3

D

Orec4

ORec3 ORec1

TB reads X

25/18

slide-56
SLIDE 56

JVSTM - read operation

variable X:

A

Orec1

ORec4

B

Orec2

C

Orec3

D

Orec4

ORec3 ORec1

TB reads X

25/18

slide-57
SLIDE 57

JVSTM - read operation

variable X:

A

Orec1

ORec4

B

Orec2

C

Orec3

D

Orec4

ORec3 ORec1

TB reads X

25/18

slide-58
SLIDE 58

JVSTM - read operation

variable X:

A

Orec1

ORec4

B

Orec2

C

Orec3

D

Orec4

ORec3 ORec1

TB reads X

25/18

slide-59
SLIDE 59

JVSTM - commit operation

variable X:

A

Orec1

ORec4

B

Orec2

C

Orec3

D

Orec4

ORec3 ORec1

ts: 0

committed

Orec4

TD commits

26/18

slide-60
SLIDE 60

JVSTM - commit operation

variable X:

A

Orec1

ORec4

B

Orec2

C

Orec3

D

Orec4

ORec3 ORec1

ts: 0

committed

Orec4 Orec4 ts: 0

committed

Orec4 Orec3

TC commits

26/18

slide-61
SLIDE 61

Evaluation - Top-level txs only

2 4 6 8 10 12 14 16 18 1 2 3 6 12 24 48 throughput (txs/sec) # threads jvstm nestm pnstm

27/18