Concurrent Copying Garbage Collection Filip Pizlo, Erez Petrank, - - PowerPoint PPT Presentation

concurrent copying garbage collection
SMART_READER_LITE
LIVE PREVIEW

Concurrent Copying Garbage Collection Filip Pizlo, Erez Petrank, - - PowerPoint PPT Presentation

Concurrent Copying Garbage Collection Filip Pizlo, Erez Petrank, Bjarne Steensgaard Purdue, Technion/Microsoft, Microsoft PLDI08 - Tucson, AZ 1 Introduction RTGC is gaining acceptance as an alternative to manual memory management for


slide-1
SLIDE 1

Concurrent Copying Garbage Collection

Filip Pizlo, Erez Petrank, Bjarne Steensgaard Purdue, Technion/Microsoft, Microsoft PLDI’08 - Tucson, AZ

1

slide-2
SLIDE 2

Introduction

  • RTGC is gaining acceptance as an

alternative to manual memory management for RT applications

  • But:
  • Multiprocessor support is problematic
  • ... especially if defragmentation is

required.

2

slide-3
SLIDE 3
  • What we deliver:
  • Compaction.
  • Concurrency.
  • Lock freedom.
  • Efficiency.

3

slide-4
SLIDE 4

Why is it hard?

  • At some point during

defragmentation there will be two copies of the same

  • bject.
  • Then: which version
  • f the object should

the mutator access?

From To Mutator The Heap

?

4

slide-5
SLIDE 5

Original Object (From) Object Copy (To)

Mutator Field

5

slide-6
SLIDE 6

Original Object (From) Object Copy (To)

Mutator Field

Already Copied

5

slide-7
SLIDE 7

Original Object (From) Object Copy (To)

Mutator Field

Already Copied

5

slide-8
SLIDE 8

Original Object (From) Object Copy (To)

Mutator

Already Copied

Field

6

slide-9
SLIDE 9

Original Object (From) Object Copy (To)

Mutator

Already Copied

Field

X

6

slide-10
SLIDE 10

Original Object (From) Object Copy (To)

Mutator

Already Copied

Field

X

6

slide-11
SLIDE 11

Original Object (From) Object Copy (To)

Mutator

Already Copied

But: how do you know when to switch from the

  • riginal to the to-

space object? Field

X

6

slide-12
SLIDE 12

Original Object (From) Object Copy (To)

Mutator

Already Copied

Field

X

Immediately after you check which version

  • f the field to use, the

copier may advance past it.

6

slide-13
SLIDE 13
  • Previous techniques:
  • Hudson & Moss ’01, Cheng & Blelloch ‘01
  • Stopless (Pizlo et al ‘07)
  • Our New Techniques:
  • Chicken
  • Clover

7

slide-14
SLIDE 14
  • Chicken:
  • Really fast
  • Does not guarantee that all objects are

copied

  • Clover:
  • Probabilistic!
  • Guarantees that all objects get copied

8

slide-15
SLIDE 15
  • Both Chicken and Clover are simple to

implement

  • (simpler, we argue, than any previous

proposed concurrent copying technique).

  • Both Chicken and Clover preserve the

underlying hardware memory model - no JMM tricks are necessary.

9

slide-16
SLIDE 16

Chicken

10

slide-17
SLIDE 17
  • Design Principles:
  • Use the cheapest barriers possible.
  • Don’t guarantee that objects tagged for

copying will actually be copied.

  • Anytime the mutator writes to an object as

it is being copied, abort the copying of the respective object.

11

slide-18
SLIDE 18

12

slide-19
SLIDE 19

Use a Brooks-style forwarding pointer

12

slide-20
SLIDE 20

Use a Brooks-style forwarding pointer To copy the object, first “tag” the forwarding pointer (set a low order bit)

12

slide-21
SLIDE 21

Use a Brooks-style forwarding pointer To copy the object, first “tag” the forwarding pointer (set a low order bit)

Mutator

The mutator writes by first atomically clearing the tag.

12

slide-22
SLIDE 22

Use a Brooks-style forwarding pointer To copy the object, first “tag” the forwarding pointer (set a low order bit)

Mutator

The mutator writes by first atomically clearing the tag. ... and then performing the write

12

slide-23
SLIDE 23

13

slide-24
SLIDE 24

If the object is already copied, the mutator writes to the new object via the forwarding pointer

Mutator

13

slide-25
SLIDE 25

Write barrier

write(object, offset, value) { if object is tagged CAS(object.forward, tagged → untagged)

  • bject.forward[offset] = value

}

14

slide-26
SLIDE 26

Write barrier

write(object, offset, value) { if object is tagged CAS(object.forward, tagged → untagged)

  • bject.forward[offset] = value

} Clears the tag bit that we stole from the Brooks forwarding pointer

14

slide-27
SLIDE 27

Write barrier

write(object, offset, value) { if object is tagged CAS(object.forward, tagged → untagged)

  • bject.forward[offset] = value

} Clears the tag bit that we stole from the Brooks forwarding pointer Writes to the field via the Brooks forwarding pointer

14

slide-28
SLIDE 28

15

slide-29
SLIDE 29

The collector starts by tagging objects that it wishes to copy.

15

slide-30
SLIDE 30

The collector starts by tagging objects that it wishes to copy. The object is then copied.

15

slide-31
SLIDE 31

The collector starts by tagging objects that it wishes to copy. The object is then copied. To get the mutator to use the new object, we atomically remove the tag and set the forwarding pointer.

15

slide-32
SLIDE 32

The collector starts by tagging objects that it wishes to copy. The object is then copied. To get the mutator to use the new object, we atomically remove the tag and set the forwarding pointer.

This will fail, if the mutator had written to the object!

15

slide-33
SLIDE 33
  • Why this is good:
  • Read barrier is a wait-free Brooks barrier
  • Write barrier is a branch on the fast path, and a

branch+CAS on the slow path (either way it’s wait- free)

  • Copying is simple and fast
  • In practice only ~1% of object copying gets

aborted.

  • Abort rates can be easily reduced (see paper).

16

slide-34
SLIDE 34
  • Things that could be improved:

17

slide-35
SLIDE 35
  • Things that could be improved:
  • Eliminate object copy abort entirely.

17

slide-36
SLIDE 36
  • Things that could be improved:
  • Eliminate object copy abort entirely.
  • Segue into Clover...

17

slide-37
SLIDE 37

Clover

18

slide-38
SLIDE 38
  • What if each field had a status field that

indicated, if the field was copied?

  • And what if - you could CAS the field’s

value, as well as the status field, in one atomic, lock-free operation?

Clover

19

slide-39
SLIDE 39

Mutator

Status

Field Not Copied

20

slide-40
SLIDE 40

Mutator

Status

Field Not Copied

The idea is to allow the mutator to always write to the original object, and to have such writes force the collector to recopy the field at a later time.

20

slide-41
SLIDE 41

Mutator

Status

Field Not Copied

Atomically

The idea is to allow the mutator to always write to the original object, and to have such writes force the collector to recopy the field at a later time.

20

slide-42
SLIDE 42

Mutator

Status

Field Copied

21

slide-43
SLIDE 43

Mutator

Status

Field Copied

If the field is already copied, access to-space.

21

slide-44
SLIDE 44

Mutator

Status

Field Copied

If the field is already copied, access to-space.

21

slide-45
SLIDE 45

Collector

Status

Field Not Copied

22

slide-46
SLIDE 46

Collector

Status

Field Not Copied

Collector repeatedly attempts to copy and assert the field as copied until it does so without the field’s value changing.

22

slide-47
SLIDE 47

Collector

Status

Field Not Copied

Collector repeatedly attempts to copy and assert the field as copied until it does so without the field’s value changing.

22

slide-48
SLIDE 48

Collector

Status

Field Not Copied

Collector repeatedly attempts to copy and assert the field as copied until it does so without the field’s value changing.

22

slide-49
SLIDE 49

Collector

Status

Field Not Copied

Collector repeatedly attempts to copy and assert the field as copied until it does so without the field’s value changing.

Atomically

FIELD COPIED

22

slide-50
SLIDE 50

Problem: cannot CAS two separate fields in hardware

23

slide-51
SLIDE 51

If you could steal a bit in the field, this would be easy...

24

slide-52
SLIDE 52

But where do you get the bit?

Easy for reference fields - but really hard for integer fields!

25

slide-53
SLIDE 53

Use a random number!

I.e. we steal 2-128 bits!

26

slide-54
SLIDE 54

Let R = random bits

R can be huge - it can be the largest CAS-able word - 128 bits on Intel!

27

slide-55
SLIDE 55
  • The random number is used to mark fields

as copied.

  • This is correct, if the mutator does not use

R.

  • But R is selected at random, independently
  • f the program - with R having 128 bits, the

probability of “failure” is 2-128.

28

slide-56
SLIDE 56
  • Put this in perspective:
  • Probability that a person dies from a car

crash in a single day in the US is higher than 1/300,000

  • Even if we stored a random value into a

field once a nanosecond since the Big Bang, the probability of ever colliding with Clover would be 1/1,000,000,000,000

29

slide-57
SLIDE 57

So - how does it work?

30

slide-58
SLIDE 58

Mutator

31

slide-59
SLIDE 59

Mutator The mutator writes to the from-space using a CAS that asserts that the field is not copied (does not equal R).

31

slide-60
SLIDE 60

Mutator The mutator writes to the from-space using a CAS that asserts that the field is not copied (does not equal R).

CAS ¬R→v

31

slide-61
SLIDE 61

Mutator

32

slide-62
SLIDE 62

Mutator If the CAS fails, the mutator just writes to to- space.

32

slide-63
SLIDE 63

Mutator If the CAS fails, the mutator just writes to to- space.

32

slide-64
SLIDE 64

Collector

33

slide-65
SLIDE 65

Collector Collector repeatedly attempts to copy and assert the field as copied until it does so without the field’s value changing.

33

slide-66
SLIDE 66

Collector Collector repeatedly attempts to copy and assert the field as copied until it does so without the field’s value changing.

33

slide-67
SLIDE 67

Collector Collector repeatedly attempts to copy and assert the field as copied until it does so without the field’s value changing.

33

slide-68
SLIDE 68

Collector Collector repeatedly attempts to copy and assert the field as copied until it does so without the field’s value changing.

CAS v→R

33

slide-69
SLIDE 69
  • What you just saw is a probabilistically correct

concurrent copying algorithm.

  • But we can:
  • Make the algorithm correct but probabilistically

lock-free by detecting when the user uses R.

34

slide-70
SLIDE 70

Implementation

35

slide-71
SLIDE 71
  • Chicken and Clover are implemented in the

same infrastructure as Stopless (ISMM’07)

  • We use the Microsoft Bartok Research

Compiler, and extend the lock-free concurrent mark-sweep collector.

  • We use Path Specialization (ISMM’08) to
  • ptimize barrier performance.

36

slide-72
SLIDE 72

Results

37

slide-73
SLIDE 73
  • Both schemes have ~20% throughput
  • verhead
  • Clover leads to a ~3x slow-down when

executing with full barriers

  • Chicken has almost no slow-down.

Summary of Results

38

slide-74
SLIDE 74

Detail: throughput

  • MSR benchmark suite (four internal PL-

type programs written in C#, VB, and C++, plus four traditional benchmarks ported to .NET)

  • Compare concurrent mark-sweep (CMS),

Stopless (ISMM’07), Chicken, and Clover

39

slide-75
SLIDE 75

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 sat lcsc zing Bartok go

  • thello

xlisp crafty Geometric mean

Execution times relative to non-copying concurrent collector

Stopless5 Clover5 Chicken5

40

slide-76
SLIDE 76

Detail: scalability

  • SpecJBB2000 ported to C# using the

Microsoft Visual Studio Java to C# converter

  • Compare CMS, Stopless, Clover, and

Chicken

41

slide-77
SLIDE 77

Concurrent MS STOPLESS CLOVER CHICKEN

Ê Ê Ê Ê Ê Ê ‡ ‡ ‡ ‡ ‡ ‡ Ï Ï Ï Ï Ï Ï Ú Ú Ú Ú Ú Ú

1 2 3 4 5 6 10000 20000 30000 40000 50000 60000 Number of Warehouses JBB Transactions per Second

42

slide-78
SLIDE 78

Detail: responsiveness

  • Two benchmarls:
  • Microbenchmark measuring

responsiveness for short-running interrupt handlers

  • Our JBB port (measure transaction time

distribution)

43

slide-79
SLIDE 79
  • For the Interrupt Microbenchmark we

measure:

  • concurrent mark-sweep (see paper)
  • Stopless (see paper)
  • Clover
  • Chicken

44

slide-80
SLIDE 80

Interrupts: Clover

Ê Ê Ê ÊÊ Ê Ê Ê ÊÊÊÊÊÊÊÊ ÊÊÊ ÊÊ 5 10 15 20 1 100 104 106

Microseconds

45

slide-81
SLIDE 81

Interrupts: Clover

Ê Ê Ê ÊÊ Ê Ê Ê ÊÊÊÊÊÊÊÊ ÊÊÊ ÊÊ 5 10 15 20 1 100 104 106

Microseconds Clover

  • utliers

45

slide-82
SLIDE 82

Interrupts: Clover

Ê Ê Ê ÊÊ Ê Ê Ê ÊÊÊÊÊÊÊÊ ÊÊÊ ÊÊ 5 10 15 20 1 100 104 106

Microseconds Clover

  • utliers

OS outliers, visible in C code

45

slide-83
SLIDE 83

Interrupts: Clover

Ê Ê Ê ÊÊ Ê Ê Ê ÊÊÊÊÊÊÊÊ ÊÊÊ ÊÊ 5 10 15 20 1 100 104 106

Microseconds Clover

  • utliers

OS outliers, visible in C code

Other RTGCs, like Metronome, would have a large peak well past the 200 microsecond mark.

45

slide-84
SLIDE 84

Ê Ê Ê ÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊ 5 10 15 20 1 100 104 106

Interrupts: Clover

Microseconds

46

slide-85
SLIDE 85

Ê Ê Ê ÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊ 5 10 15 20 1 100 104 106

Interrupts: Clover

Microseconds Chicken

  • utlier

46

slide-86
SLIDE 86

Ê Ê Ê ÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊ 5 10 15 20 1 100 104 106

Interrupts: Clover

Microseconds Chicken

  • utlier

OS outliers, visible in C code

46

slide-87
SLIDE 87
  • For JBB we measure:
  • stop-the-world mark-sweep (see paper)
  • Stopless (see paper)
  • Clover
  • Chicken

47

slide-88
SLIDE 88

JBB: Clover

Worst case: 3ms

Ê Ê Ê Ê 20 40 60 80 1 10 100 1000 104 105 106

48

slide-89
SLIDE 89

JBB: Chicken

Worst case: 1ms

Ê Ê 20 40 60 80 1 10 100 1000 104 105 106

49

slide-90
SLIDE 90

Summary

  • Presented two new concurrent copying

strategies - one that is very light-weight, and another with strong (but probabilistic!) guarantees.

  • Both are simpler than previous techniques.
  • Both provide good throughput and

responsiveness.

50

slide-91
SLIDE 91

Questions

51