Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of - - PowerPoint PPT Presentation

multicore ocaml gc
SMART_READER_LITE
LIVE PREVIEW

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of - - PowerPoint PPT Presentation

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge Multicore OCaml Multicore OCaml Adds native support for concurrency and parallelism in OCaml Multicore OCaml Adds native support for concurrency


slide-1
SLIDE 1

Multicore OCaml GC

KC Sivaramakrishnan, Stephen Dolan

OCaml Labs University of Cambridge

slide-2
SLIDE 2

Multicore OCaml

slide-3
SLIDE 3
  • Adds native support for concurrency and parallelism in OCaml

Multicore OCaml

slide-4
SLIDE 4
  • Adds native support for concurrency and parallelism in OCaml
  • Fibers for concurrency, Domains for parallelism

✦ M fibers over N domains ✦ M >>> N

Multicore OCaml

slide-5
SLIDE 5
  • Adds native support for concurrency and parallelism in OCaml
  • Fibers for concurrency, Domains for parallelism

✦ M fibers over N domains ✦ M >>> N

  • This talk

✦ Overview of multicore GC with a few deep dives.

Multicore OCaml

slide-6
SLIDE 6
  • Adds native support for concurrency and parallelism in OCaml
  • Fibers for concurrency, Domains for parallelism

✦ M fibers over N domains ✦ M >>> N

  • This talk

✦ Overview of multicore GC with a few deep dives.

Multicore OCaml

slide-7
SLIDE 7

Outline

  • Difficult to appreciate GC choices in isolation
  • Begin with a GC for a sequential purely functional language

✦ Gradually add mutations, parallelism and concurrency

slide-8
SLIDE 8

B

Purely functional

stack registers heap

A C D E

slide-9
SLIDE 9

B

Purely functional

  • Stop-the-world mark and sweep

stack registers heap

A C D E

slide-10
SLIDE 10

B

Purely functional

  • Stop-the-world mark and sweep
  • Tri-color marking

✦ States: White (Unmarked), Grey (Marking), Black (Marked)

stack registers heap

A C D E

slide-11
SLIDE 11

B

Purely functional

  • Stop-the-world mark and sweep
  • Tri-color marking

✦ States: White (Unmarked), Grey (Marking), Black (Marked)

  • White —> Grey (mark stack) —> Black

stack registers heap

A C B D E B A

mark stack

slide-12
SLIDE 12

B

Purely functional

  • Stop-the-world mark and sweep
  • Tri-color marking

✦ States: White (Unmarked), Grey (Marking), Black (Marked)

  • White —> Grey (mark stack) —> Black
  • Mark stack is empty => done

stack registers heap

A C B D E A

mark stack

B D

slide-13
SLIDE 13

B

Purely functional

  • Stop-the-world mark and sweep
  • Tri-color marking

✦ States: White (Unmarked), Grey (Marking), Black (Marked)

  • White —> Grey (mark stack) —> Black
  • Mark stack is empty => done
  • Tri-color invariant: No black object points to a white object

stack registers heap

A C B D E A

mark stack

B D

slide-14
SLIDE 14

B

Purely functional

stack registers heap

A C B D E A

mark stack

B D

slide-15
SLIDE 15

B

Purely functional

  • Pros

✦ Simple ✦ Can perform the GC incrementally

…|—mutator—|—mark—|—mutator—|—mark—|—mutator—|—sweep—|…

stack registers heap

A C B D E A

mark stack

B D

slide-16
SLIDE 16

B

Purely functional

  • Pros

✦ Simple ✦ Can perform the GC incrementally

…|—mutator—|—mark—|—mutator—|—mark—|—mutator—|—sweep—|…

  • Cons

✦ Need to maintain free-list of objects => allocations overheads + fragmentation

stack registers heap

A C B D E A

mark stack

B D

slide-17
SLIDE 17

Generational GC

slide-18
SLIDE 18

Generational GC

  • Generational Hypothesis

✦ Young objects are much more likely to die than old objects

slide-19
SLIDE 19

Generational GC

  • Generational Hypothesis

✦ Young objects are much more likely to die than old objects

minor heap major heap stack registers

slide-20
SLIDE 20

Generational GC

  • Generational Hypothesis

✦ Young objects are much more likely to die than old objects

minor heap major heap stack registers frontier

slide-21
SLIDE 21

Generational GC

  • Generational Hypothesis

✦ Young objects are much more likely to die than old objects

minor heap major heap stack registers frontier

  • Minor heap collected by copying collection

✦ Survivors promoted to major heap

slide-22
SLIDE 22

Generational GC

  • Generational Hypothesis

✦ Young objects are much more likely to die than old objects

minor heap major heap stack registers frontier

  • Minor heap collected by copying collection

✦ Survivors promoted to major heap

  • Roots are registers and stack

✦ purely functional => no pointers from major to minor

slide-23
SLIDE 23

Mutations — Minor GC

  • Old objects might point to young objects

minor heap major heap

slide-24
SLIDE 24

Mutations — Minor GC

  • Old objects might point to young objects
  • Must know those pointers for minor GC

✦ (Naively) scan the major GC for such pointers

minor heap major heap

slide-25
SLIDE 25

Mutations — Minor GC

  • Old objects might point to young objects
  • Must know those pointers for minor GC

✦ (Naively) scan the major GC for such pointers

  • Intercept mutations with write barrier

(* Before r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r

minor heap major heap

slide-26
SLIDE 26

Mutations — Minor GC

  • Old objects might point to young objects
  • Must know those pointers for minor GC

✦ (Naively) scan the major GC for such pointers

  • Intercept mutations with write barrier

(* Before r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r

  • Remembered set

✦ Set of major heap addresses that point to minor heap ✦ Used as root for minor collection ✦ Cleared after minor collection.

minor heap major heap

slide-27
SLIDE 27

Mutations — Major GC

A B C

slide-28
SLIDE 28

Mutations — Major GC

A B C

slide-29
SLIDE 29

Mutations — Major GC

A B C

slide-30
SLIDE 30

Mutations — Major GC

A B C A

slide-31
SLIDE 31

Mutations — Major GC

A C A

slide-32
SLIDE 32

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

slide-33
SLIDE 33

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1
slide-34
SLIDE 34

B

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1

A C

slide-35
SLIDE 35

B

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1

A C

slide-36
SLIDE 36

B

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1

A C

slide-37
SLIDE 37

B

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1

A C

  • Deletion/Yuasa/snapshot-at-beginning prevents 2
slide-38
SLIDE 38

B

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1

A C B C A

  • Deletion/Yuasa/snapshot-at-beginning prevents 2
slide-39
SLIDE 39

B

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1

A C B C A

  • Deletion/Yuasa/snapshot-at-beginning prevents 2
slide-40
SLIDE 40

B

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1

A C B C A B

  • Deletion/Yuasa/snapshot-at-beginning prevents 2
slide-41
SLIDE 41

B

Mutations — Major GC

  • Mutations are problematic if both conditions hold

1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted

A C A

  • Insertion/Dijkstra/Incremental barrier prevents 1

A C B C A B

  • Deletion/Yuasa/snapshot-at-beginning prevents 2

(* Before r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r else if is_major r && is_major x then mark(!r)

slide-42
SLIDE 42

Parallelism — Minor GC

  • Domain.spawn : (unit -> unit) -> unit
slide-43
SLIDE 43

Parallelism — Minor GC

  • Domain.spawn : (unit -> unit) -> unit
  • Collect each domain’s young garbage independently?
slide-44
SLIDE 44

Parallelism — Minor GC

  • Domain.spawn : (unit -> unit) -> unit
  • Collect each domain’s young garbage independently?

major heap domain n minor heap(s) domain 0 …

slide-45
SLIDE 45

Parallelism — Minor GC

  • Domain.spawn : (unit -> unit) -> unit
  • Collect each domain’s young garbage independently?
  • Invariant: Minor heap objects are only accessed by owning domain

major heap domain n minor heap(s) domain 0 …

slide-46
SLIDE 46

Parallelism — Minor GC

  • Domain.spawn : (unit -> unit) -> unit
  • Collect each domain’s young garbage independently?
  • Invariant: Minor heap objects are only accessed by owning domain
  • Doligez-Leroy POPL’93

✦ No pointers between minor heaps ✦ No pointers from major to minor heaps

major heap domain n minor heap(s) domain 0 …

slide-47
SLIDE 47

Parallelism — Minor GC

  • Domain.spawn : (unit -> unit) -> unit
  • Collect each domain’s young garbage independently?
  • Invariant: Minor heap objects are only accessed by owning domain
  • Doligez-Leroy POPL’93

✦ No pointers between minor heaps ✦ No pointers from major to minor heaps

  • Before r := x, if is_major(r) && is_minor(x), then promote(x).

major heap domain n minor heap(s) domain 0 …

slide-48
SLIDE 48

Parallelism — Minor GC

  • Domain.spawn : (unit -> unit) -> unit
  • Collect each domain’s young garbage independently?
  • Invariant: Minor heap objects are only accessed by owning domain
  • Doligez-Leroy POPL’93

✦ No pointers between minor heaps ✦ No pointers from major to minor heaps

  • Before r := x, if is_major(r) && is_minor(x), then promote(x).
  • Too much promotion. Ex: work-stealing queue

major heap domain n minor heap(s) domain 0 …

slide-49
SLIDE 49

Parallelism — Minor GC

major heap domain n minor heap(s) domain 0 …

slide-50
SLIDE 50

Parallelism — Minor GC

major heap domain n minor heap(s)

  • Weaker invariant

✦ No pointers between minor heaps ✦ Objects in foreign minor heap are not accessed directly

domain 0 …

slide-51
SLIDE 51

Parallelism — Minor GC

major heap domain n minor heap(s)

  • Weaker invariant

✦ No pointers between minor heaps ✦ Objects in foreign minor heap are not accessed directly

  • Read barrier. If the value loaded is

✦ integers, object in shared heap or own minor heap => continue ✦ object in foreign minor heap => Read fault (Interrupt + promote)

domain 0 …

slide-52
SLIDE 52

Efficient read barrier check

slide-53
SLIDE 53

Efficient read barrier check

  • Given x, is x an integer1 or in shared heap2 or own minor heap3
slide-54
SLIDE 54

Efficient read barrier check

  • Given x, is x an integer1 or in shared heap2 or own minor heap3
  • Careful

VM mapping + bit-twiddling

slide-55
SLIDE 55

Efficient read barrier check

  • Given x, is x an integer1 or in shared heap2 or own minor heap3
  • Careful

VM mapping + bit-twiddling

  • Example: 16-bit address space, 0xPQRS

Minor area 0x4200 — 0x42ff

Domain 0 : 0x4220 — 0x422f

Domain 1 : 0x4250 — 0x425f

Domain 2 : 0x42a0 — 0x42af

0x4200 0x42ff

1 2

0x4220 0x422f 0x4250 0x425f 0x42a0 0x42af

slide-56
SLIDE 56

Efficient read barrier check

  • Given x, is x an integer1 or in shared heap2 or own minor heap3
  • Careful

VM mapping + bit-twiddling

  • Example: 16-bit address space, 0xPQRS

Minor area 0x4200 — 0x42ff

Domain 0 : 0x4220 — 0x422f

Domain 1 : 0x4250 — 0x425f

Domain 2 : 0x42a0 — 0x42af

  • Integer low_bit(S) = 0x1, Minor PQ = 0x42, R determines domain

0x4200 0x42ff

1 2

0x4220 0x422f 0x4250 0x425f 0x42a0 0x42af

slide-57
SLIDE 57

Efficient read barrier check

  • Given x, is x an integer1 or in shared heap2 or own minor heap3
  • Careful

VM mapping + bit-twiddling

  • Example: 16-bit address space, 0xPQRS

Minor area 0x4200 — 0x42ff

Domain 0 : 0x4220 — 0x422f

Domain 1 : 0x4250 — 0x425f

Domain 2 : 0x42a0 — 0x42af

  • Integer low_bit(S) = 0x1, Minor PQ = 0x42, R determines domain
  • Compare with y, where y lies within domain => allocation pointer!

✦ On amd64, allocation pointer is in r15 register

0x4200 0x42ff

1 2

0x4220 0x422f 0x4250 0x425f 0x42a0 0x42af

slide-58
SLIDE 58

Efficient read barrier check

# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor

slide-59
SLIDE 59

Efficient read barrier check

# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor # low_bit(%rax) = 1 xor %r15, %rax # low_bit(%rax) = 1 sub 0x0010, %rax # low_bit(%rax) = 1 test 0xff01, %rax # ZF not set

Integer

slide-60
SLIDE 60

Efficient read barrier check

# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor # low_bit(%rax) = 1 xor %r15, %rax # low_bit(%rax) = 1 sub 0x0010, %rax # low_bit(%rax) = 1 test 0xff01, %rax # ZF not set # PQ(%r15) != PQ(%rax) xor %r15, %rax # PQ(%rax) is non-zero sub 0x0010, %rax # PQ(%rax) is non-zero test 0xff01, %rax # ZF not set

Integer Shared heap

slide-61
SLIDE 61

Efficient read barrier check

# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor

slide-62
SLIDE 62

Efficient read barrier check

# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor # PQR(%r15) = PQR(%rax) xor %r15, %rax # PQR(%rax) is zero sub 0x0010, %rax # PQ(%rax) is non-zero test 0xff01, %rax # ZF not set

Own minor heap

slide-63
SLIDE 63

Efficient read barrier check

# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor # PQR(%r15) = PQR(%rax) xor %r15, %rax # PQR(%rax) is zero sub 0x0010, %rax # PQ(%rax) is non-zero test 0xff01, %rax # ZF not set

Own minor heap

# PQ(%r15) = PQ(%rax) # S(%r15) = S(%rax) = 0 # R(%r15) != R(%rax) xor %r15, %rax # R(%rax) is non-zero, rest 0 sub 0x0010, %rax # rest 0 test 0xff01, %rax # ZF set

Foreign minor heap

slide-64
SLIDE 64

Promotion

slide-65
SLIDE 65
  • How do you promote objects to the major heap on read fault?

Promotion

slide-66
SLIDE 66
  • How do you promote objects to the major heap on read fault?
  • Several alternatives

1. Copy the object to major heap.

Mutable objects, Abstract_tag, …

2. Move the object closure + minor GC.

False promotions, latency, …

3. Move the object closure + scan the minor GC

Need to examine all objects on minor GC

Promotion

slide-67
SLIDE 67
  • How do you promote objects to the major heap on read fault?
  • Several alternatives

1. Copy the object to major heap.

Mutable objects, Abstract_tag, …

2. Move the object closure + minor GC.

False promotions, latency, …

3. Move the object closure + scan the minor GC

Need to examine all objects on minor GC

  • Hypothesis: most objects promoted on read faults are young.

✦ 95% promoted objects among the youngest 5%

Promotion

slide-68
SLIDE 68
  • How do you promote objects to the major heap on read fault?
  • Several alternatives

1. Copy the object to major heap.

Mutable objects, Abstract_tag, …

2. Move the object closure + minor GC.

False promotions, latency, …

3. Move the object closure + scan the minor GC

Need to examine all objects on minor GC

  • Hypothesis: most objects promoted on read faults are young.

✦ 95% promoted objects among the youngest 5%

  • Combine 2 & 3

Promotion

slide-69
SLIDE 69

Promotion

slide-70
SLIDE 70
  • If promoted object among youngest x%,

✦ move + fix pointers to promoted object

❖ Scan roots = registers + current stack + remembered set ❖ Younger minor objects ❖ Older minor objects referring to younger objects (mutations!)

Promotion

slide-71
SLIDE 71
  • If promoted object among youngest x%,

✦ move + fix pointers to promoted object

❖ Scan roots = registers + current stack + remembered set ❖ Younger minor objects ❖ Older minor objects referring to younger objects (mutations!)

Promotion

(* r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r else if is_major r && is_major x then mark(!r) else if is_minor r && is_minor x && addr r > addr x then promotion_set.add r

slide-72
SLIDE 72
  • If promoted object among youngest x%,

✦ move + fix pointers to promoted object

❖ Scan roots = registers + current stack + remembered set ❖ Younger minor objects ❖ Older minor objects referring to younger objects (mutations!)

  • Otherwise, move + minor GC

Promotion

(* r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r else if is_major r && is_major x then mark(!r) else if is_minor r && is_minor x && addr r > addr x then promotion_set.add r

slide-73
SLIDE 73

Parallelism — Major GC

slide-74
SLIDE 74

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
slide-75
SLIDE 75

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

slide-76
SLIDE 76

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

Allows mutator, marker, sweeper threads to concurrently

slide-77
SLIDE 77

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

Allows mutator, marker, sweeper threads to concurrently

  • Multicore OCaml is MCGC
slide-78
SLIDE 78

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

Allows mutator, marker, sweeper threads to concurrently

  • Multicore OCaml is MCGC

States

Garbage Free Unmarked Marked

slide-79
SLIDE 79

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

Allows mutator, marker, sweeper threads to concurrently

  • Multicore OCaml is MCGC

States

Domains alternate between mutator and gc thread

Garbage Free Unmarked Marked

slide-80
SLIDE 80

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

Allows mutator, marker, sweeper threads to concurrently

  • Multicore OCaml is MCGC

States

Domains alternate between mutator and gc thread

GC thread

Garbage Free Unmarked Marked Garbage Free Unmarked Marked

slide-81
SLIDE 81

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

Allows mutator, marker, sweeper threads to concurrently

  • Multicore OCaml is MCGC

States

Domains alternate between mutator and gc thread

GC thread

Marking is racy but idempotent

Garbage Free Unmarked Marked Garbage Free Unmarked Marked

slide-82
SLIDE 82

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

Allows mutator, marker, sweeper threads to concurrently

  • Multicore OCaml is MCGC

States

Domains alternate between mutator and gc thread

GC thread

Marking is racy but idempotent

  • Stop-the-world

Garbage Free Unmarked Marked Garbage Free Unmarked Marked

slide-83
SLIDE 83

Parallelism — Major GC

  • OCaml’s GC is incremental, needs to be concurrent w/ parallelism
  • Design based on

VCGC from Inferno project (ISMM’98)

Allows mutator, marker, sweeper threads to concurrently

  • Multicore OCaml is MCGC

States

Domains alternate between mutator and gc thread

GC thread

Marking is racy but idempotent

  • Stop-the-world

Garbage Free Unmarked Marked Garbage Free Unmarked Marked Garbage Free Unmarked Marked Garbage Free Unmarked Marked

slide-84
SLIDE 84
  • Fibers: vm-threads, 1-shot delimited continuations

✦ stack segments on heap

Concurrency — Minor GC

slide-85
SLIDE 85
  • Fibers: vm-threads, 1-shot delimited continuations

✦ stack segments on heap

  • stack operations are not protected by write barrier!

Concurrency — Minor GC

slide-86
SLIDE 86
  • Fibers: vm-threads, 1-shot delimited continuations

✦ stack segments on heap

  • stack operations are not protected by write barrier!

Concurrency — Minor GC

minor heap (domain x) major heap current stack registers

y x

remembered fiber set remembered set

slide-87
SLIDE 87
  • Fibers: vm-threads, 1-shot delimited continuations

✦ stack segments on heap

  • stack operations are not protected by write barrier!

Concurrency — Minor GC

minor heap (domain x) major heap current stack registers

y x

remembered fiber set remembered set

  • Remembered fiber set

✦ Set of fibers in major heap that were ran in the current cycle of domain x ✦ Cleared after minor GC

slide-88
SLIDE 88
  • Fibers transitively reachable are not promoted automatically

✦ Avoids false promotions

Concurrency — Promotions

minor heap (domain 0) major heap

r x

f

z

slide-89
SLIDE 89

Concurrency — Promotions

minor heap (domain 0) major heap

r x

f remembered set

z

slide-90
SLIDE 90
  • Fibers transitively reachable are not promoted automatically

✦ Avoids false promotions

Concurrency — Promotions

minor heap (domain 0) major heap

r x

f remembered set

z

slide-91
SLIDE 91
  • Fibers transitively reachable are not promoted automatically

✦ Avoids false promotions ✦ Promote on continuing foreign fiber

Concurrency — Promotions

minor heap (domain 0) major heap

r x

f remembered set

continue f v @ domain 1

z

slide-92
SLIDE 92
  • Fibers transitively reachable are not promoted automatically

✦ Avoids false promotions ✦ Promote on continuing foreign fiber

Concurrency — Promotions

minor heap (domain 0) major heap

r x

f remembered set

continue f v @ domain 1

z

slide-93
SLIDE 93

Concurrency — Promotions

slide-94
SLIDE 94
  • Recall, promotion fast path = move + scan and forward

✦ Do not scan remembered fiber set

✤ Context switches <<< promotions

Concurrency — Promotions

slide-95
SLIDE 95
  • Recall, promotion fast path = move + scan and forward

✦ Do not scan remembered fiber set

✤ Context switches <<< promotions

  • Scan lazily before context switch

✦ Only once per fiber per promotion ✦ In practice, scans a fiber per a batch of promotions

Concurrency — Promotions

slide-96
SLIDE 96

Concurrency — Major GC

slide-97
SLIDE 97
  • (Multicore) OCaml uses deletion barrier

Concurrency — Major GC

slide-98
SLIDE 98
  • (Multicore) OCaml uses deletion barrier
  • Fiber stack pop is a deletion

✦ Before switching to unmarked fiber, complete marking fiber

Concurrency — Major GC

slide-99
SLIDE 99
  • (Multicore) OCaml uses deletion barrier
  • Fiber stack pop is a deletion

✦ Before switching to unmarked fiber, complete marking fiber

  • Marking is racy but idempotent

✦ Race between mutator (context switch) and gc (marking) unsafe

Concurrency — Major GC

slide-100
SLIDE 100
  • (Multicore) OCaml uses deletion barrier
  • Fiber stack pop is a deletion

✦ Before switching to unmarked fiber, complete marking fiber

  • Marking is racy but idempotent

✦ Race between mutator (context switch) and gc (marking) unsafe

Concurrency — Major GC

Unmarked Marked Marking

Fibers

slide-101
SLIDE 101

Summary

  • Multicore OCaml GC

✦ Optimize for latency ✦ Independent minor GCs + mostly-concurrent mark-and-sweep

Mutations Concurrency

Parallelism Minor GC rem set rem fiber set local heaps Promotions

  • 2y rem set

lazy scanning read faults

Major GC

deletion barrier mark & switch MCGC

slide-102
SLIDE 102

Questions?

slide-103
SLIDE 103

Backup Slides

slide-104
SLIDE 104

Purely functional GC

stack registers heap

slide-105
SLIDE 105

Purely functional GC

stack registers heap

  • Stop-the-world mark and sweep
slide-106
SLIDE 106

Purely functional GC

stack registers heap

0x0000 0xffff

  • Stop-the-world mark and sweep
slide-107
SLIDE 107

Purely functional GC

stack registers heap

0x0000 0xffff

frontier

  • Stop-the-world mark and sweep
slide-108
SLIDE 108

Purely functional GC

stack registers heap

0x0000 0xffff

frontier

  • Stop-the-world mark and sweep
  • 2-pass mark compact

✦ Fast allocations by bumping the frontier

slide-109
SLIDE 109

Purely functional GC

stack registers heap

0x0000 0xffff

frontier

  • Stop-the-world mark and sweep
  • 2-pass mark compact

✦ Fast allocations by bumping the frontier

  • All heap pointers go right
slide-110
SLIDE 110

Purely functional GC

stack registers heap

0x0000 0xffff

frontier

  • Mark roots
slide-111
SLIDE 111

Purely functional GC

stack registers heap

0x0000 0xffff

frontier

  • Mark roots
  • Scan from frontier to start. For each marked object,
  • Mark reachable object & reverse pointers
slide-112
SLIDE 112

Purely functional GC

stack registers

0x0000 0xffff

frontier

  • Mark roots
  • Scan from frontier to start. For each marked object,
  • Mark reachable object & reverse pointers
  • Scan from start to frontier. For each marked object,
  • Copy to next available free space & reverse pointers pointing left
slide-113
SLIDE 113

Purely functional GC

stack registers

0x0000 0xffff

frontier

slide-114
SLIDE 114

Purely functional GC

stack registers

0x0000 0xffff

frontier

  • Pros

✦ Simple & fast allocation ✦ Efficient use of space

slide-115
SLIDE 115

Purely functional GC

stack registers

0x0000 0xffff

frontier

  • Pros

✦ Simple & fast allocation ✦ Efficient use of space

  • Cons

✦ Need to touch all the objects on the heap ✦ Compaction as default is leads to long pause times