Multicore OCaml GC
KC Sivaramakrishnan, Stephen Dolan
OCaml Labs University of Cambridge
Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of - - PowerPoint PPT Presentation
Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge Multicore OCaml Multicore OCaml Adds native support for concurrency and parallelism in OCaml Multicore OCaml Adds native support for concurrency
KC Sivaramakrishnan, Stephen Dolan
OCaml Labs University of Cambridge
✦ M fibers over N domains ✦ M >>> N
✦ M fibers over N domains ✦ M >>> N
✦ Overview of multicore GC with a few deep dives.
✦ M fibers over N domains ✦ M >>> N
✦ Overview of multicore GC with a few deep dives.
✦ Gradually add mutations, parallelism and concurrency
B
stack registers heap
A C D E
B
stack registers heap
A C D E
B
✦ States: White (Unmarked), Grey (Marking), Black (Marked)
stack registers heap
A C D E
B
✦ States: White (Unmarked), Grey (Marking), Black (Marked)
stack registers heap
A C B D E B A
mark stack
B
✦ States: White (Unmarked), Grey (Marking), Black (Marked)
stack registers heap
A C B D E A
mark stack
B D
B
✦ States: White (Unmarked), Grey (Marking), Black (Marked)
stack registers heap
A C B D E A
mark stack
B D
B
stack registers heap
A C B D E A
mark stack
B D
B
✦ Simple ✦ Can perform the GC incrementally
✤
…|—mutator—|—mark—|—mutator—|—mark—|—mutator—|—sweep—|…
stack registers heap
A C B D E A
mark stack
B D
B
✦ Simple ✦ Can perform the GC incrementally
✤
…|—mutator—|—mark—|—mutator—|—mark—|—mutator—|—sweep—|…
✦ Need to maintain free-list of objects => allocations overheads + fragmentation
stack registers heap
A C B D E A
mark stack
B D
✦ Young objects are much more likely to die than old objects
✦ Young objects are much more likely to die than old objects
minor heap major heap stack registers
✦ Young objects are much more likely to die than old objects
minor heap major heap stack registers frontier
✦ Young objects are much more likely to die than old objects
minor heap major heap stack registers frontier
✦ Survivors promoted to major heap
✦ Young objects are much more likely to die than old objects
minor heap major heap stack registers frontier
✦ Survivors promoted to major heap
✦ purely functional => no pointers from major to minor
minor heap major heap
✦ (Naively) scan the major GC for such pointers
minor heap major heap
✦ (Naively) scan the major GC for such pointers
(* Before r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r
minor heap major heap
✦ (Naively) scan the major GC for such pointers
(* Before r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r
✦ Set of major heap addresses that point to minor heap ✦ Used as root for minor collection ✦ Cleared after minor collection.
minor heap major heap
A B C
A B C
A B C
A B C A
A C A
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
B
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
A C
B
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
A C
B
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
A C
B
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
A C
B
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
A C B C A
B
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
A C B C A
B
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
A C B C A B
B
1. Exists Black —> White 2. All Grey —> White* —> White paths are deleted
A C A
A C B C A B
(* Before r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r else if is_major r && is_major x then mark(!r)
major heap domain n minor heap(s) domain 0 …
major heap domain n minor heap(s) domain 0 …
✦ No pointers between minor heaps ✦ No pointers from major to minor heaps
major heap domain n minor heap(s) domain 0 …
✦ No pointers between minor heaps ✦ No pointers from major to minor heaps
major heap domain n minor heap(s) domain 0 …
✦ No pointers between minor heaps ✦ No pointers from major to minor heaps
major heap domain n minor heap(s) domain 0 …
major heap domain n minor heap(s) domain 0 …
major heap domain n minor heap(s)
✦ No pointers between minor heaps ✦ Objects in foreign minor heap are not accessed directly
domain 0 …
major heap domain n minor heap(s)
✦ No pointers between minor heaps ✦ Objects in foreign minor heap are not accessed directly
✦ integers, object in shared heap or own minor heap => continue ✦ object in foreign minor heap => Read fault (Interrupt + promote)
domain 0 …
VM mapping + bit-twiddling
VM mapping + bit-twiddling
✦
Minor area 0x4200 — 0x42ff
✦
Domain 0 : 0x4220 — 0x422f
✦
Domain 1 : 0x4250 — 0x425f
✦
Domain 2 : 0x42a0 — 0x42af
0x4200 0x42ff
1 2
0x4220 0x422f 0x4250 0x425f 0x42a0 0x42af
VM mapping + bit-twiddling
✦
Minor area 0x4200 — 0x42ff
✦
Domain 0 : 0x4220 — 0x422f
✦
Domain 1 : 0x4250 — 0x425f
✦
Domain 2 : 0x42a0 — 0x42af
0x4200 0x42ff
1 2
0x4220 0x422f 0x4250 0x425f 0x42a0 0x42af
VM mapping + bit-twiddling
✦
Minor area 0x4200 — 0x42ff
✦
Domain 0 : 0x4220 — 0x422f
✦
Domain 1 : 0x4250 — 0x425f
✦
Domain 2 : 0x42a0 — 0x42af
✦ On amd64, allocation pointer is in r15 register
0x4200 0x42ff
1 2
0x4220 0x422f 0x4250 0x425f 0x42a0 0x42af
# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor
# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor # low_bit(%rax) = 1 xor %r15, %rax # low_bit(%rax) = 1 sub 0x0010, %rax # low_bit(%rax) = 1 test 0xff01, %rax # ZF not set
Integer
# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor # low_bit(%rax) = 1 xor %r15, %rax # low_bit(%rax) = 1 sub 0x0010, %rax # low_bit(%rax) = 1 test 0xff01, %rax # ZF not set # PQ(%r15) != PQ(%rax) xor %r15, %rax # PQ(%rax) is non-zero sub 0x0010, %rax # PQ(%rax) is non-zero test 0xff01, %rax # ZF not set
Integer Shared heap
# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor
# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor # PQR(%r15) = PQR(%rax) xor %r15, %rax # PQR(%rax) is zero sub 0x0010, %rax # PQ(%rax) is non-zero test 0xff01, %rax # ZF not set
Own minor heap
# %rax holds x (value of interest) xor %r15, %rax sub 0x0010, %rax test 0xff01, %rax # Any bit set => ZF not set => not foreign minor # PQR(%r15) = PQR(%rax) xor %r15, %rax # PQR(%rax) is zero sub 0x0010, %rax # PQ(%rax) is non-zero test 0xff01, %rax # ZF not set
Own minor heap
# PQ(%r15) = PQ(%rax) # S(%r15) = S(%rax) = 0 # R(%r15) != R(%rax) xor %r15, %rax # R(%rax) is non-zero, rest 0 sub 0x0010, %rax # rest 0 test 0xff01, %rax # ZF set
Foreign minor heap
1. Copy the object to major heap.
✤
Mutable objects, Abstract_tag, …
2. Move the object closure + minor GC.
✤
False promotions, latency, …
3. Move the object closure + scan the minor GC
✤
Need to examine all objects on minor GC
1. Copy the object to major heap.
✤
Mutable objects, Abstract_tag, …
2. Move the object closure + minor GC.
✤
False promotions, latency, …
3. Move the object closure + scan the minor GC
✤
Need to examine all objects on minor GC
✦ 95% promoted objects among the youngest 5%
1. Copy the object to major heap.
✤
Mutable objects, Abstract_tag, …
2. Move the object closure + minor GC.
✤
False promotions, latency, …
3. Move the object closure + scan the minor GC
✤
Need to examine all objects on minor GC
✦ 95% promoted objects among the youngest 5%
✦ move + fix pointers to promoted object
❖ Scan roots = registers + current stack + remembered set ❖ Younger minor objects ❖ Older minor objects referring to younger objects (mutations!)
✦ move + fix pointers to promoted object
❖ Scan roots = registers + current stack + remembered set ❖ Younger minor objects ❖ Older minor objects referring to younger objects (mutations!)
(* r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r else if is_major r && is_major x then mark(!r) else if is_minor r && is_minor x && addr r > addr x then promotion_set.add r
✦ move + fix pointers to promoted object
❖ Scan roots = registers + current stack + remembered set ❖ Younger minor objects ❖ Older minor objects referring to younger objects (mutations!)
(* r := x *) let write_barrier (r, x) = if is_major r && is_minor x then remembered_set.add r else if is_major r && is_major x then mark(!r) else if is_minor r && is_minor x && addr r > addr x then promotion_set.add r
VCGC from Inferno project (ISMM’98)
VCGC from Inferno project (ISMM’98)
✦
Allows mutator, marker, sweeper threads to concurrently
VCGC from Inferno project (ISMM’98)
✦
Allows mutator, marker, sweeper threads to concurrently
VCGC from Inferno project (ISMM’98)
✦
Allows mutator, marker, sweeper threads to concurrently
✦
States
Garbage Free Unmarked Marked
VCGC from Inferno project (ISMM’98)
✦
Allows mutator, marker, sweeper threads to concurrently
✦
States
✦
Domains alternate between mutator and gc thread
Garbage Free Unmarked Marked
VCGC from Inferno project (ISMM’98)
✦
Allows mutator, marker, sweeper threads to concurrently
✦
States
✦
Domains alternate between mutator and gc thread
✦
GC thread
Garbage Free Unmarked Marked Garbage Free Unmarked Marked
VCGC from Inferno project (ISMM’98)
✦
Allows mutator, marker, sweeper threads to concurrently
✦
States
✦
Domains alternate between mutator and gc thread
✦
GC thread
✦
Marking is racy but idempotent
Garbage Free Unmarked Marked Garbage Free Unmarked Marked
VCGC from Inferno project (ISMM’98)
✦
Allows mutator, marker, sweeper threads to concurrently
✦
States
✦
Domains alternate between mutator and gc thread
✦
GC thread
✦
Marking is racy but idempotent
Garbage Free Unmarked Marked Garbage Free Unmarked Marked
VCGC from Inferno project (ISMM’98)
✦
Allows mutator, marker, sweeper threads to concurrently
✦
States
✦
Domains alternate between mutator and gc thread
✦
GC thread
✦
Marking is racy but idempotent
Garbage Free Unmarked Marked Garbage Free Unmarked Marked Garbage Free Unmarked Marked Garbage Free Unmarked Marked
✦ stack segments on heap
✦ stack segments on heap
✦ stack segments on heap
minor heap (domain x) major heap current stack registers
y x
remembered fiber set remembered set
✦ stack segments on heap
minor heap (domain x) major heap current stack registers
y x
remembered fiber set remembered set
✦ Set of fibers in major heap that were ran in the current cycle of domain x ✦ Cleared after minor GC
✦ Avoids false promotions
minor heap (domain 0) major heap
r x
f
z
minor heap (domain 0) major heap
r x
f remembered set
z
✦ Avoids false promotions
minor heap (domain 0) major heap
r x
f remembered set
z
✦ Avoids false promotions ✦ Promote on continuing foreign fiber
minor heap (domain 0) major heap
r x
f remembered set
continue f v @ domain 1
z
✦ Avoids false promotions ✦ Promote on continuing foreign fiber
minor heap (domain 0) major heap
r x
f remembered set
continue f v @ domain 1
z
✦ Do not scan remembered fiber set
✤ Context switches <<< promotions
✦ Do not scan remembered fiber set
✤ Context switches <<< promotions
✦ Only once per fiber per promotion ✦ In practice, scans a fiber per a batch of promotions
✦ Before switching to unmarked fiber, complete marking fiber
✦ Before switching to unmarked fiber, complete marking fiber
✦ Race between mutator (context switch) and gc (marking) unsafe
✦ Before switching to unmarked fiber, complete marking fiber
✦ Race between mutator (context switch) and gc (marking) unsafe
Unmarked Marked Marking
Fibers
✦ Optimize for latency ✦ Independent minor GCs + mostly-concurrent mark-and-sweep
Mutations Concurrency
Parallelism Minor GC rem set rem fiber set local heaps Promotions
lazy scanning read faults
Major GC
deletion barrier mark & switch MCGC
stack registers heap
stack registers heap
stack registers heap
0x0000 0xffff
stack registers heap
0x0000 0xffff
frontier
stack registers heap
0x0000 0xffff
frontier
✦ Fast allocations by bumping the frontier
stack registers heap
0x0000 0xffff
frontier
✦ Fast allocations by bumping the frontier
stack registers heap
0x0000 0xffff
frontier
stack registers heap
0x0000 0xffff
frontier
stack registers
0x0000 0xffff
frontier
stack registers
0x0000 0xffff
frontier
stack registers
0x0000 0xffff
frontier
✦ Simple & fast allocation ✦ Efficient use of space
stack registers
0x0000 0xffff
frontier
✦ Simple & fast allocation ✦ Efficient use of space
✦ Need to touch all the objects on the heap ✦ Compaction as default is leads to long pause times