Retrofitting Parallelism onto OCaml
KC Sivaramakrishnan, Stephen Dolan, Leo white, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, Anil Madhavapeddy
OCaml Labs
Retrofitting Parallelism onto OCaml KC Sivaramakrishnan , Stephen - - PowerPoint PPT Presentation
Retrofitting Parallelism onto OCaml KC Sivaramakrishnan , Stephen Dolan, Leo white, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, Anil Madhavapeddy OCaml Labs Industry Projects The Astre Static Analyzer Industry
KC Sivaramakrishnan, Stephen Dolan, Leo white, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, Anil Madhavapeddy
OCaml Labs
The Astrée Static Analyzer
Industry Projects
The Astrée Static Analyzer
Industry Projects
parallelism to OCaml
parallelism to OCaml
✦ Building a multicore GC for OCaml
parallelism to OCaml
✦ Building a multicore GC for OCaml
✦ Backwards compatibility before parallel scalability
✦ Weak references, ephemerons, lazy values, finalisers ✦ Low-level C API that bakes in GC invariants ✦ Cost of refactoring sequential code itself is prohibitive
✦ Weak references, ephemerons, lazy values, finalisers ✦ Low-level C API that bakes in GC invariants ✦ Cost of refactoring sequential code itself is prohibitive
✦ Dolan et al, “Bounding Data Races in Space and
Time”, PLDI’18
✦ Strong guarantees (including type safety) under data races
✦ Weak references, ephemerons, lazy values, finalisers ✦ Low-level C API that bakes in GC invariants ✦ Cost of refactoring sequential code itself is prohibitive
✦ Dolan et al, “Bounding Data Races in Space and
Time”, PLDI’18
✦ Strong guarantees (including type safety) under data races
✦ Thanks to the GC design
Incremental and non-moving
Minor Heap
Major Heap
Incremental and non-moving
Minor Heap
Major Heap
Mutator
Start of major cycle Idle
Incremental and non-moving
Minor Heap
Major Heap
Mutator
Start of major cycle Idle
Mark Roots
mark roots
Mark
mark main Incremental and non-moving
Minor Heap
Major Heap
Mutator
Start of major cycle Idle
Mark Roots
mark roots
Mark
mark main
Sweep
sweep Incremental and non-moving
Minor Heap
Major Heap
Mutator
Start of major cycle Idle
Mark Roots
mark roots
Mark
mark main
Sweep
sweep Incremental and non-moving
Minor Heap
Major Heap
End of major cycle
Mutator
Start of major cycle Idle
Mark Roots
mark roots
Mark
mark main
Sweep
sweep Incremental and non-moving
Minor Heap
Major Heap
End of major cycle
Mutator
Start of major cycle Idle
Mark Roots
mark roots
Mark
mark main
Sweep
sweep Incremental and non-moving
Minor Heap
Major Heap
End of major cycle
Mutator
Start of major cycle Idle
Mark Roots
mark roots
running time, GC pausetime and memory usage.
running time, GC pausetime and memory usage.
✦ Based on Streamflow [Schneider et al. 2006] ✦ Thread-local, size-segmented free lists for small objects + malloc for large
allocations
✦ Sequential performance on par with OCaml’s allocators
✦ Based on Streamflow [Schneider et al. 2006] ✦ Thread-local, size-segmented free lists for small objects + malloc for large
allocations
✦ Sequential performance on par with OCaml’s allocators
✦ Based on
VCGC [Huelsbergen and Winterbottom 1998]
✦ Based on Streamflow [Schneider et al. 2006] ✦ Thread-local, size-segmented free lists for small objects + malloc for large
allocations
✦ Sequential performance on par with OCaml’s allocators
✦ Based on
VCGC [Huelsbergen and Winterbottom 1998]
Sweep Mark Mark Roots Sweep Mark Mark Roots
Start of major cycle End of major cycle Domain 0 Domain 1
✦ Based on Streamflow [Schneider et al. 2006] ✦ Thread-local, size-segmented free lists for small objects + malloc for large
allocations
✦ Sequential performance on par with OCaml’s allocators
✦ Based on
VCGC [Huelsbergen and Winterbottom 1998]
Sweep Mark Mark Roots Sweep Mark Mark Roots
Start of major cycle End of major cycle mark and sweep phases may overlap Domain 0 Domain 1
✦ A generalisation of weak references ✦ Introduce conjunction in the reachability property ✦ Requires multiple rounds of ephemeron marking ✦ Cycle-delimited handshaking without global barrier
✦ A generalisation of weak references ✦ Introduce conjunction in the reachability property ✦ Requires multiple rounds of ephemeron marking ✦ Cycle-delimited handshaking without global barrier
✦ 3 barriers / cycle worst case
✦ A generalisation of weak references ✦ Introduce conjunction in the reachability property ✦ Requires multiple rounds of ephemeron marking ✦ Cycle-delimited handshaking without global barrier
✦ 3 barriers / cycle worst case
Peyton Jones 2011] collector for GHC
Minor Heap Minor Heap Minor Heap Minor Heap
Major Heap
Domain 0 Domain 1 Domain 2 Domain 3
Peyton Jones 2011] collector for GHC
Minor Heap Minor Heap Minor Heap Minor Heap
Major Heap
Domain 0 Domain 1 Domain 2 Domain 3
Peyton Jones 2011] collector for GHC
Minor Heap Minor Heap Minor Heap Minor Heap
Major Heap
Domain 0 Domain 1 Domain 2 Domain 3
✦ Prevents early promotion & mirrors sequential behaviour ✦ Read barrier required for mutable field + promotion
✦ Read barriers need to be efficient for performance backwards
compatibility
✦ Read barriers need to be efficient for performance backwards
compatibility
VMM + bit-twiddling tricks
✦ Proof of correctness available in the paper ✦ Minimal performance impact on sequential code
✦ Read barriers need to be efficient for performance backwards
compatibility
VMM + bit-twiddling tricks
✦ Proof of correctness available in the paper ✦ Minimal performance impact on sequential code
compatibility)
minor
major heap x y a
minor
b
Domain 0 Domain 1
!y !x
minor
major heap x y a
minor
b
Domain 0 Domain 1
!y !x
promote (!y) promote (!x)
✦ Mutable reads are GC safe points!
minor
major heap x y a
minor
b
Domain 0 Domain 1
!y !x
promote (!y) promote (!x)
✦ Mutable reads are GC safe points!
✦ Need to manually refactor tricky code
minor
major heap x y a
minor
b
Domain 0 Domain 1
!y !x
promote (!y) promote (!x)
✦ Similar to GHCs minor collection
✦ Similar to GHCs minor collection
Dom 0 Dom 1
Mutator Minor GC Major slice Mutator Minor GC
Start major End major ConcMinor
✦ Similar to GHCs minor collection
Dom 0 Dom 1
Mutator Minor GC Major slice Mutator Minor GC
Start major End major ConcMinor
Mutator Major slice Mutator
Start major End major Start minor End minor ParMinor
✦ Similar to GHCs minor collection
Dom 0 Dom 1
Mutator Minor GC Major slice Mutator Minor GC
Start major End major ConcMinor
Mutator Major slice Mutator
Start major End major Start minor End minor ParMinor Slop space filled with major slices
✦ Similar to GHCs minor collection
Dom 0 Dom 1
Mutator Minor GC Major slice Mutator Minor GC
Start major End major ConcMinor
Mutator Major slice Mutator
Start major End major Start minor End minor ParMinor Slop space filled with major slices
✦ Similar to GHCs minor collection
✦ Insert poll points in code for timely inter-domain interrupt handling
[Feeley 1993]
Dom 0 Dom 1
Mutator Minor GC Major slice Mutator Minor GC
Start major End major ConcMinor
Mutator Major slice Mutator
Start major End major Start minor End minor ParMinor Slop space filled with major slices
✦ 24 cores isolated for performance evaluation
✦ 24 cores isolated for performance evaluation
✦ ConcMinor 4.9% slower and ParMinor 3.5% slower ✦ ConcMinor 54% lower peak memory and ParMinor 61% lower peak
memory
✦ 24 cores isolated for performance evaluation
✦ ConcMinor 4.9% slower and ParMinor 3.5% slower ✦ ConcMinor 54% lower peak memory and ParMinor 61% lower peak
memory
ConcMinor suffers due to read faults
ConcMinor suffers due to read faults Unbalanced allocation leads to inopportune minor GCs in ParMinor
ConcMinor
ConcMinor
✦ Does not break the C API ✦ Performs similarly to the ConcMinor on 24 cores
ConcMinor
✦ Does not break the C API ✦ Performs similarly to the ConcMinor on 24 cores
✦ May revisit ConcMinor later for manycore future
✦ https://github.com/ocaml-multicore/ocaml-multicore
✦ https://github.com/ocaml-bench/sandmark/
✦ https://github.com/ocaml-multicore/multicore-ocaml-verify
✦ https://github.com/ocaml-multicore/parallel-programming-in-
multicore-ocaml