Practical Algebraic Effect Handlers in Multicore OCaml KC - - PowerPoint PPT Presentation

practical algebraic effect handlers in multicore ocaml
SMART_READER_LITE
LIVE PREVIEW

Practical Algebraic Effect Handlers in Multicore OCaml KC - - PowerPoint PPT Presentation

Practical Algebraic Effect Handlers in Multicore OCaml KC Sivaramakrishnan University of OCaml Cambridge Labs Multicore OCaml Native support for concurrency and parallelism https://github.com/ocamllabs/ocaml-multicore Led from


slide-1
SLIDE 1

Practical Algebraic Effect Handlers in Multicore OCaml

“KC” Sivaramakrishnan

OCaml Labs University of Cambridge

slide-2
SLIDE 2

Multicore OCaml

  • Native support for concurrency and parallelism
  • Led from OCaml Labs
  • KC, Stephen Dolan, Leo White (Jane Street) & others..
  • In this talk: Practical algebraic effect handlers
  • Why algebraic effects in multicore OCaml?
  • How to make them practical?
  • Don’t break existing programs
  • Performance backwards compatibility

https://github.com/ocamllabs/ocaml-multicore

slide-3
SLIDE 3

Concurrency ≠ Parallelism

  • Concurrency
  • Overlapped execution of processes
  • Fibers — language level lightweight threads
  • 12M/s on 1 core. 30M/s on 4 cores.
  • Parallelism
  • Simultaneous execution of computations
  • Domains — System thread + Context
  • Concurrency ∩ Parallelism ➔ Scalable Concurrency
slide-4
SLIDE 4

User-level Schedulers

  • Multiplexing fibers over domain(s)
  • Bake scheduler into the runtime system (GHC)
  • Lack of flexibility
  • Maintenance onus on the compiler developers
  • Allow programmers to describe schedulers!
  • Parallel search ➔ LIFO work-stealing
  • Web-server ➔ FIFO runqueue
  • Data parallel ➔ Gang scheduling
  • Algebraic Effects and Handlers

GHC Runtime System Scheduler GC MVars Lazy Evaluation

slide-5
SLIDE 5
  • Reasoning about computational effects in a pure setting
  • G. Plotkin and J. Power, Algebraic Operations and Generic Effects, 2002
  • Handlers for programming
  • G. Plotkin and M. Pretnar, Handlers of Algebraic Effects, 2009

Algebraic effects & handlers

slide-6
SLIDE 6

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)

  • Nice abstraction for programming with control-flow
  • Separation effect declaration from its interpretation

('a,'b) continuation

slide-7
SLIDE 7

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)

val r : int = 5

fiber — lightweight stack

Algebraic Effects: Example

  • Nice abstraction for programming with control-flow
  • Separation effect declaration from its interpretation
slide-8
SLIDE 8

Algebraic Effects in Multicore OCaml

  • Unchecked

effect Foo : unit let _ = perform Foo Exception: Unhandled.

  • WIP: Effect System for OCaml
  • Accurately track user-defined as well as

native effects

  • Makes OCaml a pure language

effect foo = Foo : unit let _ = perform Foo Error: This expression performs effect foo, which has no default handler.

  • Deep handler semantics

let f () = (perform (Foo 3)) (* 3 + 1 *) + (perform (Foo 3)) (* 3 + 1 *) let r = try f () with effect (Foo i) k -> (* continuation resumed outside try/with *) continue k (i + 1)

slide-9
SLIDE 9

Demo

Concurrent round-robin scheduler

slide-10
SLIDE 10

Callback Hell

Asynchronous I/O in direct-style

slide-11
SLIDE 11
  • Demo: Echo server
  • Killer App

Callback Hell

+

Facebook’s new skin for OCaml Optimising compiler for OCaml to JavaScript

Asynchronous I/O in direct-style

slide-12
SLIDE 12

Concurrent data/sync structures

  • Channels, MVars, Queues, Stacks, Countdown latches, etc,.
  • Need to interface with the scheduler!
  • MVar_put & MVar_get as algebraic operations?

Program MVars Scheduler

Handler stack What is this interface?

slide-13
SLIDE 13

Scheduler Interface

effect Suspend : (('a,unit) continuation -> unit) -> 'a effect Resume : (('a,unit) continuation * 'a) -> unit let rec spawn f = match f () with | () -> dequeue () | effect Yield k -> enqueue k (); dequeue () | effect (Fork f) k -> enqueue k (); spawn f | effect (Suspend f) k -> f k; dequeue () | effect (Resume (k', v)) k -> enqueue k' v; ignore (continue k ())

slide-14
SLIDE 14

MVar

type 'a mvar_state = | Full of 'a * ('a * (unit,unit) continuation) Queue.t | Empty of ('a,unit) continuation Queue.t type 'a t = 'a mvar_state ref

  • Reagents https://github.com/ocamllabs/reagents
  • Composable lock-free programming

let put v mv = match !mv with | Full (_, q) -> perform @@ Suspend (fun k -> Queue.push (v,k) q) | Empty q -> if Queue.is_empty q then mv := Full (v, Queue.create ()) else let t = Queue.pop q in perform @@ Resume (t, v)

slide-15
SLIDE 15

Preemptive Multithreading

  • Conventional way: Build on top of signal handling
  • pen Sys

set_signal sigalrm (Signal_handle (fun _ -> let k = (* Get current continuation *) in Sched.enqueue k; let k' = Sched.dequeue () in (* Set current continuation to k' *)));; Unix.setitimer interval Unix.ITIMER_REAL

  • Not compositional: Signal handler is a callback
  • Unclear where the handler runs..
  • Can we do better with effect handlers?
slide-16
SLIDE 16

Preemptive Multithreading

  • Treat asynchronous interrupts as effects!
  • Can be raised asynchronously on demand

effect TimerInterrupt : unit let rec spawn f = match f () with | () -> dequeue () | effect Yield k -> yield k ... | effect TimerInterrupt k -> yield k and yield k = enqueue k; dequeue ()

  • What is the default behaviour for TimerInterrupt effect?
  • Should all signals be handled this way? effect Signal : int -> unit
slide-17
SLIDE 17
  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS
  • One-shot delimited continuations
  • Simplifies reasoning about resources - sockets, locks, etc.
  • Handlers —> Linked-list of fibers

Implementation

handle / continue

handler sp

call chain reference

slide-18
SLIDE 18

Implementation

handle / continue handle / continue

sp handler

call chain reference

  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS
  • One-shot delimited continuations
  • Simplifies reasoning about resources - sockets, locks, etc.
  • Handlers —> Linked-list of fibers
slide-19
SLIDE 19

perform

sp

handle / continue

Implementation

handler

call chain reference

  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS
  • One-shot delimited continuations
  • Simplifies reasoning about resources - sockets, locks, etc.
  • Handlers —> Linked-list of fibers
slide-20
SLIDE 20

Tricky bug

  • One-shot continuations + multicore schedulers

val call1cc : ('a cont -> 'a) -> 'a val throw : 'a cont -> 'a -> 'b

  • call1cc f, f run on the same stack!
  • Possible that k is concurrently resumed on a different core!

let put v mv = match !mv with | Full (v', q) -> call1cc (fun k -> Queue.push (v,k) q; let k' = Sched.dequeue () in throw k' ()) ....

slide-21
SLIDE 21

Tricky bug

  • No such bug here

let rec spawn f = match f () with | () -> dequeue () | effect Yield k -> enqueue k (); dequeue () | effect (Fork f) k -> enqueue k (); spawn f | effect (Suspend f) k -> f k; dequeue () | effect (Resume (k', v)) k -> enqueue k' v; ignore (continue k ())

  • f is run by the handler
  • Fiber performing suspend effect already suspended!
slide-22
SLIDE 22

Native-code fibers — Vanilla

OCaml start program C call OCaml callback C call OCaml callback

C OCaml C OCaml C OCaml

system stack

slide-23
SLIDE 23

C

system stack

Native-code fibers — Effects

OCaml heap

OCaml start program C call handle OCaml callback C call

C C

slide-24
SLIDE 24

Native-code fibers — Effects

  • Stack overflow checks for OCaml functions
  • Eliminate SO checks for small tail recursive leaf functions
  • Slop space (16 words) at the bottom of stack
  • Frame sizes statically known
  • OCaml Compiler: 18K functions; Eliminate checks for 11k functions
  • FFI calls are more expensive due to stack switching
  • Small context
  • No callee saved registers in OCaml
  • Allocation, exception, stack pointers in registers
  • Specialise for calls which {allocate / pass arguments on stack / do

neither}

slide-25
SLIDE 25

Performance: Vanilla OCaml

Normalised time (lower is better)

Effects ~0.9% slower

0.25 0.5 0.75 1 ae--add_times_nsec_sum_higher_ sequence-cps ae--04124___why_e36d6b_int-T- ae--04298___why_7ae35b_p4_3_ numal-k-means ae--Automaton_i_part2-B_transla ae--01192___why_98479f_p4_3_ ae--fill_assert_39_Ae- ae--00076___why_f2468a_Site_ce ae--00344___why_fb54b2_Foncti numal-fft chameneos-async ae--00224___why_c6049d_p9_17- ae--00020___why_bf6246_euler00 ae--00329___why_265778_p4_25 numal-lu-decomposition numal-levinson-durbin ae--08033___why_bebe52_p4_3_ numal-rnd_access ae--00145___why_0a8ac0_p9_15 ae--00195___fib__package-T-WP ae--02802___step_function_test__ ae--02362___why_be93d3_p4_3_ cpdf-squeeze ae--02182___why_3f7a7d_inverse_ ae--01201___flight_manager__pack thread-ring-async-pipe ae--00222___fib__package-T-WP_ chameneos-lwt ae--00893___why_b3d830_euler001 sequence async_echo_merge setrip thread-sleep-async ae--01012___p__package-T-WP_p thread-ring-lwt-mvar setrip-smallbuf numal-qr-decomposition numal-durand-kerner-aberth cpdf-transform valet-async lexifi-g2pp almabench numal-naive-multilayer jsontrip-sample async_rpc cohttp-lwt cohttp-async frama-c-idct sauvola-contrast cpdf-reformat chameneos-th minilight valet-lwt menhir-fancy js_of_ocaml menhir-standard bdd numal-simple_access cpdf-merge kb patdiff core_micro kb-no-exc ydump-sample frama-c-deflate menhir-sql ae--00115___why_b6d80d_relabel thread-ring-lwt-stream thread-sleep-lwt chameneos-evtchn

Effects Vanilla

slide-26
SLIDE 26

Performance : Chameneos-Redux

Time (S) 0.45 0.9 1.35 1.8 Iterations (X100,000) 1 2 3 4 5 6 7 8 9 10

Lwt Concurrency Monad GHC Fibers

Direct-style Specialised scheduler

slide-27
SLIDE 27

Generator from Iterator

(* val to_gen : 'a t -> (unit -> 'a option) *) let to_gen (type a) (t : a t) = let module M = struct effect Next : a -> unit end in let open M in let step = ref (fun () -> assert false) in let first_step () = try iter (fun x -> perform (Next x)) t; None with effect (Next v) k -> step := continue k; Some v in step := first_step; fun () -> !step () let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t

slide-28
SLIDE 28

Performance : Generator

Time (S) 1 2 3 4 Binary tree depth 15 16 17 18 19 20 21 22 23 24 25

Iterator Fiber Generator H/W Generator

slide-29
SLIDE 29

Continuation cloning

  • Our continuation are 1-shot.
  • Multi-shot continuations are useful for backtracking computations
  • Explicit cloning on demand!
  • Obj.clone_continuation : ('a,'b) continuation -> ('a,'b) continuation

effect Foo : unit let _ = try begin try perform Foo with effect Foo k -> continue k (perform Foo) end with effect Foo k -> continue (Obj.clone k) (); continue k ()

Continuation is resumed twice!

Exception: Invalid_argument "continuation already taken".

slide-30
SLIDE 30

Slowdown w.r.t exceptional queens (X times)

3.5 7 10.5 14

# Queens

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Exception (ref = 1) Option Multicore OCaml Eff Delimcc

Continuation cloning

slide-31
SLIDE 31

Affine ➔ Linear

  • Affine continuations: resumed at-most once
  • Difficult to reason about resource cleanup

let fd = Unix.openfile "hello.ml" [Unix.O_RDWR] 0o640 try foo fd; Unix.close fd with e -> Unix.close fd; raise e let foo fd = perform DoesNotReturn

slide-32
SLIDE 32
  • Affine continuations: resumed at-most once
  • Difficult to reason about resource cleanup

let fd = ref @@ Unix.openfile "hello.ml" [Unix.O_RDWR] 0o640 try foo !fd; Unix.close !fd with e -> Unix.close !fd; raise e | effect e k -> (* Dynamic wind *) Unix.close !fd; let res = perform e in fd := Unix.openfile "hello.ml" [Unix.O_RDWR] 0o640; continue k res let foo fd = perform DoesNotReturn

Affine ➔ Linear

slide-33
SLIDE 33
  • Affine continuations: resumed at-most once
  • Difficult to reason about resource cleanup
  • Linear continuations: resumed exactly once
  • Implicit finalisers for fibers
  • Always unwind the stack with exception ThreadDeath

K K K K K

Affine ➔ Linear

slide-34
SLIDE 34
  • Affine continuations: resumed at-most once
  • Difficult to reason about resource cleanup
  • Linear continuations: resumed exactly once
  • Implicit finalisers for fibers
  • Always unwind the stack with exception ThreadDeath

K K K K K raise ThreadDeath

Affine ➔ Linear

slide-35
SLIDE 35
  • Affine continuations: resumed at-most once
  • Difficult to reason about resource cleanup
  • Linear continuations: resumed exactly once
  • Implicit finalisers for fibers
  • Always unwind the stack with exception ThreadDeath

K K raise ThreadDeath (??)

Affine ➔ Linear

slide-36
SLIDE 36

Summary

  • Generalises control-flow programming
  • Async I/O, generators, promises, delimited control, etc,.
  • Practicality
  • Native one-shot fibers for performance backwards compatibility
  • Backwards compatible effect system (Leo White, Hope 2016 Keynote)
  • Real world Impact ➔ JavaScript :-)
  • React Fiber is based on OCaml effect handlers
  • Proposal to add effect handlers to EcmaScript
  • Effect-based programming still in its infancy