Effective Parallelism with Reagents KC Sivaramakrishnan University - - PowerPoint PPT Presentation

effective parallelism with reagents
SMART_READER_LITE
LIVE PREVIEW

Effective Parallelism with Reagents KC Sivaramakrishnan University - - PowerPoint PPT Presentation

Effective Parallelism with Reagents KC Sivaramakrishnan University of OCaml Cambridge Labs Multicore OCaml Concurrency Parallelism Libraries Language + Stdlib Compiler 2 Multicore OCaml Concurrency Parallelism Libraries


slide-1
SLIDE 1

Effective Parallelism with Reagents

“KC” Sivaramakrishnan

OCaml Labs University of Cambridge

slide-2
SLIDE 2

Multicore OCaml

2

Concurrency Parallelism

Compiler Language + Stdlib Libraries

slide-3
SLIDE 3

Multicore OCaml

2

Concurrency Parallelism

Compiler Language + Stdlib Libraries

slide-4
SLIDE 4

Multicore OCaml

2

Concurrency Parallelism

Compiler

Fibers

Language + Stdlib Libraries

slide-5
SLIDE 5

Multicore OCaml

2

Concurrency Parallelism

Compiler

Fibers

Language + Stdlib

  • 12M fibers/s
  • n 1 core
  • 30M fibers/s
  • n 4 cores

Libraries

slide-6
SLIDE 6

Multicore OCaml

2

Domains

Concurrency Parallelism

Compiler

Fibers

Language + Stdlib

  • 12M fibers/s
  • n 1 core
  • 30M fibers/s
  • n 4 cores

Libraries

slide-7
SLIDE 7

Multicore OCaml

2

Effects Domains

Concurrency Parallelism

Compiler

Fibers

Language + Stdlib

Domain API

  • 12M fibers/s
  • n 1 core
  • 30M fibers/s
  • n 4 cores

Libraries

slide-8
SLIDE 8

Multicore OCaml

2

Effects

Cooperative Concurrency, Async I/O, backtracking..

Domains

Concurrency Parallelism

Compiler

Fibers

Language + Stdlib

Domain API

  • 12M fibers/s
  • n 1 core
  • 30M fibers/s
  • n 4 cores

Libraries

slide-9
SLIDE 9

Multicore OCaml

2

Effects

Cooperative Concurrency, Async I/O, backtracking..

Reagents: lock- free programming Domains

Concurrency Parallelism

Compiler

Fibers

Language + Stdlib

Domain API

  • 12M fibers/s
  • n 1 core
  • 30M fibers/s
  • n 4 cores

Libraries

slide-10
SLIDE 10

2

Effects Reagents: lock- free programming

slide-11
SLIDE 11

Algebraic effects & handlers

slide-12
SLIDE 12
  • Programming and reasoning about computational effects

in a pure setting.

  • Cf. Monads

Algebraic effects & handlers

slide-13
SLIDE 13
  • Programming and reasoning about computational effects

in a pure setting.

  • Cf. Monads
  • Eff — http://www.eff-lang.org/

Algebraic effects & handlers

slide-14
SLIDE 14

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

slide-15
SLIDE 15

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

slide-16
SLIDE 16

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

slide-17
SLIDE 17

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)

slide-18
SLIDE 18

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)

slide-19
SLIDE 19

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)

slide-20
SLIDE 20

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)

slide-21
SLIDE 21

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)

val r : int = 5

slide-22
SLIDE 22

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)

val r : int = 5

fiber — lightweight stack

  • Heap-allocated
  • Dynamically resized
  • One-shot (affine), explicit cloning
slide-23
SLIDE 23

Cooperative Concurrency

(* Control operations on threads *) val fork : (unit -> unit) -> unit val yield : unit -> unit (* Runs the scheduler. *) val run : (unit -> unit) -> unit

slide-24
SLIDE 24

Cooperative Concurrency

(* Control operations on threads *) val fork : (unit -> unit) -> unit val yield : unit -> unit (* Runs the scheduler. *) val run : (unit -> unit) -> unit effect Fork : (unit -> unit) -> unit let fork f = perform (Fork f) effect Yield : unit let yield () = perform Yield

slide-25
SLIDE 25

Cooperative Concurrency

(* A concurrent round-robin scheduler *) let run main = let run_q = Queue.create () in let enqueue k = Queue.push k run_q in let rec dequeue () = if Queue.is_empty run_q then () else continue (Queue.pop run_q) () in let rec spawn f = (* Effect handler => instantiates fiber *) match f () with | () -> dequeue () | exception e -> print_string (Printexc.to_string e); dequeue () | effect Yield k -> enqueue k; dequeue () | effect (Fork f) k -> enqueue k; spawn f in spawn main

slide-26
SLIDE 26

Generator from Iterator

type 'a t = | Leaf | Node of 'a t * 'a * 'a t

slide-27
SLIDE 27

Generator from Iterator

let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t

slide-28
SLIDE 28

Generator from Iterator

(* val to_gen : 'a t -> (unit -> 'a option) *) let to_gen (type a) (t : a t) = let module M = struct effect Next : a -> unit end in let open M in let step = ref (fun () -> assert false) in let first_step () = try iter (fun x -> perform (Next x)) t; None with effect (Next v) k -> step := continue k; Some v in step := first_step; fun () -> !step () let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t

slide-29
SLIDE 29

Concurrency

Algebraic effects & handlers

  • Cooperative concurrency
  • Backtracking computations
  • Selection functionals
  • Inversion of control
  • Event-based Async I/O in direct-style
slide-30
SLIDE 30

Concurrency Parallelism

Algebraic effects & handlers

  • Cooperative concurrency
  • Backtracking computations
  • Selection functionals
  • Inversion of control
  • Event-based Async I/O in direct-style

Domain API

Spawn & Join domains

slide-31
SLIDE 31

Concurrency Parallelism

Algebraic effects & handlers

  • Cooperative concurrency
  • Backtracking computations
  • Selection functionals
  • Inversion of control
  • Event-based Async I/O in direct-style

Domain API

Spawn & Join domains

Reagents

Lock-free synchronisation & data structures

slide-32
SLIDE 32

JVM: java.util.concurrent

10

.Net: System.Concurrent.Collections

slide-33
SLIDE 33

JVM: java.util.concurrent

Synchronization Data structures

Reentrant locks Semaphores R/W locks Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Deques Sets Maps (hash & skiplist)

10

.Net: System.Concurrent.Collections

slide-34
SLIDE 34

JVM: java.util.concurrent

Synchronization Data structures

Reentrant locks Semaphores R/W locks Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Deques Sets Maps (hash & skiplist)

10

.Net: System.Concurrent.Collections

Not Composable

slide-35
SLIDE 35

How to build composable lock-free programs?

11

slide-36
SLIDE 36

lock-free

12

slide-37
SLIDE 37

lock-free

12

Under contention, at least 1 thread makes progress

slide-38
SLIDE 38

lock-free

12

Under contention, at least 1 thread makes progress Single thread in isolation makes progress

  • bstruction-free
slide-39
SLIDE 39

lock-free

12

Under contention, at least 1 thread makes progress Under contention, each thread makes progress

wait-free

Single thread in isolation makes progress

  • bstruction-free
slide-40
SLIDE 40

Compare-and-swap (CAS)

module CAS : sig val cas : 'a ref -> expect:'a -> update:'a -> bool end = struct (* atomically... *) let cas r ~expect ~update = if !r = expect then (r:= update; true) else false end

13

slide-41
SLIDE 41

Compare-and-swap (CAS)

module CAS : sig val cas : 'a ref -> expect:'a -> update:'a -> bool end = struct (* atomically... *) let cas r ~expect ~update = if !r = expect then (r:= update; true) else false end

  • Implemented atomically by processors
  • x86: CMPXCHG and friends
  • arm: LDREX, STREX, etc.
  • ppc: lwarx, stwcx, etc.

13

slide-42
SLIDE 42

CAS: cost versus contention

Threads

2 4 6 8

Conention (log-scale)

100% 0.33% 0.25% 0.2%

Throughput Sequential

1.0 0.81 0.62 0.42 0.23 0.04

0.5% 1% 2%

X

slide-43
SLIDE 43

3 2

Head

14

slide-44
SLIDE 44

3 2

Head

7

14

slide-45
SLIDE 45

3 2

Head

7

14

CAS attempt

slide-46
SLIDE 46

3 2

Head

7 5

14

CAS attempt

slide-47
SLIDE 47

3 2

Head

7 5

CAS fail

14

slide-48
SLIDE 48

3 2

Head

7 5

14

slide-49
SLIDE 49

3 2

Head

7 5

15

slide-50
SLIDE 50

module type TREIBER_STACK = sig type 'a t val push : 'a t -> 'a -> unit ... end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list ref let rec push s t = let cur = !s in if CAS.cas s cur (t::cur) then () else (backoff (); push s t) end

16

slide-51
SLIDE 51

module type TREIBER_STACK = sig type 'a t val push : 'a t -> 'a -> unit val try_pop : 'a t -> 'a option end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list ref let rec push s t = ... let rec try_pop s = match !s with | [] -> None | (x::xs) as cur -> if CAS.cas s cur xs then Some x else (backoff (); try_pop s) end

17

slide-52
SLIDE 52

let v = Treiber_stack.pop s1 in Treiber_stack.push s2 v

is not atomic

18

slide-53
SLIDE 53

Concurrency libraries are indispensable, but hard to build and extend

The Problem:

let v = Treiber_stack.pop s1 in Treiber_stack.push s2 v

is not atomic

18

slide-54
SLIDE 54

Scalable concurrent algorithms can be built and extended using abstraction and composition

Reagents

Treiber_stack.pop s1 >>> Treiber_stack.push s2

is atomic

19

slide-55
SLIDE 55

20

PLDI 2012

slide-56
SLIDE 56

20

Sequential >>> — Software transactional memory Parallel <*> — Join Calculus Selective <+> — Concurrent ML PLDI 2012

slide-57
SLIDE 57

20

Sequential >>> — Software transactional memory Parallel <*> — Join Calculus Selective <+> — Concurrent ML PLDI 2012

still lock-free!

slide-58
SLIDE 58

Design

21

slide-59
SLIDE 59

Lambda: the ultimate abstraction f

'a 'b

g

'b 'c

val f : 'a -> 'b val g : 'b -> 'c

22

slide-60
SLIDE 60

Lambda: the ultimate abstraction f

'a

g

'b 'c

(compose g f): 'a -> 'c

23

slide-61
SLIDE 61

f

'a 'b

Lambda abstraction:

24

slide-62
SLIDE 62

f

'a 'b

Lambda abstraction: Reagent abstraction:

'a 'b

R

('a,'b) Reagent.t

24

slide-63
SLIDE 63

f

'a 'b

Lambda abstraction: Reagent abstraction:

'a 'b

R

('a,'b) Reagent.t

24

val run : ('a,'b) Reagent.t -> 'a -> ‘b

slide-64
SLIDE 64

Thread Interaction

25

module type Reagents = sig type ('a,'b) t (* shared memory *) module Ref : Ref.S with type ('a,'b) reagent = ('a,'b) t (* communication channels *) module Channel : Channel.S with type ('a,'b) reagent = ('a,'b) t ... end

slide-65
SLIDE 65

module type Channel = sig type ('a,'b) endpoint type ('a,'b) reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end

slide-66
SLIDE 66

c: ('a,'b) endpoint

c

swap

'a 'b

module type Channel = sig type ('a,'b) endpoint type ('a,'b) reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end

slide-67
SLIDE 67

c: ('a,'b) endpoint

c

swap

'a 'b

c

swap

'b 'a

module type Channel = sig type ('a,'b) endpoint type ('a,'b) reagent val mk_chan : unit -> ('a,'b) endpoint * ('b,'a) endpoint val swap : ('a,'b) endpoint -> ('a,'b) reagent end

slide-68
SLIDE 68

c

swap

'a 'b c: ('a,'b) endpoint

slide-69
SLIDE 69

swap

Message passing

type 'a ref val upd : 'a ref

  • > f:(‘a -> 'b -> ('a * ‘c) option)
  • > ('b, 'c) Reagent.t

28

slide-70
SLIDE 70

swap upd

f

r

'a 'a 'b 'c

Message passing

type 'a ref val upd : 'a ref

  • > f:(‘a -> 'b -> ('a * ‘c) option)
  • > ('b, 'c) Reagent.t

28

slide-71
SLIDE 71

swap upd

f

Message passing Shared state

29

slide-72
SLIDE 72

swap upd

f 'a 'b

R

'a 'b

S

Message passing Shared state

29

slide-73
SLIDE 73

swap upd

f

R S

<+>

'a 'b

Message passing Shared state

29

slide-74
SLIDE 74

swap upd

f

R S

<+>

Message passing Shared state Disjunction

30

slide-75
SLIDE 75

swap upd

f

R S

<+>

'a 'b

R

'a 'c

S

Message passing Shared state Disjunction

30

slide-76
SLIDE 76

swap upd

f

R S

<+>

R S

<*>

'a ('b * 'c)

Message passing Shared state Disjunction

30

slide-77
SLIDE 77

swap upd

f

R S

<+>

R S

<*>

Message passing Shared state Disjunction Conjunction

31

slide-78
SLIDE 78

module type TREIBER_STACK = sig type 'a t val create : unit -> 'a t val push : 'a t -> ('a, unit) Reagent.t val pop : 'a t -> (unit, 'a) Reagent.t ... end module Treiber_stack : TREIBER_STACK = struct type 'a t = 'a list Ref.ref let create () = Ref.ref [] let push r x = Ref.upd r (fun xs x -> Some (x::xs,())) let pop r = Ref.upd r (fun l () -> match l with | [] -> None (* block *) | x::xs -> Some (xs,x)) ... end

32

slide-79
SLIDE 79

Composability

Treiber_stack.pop s1 >>> Treiber_stack.push s2

Transfer elements atomically

33

slide-80
SLIDE 80

Composability

Treiber_stack.pop s1 >>> Treiber_stack.push s2

Transfer elements atomically Consume elements atomically

Treiber_stack.pop s1 <*> Treiber_stack.pop s2

33

slide-81
SLIDE 81

Composability

Treiber_stack.pop s1 >>> Treiber_stack.push s2

Transfer elements atomically Consume elements atomically

Treiber_stack.pop s1 <*> Treiber_stack.pop s2

Consume elements from either

Treiber_stack.pop s1 <+> Treiber_stack.pop s2

33

slide-82
SLIDE 82

Composability

34

Transform arbitrary blocking reagent to a non-blocking reagent

slide-83
SLIDE 83

Composability

34

val lift : ('a -> 'b option) -> ('a,'b) t val constant : 'a -> ('b,'a) t

Transform arbitrary blocking reagent to a non-blocking reagent

slide-84
SLIDE 84

Composability

34

let attempt (r : ('a,'b) t) : ('a,'b option) t = (r >>> lift (fun x -> Some (Some x))) <+> (constant None) val lift : ('a -> 'b option) -> ('a,'b) t val constant : 'a -> ('b,'a) t

Transform arbitrary blocking reagent to a non-blocking reagent

slide-85
SLIDE 85

Composability

34

let attempt (r : ('a,'b) t) : ('a,'b option) t = (r >>> lift (fun x -> Some (Some x))) <+> (constant None) val lift : ('a -> 'b option) -> ('a,'b) t val constant : 'a -> ('b,'a) t

Transform arbitrary blocking reagent to a non-blocking reagent

let try_pop stack = attempt (pop stack)

slide-86
SLIDE 86
  • Philosopher’s alternate between thinking and

eating

  • Philosopher can only eat after obtaining both

forks

  • No philosopher starves
slide-87
SLIDE 87

type fork = {drop : (unit,unit) endpoint; take : (unit,unit) endpoint} let mk_fork () = let drop, take = mk_chan () in {drop; take} let drop f = swap f.drop let take f = swap f.take

  • Philosopher’s alternate between thinking and

eating

  • Philosopher can only eat after obtaining both

forks

  • No philosopher starves
slide-88
SLIDE 88

type fork = {drop : (unit,unit) endpoint; take : (unit,unit) endpoint} let mk_fork () = let drop, take = mk_chan () in {drop; take} let drop f = swap f.drop let take f = swap f.take

let eat l_fork r_fork = run (take l_fork <*> take r_fork) (); (* ... * eat * ... *) spawn @@ run (drop l_fork); spawn @@ run (drop r_fork)

  • Philosopher’s alternate between thinking and

eating

  • Philosopher can only eat after obtaining both

forks

  • No philosopher starves
slide-89
SLIDE 89

Implementation

36

slide-90
SLIDE 90

Phase 1 Phase 2

37

slide-91
SLIDE 91

Phase 1 Phase 2 Accumulate CASes

37

slide-92
SLIDE 92

Phase 1 Phase 2 Accumulate CASes Attempt k-CAS

37

slide-93
SLIDE 93

Accumulate CASes Attempt k-CAS

38

slide-94
SLIDE 94

Accumulate CASes Attempt k-CAS

Permanent failure

38

slide-95
SLIDE 95

Accumulate CASes Attempt k-CAS

Permanent failure Transient failure

38

slide-96
SLIDE 96

Accumulate CASes Attempt k-CAS

Permanent failure Transient failure

38

HTM Ready

slide-97
SLIDE 97

Accumulate CASes Attempt k-CAS

Permanent failure Transient failure

38

HTM Ready

Promising early results with Intel TSX!

slide-98
SLIDE 98

X

slide-99
SLIDE 99

Permanent failure

X

slide-100
SLIDE 100

Permanent failure Transient failure

X

slide-101
SLIDE 101

Permanent failure Transient failure Transient failure

X

slide-102
SLIDE 102

Permanent failure Transient failure ? failure Transient failure

X

slide-103
SLIDE 103

Permanent failure Transient failure ? failure Transient failure

P & P = P T & T = T P & T = T T & P = T

X

slide-104
SLIDE 104

Status

https://github.com/ocamllabs/ocaml-multicore https://github.com/ocamllabs/reagents

Synchronization Data structures

Locks Reentrant locks Semaphores R/W locks Reentrant R/W locks Condition variables Countdown latches Cyclic barriers Phasers Exchangers Queues Nonblocking Blocking (array & list) Synchronous Priority, nonblocking Priority, blocking Stacks Treiber Elimination backoff Counters Deques Sets Maps (hash & skiplist)

slide-105
SLIDE 105

Questions?

40

slide-106
SLIDE 106

STM vs Reagents

  • STM is more ambitious — atomic { … }. Reagents are

conservative.

  • Reagents = STM + Communication
  • Reagents don’t allow multiple writes to the same

memory location.

  • Reagents are lock-free. STMs are typically obstruction-

free.

41