Lightweight Concurrency in GHC KC Sivaramakrishnan Tim Harris - - PowerPoint PPT Presentation

lightweight concurrency in ghc
SMART_READER_LITE
LIVE PREVIEW

Lightweight Concurrency in GHC KC Sivaramakrishnan Tim Harris - - PowerPoint PPT Presentation

Lightweight Concurrency in GHC KC Sivaramakrishnan Tim Harris Simon Marlow Simon Peyton Jones 1 GHC: Concurrency and Parallelism MVars Safe foreign forkIO calls Bound threads Asynchronous Par Monad exceptions STM 2 Concurrency


slide-1
SLIDE 1

Lightweight Concurrency in GHC

KC Sivaramakrishnan Tim Harris Simon Marlow Simon Peyton Jones

1

slide-2
SLIDE 2

GHC: Concurrency and Parallelism

forkIO Bound threads Par Monad MVars STM Safe foreign calls Asynchronous exceptions

2

slide-3
SLIDE 3

Concurrency landscape in GHC

Capability 0 Capability N

Haskell Code LWT Scheduler OS Thread pool MVar STM RTS (C Code) Black Holes Safe FFI and more… preemptive, round-robin scheduler + work-sharing

slide-4
SLIDE 4

Idea

Haskell Code MVar+ STM+ OS Thread pool Concurrency Substrate RTS (C Code)

LWT Scheduler+

Black Holes Safe FFI

Capability 0 Capability N

4

Capability 0 Capability N

Haskell Code LWT Scheduler OS Thread pool MVar STM RTS (C Code) Black Holes Safe FFI and more…

slide-5
SLIDE 5

Capability 0 Capability N Capability 0 Capability N

Contributions

Haskell Code LWT Scheduler OS Thread pool MVar STM RTS (C Code) Haskell Code MVar+ STM+ OS Thread pool Concurrency Substrate Black Holes Safe FFI Black Holes Safe FFI and more… What should this be? How to unify these? Where do these live in the new design?

5

LWT Scheduler+

RTS (C Code)

slide-6
SLIDE 6

Concurrency Substrate

  • One-shot continuations

(SCont) and primitive transactional memory (PTM)

  • PTM is a bare-bones TM

– Better composability than CAS

  • --------------- PTM ----------------

data PTM a data PVar a instance Monad PTM atomically :: PTM a -> IO a newPVar :: a -> PTM (PVar a) readPVar :: PVar a -> PTM a writePVar :: PVar a -> a -> PTM ()

  • --------------- SCont --------------

data SCont -- Stack Continuations newSCont :: IO () -> IO SCont switch :: (SCont -> PTM SCont) -> IO () getCurrentSCont :: PTM SCont switchTo :: SCont -> PTM ()

6

slide-7
SLIDE 7

Switch

switch :: (SCont -> PTM SCont) -> IO ()

7

Current SCont SCont to switch to PTM!

slide-8
SLIDE 8
  • Primitive scheduler actions

– SCont {scheduleSContAction :: SCont -> PTM (), yieldControlAction :: PTM ()} – Expected from every user-level thread

8

Abstract Scheduler Interface

Haskell Code MVar+ STM+ Concurrency Substrate Black Holes Safe FFI How to unify these?

LWT Scheduler+

slide-9
SLIDE 9

Primitive Scheduler Actions (1)

9

scheduleSContAction :: SCont -> PTM () scheduleSContAction sc = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched writePVar $ contents ++ [sc] yieldControlAction :: PTM () yieldControlAction = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched case contents of x:tail -> do { writePVar $ contents tail; switchTo x -- DOES NOT RETURN }

  • therwise -> …
slide-10
SLIDE 10

Primitive Scheduler Actions (2)

10

scheduleSContAction :: SCont -> PTM () scheduleSContAction sc = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched writePVar $ contents ++ [sc] yieldControlAction :: PTM () yieldControlAction = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched case contents of x:tail -> do { writePVar $ contents tail; switchTo x -- DOES NOT RETURN }

  • therwise -> …

getScheduleSContAction :: SCont -> PTM (SCont -> PTM()) setScheduleSContAction :: SCont -> (SCont -> PTM()) -> PTM() getYieldControlAction :: SCont -> PTM (PTM ()) setScheduleSContAction :: SCont -> PTM () -> PTM ()

Substrate Primitives

slide-11
SLIDE 11

Primitive Scheduler Actions (3)

11

scheduleSContAction :: SCont -> PTM () scheduleSContAction sc = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched writePVar $ contents ++ [sc] yieldControlAction :: PTM () yieldControlAction = do sched :: PVar [SCont] <- -- get sched contents :: [SCont] <- readPVar sched case contents of x:tail -> do { writePVar $ contents tail; switchTo x -- DOES NOT RETURN }

  • therwise -> …

getScheduleSContAction :: SCont -> PTM (SCont -> PTM()) setScheduleSContAction :: SCont -> (SCont -> PTM()) -> PTM() getSSA = getScheduleSContAction setSSA = setScheduleScontAction getYieldControlAction :: SCont -> PTM (PTM ()) setScheduleSContAction :: SCont -> PTM () -> PTM () getYCA = getYieldControlAction setYCA = setYieldControlAction

Substrate Primitives Helper functions

slide-12
SLIDE 12

Building Concurrency Primitives (1)

12

yield :: IO () yield = atomically $ do s :: SCont <- getCurrentSCont

  • - Add current SCont to scheduler

ssa :: (SCont -> PTM ()) <- getSSA s enque :: PTM () <- ssa s enque

  • - Switch to next scont from scheduler

switchToNext :: PTM () <- getYCA s switchToNext

slide-13
SLIDE 13

Building Concurrency Primitives (2)

13

forkIO :: IO () -> IO SCont forkIO f = do ns <- newSCont f atomically $ do { s :: SCont <- getCurrentSCont;

  • - Initialize new sconts scheduler actions

ssa :: (SCont -> PTM ()) <- getSSA s; setSSA ns ssa; yca :: PTM () <- getYCA s; setYCA ns yca;

  • - Add to new scont current scheduler

enqueAct :: PTM () <- ssa ns; enqueAct } return ns

slide-14
SLIDE 14

Building MVars

14

An MVar is either empty or full and has a single hole

newtype MVar a = MVar (PVar (ST a)) data ST a = Full a [(a, PTM())] | Empty [(PVar a, PTM())] takeMVar :: MVar a -> IO a takeMVar (MVar ref) = do hole <- atomically $ newPVar undefined atomically $ do st <- readPVar ref case st of Empty ts -> do s <- getCurrentSCont ssa :: (SCont -> PTM ()) <- getSSA s wakeup :: PTM () <- ssa s writePVar ref $ v where v = Empty $ ts++[(hole, wakeup)] switchToNext <- getYCA s switchToNext Full x ((x', wakeup :: PTM ()):ts) -> do writePVar hole x writePVar ref $ Full x' ts wakeup

  • therwise -> …

atomically $ readPVar hole

slide-15
SLIDE 15

Building MVars

15

An MVar is either empty or full and has a single hole Result will be here

newtype MVar a = MVar (PVar (ST a)) data ST a = Full a [(a, PTM())] | Empty [(PVar a, PTM())] takeMVar :: MVar a -> IO a takeMVar (MVar ref) = do hole <- atomically $ newPVar undefined atomically $ do st <- readPVar ref case st of Empty ts -> do s <- getCurrentSCont ssa :: (SCont -> PTM ()) <- getSSA s wakeup :: PTM () <- ssa s writePVar ref $ v where v = Empty $ ts++[(hole, wakeup)] switchToNext <- getYCA s switchToNext Full x ((x', wakeup :: PTM ()):ts) -> do writePVar hole x writePVar ref $ Full x' ts wakeup

  • therwise -> …

atomically $ readPVar hole

slide-16
SLIDE 16

Building MVars

16

An MVar is either empty or full and has a single hole Result will be here If the mvar is empty (1) Append hole & wakeup info to mvar list (getSSA!) (2) Yield control to scheduler (getYCA!)

newtype MVar a = MVar (PVar (ST a)) data ST a = Full a [(a, PTM())] | Empty [(PVar a, PTM())] takeMVar :: MVar a -> IO a takeMVar (MVar ref) = do hole <- atomically $ newPVar undefined atomically $ do st <- readPVar ref case st of Empty ts -> do s <- getCurrentSCont ssa :: (SCont -> PTM ()) <- getSSA s wakeup :: PTM () <- ssa s writePVar ref $ v where v = Empty $ ts++[(hole, wakeup)] switchToNext <- getYCA s switchToNext Full x ((x', wakeup :: PTM ()):ts) -> do writePVar hole x writePVar ref $ Full x' ts wakeup

  • therwise -> …

atomically $ readPVar hole

slide-17
SLIDE 17

Building MVars

17

An MVar is either empty or full and has a single hole Result will be here Wake up a pending writer, if

  • any. wakeup is a PTM ()!

MVar is scheduler agnostic! If the mvar is empty (1) Append hole & wakeup info to mvar list (getSSA!) (2) Yield control to scheduler (getYCA!)

newtype MVar a = MVar (PVar (ST a)) data ST a = Full a [(a, PTM())] | Empty [(PVar a, PTM())] takeMVar :: MVar a -> IO a takeMVar (MVar ref) = do hole <- atomically $ newPVar undefined atomically $ do st <- readPVar ref case st of Empty ts -> do s <- getCurrentSCont ssa :: (SCont -> PTM ()) <- getSSA s wakeup :: PTM () <- ssa s writePVar ref $ v where v = Empty $ ts++[(hole, wakeup)] switchToNext <- getYCA s switchToNext Full x ((x', wakeup :: PTM ()):ts) -> do writePVar hole x writePVar ref $ Full x' ts wakeup

  • therwise -> …

atomically $ readPVar hole

slide-18
SLIDE 18

Interaction of C RTS and User-level scheduler

  • Many “Events” that necessitate actions on the scheduler

become apparent only in the C part of the RTS

18

Haskell Code MVar+ STM+ Concurrency Substrate

LWT Scheduler+

Safe FFI Black Hole Asynchronous exceptions Finalizers

slide-19
SLIDE 19

Interaction of C RTS and User-level scheduler

  • Many “Events” that necessitate actions on the scheduler

become apparent only in the C part of the RTS

19

Haskell Code MVar+ STM+ Concurrency Substrate

LWT Scheduler+

Safe FFI Black Hole Asynchronous exceptions

Capability X

UT

Pending upcall queue :: [PTM ()] Upcall Thread Finalizers Re-use primitive scheduler actions!

slide-20
SLIDE 20

Blackholes

20

T1 T2 T3

Capability 0 Capability 1

T T T

 Running  Suspended  Blocked Thunk evaluating..

slide-21
SLIDE 21

Blackholes

21

T1 T2 T3

Capability 0 Capability 1

T T T

 Running  Suspended  Blocked BH thunk “blackholed”

slide-22
SLIDE 22

Blackholes

22

T1 T2 T3

Capability 0 Capability 1

T T T

 Running  Suspended  Blocked BH enters blackhole

slide-23
SLIDE 23

Blackholes

23

T1 T2 T3

Capability 0 Capability 1

T T T

 Running  Suspended  Blocked BH

slide-24
SLIDE 24

Blackholes

24

T1 T2 T3

Capability 0 Capability 1

T T T

 Running  Suspended  Blocked BH Yield control action

slide-25
SLIDE 25

Blackholes

25

T1 T2 T3

Capability 0 Capability 1

T T T

 Running  Suspended  Blocked V Schedule SCont action finishes evaluation

slide-26
SLIDE 26

Blackholes : The Problem

26

T2

BH

T T T

 Running  Suspended  Blocked

T1

Capability 0

Switch $ \T1 -> do

  • return T2
slide-27
SLIDE 27

Blackholes : The Problem

27

T2

BH

T T T

 Running  Suspended  Blocked

T1

Capability 0

Switch $ \T1 -> do

  • return T2

enters blackhole

  • In order to make progress, we need to resume to T2
  • But, in order to resume to T2, we need to resume T2

(Deadlocked!)

– Can be resolved through runtime system tricks (Work in Progress!)

slide-28
SLIDE 28

Conclusions

  • Status

– Mostly implemented (SConts, PTM, Simple schedulers, MVars, Safe FFI, bound threads, asynchronous exceptions, finalizers, etc.) – 2X to 3X slower on micro benchmarks (programs only doing synchronization work)

  • To-do

– Re-implement Control.Concurrent with LWC – Formal operational semantics – Building real-world programs

  • Open questions

– Hierarchical schedulers, Thread priority, load balancing, Fairness, etc. – STM on top of PTM – PTM on top of SpecTM – Integration with par/seq, evaluation strategies, etc. – and more…

28