SLIDE 1
Parallelism & Concurrency
Advanced functional programming - Lecture 9
Trevor L. McDonell (& Wouter Swierstra)
1
SLIDE 2 Parallelism & Concurrency
- Parallelism vs. Concurrency
- Related concepts, but not the same
- Both give up the strictly sequential execution model of the Von Neumann
machine
- Concurrent programming:
- Structuring a program into different, interacting tasks
- Tasks may be executed simultaneously or interleaved
- In general non-deterministic
- Examples: OS kernel, GUI, web server
- Parallel programming:
- Improving the execution speed of an application
- Simultaneous use of multiple physical processing elements
- Examples: Video encoder, image processing, simulation codes
2
SLIDE 3 Overview
Haskell provides many different tools for concurrency and parallelism:
- Basic concurrency primitives (locks)
- Software transactional memory (STM)
- Erlang-style message passing (Cloud Haskell)
- Primitives to control evaluation strategies
- Data-parallel arrays
- GPU programming
- …
3
SLIDE 4 Overview
- I highly recommend Simon Marlow’s book Parallel and Concurrent
Programming in Haskell, which you can read it free online: https://simonmar.github.io/pages/pcph.html
4
SLIDE 5
Concurrency
5
SLIDE 6 Working with threads
Control.Concurrent
forkIO :: IO () -> IO ThreadId
- - managing the current thread
threadDelay :: Int -> IO ()
yield :: IO () myThreadId :: IO ThreadId
throwTo :: Exception e => ThreadId -> e -> IO ()
6
SLIDE 7 Forking threads
forkIO :: IO () -> IO ThreadId
- Using threads forces you to use IO
- Any thread can create new threads
- If the main program ends, all its threads are stopped too
- You can explicitly control other threads by sending them exceptions via
their ThreadId (e.g. to kill the thread)
7
SLIDE 8 Haskell threads
Haskell threads created with forkIO are not OS threads!
- These threads are very lightweight; they are created and scheduled by
the GHC runtime system
- If you use the threaded version of the runtime system (pass -threaded
to the compiler), multiple OS threads may be used behind the scenes
- GHC’s runtime is very clever: there are many options provided for
configuring it and obtaining debug information
8
SLIDE 9 Sharing data between threads
If we fork off a thread of type IO (), how do we observe its result?
- We can create explicit references to mutable memory in Haskell using
IORefs to share memory between threads
- Using IORefs to share data between threads is unsafe! In the sense that
it can lead to race conditions and other inconsistent states
- Generally, when working with threads, you have to be careful that they
don’t interfere with each other
9
SLIDE 10 Mutable variables in Haskell
Data.IORef newIORef :: a -> IO (IORef a) readIORef :: IORef a -> IO a writeIORef :: IORef a -> a -> IO () modifyIORef :: IORef a -> (a -> a) -> IO ()
- A value of type IORef a is a mutable reference (pointer) to a value of
type a
- Because references are mutable, all operations have results in IO
10
SLIDE 11
Example
test :: Int -> IO () test n = do x <- newIORef 0 mapM_ (forkIO . loop x) [1..n] loop x 0 loop :: IORef Int -> Int -> IO () loop ref m = do writeIORef ref m n <- readIORef ref when (m /= n) $ putStrLn (show m) loop ref m Question: What, if anything, will the following code produce?
11
SLIDE 12 Shared state concurrency
Non-determinism makes it much harder to develop correct programs
- Threads communicate via a shared state
- Problem: inconsistent data structures, race conditions
12
SLIDE 13
Shared state concurrency
We require a lock (of some kind) to control access to the shared state
13
SLIDE 14 Synchronised mutable variables in Haskell
Control.Concurrent.MVar newMVar :: a -> IO (MVar a) newEmptyMVar :: IO (MVar a) takeMVar :: MVar a -> IO a
putMVar :: MVar a -> a -> IO ()
- - wait if already full
- More flexible than IORef, and can be used to implement concurrency
primitives such as locks and semaphores
- An MVar may be either empty or full: a thread will block trying to read
from an empty MVar, or trying to write to a full one
- The runtime system manages blocked threads with some fairness
guarantee
14
SLIDE 15
Example: Bank account
Model a bank account and operations like withdrawal, deposit, and transfer of funds between accounts. It should not be possible to observe a state where, during a transfer, money has been withdrawn from one account without yet being deposited into the target account.
15
SLIDE 16 Example: Bank account
withdraw :: Int -> MVar Int -> IO Bool withdraw amount account = modifyMVar account $ \balance ->
if balance >= amount then return (balance - amount, True) else return (balance, False) transfer :: Int -> MVar Int -> MVar Int -> IO Bool transfer amount from to = do withdraw amount from
- - inconsistent state mustn't be observable!
deposit amount to
16
SLIDE 17 Example: Bank account
We need to implement transfer differently transfer :: Int -> MVar Int -> MVar Int -> IO Bool transfer amount from to = withMVar from $ \balance_from -> withMVar to $ \balance_to
...
- Question: What happens if someone simultaneously tries to transfer
funds in the opposite direction?
- Locks must be acquired in a fixed (global) order, otherwise there is a
potential for deadlock
17
SLIDE 18 Example: Bank account
We need to implement transfer differently transfer :: Int -> MVar Int -> MVar Int -> IO Bool transfer amount from to = withMVar from $ \balance_from -> withMVar to $ \balance_to
...
- Question: What happens if someone simultaneously tries to transfer
funds in the opposite direction?
- Locks must be acquired in a fixed (global) order, otherwise there is a
potential for deadlock
17
SLIDE 19 Concurrency using locks
The good:
The bad:
- Taking too many or too few locks
- Taking the wrong locks, or in the wrong order
- Difficult error recovery
- Lost wake-ups and erroneous retries
The ugly:
- Lock’s don’t support modular programming
- We had to inline the definition of withdraw and deposit into
transfer
18
SLIDE 20 Software Transactional Memory (STM)
Control.Concurrent.STM atomically :: STM a -> IO a
newTVar :: a -> STM (TVar a)
- - STM equivalent of IORef
newTVarIO :: a -> IO (TVar a) readTVar :: TVar a -> STM a writeTVar :: TVar a -> a -> STM ()
- A concurrency abstraction which takes ideas from database systems
- Threads execute transactions, whose effects can be undone if necessary
- atomicity: all effects of executing a transaction become visible at once
- isolation: can not see the effects of other threads
19
SLIDE 21 Example: Bank account, revisted
transfer :: Int -> TVar Int -> TVar Int -> IO Bool transfer amount from to = atomically $ do
<- withdraw amount from
when ok $ deposit amount to return ok
- Modular concurrency!
- Be optimistic: locks are pessimistic
20
SLIDE 22 Software Transactional Memory
retry :: STM a
- rElse :: STM a -> STM a -> STM a
- The retry function rolls back the effects of the current transaction and
restarts the atomic operation
- orElse offers an alternative to immediate execution: if the first
alternative leads to a retry, attempt the second These operations allow you to assemble more complex transactions
21
SLIDE 23 STM is lock free
An alternative to concurrent programming using lock/mutex/synchronised methods Compositional and modular concurrent programming
- Software transactional memory does not use locking: deadlocks can not
- ccur
- Robust in the presence of failure or cancellation
- However, large transactions can take a huge number of retries, so STM
works best if transactions are small, or unlikely to interfere The concept of STM has been around since ’95
- Can it be implemented efficiently?
22
SLIDE 24 STM implementation
Naive implementation: a single global lock Better implementation:
- Each transaction keeps a log of all of the memory accesses during a
transaction (record initial value and latest update), but does not actually perform any writes yet
- At the end of the transaction, validate the log: if the initial values are the
same as the current values, the memory is still consistent and the transaction is committed; otherwise, it is restarted
- Validation and committing must be truly atomic
23
SLIDE 25 STM in other languages
Software transactional memory is supported in many languages, including C/C++, C#, Java, Perl, Python, Scala, OCaml, SmallTalk … Transactions try to commit, but roll-back and retry later if the log is no longer consistent Question: Why might this cause problems (in other languages)?
- Haskell’s type system is particularly well suited to statically check the
restrictions required by STM because side effects are controlled derp :: IO a derp = atomically $ do brexit
(international) side effects retry
??
24
SLIDE 26 STM in other languages
Software transactional memory is supported in many languages, including C/C++, C#, Java, Perl, Python, Scala, OCaml, SmallTalk … Transactions try to commit, but roll-back and retry later if the log is no longer consistent Question: Why might this cause problems (in other languages)?
- Haskell’s type system is particularly well suited to statically check the
restrictions required by STM because side effects are controlled derp :: IO a derp = atomically $ do brexit
(international) side effects retry
??
24
SLIDE 27 Other libraries: async
Control.Concurrent.Async async :: IO a -> IO (Async a) wait :: Async a -> IO a
- Provides a higher-level interface over threads, in which an Async a is a
concurrent thread which will eventually deliver a result of type a
- Provides ways to create Async computations, wait for their results, and
cancel them
25
SLIDE 28 Other libraries: monad-par
Control.Monad.Par runPar :: Par a -> a fork :: Par () -> Par () new :: Par (IVar a) put :: NFData a => IVar a -> a -> Par () get :: IVar a -> Par a
- Together, fork and IVars allow the construction of dataflow networks
- Similar to async, but implemented entirely as a library
26
SLIDE 29 Other libraries: lvish
Control.LVish
- Based on monotonically increasing data structures
- Two subcomputations may independently update a lattice variable; the
result is the least upper bound of the two states
- A threshold set of pairwise incompatible “trigger values” determines when
it is safe for a read to return
27
SLIDE 30
Parallelism
28
SLIDE 31 Recall: lazy evaluation
In Haskell…
- Expressions are only evaluated if actually required
- The leftmost outermost reducible sub-expression (redex) is chosen to
achieve this
- Sharing is introduced in order to prevent evaluating expressions multiple
times If no redexes are left, an expression is in normal form. If the top-level of an expression is a constructor or lambda, then it is in weak head normal form.
29
SLIDE 32 Recall: forcing evaluation
Haskell has the following primitive function: seq :: a -> b -> b A call of the form seq x y
- First evaluates x up to WHNF
- Then proceeds normally to compute y
We can use this to define strict function application: ($!) :: (a -> b) -> a -> b f $! x = x `seq` f x
30
SLIDE 33 Deterministic Parallelism in Haskell
Basic parallelism:
- Mark parts of the program that we consider suitable for parallel
execution
- Let the runtime system decide the details!
- We can even evaluate speculatively: we might not know for certain that
the result is required, but in a pure language this is harmless
31
SLIDE 34 The Eval monad
Control.Parallel.Strategies runEval :: Eval a -> a rseq :: a -> Eval a rpar :: a -> Eval a
- rseq marks its argument as a computation that should be evaluated (to
WHNF) before continuing (equivalent to seq)
- rpar marks its argument as a computation that might be beneficial to
be evaluated in parallel
32
SLIDE 35
Example
runEval $ do a <- rpar (f x) b <- rseq (f y) rseq a return (a, b)
33
SLIDE 36 Sparks
What is rpar doing?
- The runtime system manages multiple capabilities, usually one per CPU
core
- An expression marked for parallel execution in rpar creates a spark in
the spark pool, a data structure storing references to sparks on the heap
- If a capability is idle, it can steal a spark from the pool
- Spark evaluation is speculative
34
SLIDE 37
Strategies
Annotating expressions using the Eval monad feels very low-level, so let’s abstract a little: type Strategy a = a -> Eval a Observation! rpar :: Strategy a rseq :: Strategy a return :: Strategy a
35
SLIDE 38
Strategies
Annotating expressions using the Eval monad feels very low-level, so let’s abstract a little: type Strategy a = a -> Eval a Observation! rpar :: Strategy a rseq :: Strategy a return :: Strategy a
35
SLIDE 39
Applying strategies
using :: a -> Strategy a -> a x `using` s = runEval (s x) The using function takes a value of type a and a Strategy for evaluating things of type a, and applies the strategy to the value. This allows us to clearly distinguish what the program does, from the code which adds parallelism. We can define a sequential algorithm, and then experiment with the best evaluation strategy.
36
SLIDE 40
More strategies
We can define parameterised strategies: evalList :: Strategy a -> Strategy [a] evalList s [] = return [] evalList s (x:xs) = do x' <- s x xs' <- evalList s xs return (x' : xs')
37
SLIDE 41
Common patterns
…and use these to succinctly define a parallel map: parMap :: (a -> b) -> [a] -> [b] parMap f xs = map f xs `using` evalList rpar We can easily define similar strategies for other data types!
38
SLIDE 42
Common patterns
…and use these to succinctly define a parallel map: parMap :: (a -> b) -> [a] -> [b] parMap f xs = map f xs `using` evalList rpar We can easily define similar strategies for other data types!
38
SLIDE 43 Other libraries: repa
Data.Array.Repa The repa library provides operations on dense, multi-dimensional arrays, which are automatically executed in parallel. data Array r sh e
- e is the type of the array elements; Int, Float, etc.
- sh describes the shape of the array, as a snoc-list at both the type and
value level; one-dimensional index value Z :. 3 has type Z :. Int
- Z represents a zero-dimensional array
- (:.) adds an inner-most dimension
- r is a representation tag which determines what structure holds the data
39
SLIDE 44 Other libraries: repa
Data.Array.Repa Mapping a function over an array produces a delayed array: the result is not computed immediately, rather it will be fused into the operation which consumes those values. Repa.map :: (Shape sh, Source r a) => (a -> b)
- > Array r sh a
- > Array D sh b
- - tag 'D' represents delayed arrays
40
SLIDE 45 Other libraries: accelerate
Data.Array.Accelerate Similar to repa, provides operations on dense, multi-dimensional arrays, but implemented as a (deeply) embedded language. Accelerate.map :: (Shape sh, Elt a, Elt b) => (Exp a -> Exp b)
- - Exp: embedded scalar term
- > Acc (Array sh a)
- - Acc: embedded parallel term
- > Acc (Array sh b)
41
SLIDE 46 Other libraries: accelerate
Data.Array.Accelerate Internally, uses many of the ideas described in this course
- Laziness
- Monad transformers
- Type and data families, reified type representations
- GADTs, polymorphic recursion
- Example: (internal) variables represented by typed de Bruijn indices:
data Idx env t where ZeroIdx :: Idx (env, t) t SuccIdx :: Idx env t -> Idx (env, s) t
42
SLIDE 47 Summary
There are many different approaches to parallel and concurrent programming that Haskell supports. Each may be suitable for a different purpose.
- Parallel and concurrent programming is hard!
- Concurrency is difficult to get right; parallelism is difficult to get fast
- Purity is great, but laziness is a mixed blessing
43