Parallelism & Concurrency Advanced functional programming - - - PowerPoint PPT Presentation

parallelism concurrency
SMART_READER_LITE
LIVE PREVIEW

Parallelism & Concurrency Advanced functional programming - - - PowerPoint PPT Presentation

Parallelism & Concurrency Advanced functional programming - Lecture 9 Trevor L. McDonell (& Wouter Swierstra) 1 Parallelism & Concurrency Parallelism vs. Concurrency Related concepts, but not the same Both give up the


slide-1
SLIDE 1

Parallelism & Concurrency

Advanced functional programming - Lecture 9

Trevor L. McDonell (& Wouter Swierstra)

1

slide-2
SLIDE 2

Parallelism & Concurrency

  • Parallelism vs. Concurrency
  • Related concepts, but not the same
  • Both give up the strictly sequential execution model of the Von Neumann

machine

  • Concurrent programming:
  • Structuring a program into different, interacting tasks
  • Tasks may be executed simultaneously or interleaved
  • In general non-deterministic
  • Examples: OS kernel, GUI, web server
  • Parallel programming:
  • Improving the execution speed of an application
  • Simultaneous use of multiple physical processing elements
  • Examples: Video encoder, image processing, simulation codes

2

slide-3
SLIDE 3

Overview

Haskell provides many different tools for concurrency and parallelism:

  • Basic concurrency primitives (locks)
  • Software transactional memory (STM)
  • Erlang-style message passing (Cloud Haskell)
  • Primitives to control evaluation strategies
  • Data-parallel arrays
  • GPU programming

3

slide-4
SLIDE 4

Overview

  • I highly recommend Simon Marlow’s book Parallel and Concurrent

Programming in Haskell, which you can read it free online: https://simonmar.github.io/pages/pcph.html

4

slide-5
SLIDE 5

Concurrency

5

slide-6
SLIDE 6

Working with threads

Control.Concurrent

  • - creating a thread

forkIO :: IO () -> IO ThreadId

  • - managing the current thread

threadDelay :: Int -> IO ()

  • - delay in microseconds

yield :: IO () myThreadId :: IO ThreadId

  • - managing other threads

throwTo :: Exception e => ThreadId -> e -> IO ()

6

slide-7
SLIDE 7

Forking threads

forkIO :: IO () -> IO ThreadId

  • Using threads forces you to use IO
  • Any thread can create new threads
  • If the main program ends, all its threads are stopped too
  • You can explicitly control other threads by sending them exceptions via

their ThreadId (e.g. to kill the thread)

7

slide-8
SLIDE 8

Haskell threads

Haskell threads created with forkIO are not OS threads!

  • These threads are very lightweight; they are created and scheduled by

the GHC runtime system

  • If you use the threaded version of the runtime system (pass -threaded

to the compiler), multiple OS threads may be used behind the scenes

  • GHC’s runtime is very clever: there are many options provided for

configuring it and obtaining debug information

8

slide-9
SLIDE 9

Sharing data between threads

If we fork off a thread of type IO (), how do we observe its result?

  • We can create explicit references to mutable memory in Haskell using

IORefs to share memory between threads

  • Using IORefs to share data between threads is unsafe! In the sense that

it can lead to race conditions and other inconsistent states

  • Generally, when working with threads, you have to be careful that they

don’t interfere with each other

9

slide-10
SLIDE 10

Mutable variables in Haskell

Data.IORef newIORef :: a -> IO (IORef a) readIORef :: IORef a -> IO a writeIORef :: IORef a -> a -> IO () modifyIORef :: IORef a -> (a -> a) -> IO ()

  • A value of type IORef a is a mutable reference (pointer) to a value of

type a

  • Because references are mutable, all operations have results in IO

10

slide-11
SLIDE 11

Example

test :: Int -> IO () test n = do x <- newIORef 0 mapM_ (forkIO . loop x) [1..n] loop x 0 loop :: IORef Int -> Int -> IO () loop ref m = do writeIORef ref m n <- readIORef ref when (m /= n) $ putStrLn (show m) loop ref m Question: What, if anything, will the following code produce?

11

slide-12
SLIDE 12

Shared state concurrency

Non-determinism makes it much harder to develop correct programs

  • Threads communicate via a shared state
  • Problem: inconsistent data structures, race conditions

12

slide-13
SLIDE 13

Shared state concurrency

We require a lock (of some kind) to control access to the shared state

13

slide-14
SLIDE 14

Synchronised mutable variables in Haskell

Control.Concurrent.MVar newMVar :: a -> IO (MVar a) newEmptyMVar :: IO (MVar a) takeMVar :: MVar a -> IO a

  • - wait if empty

putMVar :: MVar a -> a -> IO ()

  • - wait if already full
  • More flexible than IORef, and can be used to implement concurrency

primitives such as locks and semaphores

  • An MVar may be either empty or full: a thread will block trying to read

from an empty MVar, or trying to write to a full one

  • The runtime system manages blocked threads with some fairness

guarantee

14

slide-15
SLIDE 15

Example: Bank account

Model a bank account and operations like withdrawal, deposit, and transfer of funds between accounts. It should not be possible to observe a state where, during a transfer, money has been withdrawn from one account without yet being deposited into the target account.

15

slide-16
SLIDE 16

Example: Bank account

withdraw :: Int -> MVar Int -> IO Bool withdraw amount account = modifyMVar account $ \balance ->

  • - acquires lock

if balance >= amount then return (balance - amount, True) else return (balance, False) transfer :: Int -> MVar Int -> MVar Int -> IO Bool transfer amount from to = do withdraw amount from

  • - inconsistent state mustn't be observable!

deposit amount to

16

slide-17
SLIDE 17

Example: Bank account

We need to implement transfer differently transfer :: Int -> MVar Int -> MVar Int -> IO Bool transfer amount from to = withMVar from $ \balance_from -> withMVar to $ \balance_to

  • >

...

  • Question: What happens if someone simultaneously tries to transfer

funds in the opposite direction?

  • Locks must be acquired in a fixed (global) order, otherwise there is a

potential for deadlock

17

slide-18
SLIDE 18

Example: Bank account

We need to implement transfer differently transfer :: Int -> MVar Int -> MVar Int -> IO Bool transfer amount from to = withMVar from $ \balance_from -> withMVar to $ \balance_to

  • >

...

  • Question: What happens if someone simultaneously tries to transfer

funds in the opposite direction?

  • Locks must be acquired in a fixed (global) order, otherwise there is a

potential for deadlock

17

slide-19
SLIDE 19

Concurrency using locks

The good:

  • ..?

The bad:

  • Taking too many or too few locks
  • Taking the wrong locks, or in the wrong order
  • Difficult error recovery
  • Lost wake-ups and erroneous retries

The ugly:

  • Lock’s don’t support modular programming
  • We had to inline the definition of withdraw and deposit into

transfer

18

slide-20
SLIDE 20

Software Transactional Memory (STM)

Control.Concurrent.STM atomically :: STM a -> IO a

  • - run a transaction

newTVar :: a -> STM (TVar a)

  • - STM equivalent of IORef

newTVarIO :: a -> IO (TVar a) readTVar :: TVar a -> STM a writeTVar :: TVar a -> a -> STM ()

  • A concurrency abstraction which takes ideas from database systems
  • Threads execute transactions, whose effects can be undone if necessary
  • atomicity: all effects of executing a transaction become visible at once
  • isolation: can not see the effects of other threads

19

slide-21
SLIDE 21

Example: Bank account, revisted

transfer :: Int -> TVar Int -> TVar Int -> IO Bool transfer amount from to = atomically $ do

  • - run a transaction
  • k

<- withdraw amount from

  • - no locks!

when ok $ deposit amount to return ok

  • Modular concurrency!
  • Be optimistic: locks are pessimistic

20

slide-22
SLIDE 22

Software Transactional Memory

retry :: STM a

  • rElse :: STM a -> STM a -> STM a
  • The retry function rolls back the effects of the current transaction and

restarts the atomic operation

  • orElse offers an alternative to immediate execution: if the first

alternative leads to a retry, attempt the second These operations allow you to assemble more complex transactions

21

slide-23
SLIDE 23

STM is lock free

An alternative to concurrent programming using lock/mutex/synchronised methods Compositional and modular concurrent programming

  • Software transactional memory does not use locking: deadlocks can not
  • ccur
  • Robust in the presence of failure or cancellation
  • However, large transactions can take a huge number of retries, so STM

works best if transactions are small, or unlikely to interfere The concept of STM has been around since ’95

  • Can it be implemented efficiently?

22

slide-24
SLIDE 24

STM implementation

Naive implementation: a single global lock Better implementation:

  • Each transaction keeps a log of all of the memory accesses during a

transaction (record initial value and latest update), but does not actually perform any writes yet

  • At the end of the transaction, validate the log: if the initial values are the

same as the current values, the memory is still consistent and the transaction is committed; otherwise, it is restarted

  • Validation and committing must be truly atomic

23

slide-25
SLIDE 25

STM in other languages

Software transactional memory is supported in many languages, including C/C++, C#, Java, Perl, Python, Scala, OCaml, SmallTalk … Transactions try to commit, but roll-back and retry later if the log is no longer consistent Question: Why might this cause problems (in other languages)?

  • Haskell’s type system is particularly well suited to statically check the

restrictions required by STM because side effects are controlled derp :: IO a derp = atomically $ do brexit

  • - :: IO ()

(international) side effects retry

  • - :: STM a

??

24

slide-26
SLIDE 26

STM in other languages

Software transactional memory is supported in many languages, including C/C++, C#, Java, Perl, Python, Scala, OCaml, SmallTalk … Transactions try to commit, but roll-back and retry later if the log is no longer consistent Question: Why might this cause problems (in other languages)?

  • Haskell’s type system is particularly well suited to statically check the

restrictions required by STM because side effects are controlled derp :: IO a derp = atomically $ do brexit

  • - :: IO ()

(international) side effects retry

  • - :: STM a

??

24

slide-27
SLIDE 27

Other libraries: async

Control.Concurrent.Async async :: IO a -> IO (Async a) wait :: Async a -> IO a

  • Provides a higher-level interface over threads, in which an Async a is a

concurrent thread which will eventually deliver a result of type a

  • Provides ways to create Async computations, wait for their results, and

cancel them

  • Built on top of STM

25

slide-28
SLIDE 28

Other libraries: monad-par

Control.Monad.Par runPar :: Par a -> a fork :: Par () -> Par () new :: Par (IVar a) put :: NFData a => IVar a -> a -> Par () get :: IVar a -> Par a

  • Together, fork and IVars allow the construction of dataflow networks
  • Similar to async, but implemented entirely as a library

26

slide-29
SLIDE 29

Other libraries: lvish

Control.LVish

  • Based on monotonically increasing data structures
  • Two subcomputations may independently update a lattice variable; the

result is the least upper bound of the two states

  • A threshold set of pairwise incompatible “trigger values” determines when

it is safe for a read to return

27

slide-30
SLIDE 30

Parallelism

28

slide-31
SLIDE 31

Recall: lazy evaluation

In Haskell…

  • Expressions are only evaluated if actually required
  • The leftmost outermost reducible sub-expression (redex) is chosen to

achieve this

  • Sharing is introduced in order to prevent evaluating expressions multiple

times If no redexes are left, an expression is in normal form. If the top-level of an expression is a constructor or lambda, then it is in weak head normal form.

29

slide-32
SLIDE 32

Recall: forcing evaluation

Haskell has the following primitive function: seq :: a -> b -> b A call of the form seq x y

  • First evaluates x up to WHNF
  • Then proceeds normally to compute y

We can use this to define strict function application: ($!) :: (a -> b) -> a -> b f $! x = x `seq` f x

30

slide-33
SLIDE 33

Deterministic Parallelism in Haskell

Basic parallelism:

  • Mark parts of the program that we consider suitable for parallel

execution

  • Let the runtime system decide the details!
  • We can even evaluate speculatively: we might not know for certain that

the result is required, but in a pure language this is harmless

31

slide-34
SLIDE 34

The Eval monad

Control.Parallel.Strategies runEval :: Eval a -> a rseq :: a -> Eval a rpar :: a -> Eval a

  • rseq marks its argument as a computation that should be evaluated (to

WHNF) before continuing (equivalent to seq)

  • rpar marks its argument as a computation that might be beneficial to

be evaluated in parallel

32

slide-35
SLIDE 35

Example

runEval $ do a <- rpar (f x) b <- rseq (f y) rseq a return (a, b)

33

slide-36
SLIDE 36

Sparks

What is rpar doing?

  • The runtime system manages multiple capabilities, usually one per CPU

core

  • An expression marked for parallel execution in rpar creates a spark in

the spark pool, a data structure storing references to sparks on the heap

  • If a capability is idle, it can steal a spark from the pool
  • Spark evaluation is speculative

34

slide-37
SLIDE 37

Strategies

Annotating expressions using the Eval monad feels very low-level, so let’s abstract a little: type Strategy a = a -> Eval a Observation! rpar :: Strategy a rseq :: Strategy a return :: Strategy a

35

slide-38
SLIDE 38

Strategies

Annotating expressions using the Eval monad feels very low-level, so let’s abstract a little: type Strategy a = a -> Eval a Observation! rpar :: Strategy a rseq :: Strategy a return :: Strategy a

35

slide-39
SLIDE 39

Applying strategies

using :: a -> Strategy a -> a x `using` s = runEval (s x) The using function takes a value of type a and a Strategy for evaluating things of type a, and applies the strategy to the value. This allows us to clearly distinguish what the program does, from the code which adds parallelism. We can define a sequential algorithm, and then experiment with the best evaluation strategy.

36

slide-40
SLIDE 40

More strategies

We can define parameterised strategies: evalList :: Strategy a -> Strategy [a] evalList s [] = return [] evalList s (x:xs) = do x' <- s x xs' <- evalList s xs return (x' : xs')

37

slide-41
SLIDE 41

Common patterns

…and use these to succinctly define a parallel map: parMap :: (a -> b) -> [a] -> [b] parMap f xs = map f xs `using` evalList rpar We can easily define similar strategies for other data types!

38

slide-42
SLIDE 42

Common patterns

…and use these to succinctly define a parallel map: parMap :: (a -> b) -> [a] -> [b] parMap f xs = map f xs `using` evalList rpar We can easily define similar strategies for other data types!

38

slide-43
SLIDE 43

Other libraries: repa

Data.Array.Repa The repa library provides operations on dense, multi-dimensional arrays, which are automatically executed in parallel. data Array r sh e

  • e is the type of the array elements; Int, Float, etc.
  • sh describes the shape of the array, as a snoc-list at both the type and

value level; one-dimensional index value Z :. 3 has type Z :. Int

  • Z represents a zero-dimensional array
  • (:.) adds an inner-most dimension
  • r is a representation tag which determines what structure holds the data

39

slide-44
SLIDE 44

Other libraries: repa

Data.Array.Repa Mapping a function over an array produces a delayed array: the result is not computed immediately, rather it will be fused into the operation which consumes those values. Repa.map :: (Shape sh, Source r a) => (a -> b)

  • > Array r sh a
  • > Array D sh b
  • - tag 'D' represents delayed arrays

40

slide-45
SLIDE 45

Other libraries: accelerate

Data.Array.Accelerate Similar to repa, provides operations on dense, multi-dimensional arrays, but implemented as a (deeply) embedded language. Accelerate.map :: (Shape sh, Elt a, Elt b) => (Exp a -> Exp b)

  • - Exp: embedded scalar term
  • > Acc (Array sh a)
  • - Acc: embedded parallel term
  • > Acc (Array sh b)

41

slide-46
SLIDE 46

Other libraries: accelerate

Data.Array.Accelerate Internally, uses many of the ideas described in this course

  • Laziness
  • Monad transformers
  • Type and data families, reified type representations
  • GADTs, polymorphic recursion
  • Example: (internal) variables represented by typed de Bruijn indices:

data Idx env t where ZeroIdx :: Idx (env, t) t SuccIdx :: Idx env t -> Idx (env, s) t

42

slide-47
SLIDE 47

Summary

There are many different approaches to parallel and concurrent programming that Haskell supports. Each may be suitable for a different purpose.

  • Parallel and concurrent programming is hard!
  • Concurrency is difficult to get right; parallelism is difficult to get fast
  • Purity is great, but laziness is a mixed blessing

43