Concurrent and Multicore Haskell Friday, May 9, 2008 1 These - - PDF document

concurrent and multicore haskell
SMART_READER_LITE
LIVE PREVIEW

Concurrent and Multicore Haskell Friday, May 9, 2008 1 These - - PDF document

Concurrent and Multicore Haskell Friday, May 9, 2008 1 These slides are licensed under the terms of the Creative Commons Attribution-Share Alike 3.0 United States License. Concurrent Haskell For responsive programs that multitask Plain


slide-1
SLIDE 1

Concurrent and Multicore Haskell

1 Friday, May 9, 2008

These slides are licensed under the terms of the Creative Commons Attribution-Share Alike 3.0 United States License.

slide-2
SLIDE 2

Concurrent Haskell

  • For responsive programs that multitask
  • Plain old threads, with a few twists
  • Popular programming model

2 Friday, May 9, 2008

slide-3
SLIDE 3

A simple example

backgroundWrite path contents = done <- newEmptyMVar forkIO $ do writeFile path contents putMVar done () return done

3 Friday, May 9, 2008

In spite of the possibly unfamiliar notational style, this is quite normal imperative code. Here it is in pseudo-Python: def backgroundWrite(path, contents): done = newEmptyMVar() def mythread(): writeFile(path, contents) putMVar(done, ()) forkIO(mythread) return done

slide-4
SLIDE 4

Imperative code!?

  • Threads, assignment, “return”... huh?
  • Haskell is a multi-paradigm language
  • Pure by default
  • Imperative when you need it

4 Friday, May 9, 2008

slide-5
SLIDE 5

What’s an MVar?

  • An atomic variable
  • Either empty or full
  • takeMVar blocks if empty
  • putMVar blocks if full
  • Nice building block for mutual exclusion

5 Friday, May 9, 2008

See Control.Concurrent.MVar for the type.

slide-6
SLIDE 6

Coding with MVars

  • Higher-order programming
  • modifyMVar: atomic modification
  • Safe critical sections
  • Combine MVars into a list
  • FIFO message channels

6 Friday, May 9, 2008

The modifyMVar function extracts a value from an MVar, passes it to a block of code that modifies it (or completely replaces it), then puts the modified value back in. If you like, you can use MVars to construct more traditional-looking synchronisation primitives like mutexes and semaphores. I don’t think anyone does this in practice.

slide-7
SLIDE 7

FIFO channels (Chan)

  • Writer does not block
  • Reader blocks if channel is empty
  • Duplicate a channel
  • Broadcast to multiple threads

7 Friday, May 9, 2008

See Control.Concurrent.Chan for the type. A Chan is just a linked list of MVars.

slide-8
SLIDE 8

Smokin’ performance

From the “Computer Language Benchmark Game”

  • Create 503 threads
  • Circulate token in a ring
  • Iterate 10 million times

Language Seconds

GHC 6.70 Erlang 7.49 Scala 53.35 C / NPTL 56.74 Ruby 1890.92

8 Friday, May 9, 2008

slide-9
SLIDE 9

Runtime

  • GHC threads are incredibly cheap
  • Run millions at a time
  • File and network APIs are blocking
  • Simple mental model
  • Async I/O underneath

9 Friday, May 9, 2008

slide-10
SLIDE 10

Time for a change

  • That didn’t rewire my brain at all!
  • Where’s the crazy stuff?

10 Friday, May 9, 2008

slide-11
SLIDE 11

Purity and parallelism

11 Friday, May 9, 2008

slide-12
SLIDE 12

Concurrent vs parallel

  • Concurrency
  • Do many unrelated things “at once”
  • Goals are responsiveness and multitasking
  • Parallelism
  • Get a faster answer with multiple CPUs

12 Friday, May 9, 2008

slide-13
SLIDE 13

Pure laziness

  • Haskell is not just functional (aka pure)
  • It’s non-strict: work is deferred until needed
  • Implemented via lazy evaluation
  • Can laziness and parallelism mix?

13 Friday, May 9, 2008

If we’re deferring all of our work until the last possible moment, how can we specify that any

  • f this evaluation should occur in parallel?
slide-14
SLIDE 14

Laziness is the default

  • What if something must happen right now?
  • Use a special combinator
  • seq – adds strictness
  • Evaluates its 1st argument, returns its 2nd

14 Friday, May 9, 2008

slide-15
SLIDE 15

A simple use of seq

daxpy k xs ys = zipWith f xs ys where f x y = k * x + y daxpy’ k xs ys = zipWith f xs ys where f x y = let a = k * x + y in a `seq` a

15 Friday, May 9, 2008

The daxpy routine is taken from the venerable Linpack suite of linear algebra routines. Jack Dongarra wrote the Fortran version of this function in 1978. Needless to say, it’s a bit longer. The routine scales one vector by a constant, and adds it to a second. In this case, we’re using lists to represent the vectors (purely for convenience). The first version of the function returns a list of thunks. A thunk is an unevaluated expression, and for simple numeric computations it’s fairly expensive and pointless: each element of the list contains an unevaluated “k * x + y” for some x and y. The second version returns a list of fully evaluated numbers.

slide-16
SLIDE 16

par

  • “Sparks” its first argument
  • Sparked evaluation occurs in parallel
  • Returns its second

16 Friday, May 9, 2008

The par combinator does not promise to evaluate its first argument in parallel, but in practice this is what occurs. Why not bake this behaviour into its contract? Because that would remove freedom from the

  • implementor. A compiler or runtime might notice that in fact a particular use of par would be

better represented as seq.

slide-17
SLIDE 17

Our favourite whipping boy

pfib n | n <= 1 = 1 pfib n = a `par` (b `pseq` (a + b + 1)) where a = pfib (n-1) b = pfib (n-2)

17 Friday, May 9, 2008

The pseq combinator behaves almost identically to seq.

slide-18
SLIDE 18

Parallel strategies

  • par might be cute, but it’s fiddly
  • Manual annotations are a pain
  • Time for a Haskell hacker’s favourite hobby:
  • Abstraction!

18 Friday, May 9, 2008

slide-19
SLIDE 19

Algorithm + evaluation

  • What’s a strategy?
  • How to evaluate an expression
  • Result is in a normal form

19 Friday, May 9, 2008

slide-20
SLIDE 20

Head normal form

  • “What is my value?”
  • Completely evaluates an expression
  • Similar to traditional languages

20 Friday, May 9, 2008

slide-21
SLIDE 21

Weak head normal form

  • “What is my constructor?”

data Maybe a = Nothing | Just a

  • Does not give us a complete value
  • Only what constructor it was built with

21 Friday, May 9, 2008

The elements that I’ve marked in green are the constructors (properly, the “value constructors”) for the Maybe type. When we evaluate a Maybe expression to WHNF, we can tell that it was constructed using Nothing or Just. If it was constructed with Just, the value inside is not necessarily in a normal form: WHNF only reduces (“evaluates”) until the outermost constructor is known.

slide-22
SLIDE 22

Combining strategies

  • A strategy is a normal Haskell function
  • Want to apply some strategy in parallel

across an entire list?

parList strat [] = () parList strat (x:xs) = strat x `par` parList strat xs

22 Friday, May 9, 2008

We process the spine of the list in parallel, and use the strat parameter to determine how we’ll evaluate each element in the list.

slide-23
SLIDE 23

Strategies at work

  • Map a function over a list in parallel
  • Pluggable evaluation strategy per element

using x strat = strat x `seq` x parMap strat f xs = map f xs `using` parList strat

23 Friday, May 9, 2008

Notice the separation in the body of parMap: we have normal Haskell code on the left of the using combinator, and the evaluation strategy for it on the right. The code on the left knows nothing about parallelism, par, or seq. Meanwhile, the evaluation strategy is pluggable: we can provide whatever one suits our current needs, even at runtime.

slide-24
SLIDE 24

True or false?

  • Inherent parallelism will save us!
  • Functional programs have oodles!
  • All we need to do is exploit it!

24 Friday, May 9, 2008

slide-25
SLIDE 25

Limit studies

  • Gives a maximum theoretical benefit
  • Model a resource, predict effect of changing it
  • Years of use in CPU & compiler design
  • Early days for functional languages

25 Friday, May 9, 2008

slide-26
SLIDE 26

So ... true or false?

  • Is there lots of “free” parallelism?
  • Very doubtful
  • Why? A familiar plague
  • Data dependencies
  • Code not written to be parallel isn’t

26 Friday, May 9, 2008

Two useful early-but-also-recent papers: “Feedback directed implicit parallelism”, by Harris and Singh “Limits to implicit parallelism in functional application”, by DeTreville

slide-27
SLIDE 27

Current research

  • Feedback-directed implicit parallelism
  • Automated par annotations
  • Tuned via profiled execution
  • Results to date are fair
  • Up to 2x speedups in some cases

27 Friday, May 9, 2008

This is the work described in the Harris and Singh paper.

slide-28
SLIDE 28

Parallelism is hard

  • Embarrassingly parallel: not so bad
  • Hadoop, image convolution
  • Regular, but squirrelly: pretty tough
  • Marching cube isosurface interpolation, FFT
  • Irregular or nested: really nasty
  • FEM crack propagation, coupled climate models

28 Friday, May 9, 2008

slide-29
SLIDE 29

Current state of the art

  • Most parallelism added by hand
  • Manual coordination & data layout
  • MPI is akin to assembly language
  • Difficult to use, even harder to tune
  • Irregular data is especially problematic

29 Friday, May 9, 2008

slide-30
SLIDE 30

Nested data parallelism

  • Parallel functions invoke other parallel code
  • One SIMD “thread of control”
  • Friendly programming model

30 Friday, May 9, 2008

This project is known as “Data Parallel Haskell”, but is sometimes acronymised as “NDP” (Nested Data Parallelism) or “NPH” (Nested Parallel Haskell). Confusing, eh?

slide-31
SLIDE 31

NPH automation

  • Compiler transforms code and data
  • Irregular, nested data becomes flat, regular
  • Complexity hidden from the programmer

31 Friday, May 9, 2008

slide-32
SLIDE 32

Current status

  • Work in progress
  • Exciting work, lots of potential
  • Attack both performance and usability
  • Haskell’s purity is a critical factor

32 Friday, May 9, 2008

slide-33
SLIDE 33

Fixing threaded programming

33 Friday, May 9, 2008

slide-34
SLIDE 34

Concurrency is hard

  • Race conditions
  • Data corruption
  • Deadlock

34 Friday, May 9, 2008

slide-35
SLIDE 35

Transactional memory

  • Fairly new as a practical programming tool
  • Implemented for several languages
  • Typically comes with weird quirks
  • Haskell’s implementation is beautiful

35 Friday, May 9, 2008

slide-36
SLIDE 36

Atomic execution

  • Either an entire block succeeds, or it all fails
  • Failed transactions retry automatically
  • Type system forbids non-atomic actions
  • No file or network access

36 Friday, May 9, 2008

slide-37
SLIDE 37

How does retry occur?

  • When to wake a thread and retry a

transaction?

  • No programmer input needed
  • Runtime tracks variables read by a failed

transaction, retries automatically

37 Friday, May 9, 2008

slide-38
SLIDE 38

Composability

  • All transactions are flat
  • Calling transactional code from the current

transaction is normal

  • This simply extends the current transaction

38 Friday, May 9, 2008

slide-39
SLIDE 39

Early abort

  • The retry action manually aborts a

transaction early

  • It will still automatically retry
  • Handy if we know the transaction must fail

39 Friday, May 9, 2008

slide-40
SLIDE 40

Choosing an alternative

  • The orElse action combines two

transactions

  • If the first succeeds, both succeed
  • Otherwise, it tries the second
  • If the second succeeds, both succeed
  • If both fail, the first will be retried

40 Friday, May 9, 2008

slide-41
SLIDE 41

STM and IPC

  • TVar – simple shared variable
  • TMVar – atomic variable (like an MVar)
  • TChan – FIFO channel
  • If the enclosing transaction retries...

...then so does any modification

41 Friday, May 9, 2008

slide-42
SLIDE 42

A useful analogy

  • Concurrency
  • Mutexes, semaphores, condition variables
  • Software transactional memory
  • Memory management
  • malloc, free, manual refcounting
  • Garbage collection

42 Friday, May 9, 2008

The analogy between garbage collection and STM is, as far as I know, due to Dan Grossman. He was at least the first to publish it in academic circles.

slide-43
SLIDE 43

Manual / auto tradeoffs

  • Memory management
  • Performance, footprint
  • Safety against memory leaks, corruption
  • Concurrency
  • Fine tuning for high contention
  • Safety against deadlocks, corruption

43 Friday, May 9, 2008

slide-44
SLIDE 44

Brief recap

  • Concurrency
  • Fast, cheap threads
  • Blocking I/O and STM are friendly to your brain
  • Multicore parallelism
  • Explicit control or a strategic approach
  • NPH offers an exciting future

44 Friday, May 9, 2008