My three main points 1.Parallel programming and functional - PowerPoint PPT Presentation

D ATA P ARALLELISM IN H ASKELL Manuel M. T. Chakravarty University of New South Wales I NCLUDES JOINT WORK WITH Gabriele Keller Sean Lee Roman Leshchinskiy Simon Peyton Jones Thursday, 11 June 2009

My three main points 1.Parallel programming and functional programming are intimately connected 2.Data parallelism is cheaper than control parallelism 3.Two approaches to data parallelism in Haskell Thursday, 11 June 2009

Parallel Functional What is hard about parallel programming? Why is it easier in a functional language? Thursday, 11 June 2009

What is Hard About Parallelism? Thursday, 11 June 2009

What is Hard About Parallelism? Indeterminate execution order! Other difficulties are arguably a consequence (race conditions, mutual exclusion, and so on) Thursday, 11 June 2009

Why Use a Functional Language? Thursday, 11 June 2009

Why Use a Functional Language? De-emphasises attention to execution order ‣ Purity and persistance ‣ Focus on data dependencies Encourages the use of collective operations ‣ Wholemeal programming is better for you! Thursday, 11 June 2009

Why Use a Functional Language? De-emphasises attention to execution order ‣ Purity and persistance ‣ Focus on data dependencies Encourages the use of collective operations ‣ Wholemeal programming is better for parallelism! Thursday, 11 June 2009

Haskell? Thursday, 11 June 2009

Haskell? Laziness prevented bad habits Haskell programmers are not spoiled by the luxury of predictable execution order — a luxury that we can no longer afford in the presence of parallelism. Haskell programming culture and implementations avoid relying on a specific execution order Thursday, 11 June 2009

Haskell? Laziness prevented bad habits Haskell programmers are not spoiled by the luxury Haskell is ready of predictable execution order — a luxury that we can no longer afford in the presence of parallelism. for parallelism! Haskell programming culture and implementations avoid relying on a specific execution order Thursday, 11 June 2009

Why should we care about data parallelism? Thursday, 11 June 2009

Data parallelism is successful in the large On servers farms: CGI rendering, MapReduce, ... Fortran and OpenMP for high-performance computing Thursday, 11 June 2009

Data parallelism is successful in the large On servers farms: CGI rendering, MapReduce, ... Fortran and OpenMP for high-performance computing Data parallelism becomes increasingly important in the small! Thursday, 11 June 2009

[Image courtesy of NVIDIA] Quadcore Tesla T10 Xeon CPU GPU O UR D ATA P ARALLEL F UTURE Two competing extremes in current processor design Thursday, 11 June 2009

[Image courtesy of NVIDIA] Quadcore Tesla T10 Xeon CPU GPU Why? O UR D ATA P ARALLEL F UTURE Two competing extremes in current processor design Thursday, 11 June 2009

Reduce power consumption! ✴ GPU achieves 20x better performance/Watt (judging by peak performance) ✴ Speedups between 20x to 150x have been observed in real applications Thursday, 11 June 2009

We need data parallelism GPU-like architectures require data parallelism 4 core CPU versus 240 core GPU are the current extreme Intel Larrabee (in 2010): 32 cores x 16 vector units Increasing core counts in CPUs and GPUs Thursday, 11 June 2009

We need data parallelism GPU-like architectures require data parallelism 4 core CPU versus 240 core GPU are the current extreme Intel Larrabee (in 2010): 32 cores x 16 vector units Increasing core counts in CPUs and GPUs Data parallelism is good news for functional programming! Thursday, 11 June 2009

Data parallelism and functional programming CUDA Kernel Invocation seq_kernel<<N, M>>(arg1, ..., argn); Thursday, 11 June 2009

Data parallelism and functional programming CUDA Kernel Invocation seq_kernel<<N, M>>(arg1, ..., argn); FORTRAN 95 FORALL (i=1:n) A(i,i) = pure_function(b,i) END FORALL Thursday, 11 June 2009

Data parallelism and functional programming CUDA Kernel Invocation seq_kernel<<N, M>>(arg1, ..., argn); FORTRAN 95 FORALL (i=1:n) A(i,i) = pure_function(b,i) END FORALL Parallel map is essential; reductions are common Parallel code must be pure Thursday, 11 June 2009

T WO A PPROACHES TO D ATA P ARALLEL P ROGRAMMING IN H ASKELL Thursday, 11 June 2009

Two forms of data parallelism flat, regular nested, irregular Thursday, 11 June 2009

Two forms of data parallelism flat, regular nested, irregular covers sparse structures and limited expressiveness even divide&conquer Thursday, 11 June 2009

Two forms of data parallelism flat, regular nested, irregular covers sparse structures and limited expressiveness even divide&conquer needs to be turned into flat close to the hardware model parallelism for execution Thursday, 11 June 2009

Two forms of data parallelism flat, regular nested, irregular covers sparse structures and limited expressiveness even divide&conquer needs to be turned into flat close to the hardware model parallelism for execution well understood compilation highly experimental program techniques transformations Thursday, 11 June 2009

Flat data parallelism in Haskell Embedded language of array computations (two- level language) Datatype of multi-dimensional arrays [Gabi's talk] Array elements limited to tuples of scalars ( Int , Float , Bool , etc) Collective array operations: map, fold, scan, zip, permute, etc. Thursday, 11 June 2009

Scalar Alpha X Plus Y (SAXPY) type Vector = Array DIM1 Float saxpy :: GPU.Exp Float -> Vector -> Vector -> Vector saxpy alpha xs ys = GPU.run $ do xs' <- use xs ys' <- use ys GPU.zipWith (\x y -> alpha*x + y) xs' ys' Thursday, 11 June 2009

Scalar Alpha X Plus Y (SAXPY) type Vector = Array DIM1 Float saxpy :: GPU.Exp Float -> Vector -> Vector -> Vector saxpy alpha xs ys = GPU.run $ do xs' <- use xs ys' <- use ys GPU.zipWith (\x y -> alpha*x + y) xs' ys' GPU.Exp e — expression evaluated on the GPU Monadic code to make sharing explicit GPU.run — compile & execute embedded code Thursday, 11 June 2009

Limitations of the embedded language First-order, except for a fixed set of higher-order collective operations No recursion No nesting — code is not compositional No arrays of structured data Thursday, 11 June 2009

SAXPY 100000 10000 Time (milliseconds) 1000 100 10 1 10 30 50 70 90 110 130 150 170 190 Number of elements (million) Plain Haskell, CPU only (AMD Sempron) Plain Haskell, CPU only (Intel Xeon) Haskell with GPU.gen (GeForce 8800GTS) Haskell with GPU.gen (Tesla S1070 x1) Prototype implementation targeting GPUs Runtime code generation (computation only) Thursday, 11 June 2009

Sparse Matrix Vector Multiplication 1000 100 Time (milliseconds) 10 1 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of non-zero elements (million) Plain Haskell, CPU only (AMD Sempron) Plain Haskell, CPU only (Intel Xeon) Haskell with GPU.gen (GeForce 8800GTS) Haskell with GPU.gen (Tesla S1070 x1) Prototype implementation targeting GPUs Runtime code generation (computation only) Thursday, 11 June 2009

Black Scholes Call Options 1000000 100000 Time (milliseconds) 10000 1000 100 10 1 10 30 50 70 90 110 130 150 170 190 Number of options (million) Plain Haskell, CPU only (AMD Sempron) Plain Haskell, CPU only (Intel Xeon) Haskell with GPU.gen (GeForce 8800GTS) Haskell with GPU.gen (Tesla S1070 x1) C for CUDA (Tesla S1070 x1) Prototype implementation targeting GPUs Runtime code generation (computation only) Thursday, 11 June 2009

Nested data parallelism in Haskell Language extension (fully integrated) Data type of nested parallel arrays [:e:] — here, e can be any type Parallel evaluation semantics Array comprehensions & collective operations ( mapP , scanP , etc. ) Forthcoming: multidimensional arrays [Gabi's talk] Thursday, 11 June 2009

Parallel Quicksort qsort :: Ord a => [:a:] -> [:a:] qsort [::] = [::] qsort xs = let p = xs!:0 smaller = [:x | x <- xs, x < p:] equal = [:x | x <- xs, x == p:] bigger = [:x | x <- xs, x > p:] qs = [:qsort xs‘ | xs‘ <- [:smaller, bigger:]:] in qs!:0 +:+ equal +:+ qs!:1 Thursday, 11 June 2009

Parallel Quicksort qsort :: Ord a => [:a:] -> [:a:] qsort [::] = [::] qsort xs = let p = xs!:0 smaller = [:x | x <- xs, x < p:] equal = [:x | x <- xs, x == p:] bigger = [:x | x <- xs, x > p:] qs = [:qsort xs‘ | xs‘ <- [:smaller, bigger:]:] in qs!:0 +:+ equal +:+ qs!:1 [: e | x <- xs:] — array comprehension (!:), (+:+) — array indexing and append collective array operations are parallel Thursday, 11 June 2009

qsort Thursday, 11 June 2009

qsort qsort qsort Thursday, 11 June 2009

qsort qsort qsort qsort qsort qsort qsort Thursday, 11 June 2009

qsort qsort qsort qsort qsort qsort qsort qs q q qs qsort qsort q qs Thursday, 11 June 2009

qsort qsort qsort qsort qsort qsort qsort qs q q qs qsort qsort q qs q q qs qs q q q Thursday, 11 June 2009

My three main points 1.Parallel programming and functional - PowerPoint PPT Presentation

D ATA P ARALLELISM IN H ASKELL Manuel M. T. Chakravarty University of New South Wales I NCLUDES JOINT WORK WITH Gabriele Keller Sean Lee Roman Leshchinskiy Simon Peyton Jones Thursday, 11 June 2009 My three main points 1.Parallel programming

The projective line minus three fractional 3 kinds of integral points points Darmons M

CMPS 112, Spring 2019 Midterm (Solutions) Section Points Score Reductions 10 points Lists

Main Points to be Covered Today Main Points to be Covered Today Brief Review and Where We Are

Points to ponder while we wait for everyone to log on Points to ponder while we wait for

September 27, 2013 New MAP-based Performance Policy Category Points Points Weighted Points

CMSC427 Rendering polylines Points, polylines and polygons Points Polyline Polygon Polyline can

MACBETH Revision Day Slides ACT 1 SCENE 1 Note the significance of the number three: three

gap The three main types The names of the three main types of bridge are: BEAM ARCH

Analysts and Investors Day June 2017 Main Street Capital Corporation NYSE: MAIN

Investor Presentation Second Quarter 2015 Main Street Capital Corporation NYSE: MAIN

Investor Presentation First Quarter 2017 Main Street Capital Corporation NYSE: MAIN

Investor Presentation Third Quarter 2016 Main Street Capital Corporation NYSE: MAIN

June 2011 Main One Corporate Structure Main One Cable Company Portugal 100% MOCCM Main One

Investor Presentation Second Quarter 2020 Main Street Capital Corporation NYSE: MAIN

Investor Presentation First Quarter 2019 Main Street Capital Corporation NYSE: MAIN

Investor Presentation Third Quarter 2019 Main Street Capital Corporation NYSE: MAIN

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

A changing investors / A changing investors / intermediary relationship intermediary relationship

UNEQUAL PARTNERS: THE DETERMINANTS AND CONSEQUENCES OF INTRA- HOUSEHOLD INEQUALITY IN SOUTH

Non-neoclassical economics from late 19 th century to 1930s Historical school of economics

Plug-In Folly Part 5 by Pat Murphy, Plan Curtail Part 5A: Conclusion Plug-In Vehicles the

Expenditure weights in the HICP: Selected aspects from a users perspective (final version, 1

Demand-Driven Labor Market Polarization Diego Comin (Dartmouth) joint work with Ana Danieli

THE ORIGINS OF INEQUALITY: Insiders, Outsiders, Elites, and Commoners Gregory K. Dow