Controlling Loops in Parallel Mercury Code Paul Bone, Zoltan Somogyi - PowerPoint PPT Presentation

Controlling Loops in Parallel Mercury Code Controlling Loops in Parallel Mercury Code Paul Bone, Zoltan Somogyi and Peter Schachte National ICT Australia The University of Melbourne Declarative Aspects of Multicore Programming January 28, 2012 Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 1 / 15

Controlling Loops in Parallel Mercury Code Introduction About Mercury Mercury is a pure logic/functional language designed to support the creation of large, reliable, efficient programs. It has a syntax similar to Prolog’s, however the operational semantics are very different. It is strongly typed using a Hindley Milner type system. It also has mode and determinism systems. :- pred map(pred(T, U), list(T), list(U)). :- mode map(pred(in, out) is det, in, out) is det. map(_, [], []). map(P, [X | Xs], [Y | Ys]) :- P(X, Y), map(P, Xs, Ys). Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 2 / 15

Controlling Loops in Parallel Mercury Code Introduction Parallelism in Mercury Introducing parallelism in Mercury can be done simply by replacing a comma with &, the parallel conjunction operator: map(P, [X | Xs], [Y | Ys]) :- P(X, Y) & map(P, Xs, Ys). Parallel computations are handled by: Engines Correspond to PThreads. One engine is created for each core on a multicore system. Each engine has a set of abstract machine registers. Contexts Represent computations in progress. They are executed by engines. Although lighter than PThreads, contexts are still somewhat heavy: each one contains two stacks. Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 3 / 15

Controlling Loops in Parallel Mercury Code Loop control Dependent right-recursive parallel code Programmers are encouraged to write tail recursive code. In Mercury, this means that the last call in a clause is often a recursive call. Mercury allows dependent AND-parallelism. Variables such as Acc1 are shared between the parallel conjuncts. Their synchronization is handled automatically. map foldl(M, F, [X | Xs], Acc0, Acc) :- ( M(X, Y), F(Y, Acc0, Acc1 ) ) & map foldl(M, F, Xs, Acc1, Acc). Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 4 / 15

Controlling Loops in Parallel Mercury Code Loop control The general parallel conjunction transformation A parallel conjunction G 1 & G 2 & G 3 is executed by spawning off G 2 & G 3 and then executing G 1 immediately in the current context. This mixed-level pseudo-code shows the operations that implement this. case label: SyncTerm st; init sync term(&st); spawn off(spawn off label, &st); M(X, Y); F(Y, Acc0, Acc1); join and continue(resume label, &st); spawn off label: map foldl(M, F, Xs, Acc1, Acc); join and terminate(&st); resume label: return; Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 5 / 15

Controlling Loops in Parallel Mercury Code Loop control Execution of dependent right-recursive parallel code The original context has to stay around until the recursive call finishes, so it can resume. Parallelizing such a loop in this way will cause it to use a number of contexts linear in the depth of the recursion. If each context contains 4 megabytes of stack space, a loop only has to iterate 256 times to consume a gigabyte of memory! Number of Contexts Time Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 6 / 15

Controlling Loops in Parallel Mercury Code Loop control Loop control structure Our solution of this problem associates a loop control structure with each loop. This structure contains a fixed number of slots, each of which has a pointer to a single context. Once a context is allocated to a slot, the context is not released until the loop has finished. Instead, it is reused for later iterations. We replace the original looping procedure with code that creates the loop control structure, before calling a renamed and transformed version of its old self. map foldl(M, F, Xs, Acc0, Acc) :- create loop control(LC), map foldl lc(LC, M, F, Xs, Acc0, Acc). Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 7 / 15

Controlling Loops in Parallel Mercury Code Loop control Loop control transformation map foldl lc(LC, M, F, [X | Xs], Acc0, Acc) :- LCS = lc wait for free slot(LC), lc spawn off(LC, LCS, spawn off label), map foldl lc(LC, M, F, Xs, Acc1, Acc). % Tail call spawn off label: M(X, Y); F(Y, Acc0, Acc1); lc free slot(LC, LCS); map foldl lc(LC, , , [], Acc, Acc). lc finish(LC). Only as many iterations of the loop can be active as there are slots in the loop control structure. Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 8 / 15

Controlling Loops in Parallel Mercury Code Loop control Execution of loop controlled code The first time each slot is used, we create a context for that slot. After the initial rampup period, the loop always uses the configured number of contexts, never more. After the loop terminates, we free the contexts. Number of Contexts Time Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 9 / 15

Controlling Loops in Parallel Mercury Code Conclusion Memory usage results: contexts and megabytes mandelbrot raytracer spectral seq 1 0.62 1 0.62 1 0.62 par, no & 1 0.62 1 0.62 1 0.62 par, &, 1c, nolc, c128 1 0.62 1 0.62 1 1.12 par, &, 1c, nolc, c512 1 0.62 1 0.62 1 1.12 par, &, 1c, lc1 2 1.25 2 1.25 2 1.75 par, &, 1c, lc2 3 1.88 3 1.88 3 2.38 par, &, 1c, lc4 5 3.12 5 3.12 5 3.62 par, &, 2c, nolc, c128 257 160.62 257 160.62 257 161.12 par, &, 2c, nolc, c512 601 375.62 1025 640.62 1025 641.12 par, &, 2c, lc1 4 2.50 4 2.50 3 2.38 par, &, 2c, lc2 6 3.75 6 3.75 5 3.62 par, &, 2c, lc4 10 6.25 10 6.25 9 6.12 par, &, 4c, nolc, c128 513 320.62 513 320.62 513 321.12 par, &, 4c, nolc, c512 601 375.62 1201 750.62 2049 1281.12 par, &, 4c, lc1 6 3.75 6 3.75 5 3.62 par, &, 4c, lc2 10 6.25 10 6.25 9 6.12 par, &, 4c, lc4 18 11.25 18 11.25 17 11.12 Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 10 / 15

Controlling Loops in Parallel Mercury Code Conclusion Time results: seconds and speedups mandelbrot raytracer spectral seq 19.37 (0.97) 19.50 (1.21) 16.07 (1.19) par, no & 18.75 (1.00) 23.55 (1.00) 19.07 (1.00) 1c, nolc, c128 18.74 (1.00) 23.46 (1.00) 19.30 (0.99) 1c, nolc, c512 18.74 (1.00) 23.43 (1.00) 19.30 (0.99) 1c, lc2 18.74 (1.00) 23.54 (1.00) 19.30 (0.99) 1c, lc2, tr 18.74 (1.00) 23.79 (0.99) n/a 2c, nolc, c128 17.82 (1.05) 25.68 (0.92) 19.25 (0.99) 2c, nolc, c512 9.60 (1.95) 20.34 (1.16) 18.54 (1.03) 2c, lc2 9.69 (1.94) 14.14 (1.67) 9.96 (1.91) 2c, lc2, tr 9.78 (1.92) 14.04 (1.68) n/a 4c, nolc, c128 8.35 (2.25) 26.93 (0.87) 18.91 (1.01) 4c, nolc, c512 4.84 (3.88) 14.12 (1.67) 16.83 (1.13) 4c, lc2 4.74 (3.96) 9.35 (2.52) 4.98 (3.83) 4c, lc2, tr 4.76 (3.94) 9.41 (2.50) n/a Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 11 / 15

Controlling Loops in Parallel Mercury Code Conclusion Conclusion We have prevented excessive memory usage. We can preserve tail recursion in parallel recursive code. We have also reduced the overheads of parallelism, resulting in greater parallel speedups. Further work We plan to add support for profiling loop-controlled computations with ThreadScope. We also intend to add knowledge of the loop-control cost model to our automatic parallelization system. We would like to introduce new transformations that efficiently control parallelism for other common programming patterns such as divide and conquer. Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 12 / 15

Controlling Loops in Parallel Mercury Code Spare slides Communication through stack frames The variables used to communicate to and from spawned off computations, excluding shared variables, are stored on the parent’s stack frame. The code that is spawned off accesses these variables through an abstract machine register called the parent stack pointer rather than the normal stack pointer register. This mechanism existed before we introduced loop control. However, this prevents tail recursion since a spawned off computation will need access to this stack frame even after the original context executed the recursive call. Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 13 / 15

Controlling Loops in Parallel Mercury Code Spare slides Getting tail recursion back In tail recursive code, we can create a stack frame on the child context’s stack and copy over any variables it needs. map foldl lc(LC, M, F, [X | Xs], Acc0, Acc) :- LCS = lc wait for free slot(LC), incr child stack ptr(LC, LCS, NumSlots); child stack var(...) = M; child stack var(...) = F; ... lc spawn off(LC, LCS, spawn off label), map foldl lc(LC, M, F, Xs, Acc1, Acc). % Tail call In tail recursive code, we never need to manage communication from the spawned off code to the parent code. This is because there is no code after the recursive call, and therefore no variable can be consumed after the parallel conjunction. Paul Bone et al (NICTA) Controlling Loops in Parallel Mercury Code January 28, 2012 14 / 15

Controlling Loops in Parallel Mercury Code Paul Bone, Zoltan Somogyi - PowerPoint PPT Presentation

Controlling Loops in Parallel Mercury Code Controlling Loops in Parallel Mercury Code Paul Bone, Zoltan Somogyi and Peter Schachte National ICT Australia The University of Melbourne Declarative Aspects of Multicore Programming January 28, 2012

LOOPS Loops Loops Loops! How can we repeat a piece of code without having to write it out over

Automatic Parallelism for Mercury Paul Bone The University of Melbourne National ICT Australia

Mercury in Women of Childbearing Age in 25 Countries: Study Finds Harmful Levels of Mercury in

Stabilisation of Mercury Stabilisation of Mercury Stabilisation of Mercury Stabilisation of

BEPICOLOMBO MERCURY MISSION The main questions about Mercury Why Mercury is so dense?

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &

Loops! Flow of Control: Loops (Savitch, Chapter 4) TOPICS while Loops do while

Reduction Chris M. Piehler Senior Environmental Scientist The Mercury Cycle Blood Mercury

Mercury in Fish What is Mercury? Mercury (Hg), a toxic element, is a famous heavy metal found in

What is Mercury? Naturally occurring element (atomic number 80) Heavy metal, can be toxic

MERCURY IN OUR RIVERS AND IN OUR BELLIES Chandra Brown Mercury Mercury is a neurotoxin that

Loops! Loops! Loops! Lecture 10 COP 3014 Spring 2017 January 31, 2017 Repetition Statements

Loops! Loops! Loops! Lecture 5 COP 3014 Fall 2020 September 17, 2020 Repetition Statements

Automatic Parallelisation for Mercury Paul Bone pbone@csse.unimelb.edu.au Department of Computer

BONE ANATOMY Daniel D Bikle, MD, PhD Professor of Medicine, UCSF 1 7/11/2019 Origin of Bone

Loops Simone Campanoni simonec@eecs.northwestern.edu Outline Loops Identify loops

How to Establish Loop-Free Multipath Routes in Named Data Networking? NDNcomm 2017 Klaus

The Eikonal Limit and Post-Minkowskian Scattering Talk by P.H. Damgaard at QCD Meets Gravity

Multifractal Volatility: Multifractal Volatility: Theory, Forecasting, and Pricing Theory,

Ho How to Imp Improve e Forec ecasts by y Id Iden entifyi ying g an and Dele letin ing

1 Blocking Example Reducing Conflict Misses by Blocking /* After */ for (jj = 0; jj < N; jj

MaD2: An Ultra-Performance Stream Cipher for Pervasive Data Encryption Jie Li and Jianliang Zheng

Virtual Memory Goals for Today Virtual memory Mechanism Mechanism How does it

Micro-structural analysis & radiation stability studies in undoped and cerium doped

Controlling Loops in Parallel Mercury Code Paul Bone, Zoltan Somogyi - PowerPoint PPT Presentation

Controlling Loops in Parallel Mercury Code Controlling Loops in Parallel Mercury Code Paul Bone, Zoltan Somogyi and Peter Schachte National ICT Australia The University of Melbourne Declarative Aspects of Multicore Programming January 28, 2012

LOOPS Loops Loops Loops! How can we repeat a piece of code without having to write it out over

Automatic Parallelism for Mercury Paul Bone The University of Melbourne National ICT Australia

Mercury in Women of Childbearing Age in 25 Countries: Study Finds Harmful Levels of Mercury in

Stabilisation of Mercury Stabilisation of Mercury Stabilisation of Mercury Stabilisation of

BEPICOLOMBO MERCURY MISSION The main questions about Mercury Why Mercury is so dense?

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &amp;

Loops! Flow of Control: Loops (Savitch, Chapter 4) TOPICS while Loops do while

Reduction Chris M. Piehler Senior Environmental Scientist The Mercury Cycle Blood Mercury

Mercury in Fish What is Mercury? Mercury (Hg), a toxic element, is a famous heavy metal found in

What is Mercury? Naturally occurring element (atomic number 80) Heavy metal, can be toxic

MERCURY IN OUR RIVERS AND IN OUR BELLIES Chandra Brown Mercury Mercury is a neurotoxin that

Loops! Loops! Loops! Lecture 10 COP 3014 Spring 2017 January 31, 2017 Repetition Statements

Loops! Loops! Loops! Lecture 5 COP 3014 Fall 2020 September 17, 2020 Repetition Statements

Automatic Parallelisation for Mercury Paul Bone pbone@csse.unimelb.edu.au Department of Computer

BONE ANATOMY Daniel D Bikle, MD, PhD Professor of Medicine, UCSF 1 7/11/2019 Origin of Bone

Loops Simone Campanoni simonec@eecs.northwestern.edu Outline Loops Identify loops

How to Establish Loop-Free Multipath Routes in Named Data Networking? NDNcomm 2017 Klaus

The Eikonal Limit and Post-Minkowskian Scattering Talk by P.H. Damgaard at QCD Meets Gravity

Multifractal Volatility: Multifractal Volatility: Theory, Forecasting, and Pricing Theory,

Ho How to Imp Improve e Forec ecasts by y Id Iden entifyi ying g an and Dele letin ing

1 Blocking Example Reducing Conflict Misses by Blocking /* After */ for (jj = 0; jj &lt; N; jj

MaD2: An Ultra-Performance Stream Cipher for Pervasive Data Encryption Jie Li and Jianliang Zheng

Virtual Memory Goals for Today Virtual memory Mechanism Mechanism How does it

Micro-structural analysis &amp; radiation stability studies in undoped and cerium doped

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &

1 Blocking Example Reducing Conflict Misses by Blocking /* After */ for (jj = 0; jj < N; jj

Micro-structural analysis & radiation stability studies in undoped and cerium doped