Dynamic generation of parallel computations James Hanlon, Simon J. - PowerPoint PPT Presentation

Dynamic generation of parallel computations James Hanlon, Simon J. Hollis Many-core project June 13, 2011 1

Introduction Background State of the art parallelism General-purpose parallel computers Language features supporting concurrency Parallelism and channel communication Process migration Parallel recursion Concurrent programming Process structures Rapid process spawning Hardware support A real implementation Conclusions 2

Background ◮ Concurrency is not a new area: originally developed as a key abstraction in the design of real-time systems ◮ Conventional thinking in academia and industry has largely ignored the vast amount of work in this area. ◮ Caused largely by preoccupation with frequency scaling, (between ∼ 1970-2005). ◮ Parallelism will be the primary means of increasing computational performance. ◮ But we don’t know how to effectively architect or program parallel computers. 3

State of the art parallelism ◮ Parallelism now pervasive in systems design ◮ HPC systems becoming increasingly important in science and industry. ◮ Dual/quad core processors standard in desk and laptop computers. ◮ Embedded systems using network-on-chip designs. 4

State of the art parallelism ◮ Parallelism now pervasive in systems design ◮ HPC systems becoming increasingly important in science and industry. ◮ Dual/quad core processors standard in desk and laptop computers. ◮ Embedded systems using network-on-chip designs. ◮ But : parallelism is still deployed in specific areas, addressing specific requirements. ◮ Evident in wide the wide variety of designs, e.g. CMPs, GPUs, HPC systems. ◮ Emerging gap between architectures and languages, and application users. ◮ Very difficult for users to harness all available parallelism. 5

General-purpose parallel computers ◮ Sequential case: von Neumann architecture provides an efficient abstraction from the implementation of different computer systems. ◮ Hides irrelevant details from the programmer ◮ Makes possible standardised languages and transportable software 6

General-purpose parallel computers ◮ Sequential case: von Neumann architecture provides an efficient abstraction from the implementation of different computer systems. ◮ Hides irrelevant details from the programmer ◮ Makes possible standardised languages and transportable software ◮ Universality concept, introduced by Turing in 1937. ◮ Computer both special purpose device for executing a program, as well as a device capable of simulating all programs. ◮ Special purpose machines have no significant advantage (Valiant 1990). ◮ A universal parallel computer would allow parallelism to be exploited effectively with high level, transportable languages. 7

Language features supporting concurrency ◮ Programming languages must support high-level concurrent programming. ◮ Contribution of this work is to demonstrate the existence of simple language features supporting this. ◮ Process-to-processor allocation is the key issue. 8

Parallelism and channel communication proc p1 proc p2 proc init() is (c: chanend ) is (c: chanend ) is var c: chan ; var x: integer ; var y: integer ; { p1(c) | p2(c) } { x:=0 ; c!x ; c?x } { c?y ; c!y+1 } c p1 p2 init chanend chanend 9

Parallelism and channel communication proc p1 proc p2 proc init() is (c: chanend ) is (c: chanend ) is var c: chan ; var x: integer ; var y: integer ; { p1(c) | p2(c) } { x:=0 ; c!x ; c?x } { c?y ; c!y+1 } 0 c p1 p2 init chanend chanend 10

Parallelism and channel communication proc p1 proc p2 proc init() is (c: chanend ) is (c: chanend ) is var c: chan ; var x: integer ; var y: integer ; { p1(c) | p2(c) } { x:=0 ; c!x ; c?x } { c?y ; c!y+1 } 1 c p1 p2 init chanend chanend 11

Process migration ◮ Offload a process: p on p do process() s process 12

Process migration ◮ Offload a process: p on p do process() s process ◮ Offload a process with a channel: var c: chan c { on p do process(c) ; c ! value p s process,c } 13

Process migration ◮ Offload a process: p on p do process() s process ◮ Offload a process with a channel: var c: chan c { on p do process(c) ; c ! value p s process,c } ◮ Offload processes sharing a channel: c , 1 s s e p c var c: chan o r p { on p do process1(c) s c ; on q do process2(c) p r o q c } e s s 2 , c 14

Parallel recursion ◮ Parallel recursion is a natural tool for expressing concurrent program structures. 15

Parallel recursion ◮ Parallel recursion is a natural tool for expressing concurrent program structures. ◮ Recursion : solve a problem by solving smaller instances of the same problem. ◮ Parallelism : break a large computation down into smaller parts. 16

Creating a tree proc tree(depth: int ; top: chanend ) is var left, right: chan if depth = 0 then leaf(top) else { node(top, left, right) | tree(depth-1, left) | tree(depth-1, right) } 17

Creating a tree proc tree(depth: int ; top: chanend ) is var left, right: chan if depth = 0 then leaf(top) else { node(top, left, right) | tree(depth-1, left) | tree(depth-1, right) } tree(2, top): top node right left node node leaf leaf leaf leaf 18

Process structures ◮ A process structure is the communication topology of a set of concurrent processes. ◮ Simple structures such as the tree underpin many important parallel algorithms. ◮ e.g. sorting and FFT. ◮ Other common process structures include arrays, meshes and hypercubes. ◮ Parallel recursion and process migration allow the style of programming to shift from data structures to process structures . 19

Example: rapid process spawning ◮ Combine parallel recursion and process migration to optimise the distribution of processes over a system. proc d(t, n: int ) is if n = 1 then node(t) else { d(t, n/2) | on t + n/2 do d(t + n/2, n/2) } ◮ Given a set of networked processors p 0 , p 1 , p 2 , p 3 , d(0, 4) executes in time and space : Step p 0 p 1 p 2 p 3 20

Example: rapid process spawning ◮ Combine parallel recursion and process migration to optimise the distribution of processes over a system. proc d(t, n: int ) is if n = 1 then node(t) else { d(t, n/2) | on t + n/2 do d(t + n/2, n/2) } ◮ Given a set of networked processors p 0 , p 1 , p 2 , p 3 , d(0, 4) executes in time and space : Step p 0 p 1 p 2 p 3 0 d(0,4) 21

Example: rapid process spawning ◮ Combine parallel recursion and process migration to optimise the distribution of processes over a system. proc d(t, n: int ) is if n = 1 then node(t) else { d(t, n/2) | on t + n/2 do d(t + n/2, n/2) } ◮ Given a set of networked processors p 0 , p 1 , p 2 , p 3 , d(0, 4) executes in time and space : Step p 0 p 1 p 2 p 3 0 d(0,4) 1 d(0,2) d(2,2) 22

Example: rapid process spawning ◮ Combine parallel recursion and process migration to optimise the distribution of processes over a system. proc d(t, n: int ) is if n = 1 then node(t) else { d(t, n/2) | on t + n/2 do d(t + n/2, n/2) } ◮ Given a set of networked processors p 0 , p 1 , p 2 , p 3 , d(0, 4) executes in time and space : Step p 0 p 1 p 2 p 3 0 d(0,4) 1 d(0,2) d(2,2) 2 d(0,1) d(1,1) d(2,1) d(3,1) 23

Example: rapid process spawning ◮ Combine parallel recursion and process migration to optimise the distribution of processes over a system. proc d(t, n: int ) is if n = 1 then node(t) else { d(t, n/2) | on t + n/2 do d(t + n/2, n/2) } ◮ Given a set of networked processors p 0 , p 1 , p 2 , p 3 , d(0, 4) executes in time and space : Step p 0 p 1 p 2 p 3 0 d(0,4) 1 d(0,2) d(2,2) 2 d(0,1) d(1,1) d(2,1) d(3,1) 3 node(0) node(1) node(2) node(3) 24

Hardware support for concurrency ◮ It is essential for an efficient implementation of these mechanisms that the hardware directly supports them. ◮ Difficult in systems like MPI where communication predominantly software based. 25

Hardware support for concurrency ◮ It is essential for an efficient implementation of these mechanisms that the hardware directly supports them. ◮ Difficult in systems like MPI where communication predominantly software based. ◮ Process and communication primitives must be provided at the hardware level (in the instruction set). ◮ These primitives must complete in same magnitude of time as equivalent sequential operations such as subroutine calls & memory accesses. 26

A real implementation ◮ XMOS XCore processor architecture: general-purpose, scalable and provides low-level support for concurrency. ◮ Completed work: ◮ Written bespoke compiler implementing a small language as platform for new features ◮ A simple implementation of on statement. ◮ Initial exploration of approach has been promising. Results will follow in due course. 27

Conclusions ◮ The combination of parallel recursion and process migration allows the elegant expression of powerful concurrent programs. ◮ Rapid process distribution is an important mechanism in large scale systems & has a simple high level expression in this framework. ◮ The existence of the sympathetic XCore architecture proves implementation of efficient mechanisms supporting concurrent programming are feasible. ◮ The results will be very competitive when compared to leading parallel architectures. 28

Any questions? Email: hanlon@cs.bris.ac.uk 29

Dynamic generation of parallel computations James Hanlon, Simon J. - PowerPoint PPT Presentation

Dynamic generation of parallel computations James Hanlon, Simon J. Hollis Many-core project June 13, 2011 1 Introduction Background State of the art parallelism General-purpose parallel computers Language features supporting concurrency

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that

Parallel Computations Timo Heister, Clemson University heister@clemson.edu 2015-08-05 deal.II

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Parallel Algorithms Algorithm Theory WS 2013/14 Fabian Kuhn Parallel Computations : time to

The computations of acting agents and the agents acting in computations Philipp Hennig ICERM 5

for Optimization and Analysis of Floating-Point Computations Heiko Becker, Pavel Panchekha, Eva

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Dynamic document generation using Stata Zhao Xu StataCorp LLC June 16, 2019 Zhao Xu Dynamic

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of

Data Parallel Programming in R David Padua Department of

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Introduction to Symbolic Dynamics Part 1: The basics Silvio Capobianco Institute of Cybernetics

lecture 7 Integer multiplication (grade school) How to do (unsigned) integer multiplication in

24 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science

Intro to data cleaning with Apache Spark CLEAN IN G DATA W ITH P YS PARK Mike Metzger Data

Leveraging Redshift Spectrum for Fun and Profit About This Talk As a software engineer at a

Dynamic generation of parallel computations James Hanlon, Simon J. - PowerPoint PPT Presentation

Dynamic generation of parallel computations James Hanlon, Simon J. Hollis Many-core project June 13, 2011 1 Introduction Background State of the art parallelism General-purpose parallel computers Language features supporting concurrency

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that

Parallel Computations Timo Heister, Clemson University heister@clemson.edu 2015-08-05 deal.II

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &amp;

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Parallel Algorithms Algorithm Theory WS 2013/14 Fabian Kuhn Parallel Computations : time to

The computations of acting agents and the agents acting in computations Philipp Hennig ICERM 5

for Optimization and Analysis of Floating-Point Computations Heiko Becker, Pavel Panchekha, Eva

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Dynamic document generation using Stata Zhao Xu StataCorp LLC June 16, 2019 Zhao Xu Dynamic

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of

Data Parallel Programming in R David Padua Department of

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Introduction to Symbolic Dynamics Part 1: The basics Silvio Capobianco Institute of Cybernetics

lecture 7 Integer multiplication (grade school) How to do (unsigned) integer multiplication in

24 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science

Intro to data cleaning with Apache Spark CLEAN IN G DATA W ITH P YS PARK Mike Metzger Data

Leveraging Redshift Spectrum for Fun and Profit About This Talk As a software engineer at a

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &