Embarrassingly Parallel Computations A computation that can be - PDF document

Embarrassingly Parallel Computations A computation that can be divided into a number of completely independent parts, each of which can be executed by a separate processor. Input data Processes Figure 3.1 Disconnected computational Results graph (embarrassingly parallel problem). 99 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Send initial data spawn() send() recv() Slaves Master send() recv() Collect results Figure 3.2 Practical embarrassingly parallel computational graph with dynamic process creation and the master-slave approach. 100 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Embarrassingly Parallel Examples Geometrical Transformations of Images Two-dimensional image stored as a pixmap , in which each pixel (picture element) is repre- sented a binary number in a two-dimensional array. Grayscale images require typically 8 bits to represent 256 different monochrome intensities. Color requires more specification. Examples of low level embarrassingly parallel image operations: (a) Shifting The coordinates of a two-dimensional object shifted by ∆ x in the x -dimension and ∆ y in the y -dimension are given by x ′ = x + ∆ x y ′ = y + ∆ y where x and y are the original and x ′ and y ′ are the new coordinates. (b) Scaling The coordinates of an object scaled by a factor S x in the x -direction and S y in the y- direction are given by x ′ = xS x y ′ = yS y The object is enlarged in size when S x and S y are greater than 1 and reduced in size when S x and S y are between 0 and 1. Note that the magnification or reduction do not need to be the same in both x- and y -directions. (c) Rotation The coordinates of an object rotated through an angle θ about the origin of the coordinate system are given by x ′ = x cos θ + y sin θ y ′ = − x sin θ + y cos θ 101 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Main parallel programming concern is division of bitmap/pixmap into groups of pixels for each processor because there are usually many more pixels than processes/processors. Two general methods of grouping: by square/rectangular regions and by columns/rows. With a 640 × 480 image and 48 processes: x Process 80 640 y Map 80 480 (a) Square region for each process Process 640 10 Map 480 (b) Row region for each process Figure 3.3 Partitioning into regions for individual processes. 102 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Pseudocode to Perform Image Shift Master for (i = 0, row = 0; i < 48; i++, row = row + 10)/* for each process*/ send(row, P i ); /* send row no.*/ for (i = 0; i < 480; i++) /* initialize temp */ for (j = 0; j < 640; j++) temp_map[i][j] = 0; for (i = 0; i < (640 * 480); i++) { /* for each pixel */ recv(oldrow,oldcol,newrow,newcol, P ANY ); /* accept new coords */ if !((newrow < 0)||(newrow >= 480)||(newcol < 0)||(newcol >= 640)) temp_map[newrow][newcol]=map[oldrow][oldcol]; } for (i = 0; i < 480; i++) /* update bitmap */ for (j = 0; j < 640; j++) map[i][j] = temp_map[i][j]; Slave recv(row, P master ); /* receive row no. */ for (oldrow = row; oldrow < (row + 10); oldrow++) for (oldcol = 0; oldcol < 640; oldcol++) { /* transform coords */ newrow = oldrow + delta_x; /* shift in x direction */ newcol = oldcol + delta_y; /* shift in y direction */ send(oldrow,oldcol,newrow,newcol, P master ); /* coords to master */ } 103 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Analysis Suppose each pixel requires one computational step and there are n × n pixels. Sequential t s = n 2 and a sequential time complexity of Ο ( n 2 ). Parallel Communication t comm = t startup + mt data t comm = p ( t startup + 2 t data ) + 4 n 2 ( t startup + t data ) = Ο ( p + n 2 ) Computation   2 n 2   = Ο ( n 2 / p ) t comp = ---- -  p  Overall Execution Time t p = t comp + t comm For constant p , this is Ο ( n 2 ). However, the constant hidden in the communication part far exceeds those constants in the computation in most practical situations. 104 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Mandelbrot Set Set of points in a complex plane that are quasi-stable (will increase and decrease, but not exceed some limit) when computed by iterating the function 2 + c z k +1 = z k where z k +1 is the ( k + 1)th iteration of the complex number z = a + bi and c is a complex number giving the position of the point in the complex plane. The initial value for z is zero. The iterations are continued until magnitude of z is greater than 2 or the number of iterations reaches some arbitrary limit. Magnitude of z is the length of the vector given by a 2 b 2 z length = + 2 + c , is simplified by recognizing that Computing the complex function, z k +1 = z k z 2 = a 2 + 2 abi + bi 2 = a 2 − b 2 + 2 abi or a real part that is a 2 − b 2 and an imaginary part that is 2 ab . The next iteration values can be produced by computing: 2 - z imag 2 + c real z real = z real z imag = 2 z real z imag + c imag 105 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Sequential Code Structure for real and imaginary parts of z : structure complex { float real; float imag; }; Routine for computing value of one point and returning number of iterations int cal_pixel(complex c) { int count, max; complex z; float temp, lengthsq; max = 256; z.real = 0; z.imag = 0; count = 0; /* number of iterations */ do { temp = z.real * z.real - z.imag * z.imag + c.real; z.imag = 2 * z.real * z.imag + c.imag; z.real = temp; lengthsq = z.real * z.real + z.imag * z.imag; count++; } while ((lengthsq < 4.0) && (count < max)); return count; } 106 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Scaling Coordinate System Suppose the display height is disp_height , the display width is disp_width , and the point in this display area is ( x , y ). For computational efficiency, let scale_real = (real_max - real_min)/disp_width; scale_imag = (imag_max - imag_min)/disp_height; Including scaling, the code could be of the form for (x = 0; x < disp_width; x++) /* screen coordinates x and y */ for (y = 0; y < disp_height; y++) { c.real = real_min + ((float) x * scale_real); c.imag = imag_min + ((float) y * scale_imag); color = cal_pixel(c); display(x, y, color); } where display() is a routine suitably written to display the pixel ( x , y ) at the computed col- or. 107 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

+ 2 Imaginary 0 − 2 − 2 + 2 0 Real Figure 3.4 Mandelbrot set. 108 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Parallelizing Mandelbrot Set Computation Static Task Assignment Master for (i = 0, row = 0; i < 48; i++, row = row + 10)/* for each process*/ send(&row, P i ); /* send row no.*/ for (i = 0; i < (480 * 640); i++) {/* from processes, any order */ recv(&c, &color, P ANY ); /* receive coordinates/colors */ display(c, color); /* display pixel on screen */ } Slave (process i) recv(&row, P master ); /* receive row no. */ for (x = 0; x < disp_width; x++) /* screen coordinates x and y */ for (y = row; y < (row + 10); y++) { c.real = min_real + ((float) x * scale_real); c.imag = min_imag + ((float) y * scale_imag); color = cal_pixel(c); send(&c, &color, P master ); /* send coords, color to master */ } 109 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Dynamic Task Assignment Work Pool/Processor Farms Work pool ( x a , y a ) ( x e , y e ) ( x c , y c ) ( x d , y d ) ( x b , y b ) Task Return results/ request new task Figure 3.5 Work pool approach. 110 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen  Prentice Hall, 1999

Embarrassingly Parallel Computations A computation that can be - PDF document

Embarrassingly Parallel Computations A computation that can be divided into a number of completely independent parts, each of which can be executed by a separate processor. Input data Processes Figure 3.1 Disconnected computational Results

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Task Farming For Embarrassingly Parallel Processing Ivan Giroo igiroo@ictp.it Informa(on

Beyond the Embarrassingly Parallel New Languages, Compilers, and Runtimes for Big-Data Processing

Parallel Computations Timo Heister, Clemson University heister@clemson.edu 2015-08-05 deal.II

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Parallel Search Ciaran McCreesh and Patrick Prosser This Weeks Lectures Search and

Getting a first grip on doing large computations at CWI Nicolas H oning Centrum Wiskunde

Parallel Algorithms Algorithm Theory WS 2013/14 Fabian Kuhn Parallel Computations : time to

The computations of acting agents and the agents acting in computations Philipp Hennig ICERM 5

for Optimization and Analysis of Floating-Point Computations Heiko Becker, Pavel Panchekha, Eva

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

PIP - Perspective of Image Processing Introduction SIFT Implementation of Code Rishidhar Reddy

Building Java Programs Graphics Reading: Supplement 3G Objects (briefly) object: An entity

WATER SLIDES POINTMAN OY/2019 Picture: Helsinki swimming stadium Address Internet Telephone

Simulations for Crystal (UA9) V. Previtali CERN & EPFL R. Assmann, S. Redaelli, CERN I.

Using Graphics Building Java Programs Supplement 3G Introduction So far, you have learned how

Week 2 - Friday What did we talk about last time? Graphics rendering pipeline Geometry

Bootstrapping semantics on the Web: meaning elicitation from schemas Paolo Bouquet 1 Joint work

Sorting Kelly Rivers and Stephanie Rosenthal 15-110 Fall 2019 Announcements Homework 3 full

Embarrassingly Parallel Computations A computation that can be - PDF document

Embarrassingly Parallel Computations A computation that can be divided into a number of completely independent parts, each of which can be executed by a separate processor. Input data Processes Figure 3.1 Disconnected computational Results

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Task Farming For Embarrassingly Parallel Processing Ivan Giro*o igiro*o@ictp.it Informa(on

Beyond the Embarrassingly Parallel New Languages, Compilers, and Runtimes for Big-Data Processing

Parallel Computations Timo Heister, Clemson University heister@clemson.edu 2015-08-05 deal.II

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &amp;

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Parallel Search Ciaran McCreesh and Patrick Prosser This Weeks Lectures Search and

Getting a first grip on doing large computations at CWI Nicolas H oning Centrum Wiskunde

Parallel Algorithms Algorithm Theory WS 2013/14 Fabian Kuhn Parallel Computations : time to

The computations of acting agents and the agents acting in computations Philipp Hennig ICERM 5

for Optimization and Analysis of Floating-Point Computations Heiko Becker, Pavel Panchekha, Eva

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

PIP - Perspective of Image Processing Introduction SIFT Implementation of Code Rishidhar Reddy

Building Java Programs Graphics Reading: Supplement 3G Objects (briefly) object: An entity

WATER SLIDES POINTMAN OY/2019 Picture: Helsinki swimming stadium Address Internet Telephone

Simulations for Crystal (UA9) V. Previtali CERN &amp; EPFL R. Assmann, S. Redaelli, CERN I.

Using Graphics Building Java Programs Supplement 3G Introduction So far, you have learned how

Week 2 - Friday What did we talk about last time? Graphics rendering pipeline Geometry

Bootstrapping semantics on the Web: meaning elicitation from schemas Paolo Bouquet 1 Joint work

Sorting Kelly Rivers and Stephanie Rosenthal 15-110 Fall 2019 Announcements Homework 3 full

Task Farming For Embarrassingly Parallel Processing Ivan Giroo igiroo@ictp.it Informa(on

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &

Simulations for Crystal (UA9) V. Previtali CERN & EPFL R. Assmann, S. Redaelli, CERN I.