Identifying opportunities for parallelization In the hotspots of - - PowerPoint PPT Presentation

identifying opportunities for parallelization in the
SMART_READER_LITE
LIVE PREVIEW

Identifying opportunities for parallelization In the hotspots of - - PowerPoint PPT Presentation

Identifying opportunities for parallelization In the hotspots of your code PARALLWARE SW DEVELOPMENT CYCLE Understanding the sequential code and profiling a. Analyze your code b. Focus on profiling and where to parallelize your code correctly


slide-1
SLIDE 1

Identifying opportunities for parallelization In the hotspots of your code

slide-2
SLIDE 2

2

PARALLWARE SW DEVELOPMENT CYCLE

Understanding the sequential code and profiling

a. Analyze your code b. Focus on profiling and where to parallelize your code correctly

Identifying opportunities for parallelization

c. Figuring out where the code is suitable for parallelization d. Often the hardest step!

Introduce parallelism

e. Decide how to implement the parallelism discovered in your code

Test the correctness of your parallel implementation

f. Compile & run the parallel versions of your code to check that the numerical result is correct

Test the performance of your parallel implementation

g. Run the parallel versions of your code to measure performance increase for real-world workloads

Performance tuning

h. Repeat steps 1-5 until you meet your performance requirements....

slide-3
SLIDE 3

3

PARALLWARE SW DEVELOPMENT CYCLE

Understanding the sequential code and profiling

a. Analyze your code b. Focus on profiling and where to parallelize your code correctly

Identifying opportunities for parallelization

c. Figuring out where the code is suitable for parallelization d. Often the hardest step!

Introduce parallelism

e. Decide how to implement the parallelism discovered in your code

Test the correctness of your parallel implementation

f. Compile & run the parallel versions of your code to check that the numerical result is correct

Test the performance of your parallel implementation

g. Run the parallel versions of your code to measure performance increase for real-world workloads

Performance tuning

h. Repeat steps 1-5 until you meet your performance requirements....

slide-4
SLIDE 4

4

PARALLWARE SW DEVELOPMENT CYCLE

Understanding the sequential code and profiling Identifying opportunities for parallelization

Why are dependences difficult to use in practice?

  • K. Asanovic et al. 2009. A view of the parallel computing landscape. Commun. ACM 52, 10 (October 2009), 56-67.

DOI: https://doi.org/10.1145/1562764.1562783

https://www.exascaleproject.org/event/bssw/

slide-5
SLIDE 5

5

PARALLWARE SW DEVELOPMENT CYCLE

01 void atmux(double* restrict y, … , int n) 08 { 09 for(int t = 0; t < n; t++) 10 y[t] = 0; 11 12 for(int i = 0; i < n; i++) { 13 for (int k = row_ptr[i]; k < row_ptr[i+1]; k++) { 14 y[col_ind[k]] += x[i] * val[k]; 15 } 16 } 17 }

FLOW dependences OUTPUT dependences ANTI dependences

$ icc atmux.c -std=c99 -c -O3 -xAVX -Wall -vec-report3 -opt-report3 -restrict -parallel -openmp -guide icc (ICC) 13.1.1 20130313 ... HPO THREADIZER REPORT (atmux) LOG OPENED ON Fri Sep 25 18:04:15 2015 HPO Threadizer Report (atmux) atmux.c(9:2-9:2):PAR:atmux: loop was not parallelized: existence of parallel dependence atmux.c(10:3-10:3):PAR:atmux: potential ANTI dependence on y. potential FLOW dependence on y. atmux.c(9:2-9:2):PAR:atmux: LOOP WAS AUTO-PARALLELIZED atmux.c(12:2-12:2):PAR:atmux: loop was not parallelized: existence of parallel dependence atmux.c(13:3-13:3):PAR:atmux: loop was not parallelized: existence of parallel dependence ...

Understanding the sequential code and profiling Identifying opportunities for parallelization

slide-6
SLIDE 6

6

PARALLWARE SW DEVELOPMENT CYCLE

MISSING TOWER “CODE” FOR “IMPLEMENTATION GAP”

Understanding the sequential code and profiling Identifying opportunities for parallelization

slide-7
SLIDE 7

7

Source code:

  • Outputs: xi, yi, zi
  • Temporaries: dxc, dyc, dzc, m, f
  • Read-only: xx1, yy1, zz1, mass1,

fsrrmax2, ...

PARALLWARE SW DEVELOPMENT CYCLE

Understanding the sequential code and profiling Identifying opportunities for parallelization

slide-8
SLIDE 8

Making the most of your

  • pportunities to parallelize
slide-9
SLIDE 9
  • The Parallware Analysis will help you to

identify regions that are opportunities for parallelization:

slide-10
SLIDE 10

#pragma omp parallel for … \ shared(y) for (i = 0; i<n; i++) { y[i] = 0; for (k=ia[i]; k<ia[i+1-1]; i++) { y[i] = y[i] + a[j]*x[ja[k]]; } }

Parallel scalar reduction, in the inner loop Parallel forall, in the outer loop

#pragma omp parallel for … \ private(t) for (i = 0; i<n; i++) { t = 0; for (k=ia[i]; k<ia[i+1-1]; i++) { t = t + a[k]*x[ja[k]]; } y[i] = t; }

Parallel forall, in the outer loop Parallel scalar reduction, in the inner loop

slide-11
SLIDE 11

for (h = 0; h<Adim; h++) { hist[h] = 0; } for (h = 0; h<fDim; h++) { hist[f[[h]] = hist[f[h]] + 1; } for (h = 1; h<Adim; h++) { hist[h] = hist[h] + hist[h-1]; }

Parallel forall Parallel sparse reduction Parallel recurrence In general not parallelizable, but in many situations can be parallelized with significant synchronization

  • verhead.