in intro roduc ductio ion t n to p paralle arallel p l
play

In Intro roduc ductio ion t n to P Paralle arallel P l Pro - PowerPoint PPT Presentation

UofM-Summer-School, June 25-28, 2018 In Intro roduc ductio ion t n to P Paralle arallel P l Pro rogram gramming ing for shared memory machines using fo Op Open enMP MP Al Ali Ke Kerrache E-ma mail: ali. ali.kerrac ache


  1. UofM-Summer-School, June 25-28, 2018 In Intro roduc ductio ion t n to P Paralle arallel P l Pro rogram gramming ing for shared memory machines using fo Op Open enMP MP Al Ali Ke Kerrache E-ma mail: ali. ali.kerrac ache he@um @umanit anitoba. ba.ca Summer School, June 25-28, 2018

  2. Outline q Introduction to parallel programming (OpenMP) q Definition of OpenMPAPI Ø Constitution of an OpenMP program Ø OpenMP programming Model Ø OpenMP syntax [C/C++, Fortran]: compiler directives Ø Run or submit an OpenMP job [SLURM, PBS] q Learn OpenMP by Examples Ø Hello World program v Work sharing in OpenMP ü Sections ü Loops Ø Compute pi = 3.14 v Serial and Parallel versions v Race condition v SPMD model v Synchronization Summer School, June 25-28, 2018

  3. Download the support material q Use ssh client: PuTTy, MobaXterm, Terminal (Mac or Linux) to connect to cedar and/or graham: Ø ssh –Y username@cedar.computecanada.ca Ø ssh –Y username@graham.computecanada.ca q Download the files using wget: wget https://ali-kerrache.000webhostapp.com/uofm/openmp.tar.gz wget https://ali-kerrache.000webhostapp.com/uofm/openmp-slides.pdf Or from the website https://westgrid.github.io/manitobaSummerSchool2018/ q Unpack the archive and change the directory: tar -xvf openmp.tar.gz cd UofM-Summer-School-OpenMP Summer School, June 25-28, 2018

  4. Concurrency and parallelism Concurrency: q Condition of a system in which multiple tasks are logically active at the same time … but they may not necessarily run in parallel. Parallelism: - subset of concurrency q Condition of a system in which multiple tasks are active at the same time and run in parallel. What do we mean by parallel machines? Summer School, June 25-28, 2018

  5. Introduction of parallel programming Se Serial ial Program amming ing: Example: Ø Develop a serial program. Time Ø Performance & Optimization? 4 Cores 1 Core But in real world: Bu Parallelization Ø Run multiple programs. Execution in parallel Ø Large & complex problems. With 4 cores: Ø Time consuming. Execution time reduced So Solut lutio ion: n: by a factor of 4 Ø Use Parallel Machines. Ø Use Multi-Core Machines. What is Parallel Programming? Why Parallel? Wh Obtain the same amount of computation with multiple Ø Reduce the execution time. cores at low frequency (fast). Ø Run multiple programs. Summer School, June 25-28, 2018

  6. Parallel machines & parallel programming Distributed Memory Machines Shared Memory Machines CPU-0 CPU-1 CPU-2 CPU-3 CPU-0 CPU-1 CPU-2 CPU-3 MEM-0 MEM-1 MEM-2 MEM-3 SHARED MEMORY Ø Each processor has its own memory . Ø All processors share the same memory . Ø The variables are independent . Ø The variables can be shared or private . Ø Communication by passing messages Ø Communication via shared memory . (network). Multi-Threading Multi-Processing Ø Portable, easy to program and use. Ø Difficult to program. Ø Not very scalable . Ø Scalable . MPI based programming OpenMP based programming Summer School, June 25-28, 2018

  7. Definition of OpenMP: API v Library used to divide computational work in a program and add parallelism to a serial program ( create threads ). v Supported by compilers: Intel (ifort, icc), GNU (gcc, gfortran, …). v Programming languages: C/C++, Fortran. v Compilers: http://www.openmp.org/resources/openmp-compilers/ OpenMP Runtime Library Environment Variables Compiler Directives Directives to add to a Directives introduced after Directives executed serial program. compile time to control & at run time. Interpreted at compile execute OpenMP program. time. Summer School, June 25-28, 2018

  8. Construction of an OpenMP program Application / Serial program / End user OpenMP Environment Compiler Directives Runtime Library Variables Compilation / Runtime Library / Operating System Thread creation & Parallel Execution … N-1 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 What is the OpenMP programming model? Summer School, June 25-28, 2018

  9. OpenMP model: Fork-Join parallelism Serial Serial Serial FORK FORK JOIN JOIN Region Region Region Nested Serial region: master thread Parallel region: all threads Region Master thread spawns a team of threads as needed. FORK JOIN The Parallelism is added incrementally: that is, the sequential program evolves into a parallel program. Serial Program Define the regions to parallelize , then add OpenMP directives Summer School, June 25-28, 2018

  10. Learn OpenMP by examples v Example_00: Threads creation. ü How to go from a serial code to a parallel code? ü How to create threads ? ü Introduce some constructs of OpenMP. ü Compile and run an OpenMP program ü submit an OpenMP job v Example_01: Work sharing using: ü Loops ü Sections v Example_02: Common problem in OpenMP programming. ü False sharing and race conditions. v Example_03: Single Program Multiple Data model: ü as solution to avoid race conditions . v Example_04: ü More OpenMP constructs. ü Synchronization . Summer School, June 25-28, 2018

  11. OpenMP: simple syntax Most of the constructs in OpenMP are compiler directives or pragma : #include <omp.h> v For C/C++, the pragma take the form: #pragma omp parallel { #pragma omp construct [clause [clause]…] Block of a C/C++ code; } v For Fortran, the directives take one of the forms: use omp_lib !$OMP construct [clause [clause]…] !$omp parallel C$OMP construct [clause [clause]…] Block of Fortran code *$OMP construct [clause [clause]…] !$omp end parallel ü For C/C++ include the H eader file: #include <omp.h> ü For Fortran 90 use the module : use omp_lib ü For F77 include the Header file: include ‘omp_lib.h’ Summer School, June 25-28, 2018

  12. Parallel regions and structured blocks Most of OpenMP constructs apply to structured blocks q Structured block: a block with one point of entry at the top and one point of exit at the bottom. q The only “ branches ” allowed are STOP statements in Fortran and exit() in C/C++ Stru St ruct ctured b block ock No Non structured block #pragma omp parallel if (go_now()) goto more; { #pragma omp parallel int id = omp_get_thread_num(); { int id = omp_get_thread_num(); more: res[id] = do_big_job (id); more: res[id] = do_big_job(id); if (conv (res[id]) goto done; if (conv (res[id]) goto more; goto more; } } printf (“All done\n”); done: if (!Really_done()) goto more; Summer School, June 25-28, 2018

  13. Compile and run OpenMP program q Compile and enable OpenMP library: Ø GNU: add –fopenmp to C/C++ & Fortran compilers. Ø Intel compilers: add –openmp, -qopenmp (accepts also –fopenmp ) ü PGI Linux compilers: add –mp ü Windows: add /Qopenmp q Set the environment variable: OMP_NUM_THREADS ü OpenMP will spawns one thread per hardware thread . Ø $ export OMP_NUM_THREADS=value ( bash shell ) Ø $ setenv OMP_NUM_THREADS value (tcsh shell) value: number of threads [ For example 4 ] q Execute or run the program: Ø $ ./exec_program {options, parameters} or ./a.out Summer School, June 25-28, 2018

  14. Submission script: SLURM #!/bin/bash Resources: #SBATCH --nodes=1 q nodes=1 #SBATCH --ntasks=1 q ntasks=1 #SBATCH --cpus-per-task=4 q cpus-per-task=1 to number of cores #SBATCH --mem-per-cpu=2500M per node #SBATCH --time=0-00:30 Ø Cedar: nodes with 32 or 48 cores # Load compiler module and/or your Ø Graham: nodes with 32 cores # application module. Ø Niagara: nodes with 40 cores cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK echo "Starting run at: `date`" ./your_openmp_program_exec {options and/or parameters} echo "Program finished with exit code $? at: `date`" Summer School, June 25-28, 2018

  15. Submission script: PBS #!/bin/bash Resources: #PBS -S /bin/bash ü nodes=1 #PBS –l nodes=1:ppn=4 ü ppn=1 to maximum of N CPU (hardware) #PBS –l pmem=2000mb ü nodes=1:ppn=4 (for example). #PBS –l walltime=24:00:00 #PBS –M <your-valid-email> # On systems where $PBS_NUM_PPN is not #PBS –m abe available, one could use: # Load compiler module CORES=`/bin/awk 'END {print NR}' $PBS_NODEFILE` # and/or your application # module. export OMP_NUM_THREADS=$CORES cd $PBS_O_WORKDIR echo "Current working directory is `pwd`" export OMP_NUM_THREADS=$PBS_NUM_PPN ./your_openmp_exec < input_file > output_file echo "Program finished at: `date`" Summer School, June 25-28, 2018

  16. Data environment Ø only a single instance of variables in shared memory. shared Ø all threads have read and write access to these variables. Ø Each thread allocates its own private copy of the data. private Ø These local copies only exist in parallel region. Ø Undefined when entering or exiting the parallel region. Ø variables are also declared to be private . firstprivate Ø additionally, get initialized with value of original variable. Ø declares variables as private . lastprivate Ø variables get value from the last iteration of the loop. C/C++: default ( shared | none ) Fortran: default ( private | firstprivate | shared | none ) It is highly recommended to use: default ( none ) Summer School, June 25-28, 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend