In Intro roduc ductio ion t n to P Paralle arallel P l Pro - PowerPoint PPT Presentation

UofM-Summer-School, June 25-28, 2018 In Intro roduc ductio ion t n to P Paralle arallel P l Pro rogram gramming ing for shared memory machines using fo Op Open enMP MP Al Ali Ke Kerrache E-ma mail: ali. ali.kerrac ache he@um @umanit anitoba. ba.ca Summer School, June 25-28, 2018

Outline q Introduction to parallel programming (OpenMP) q Definition of OpenMPAPI Ø Constitution of an OpenMP program Ø OpenMP programming Model Ø OpenMP syntax [C/C++, Fortran]: compiler directives Ø Run or submit an OpenMP job [SLURM, PBS] q Learn OpenMP by Examples Ø Hello World program v Work sharing in OpenMP ü Sections ü Loops Ø Compute pi = 3.14 v Serial and Parallel versions v Race condition v SPMD model v Synchronization Summer School, June 25-28, 2018

Download the support material q Use ssh client: PuTTy, MobaXterm, Terminal (Mac or Linux) to connect to cedar and/or graham: Ø ssh –Y username@cedar.computecanada.ca Ø ssh –Y username@graham.computecanada.ca q Download the files using wget: wget https://ali-kerrache.000webhostapp.com/uofm/openmp.tar.gz wget https://ali-kerrache.000webhostapp.com/uofm/openmp-slides.pdf Or from the website https://westgrid.github.io/manitobaSummerSchool2018/ q Unpack the archive and change the directory: tar -xvf openmp.tar.gz cd UofM-Summer-School-OpenMP Summer School, June 25-28, 2018

Concurrency and parallelism Concurrency: q Condition of a system in which multiple tasks are logically active at the same time … but they may not necessarily run in parallel. Parallelism: - subset of concurrency q Condition of a system in which multiple tasks are active at the same time and run in parallel. What do we mean by parallel machines? Summer School, June 25-28, 2018

Introduction of parallel programming Se Serial ial Program amming ing: Example: Ø Develop a serial program. Time Ø Performance & Optimization? 4 Cores 1 Core But in real world: Bu Parallelization Ø Run multiple programs. Execution in parallel Ø Large & complex problems. With 4 cores: Ø Time consuming. Execution time reduced So Solut lutio ion: n: by a factor of 4 Ø Use Parallel Machines. Ø Use Multi-Core Machines. What is Parallel Programming? Why Parallel? Wh Obtain the same amount of computation with multiple Ø Reduce the execution time. cores at low frequency (fast). Ø Run multiple programs. Summer School, June 25-28, 2018

Parallel machines & parallel programming Distributed Memory Machines Shared Memory Machines CPU-0 CPU-1 CPU-2 CPU-3 CPU-0 CPU-1 CPU-2 CPU-3 MEM-0 MEM-1 MEM-2 MEM-3 SHARED MEMORY Ø Each processor has its own memory . Ø All processors share the same memory . Ø The variables are independent . Ø The variables can be shared or private . Ø Communication by passing messages Ø Communication via shared memory . (network). Multi-Threading Multi-Processing Ø Portable, easy to program and use. Ø Difficult to program. Ø Not very scalable . Ø Scalable . MPI based programming OpenMP based programming Summer School, June 25-28, 2018

Definition of OpenMP: API v Library used to divide computational work in a program and add parallelism to a serial program ( create threads ). v Supported by compilers: Intel (ifort, icc), GNU (gcc, gfortran, …). v Programming languages: C/C++, Fortran. v Compilers: http://www.openmp.org/resources/openmp-compilers/ OpenMP Runtime Library Environment Variables Compiler Directives Directives to add to a Directives introduced after Directives executed serial program. compile time to control & at run time. Interpreted at compile execute OpenMP program. time. Summer School, June 25-28, 2018

Construction of an OpenMP program Application / Serial program / End user OpenMP Environment Compiler Directives Runtime Library Variables Compilation / Runtime Library / Operating System Thread creation & Parallel Execution … N-1 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 What is the OpenMP programming model? Summer School, June 25-28, 2018

OpenMP model: Fork-Join parallelism Serial Serial Serial FORK FORK JOIN JOIN Region Region Region Nested Serial region: master thread Parallel region: all threads Region Master thread spawns a team of threads as needed. FORK JOIN The Parallelism is added incrementally: that is, the sequential program evolves into a parallel program. Serial Program Define the regions to parallelize , then add OpenMP directives Summer School, June 25-28, 2018

Learn OpenMP by examples v Example_00: Threads creation. ü How to go from a serial code to a parallel code? ü How to create threads ? ü Introduce some constructs of OpenMP. ü Compile and run an OpenMP program ü submit an OpenMP job v Example_01: Work sharing using: ü Loops ü Sections v Example_02: Common problem in OpenMP programming. ü False sharing and race conditions. v Example_03: Single Program Multiple Data model: ü as solution to avoid race conditions . v Example_04: ü More OpenMP constructs. ü Synchronization . Summer School, June 25-28, 2018

OpenMP: simple syntax Most of the constructs in OpenMP are compiler directives or pragma : #include <omp.h> v For C/C++, the pragma take the form: #pragma omp parallel { #pragma omp construct [clause [clause]…] Block of a C/C++ code; } v For Fortran, the directives take one of the forms: use omp_lib !$OMP construct [clause [clause]…] !$omp parallel C$OMP construct [clause [clause]…] Block of Fortran code *$OMP construct [clause [clause]…] !$omp end parallel ü For C/C++ include the H eader file: #include <omp.h> ü For Fortran 90 use the module : use omp_lib ü For F77 include the Header file: include ‘omp_lib.h’ Summer School, June 25-28, 2018

Parallel regions and structured blocks Most of OpenMP constructs apply to structured blocks q Structured block: a block with one point of entry at the top and one point of exit at the bottom. q The only “ branches ” allowed are STOP statements in Fortran and exit() in C/C++ Stru St ruct ctured b block ock No Non structured block #pragma omp parallel if (go_now()) goto more; { #pragma omp parallel int id = omp_get_thread_num(); { int id = omp_get_thread_num(); more: res[id] = do_big_job (id); more: res[id] = do_big_job(id); if (conv (res[id]) goto done; if (conv (res[id]) goto more; goto more; } } printf (“All done\n”); done: if (!Really_done()) goto more; Summer School, June 25-28, 2018

Compile and run OpenMP program q Compile and enable OpenMP library: Ø GNU: add –fopenmp to C/C++ & Fortran compilers. Ø Intel compilers: add –openmp, -qopenmp (accepts also –fopenmp ) ü PGI Linux compilers: add –mp ü Windows: add /Qopenmp q Set the environment variable: OMP_NUM_THREADS ü OpenMP will spawns one thread per hardware thread . Ø $ export OMP_NUM_THREADS=value ( bash shell ) Ø $ setenv OMP_NUM_THREADS value (tcsh shell) value: number of threads [ For example 4 ] q Execute or run the program: Ø $ ./exec_program {options, parameters} or ./a.out Summer School, June 25-28, 2018

Submission script: SLURM #!/bin/bash Resources: #SBATCH --nodes=1 q nodes=1 #SBATCH --ntasks=1 q ntasks=1 #SBATCH --cpus-per-task=4 q cpus-per-task=1 to number of cores #SBATCH --mem-per-cpu=2500M per node #SBATCH --time=0-00:30 Ø Cedar: nodes with 32 or 48 cores # Load compiler module and/or your Ø Graham: nodes with 32 cores # application module. Ø Niagara: nodes with 40 cores cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK echo "Starting run at: `date`" ./your_openmp_program_exec {options and/or parameters} echo "Program finished with exit code $? at: `date`" Summer School, June 25-28, 2018

Submission script: PBS #!/bin/bash Resources: #PBS -S /bin/bash ü nodes=1 #PBS –l nodes=1:ppn=4 ü ppn=1 to maximum of N CPU (hardware) #PBS –l pmem=2000mb ü nodes=1:ppn=4 (for example). #PBS –l walltime=24:00:00 #PBS –M <your-valid-email> # On systems where $PBS_NUM_PPN is not #PBS –m abe available, one could use: # Load compiler module CORES=`/bin/awk 'END {print NR}' $PBS_NODEFILE` # and/or your application # module. export OMP_NUM_THREADS=$CORES cd $PBS_O_WORKDIR echo "Current working directory is `pwd`" export OMP_NUM_THREADS=$PBS_NUM_PPN ./your_openmp_exec < input_file > output_file echo "Program finished at: `date`" Summer School, June 25-28, 2018

Data environment Ø only a single instance of variables in shared memory. shared Ø all threads have read and write access to these variables. Ø Each thread allocates its own private copy of the data. private Ø These local copies only exist in parallel region. Ø Undefined when entering or exiting the parallel region. Ø variables are also declared to be private . firstprivate Ø additionally, get initialized with value of original variable. Ø declares variables as private . lastprivate Ø variables get value from the last iteration of the loop. C/C++: default ( shared | none ) Fortran: default ( private | firstprivate | shared | none ) It is highly recommended to use: default ( none ) Summer School, June 25-28, 2018

In Intro roduc ductio ion t n to P Paralle arallel P l Pro - PowerPoint PPT Presentation

UofM-Summer-School, June 25-28, 2018 In Intro roduc ductio ion t n to P Paralle arallel P l Pro rogram gramming ing for shared memory machines using fo Op Open enMP MP Al Ali Ke Kerrache E-ma mail: ali. ali.kerrac ache

Miscellaneous Topics in Databases P ARALLEL DBMS W HY P ARALLEL A CCESS T O D ATA ? At 10 MB/s

Br Brief ef In Intro roduc uctio ion o of I IMP Liang LU Linac Center Institute of

P ARALLEL P ROGRAMS U SING A D OMAIN -S PECIFIC L ANGUAGE T OBIAS K LEIN T OOL F OR D EVELOPMENT

NVR NAS Agenda Intro trodu ductio ction Background Challenges Solution Features

I nt nt roduc uct ion Paul McMahon 11 th November 2011 Age genda da Part 1 Schedule 4

In Intro roduc ducti tion n to Q Q-Free ree September 2016 2 Introduction Hkon Volldal

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Combining ModelCheckin g and Abstra ct Interpret a t ion P arallel Combina tio n

A Heteroge erogeneous neous Paralle llel l Fr Framework mework for or Do Domain in-

A DOMA MAIN S SPECIF ECIFIC A IC APPR PROACH CH TO H HETER EROGENEO ENEOUS US PARALLE

UNU-WIDER Conference - Think Development Helsinki 13-15 September 2018 Paralle lel Ses ession

5 th Annual ARD Faculty Meeting Thursday, August 16, 2018 - Introdu ductio ions - Connections

SPECIAL OLYMPICS Volunteer Management Series VOLUNTEER MANAGEMENT SERIES F-1 Int roduc t ory

I nt roduct ion t o Lab 2 I nt roduct ion t o Lab 2 I nt roduct ion t o Lab 2 I nt roduct ion t

Jobe Ofetotse URP (UB), MCRP (UCT), SP (USB) oftjob@gmail.com 28 OCTOBER 2017 STRUCTURE UCTURE

Raising the Arithmetic Intensity of Krylov solvers 1 Applied Mathematics, University of Antwerp,

Titanium: A High-Performance Java Dialect Jason Ryder Matt Beaumont-Gay Aravind Bappanadu

Presburger Arithmetic in Memory Access Optimization for Data-Parallel Languages Marek Ko sta

Principle of Parallel Algorithm Design Alexandre David B2-206 Today Preliminaries (3.1).

High-Performance Execution of Multithreaded Workloads on CMPs M. Aater Suleman Advisor: Yale

Parallel Recursive Programs Abhishek Somani, Debdeep Mukhopadhyay Mentor Graphics, IIT Kharagpur

Computer Graphics (CS 543) Lecture 5 (part 2): Projection (Part 2): Derivation Prof Emmanuel Agu

CS488 Implementation of projections Luc R ENAMBOT 1 3D Graphics Convert a set of polygons in

Sambuz

Useful Links

Newsletter

Mail Us