Shared Memory Parallel Programming Abhishek Somani, Debdeep - PowerPoint PPT Presentation

Shared Memory Parallel Programming Abhishek Somani, Debdeep Mukhopadhyay Mentor Graphics, IIT Kharagpur August 5, 2016 Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 1 / 49

Overview Introduction 1 Programming with pthreads 2 Programming with OpenMP 3 Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 2 / 49

Outline Introduction 1 Programming with pthreads 2 Programming with OpenMP 3 Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 3 / 49

Programming Model CREW (Concurrent Read Exclusive Write) PRAM (Parallel Random Access Machine) Shared Memory Address Space Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 4 / 49

Requirements for Shared Address Programming Concurrency : Constructs to allow executing parallel streams of instructions Synchronization : Constructs to ensure program correctness Mutual exclusion for shared variables Barriers Software Portability : Across architectural platforms and number of processors Scheduling and Load balance : Efficiency Ease of programming : OpenMP versus pthreads Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 5 / 49

Fork-Join Mechanism Figure : Courtesy of Victor Eijkhout Threads are dynamic Master thread is always active Other threads created by thread spawning Threads share data Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 6 / 49

process and thread process thread separate address space shared address space heavyweight; context lightweight; hyperthreading switching is expensive support in modern hardware can consist of multiple belongs to a process threads all threads of a process are independent of other interdependent processes requires careful programming not very different from serial for correctness and efficiency programming Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 7 / 49

POSIX threads or pthreads // Necessary header #include "pthread.h" // Function to be called by each thread void * thread_function(void * arg); // Start Thread int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*thread_function) (void *), void *arg); // Stop Thread int pthread_join(pthread_t thread, void **retval); Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 9 / 49

pthread example 1 #include <stdlib.h> #include <stdio.h> #include "pthread.h" int sum=0; //Global variable touched by all threads //Function to be called by each thread void * adder(void *) { sum = sum+1; return NULL; } int main() { const int numThreads=24; int i; pthread_t threads[numThreads]; for (i=0; i<numThreads; i++) //Start threads if (pthread_create(threads+i, NULL, adder, NULL) != 0) return i+1; for (i=0; i<numThreads; i++) //Stop threads if (pthread_join(threads[i], NULL) != 0) return numThreads+i+1; printf("Sum computed: %d\n",sum); return 0; Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 10 / 49

pthread example 1 while sleeping //Function to be called by each thread void * adder(void *) { int t = sum; sleep(1); sum = t + 1; return NULL; } Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 11 / 49

pthread example 1 while sleeping ... //Function to be called by each thread void * adder(void *) { sleep(1); sum = sum + 1; return NULL; } Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 12 / 49

Critical Region Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 13 / 49

Lock and Key Mutual Exclusion Locks ⇐ ⇒ mutex locks //The Lock int pthread_mutex_lock (pthread_mutex_t *mutex_lock); //The Key int pthread_mutex_unlock (pthread_mutex_t *mutex_lock); //Initialization of Lock int pthread_mutex_init (pthread_mutex_t *mutex_lock, const pthread_mutexattr_t *lock_attr); Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 14 / 49

pthread example 1 with locks #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include "pthread.h" int sum=0; //Global variable touched by all threads pthread_mutex_t lock; //Mutex lock //Function to be called by each thread void * adder(void *) { pthread_mutex_lock(&lock); int t = sum; sleep(1); sum = t + 1; pthread_mutex_unlock(&lock); return NULL; } int main() { const int numThreads=24; int i; pthread_mutex_init(&lock, NULL); pthread_t threads[numThreads]; for (i=0; i<numThreads; i++) //Start threads if (pthread_create(threads+i, NULL, adder, NULL) != 0) return i+1; for (i=0; i<numThreads; i++) //Stop threads if (pthread_join(threads[i], NULL) != 0) return numThreads+i+1; printf("Sum computed: %d\n",sum); return 0; } Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 15 / 49

Producer-Consumer work queues pthread_mutex_t task_queue_lock; // Initialized in main int task_available; //Initialized to 0 in main producer consumer while (!done()) { while (!done()) { inserted = 0; extracted = 0; create_task(&my_task); while (extracted == 0) { while (inserted == 0) { pthread_mutex_lock(&task_queue_lock); pthread_mutex_lock(&task_queue_lock); if (task_available == 1) { if (task_available == 0) { extract_from_queue(&my_task); insert_into_queue(my_task); task_available = 0; task_available = 1; extracted = 1; inserted = 1; } } pthread_mutex_unlock(&task_queue_lock); pthread_mutex_unlock(&task_queue_lock); } } process_task(my_task); } } Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 16 / 49

Mutex Efficiency pthread_mutex_trylock Faster than pthread_mutex_lock Allows thread to do other work if already locked Condition Variables Allows a thread to block itself until a pre-specified condition is satisfied Thread performing condition wait does not use any CPU cycles Read-Write Locks More frequent reads than writes on a data-structure Multiple simultaneous reads can be allowed but only one write Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 17 / 49

Types of mutexes //Initialization of Mutex Attribute int pthread_mutexattr_init (pthread_mutexattr_t *attr); //Set type of Mutex int pthread_mutexattr_settype_np (pthread_mutexattr_t *attr, int type); PTHREAD_MUTEX_NORMAL_NP : default, deadlocks on trying a second lock PTHREAD_MUTEX_RECURSIVE_NP : allows locking multiple times PTHREAD_MUTEX_ERRORCHECK_NP : reports an error on trying a second lock Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 18 / 49

Barriers Can be implemented using a counter, mutex or condition variable Threads wait at the barrier till all threads have reached Last thread to reach barrier wakes up all the threads Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 19 / 49

Famous words A good way to stay flexible is to write less code – Pragmatic Programmer Simplicity is prerequisite for reliability – Dijkstra Any fool can write code that a computer can understand. Good programmers write code that humans can understand – Martin Fowler Programming can be fun, so can be cryptography; however they should not be combined – Kreitzberg and Shneiderman KISS - Keep It Simple, Stupid – Anonymous Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 20 / 49

OpenMP Example 1 #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <omp.h> int sum=0; //Global variable touched by all threads //Function to be called by each thread void adder() { #pragma omp critical { int t = sum; sleep(1); sum = t + 1; } return; } int main() { const int numThreads=24; int i; omp_set_num_threads(numThreads); #pragma omp parallel for shared(sum) for(i = 0; i < numThreads; ++i) adder(); printf("Sum computed: %d\n",sum); return 0; } Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 22 / 49

OpenMP Programming in C/C++ Based on #pragma compiler directive Code added by compiler, NOT preprocessor Directive name followed by clauses #pragma omp directive [clause list] #pragma omp parallel [clause list] Serial execution till parallel directive is encountered. Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 23 / 49

OpenMP clauses Conditional Parallelization bool doParallel = true; #pragma omp parallel if(doParallel) Degree of Concurrency #pragma omp parallel num_threads(8) Data Handling #pragma omp parallel default(none) private(x) shared(y) #pragma omp parallel private(x) lastprivate(y) #pragma omp parallel default(shared) firstprivate(x) Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 24 / 49

Shared Memory Parallel Programming Abhishek Somani, Debdeep - PowerPoint PPT Presentation

Shared Memory Parallel Programming Abhishek Somani, Debdeep Mukhopadhyay Mentor Graphics, IIT Kharagpur August 5, 2016 Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 1 / 49 Overview Introduction 1 Programming with

Introduction to Parallel Programming using OpenMP Shared Memory Parallel Programming Part I

A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1

Case Studies in Asynchronous, Message-Driven Shared Memory Programming Pritish Jetley Parallel

28. Parallel Programming II 28.1 Shared Memory, Concurrency Shared Memory, Concurrency,

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

Parallel Programming and Heterogeneous Computing B2 - Shared-Memory: Programming Models Max

MulticoreBSP for C a high-performance library for shared-memory parallel programming Albert-Jan

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Programming and Heterogeneous Computing Shared-Memory: Concurrency & Synchronization

Interoperability of Shared Memory Parallel Programming Models with Charm++ Jmin Choi

Parallel Programming and Heterogeneous Computing Shared-Memory: Concurrency Max Plauth, Sven

OpenMP Troubleshooting ! One of the biggest drawbacks of shared-memory parallel programming is

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Shared Memory Programming More about parallel loops LASTPRIVATE clause Sometimes need the

Lecture 5: Parallel machines and models; shared memory programming David Bindel 8 Feb 2010

Parallel Programming and Heterogeneous Computing Shared-Memory Hardware Max Plauth, Sven Khler,

Static Typing Slides available from github at: https://github.com/bhurt/presentations/blob/master

Information Dynamics Samson Abramsky Department of Computer Science, Oxford University Samson

Lambda Calculus with Types Henk Barendregt ICIS Radboud University Nijmegen The Netherlands New

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of

A Fully Parallel DNN Implementation and its Application to Automatic Modulation Classification

Sine/Cosine using Sine/Cosine using CORDIC Algorithm CORDIC Algorithm Prof. Kris Gaj Gaj

Re-indexing the DFT (n and k) We can investigate the various implementations of the DFT by

Lecture 13: Block Diagrams and the Inverse Z Transform Mark Hasegawa-Johnson ECE 401: Signal and

Shared Memory Parallel Programming Abhishek Somani, Debdeep - PowerPoint PPT Presentation

Shared Memory Parallel Programming Abhishek Somani, Debdeep Mukhopadhyay Mentor Graphics, IIT Kharagpur August 5, 2016 Abhishek, Debdeep (IIT Kgp) Parallel Programming August 5, 2016 1 / 49 Overview Introduction 1 Programming with

Introduction to Parallel Programming using OpenMP Shared Memory Parallel Programming Part I

A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1

Case Studies in Asynchronous, Message-Driven Shared Memory Programming Pritish Jetley Parallel

28. Parallel Programming II 28.1 Shared Memory, Concurrency Shared Memory, Concurrency,

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

Parallel Programming and Heterogeneous Computing B2 - Shared-Memory: Programming Models Max

MulticoreBSP for C a high-performance library for shared-memory parallel programming Albert-Jan

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Programming and Heterogeneous Computing Shared-Memory: Concurrency &amp; Synchronization

Interoperability of Shared Memory Parallel Programming Models with Charm++ Jmin Choi

Parallel Programming and Heterogeneous Computing Shared-Memory: Concurrency Max Plauth, Sven

OpenMP Troubleshooting ! One of the biggest drawbacks of shared-memory parallel programming is

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Shared Memory Programming More about parallel loops LASTPRIVATE clause Sometimes need the

Lecture 5: Parallel machines and models; shared memory programming David Bindel 8 Feb 2010

Parallel Programming and Heterogeneous Computing Shared-Memory Hardware Max Plauth, Sven Khler,

Static Typing Slides available from github at: https://github.com/bhurt/presentations/blob/master

Information Dynamics Samson Abramsky Department of Computer Science, Oxford University Samson

Lambda Calculus with Types Henk Barendregt ICIS Radboud University Nijmegen The Netherlands New

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of

A Fully Parallel DNN Implementation and its Application to Automatic Modulation Classification

Sine/Cosine using Sine/Cosine using CORDIC Algorithm CORDIC Algorithm Prof. Kris Gaj Gaj

Re-indexing the DFT (n and k) We can investigate the various implementations of the DFT by

Lecture 13: Block Diagrams and the Inverse Z Transform Mark Hasegawa-Johnson ECE 401: Signal and

Parallel Programming and Heterogeneous Computing Shared-Memory: Concurrency & Synchronization