Explicit vs. Implicit Parallel Programming Language, Directive, - PowerPoint PPT Presentation

Explicit vs. Implicit Parallel Programming Language, Directive, Library  Expose, Express, Exploit parallelism, synchronization, locality  instruction-level parallelism (warm-up)  superscalar control unit  exposed in instruction reorder unit  expressed using register renaming  exploited in multiple instruction issue/execute/retire  VLIW control unit  exposed by compiler (unrolling, scheduling)  expressed in VLIW instructions  exploited by parallel operation issue  locality in register file  synchronization managed by reorder unit or by stalling for( i = 0; i < n; ++i ) a[i] = b[i+1] + c[i+2];

Explicit vs. Implicit Parallel Programming Language, Directive, Library  Expose, Express, Exploit parallelism; synchronization, locality  vector parallelism (warm-up 2)  vector language extensions  exposed by application programmer  expressed in language extensions; remember Q8 functions?  exploited by parallel/pipelined functional units a(1;n) = b(2;n) + c(3;n)  vectorizing compilers  exposed by application programmer (and compiler?)  expressed in vectorizable loops  exploited by parallel/pipelined functional units  locality in vector register file, if available  synchronization managed by hardware or compiler do i = 1,n ; a(i) = b(i+1) + c(i+2) ; enddo

Scalable Parallelism – Node Level  MPI exposed in SPMD model static parallelism   can decompose based on MPI rank expressed in single program  (redundant execution) send/receive exposes locality  exploited one MPI rank per core sync implicit with data transfer    CAF (PGAS) exposed in SPMD model static parallelism   can decompose, less general expressed using single program  (redundant execution) get/put exposes locality  one image per core sync, separate from data transfer    HPF exposed in SPDD model static parallelism (data parallel only)   expressed using single program load/store, locality hidden   (implicitly executed redundantly) synchronization mostly implicit  one HPF processor per core  managed by compiler

Shared Memory Parallelism – Socket/Core Level  Posix Threads exposed in application threads dynamic parallelism, SPMD or not   can compose expressed using pthread_create()  shared memory, coherent caches  exploited one thread per core  sync using spin wait, more calls   Cilk exposed in asynchronous procedures dynamic parallelism   can compose expressed using cilk_spawn  shared memory, coherent caches  pool of threads, work stealing  spin wait sync, or barriers   OpenMP expose in parallel loops, tasks static parallelism (mostly)   does support dynamic tasking express with directives  can compose, nested parallelism one OpenMP thread per core  shared memory, coherent caches  barriers, task wait, ordered regions 

Accelerator Parallelism – GPUs, etc.  no library equivalent  CUDA or OpenCL exposed in kernel procedures static parallelism, does not compose   expressed in CUDA kernels sync explicit within thread block   kernel domain, launch sync implicit between kernels  grid parallelism  exposed memory hierarchy  thread block parallelism host, device, sw cache, register accelerator asynchronous with host  PGI Accelerator Model exposed in nested parallel loops  static parallelism, data parallel only  expressed in nested parallel loops,  does not compose accelerator directives limited synchronization  exploited as above  locality managed by compiler 

Abstraction Levels  Library  Node Level  independent of compiler scalable, static parallelism   opaque to compiler emphasis on locality   Language  Socket/Core Level  allows optimization static+dynamic parallelism   requires compiler locality unaddressed  cache coherence   Directives  Accelerators  allows optimization  requires compiler regular parallelism   may preserve portability locality exposed   may allow specialization

Explicit vs. Implicit Parallel Programming Language, Directive, - PowerPoint PPT Presentation

Explicit vs. Implicit Parallel Programming Language, Directive, Library Expose, Express, Exploit parallelism, synchronization, locality instruction-level parallelism (warm-up) superscalar control unit exposed in instruction reorder

Implicit Guarantees and Risk Taking: Implicit Guarantees and Risk Taking: Implicit Guarantees and

MOBILE COMPUTING CSE 40814/60814 Fall 2015 System Structure explicit explicit input output 1

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Implicit Surfaces Implicit Surfaces An implicit surface is simply an iso-contour CIS 781 of a

Session 14: Explicit and Implicit Cooperation 1. Nash equilibrium versus collective action.

Predicting implicit and explicit questions Matthijs Westera COLT kick-off workshop Predicting

EXPLICIT INSTRUCTION EXPLICIT INSTRUCTION Michael L. Kamil Michael L. Kamil Stanford University

The explicit teaching of a The explicit teaching of a The explicit teaching of a laboratory

Cluster Basics Hana Sevcikova University of Washington DataCamp Parallel Programming in R

Implicit Bias: Transcript Inclusive Teaching Series: Implicit Bias Welcome to the third module of

Implicit Extremes and Implicit MaxStable Laws Stilian Stoev ( sstoev@umich.edu ) University of

Implicit Surfaces CPSC 599.86 / 601.86 Sonny Chan University of Calgary (some board work happened

PARALLEL Joachim Nitschke PROGRAMMING Project Seminar Parallel Programming, Summer

Vectorisation James Briggs 1 COSMOS DiRAC April 28, 2015 Overview Implicit Vectorisation

Implicit Graphs Implicit Graph: Only a subset, possibly only one, of the vertices is

Myths and Realities: The Performance Impact of Garbage Collection Presented by: Tapasya Patki

Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems Ming Yang,

RDMAP and DDP Overview Renato Recio 11/22/2002 1 Introduction I Direct Data Placement A

Never-Ending Learning ICML 2019 Tutorial Tom Mitchell Partha Talukdar Carnegie Mellon

High-Level Language VM Outline Introduction Virtualizing conventional ISA Vs. HLL VM

Introduction to GPU Computing Jeff Larkin Cray Supercomputing Center of Excellence

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 18 November 2016 Lecture 7

Deep learning 13.1. Attention for Memory and Sequence Translation Fran cois Fleuret