Multi-Core Computing Instructor: Hamid Sarbazi-Azad Department of - - PDF document

multi core computing
SMART_READER_LITE
LIVE PREVIEW

Multi-Core Computing Instructor: Hamid Sarbazi-Azad Department of - - PDF document

11/2/2014 Multi-Core Computing Instructor: Hamid Sarbazi-Azad Department of Computer Engineering Sharif University of Technology Fall 2014 Programming Models P-thread A POSIX standard for threads General purpose multi-core


slide-1
SLIDE 1

11/2/2014 1

Multi-Core Computing

Instructor:

Hamid Sarbazi-Azad

Department of Computer Engineering Sharif University of Technology Fall 2014

Programming Models

P-thread

A POSIX standard for threads General purpose multi-core processors Implementations are available on many Unix-like

POSIX-conformant operating systems such as FreeBSD, NetBSD, OpenBSD, GNU/Linux, Mac OS X and Solaris

DR-DOS and Microsoft Windows implementations also

exist

A set of C programming language types, functions and

constants

There are around 100 PThreads procedures, all

prefixed "pthread_“

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

2

slide-2
SLIDE 2

11/2/2014 2

OpenMP

What is OpenMP?

Open specification for Multi-Processing “Standard” API for defining multi-threaded

shared-memory programs

  • penmp.org – Talks, examples, forums, etc.

High-level API

Preprocessor (compiler) directives ( ~ 80% ) Library Calls ( ~ 19% ) Environment Variables ( ~ 1% )

3

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

A Programmer’s View of OpenMP

OpenMP is a portable, threaded, shared-memory

programming specification with “light” syntax

OpenMP will:

Allow a programmer to separate a program into serial

regions and parallel regions

Provide synchronization constructs

OpenMP will not:

Parallelize automatically Guarantee speedup Provide freedom from data races 4

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

slide-3
SLIDE 3

11/2/2014 3

Motivation

Thread libraries are hard to use

PThreads/Solaris threads have many library calls for

initialization, synchronization, thread creation, condition variables, etc.

Programmer must code with multiple threads in mind

Synchronization between threads introduces a

new dimension of program correctness

Wouldn’t it be nice to write serial programs and

somehow parallelize them “automatically”?

OpenMP can parallelize many serial programs with

relatively few annotations that specify parallelism and independence

It is not automatic: you can still make errors in your

annotations

5

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

Motivation (Cont’d)

Good performance and scalability

If you do it right ....

De-facto standard An OpenMP program is portable

Supported by a large number of compilers

Requires little programming effort Allows the program to be parallelized

incrementally

Maps naturally onto a multicore architecture:

Lightweight Each OpenMP thread in the program can be

executed by a hardware thread

6

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

slide-4
SLIDE 4

11/2/2014 4

Fork/Join Parallelism

Initially only master thread is active Master thread executes sequential code Fork: Master thread creates or awakens

additional threads to execute parallel code

Join: At end of parallel code created

threads die or are suspended

7

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

The OpenMP Execution Model

8

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

slide-5
SLIDE 5

11/2/2014 5

Programming Models

OpenMP vs. PThread

OpenMP is generally best suited to data parallel

applications with evident loop level parallelism

It may be easier to debug and performance tune

than direct application of PThreads

OpenMP does use PThreads when running on

linux systems

OpenMP’s greatest attributes are its portability

and the simplicity it brings to parallel programming

When handling simple loops, OpenMP and

PThreads have similar speed ups but OpenMP is times easier to program

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

9

OpenMP vs. PThread

A critical question when programming with

PThread:

How many threads will be available at run-time?

  • There are ways of extracting this information from the

system at run time and dynamically creating the appropriate number of threads

This process can be messy and, with Hyper-Threading

Technology, error-prone.

OpenMP figures out the correct number of

threads and automatically distributes the work

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

10

slide-6
SLIDE 6

11/2/2014 6

OpenMP vs. PThread

Code containing OpenMP pragmas compiles as

single-threaded code if the compiler does not support OpenMP, and as multithreaded code if the compiler does support OpenMP

Open MP is not general enough to be used for all

kinds of parallelism. It is best equipped with pragmas for loop-level parallelism often found in compute intensive workloads

PThread is universal and can be used for any type of

parallelism

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

11

Programming Models-OepnMP vs. PThread

Not all loops can be threaded OpenMP does not analyze code correctness, and

so it cannot detect this dependency

OpenMP requires that developers have made

their code thread-safe

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

12

slide-7
SLIDE 7

11/2/2014 7

IBM Cell

IBM Cell/BE SDK

For IBM Cell/BE heterogeneous multi-core

system architecture: 1 PPE and 8 SPEs

C/C++ Language Extensions GNU based C/C++ compiler targeting SPE/PPE Assembly Language Specification IBM XLC C/C++ auto-Vectorization (auto-

SIMD) for SPE and PPE Multimedia Extension cod

Full System Simulator

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

13

CUDA

Compute Unified Device Architecture Parallel computing platform and programming

model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce

CUDA-accelerated libraries Compiler directives Extensions to industry-standard programming

languages, including C, C++ and Fortran

Multicore Computing, SHARIF U. OF TECHNOLOGY, 2014.

14

slide-8
SLIDE 8

11/2/2014 8

QUESTIONS?

15

Multicore Computing, SHARIF

  • U. OF TECHNOLOGY, 2014.