Introduction to Parallel Programming Kathy Traxler - - PowerPoint PPT Presentation

introduction to parallel programming
SMART_READER_LITE
LIVE PREVIEW

Introduction to Parallel Programming Kathy Traxler - - PowerPoint PPT Presentation

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services Introduction to Parallel Programming Kathy Traxler ktraxler@lsu.edu LONI High Performance Computing Workshop - Louisiana Tech


slide-1
SLIDE 1

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Introduction to Parallel Programming

Kathy Traxler ktraxler@lsu.edu

slide-2
SLIDE 2

http://www.loni.org

LONI High Performance Computing Workshop - Louisiana Tech University October 11 &12, 2007 High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services

Goals for this Workshop

  • To familiarize you with the LONI HPC staff
  • To familiarize you with HPC terminology
  • To give you a basis for the learning you must

do to write good parallel code

slide-3
SLIDE 3

http://www.loni.org

LONI High Performance Computing Workshop - Louisiana Tech University October 11 &12, 2007 High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services

Goals for this presentation

  • Introduce you to some basic terminology
  • Introduce you to basic parallel concepts
  • Make you a little more comfortable with the

technical presentations coming up

slide-4
SLIDE 4

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Outline

Introduce you to some basic terminology

  • Sequential Programming
  • Parallel Computing
  • Why Parallel Computing
  • Limits of Parallel Computing
  • Programming Parallel Computers
  • Why Parallel Computers
slide-5
SLIDE 5

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Outline (cont’d)

  • Limits of Parallel Computers
  • Taxonomy
  • Shared and Distributed Memory
  • Parallel Programming Paradigms
slide-6
SLIDE 6

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Sequential Programming

Traditionally, in Computer Science, software has been written for serial computation. A single cpu is available The problem is broken down into a series

  • f discrete instructions

Each instruction is executed one after another Only one instruction may execute at a time

slide-7
SLIDE 7

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

slide-8
SLIDE 8

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Outline

Sequential Programming Parallel Computing Why Parallel Computing Limits of Parallel Computing Programming Parallel Computers Why Parallel Computers Limits of Parallel Computers Taxonomy Shared and Distributed Memory

slide-9
SLIDE 9

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Parallel Programming

Defined:

Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination. From http://en.wikipedia.org/wiki/Parallel_computing A strategy for performing large, complex tasks faster. A large task can either be performed serially, one step following another, or can be decomposed into smaller tasks to be performed simultaneously, i.e., in parallel.

slide-10
SLIDE 10

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Planes, trains and automobiles?

In a manufacturing plant the components of the final product is a result of parallelism.

If the plane is the final result you define the tasks to build a plane Farm them out to different vendors and have them built When they all arrive at the plant they will be assembled from multiple tasks’ final product into the airplane This is what parallelism is regardless of the discipline.

Example:

slide-11
SLIDE 11

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

slide-12
SLIDE 12

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Outline

Sequential Programming Parallel Computing Why Parallel Computing Limits of Parallel Computing Programming Parallel Computers Why Parallel Computers Limits of Parallel Computers Taxonomy Shared and Distributed Memory

slide-13
SLIDE 13

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Why Parallel Computing

Many classes of problems that won’t finish executing in a reasonable amount of time

  • n a single CPU system

Simulation and modeling Problems dependent on computations/ manipulations of large amount of data Grand Challenge Problems

A grand challenge problem is a general category

  • f unsolved problems
slide-14
SLIDE 14

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Why Parallel Computing

Benefits:

Ability to achieve performance and work on problems impossible with traditional computers Exploit “off the shelf” processors, memory, disks and tape systems Ability to scale to problem Ability to quickly integrate new elements into systems Commonly much cheaper.

slide-15
SLIDE 15

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Sequential Programming Parallel Computing Why Parallel Computing Limits of Parallel Computing Programming Parallel Computers Why Parallel Computers Limits of Parallel Computers Taxonomy Shared and Distributed Memory Parallel Programming Paradigms

Outline

slide-16
SLIDE 16

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Limits of Parallel Computing

Theoretical Upper Limits Amdahl’s Law Practical Limits Load balancing Non-computational sections Other Considerations time to re-write code

slide-17
SLIDE 17

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Theoretical Limits

All parallel programs contain:

parallel sections (we hope!) serial sections (unfortunately)

Serial sections limit the parallel sections’ effectiveness Amdahl’s Law states this formally

slide-18
SLIDE 18

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Amdahl’s Law

Amdahl's law is a model for the relationship between the expected speedup of parallelized implementations of an algorithm relative to the serial algorithm. For example, if a parallelized implementation of an algorithm can run 12% of the algorithm's operations arbitrarily fast (while the remaining 88% of the

  • perations are not parallelizable), Amdahl's law states that the maximum

speedup of the parallelized version is 1 / (1 - 0.12) = 1.136 times faster than the non-parallelized implementation. More technically, the law is concerned with the speedup achievable from an improvement to a computation that affects a proportion P of that computation where the improvement has a speedup of S. (For example, if an improvement can speed up 30% of the computation, P will be 0.3; if the improvement makes the portion affected twice as fast, S will be 2). Amdahl's law states that the

  • verall speedup of applying the improvement will be:

1 / (1 - P) + (P/S)

slide-19
SLIDE 19

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Amdahl’s Law

50 100 150 200 250 50 100 150 200 250 fp = 1.000 fp = 0.999 fp = 0.990 fp = 0.900

S

Only a small amount of serial content in program can degrade the parallel performance.

slide-20
SLIDE 20

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Practical Limits: Amdahl vs Reality

Amdahl’s Law provides a theoretical upper limit on a parallel speedup assuming that there are no costs for communications. In reality, communications will result in a further degradation of performance.

The more processors you have the more degradation your computations will see. Load balancing (waiting) Scheduling (Shared processors) I/O

slide-21
SLIDE 21

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Other Considerations

Writing effective parallel applications is difficult Load balancing communications Serial time can dominate Is it worth your TIME to rewrite your application?

slide-22
SLIDE 22

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Outline

Sequential Programming Parallel Computing Why Parallel Computing Limits of Parallel Computing Programming Parallel Computers Why Parallel Computers Limits of Parallel Computers Taxonomy Shared and Distributed Memory Parallel Programming Paradigms

slide-23
SLIDE 23

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

SISD

Flynn’s Taxonomy

Single Instruction, Single Data

SIMD

Single Instruction, Multiple Data Multiple Instruction, Single Data Multiple Instr., Multiple Data

MISD MIMD

slide-24
SLIDE 24

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Taxonomy

The most common way heard of classifying HP computers here at LSU is either: Shared Memory Distributed Memory

slide-25
SLIDE 25

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Outline

Sequential Programming Parallel Computing Why Parallel Computing Limits of Parallel Computing Programming Parallel Computers Why Parallel Computers Limits of Parallel Computers Taxonomy Shared and Distributed Memory Parallel Programming Paradigms

slide-26
SLIDE 26

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

B U S

Shared and Distributed Memory

Shared memory: single address

  • space. All processors have access

to a pool of shared memory. (examples: Cray SV1, IBM Power4 node) Methods of memory access :

  • Bus
  • Crossbar

Distributed memory: each processor has its own local memory. Must do message passing to exchange data between processors. (examples: Clusters, Cray T3E) Methods of memory access :

  • various topological interconnects

Network

P M P P P P P M M M M M Memory P P P P P P

Bus

slide-27
SLIDE 27

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Outline

Sequential Programming Parallel Computing Why Parallel Computing Limits of Parallel Computing Programming Parallel Computers Why Parallel Computers Limits of Parallel Computers Taxonomy Shared and Distributed Memory Parallel Programming Paradigms

slide-28
SLIDE 28

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Programming Parallel Computers

Programming single-processor systems is (relatively) easy due to: single thread of execution single address space Programming shared memory systems can benefit from the single address space Programming distributed memory systems is the most difficult due to multiple address spaces and need to access remote data

slide-29
SLIDE 29

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Parallel Programming Paradigms

Many methods of programming parallel computers Message Passing: the user makes calls to libraries to explicitly share data between processors Data Parallel: data partitioning determines parallelism Remote Memory Operation: a set of processes in which a process can access the memory of another process without its participation

slide-30
SLIDE 30

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Parallel Programming Paradigms

Many methods of programming parallel computers Threads: a single process having multiple (concurrent) execution paths Combined Models: composed of two or more of the above

slide-31
SLIDE 31

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Parallel Programming Paradigms

The two most common paradigms are: Message passing Data parallel

slide-32
SLIDE 32

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Message Passing Paradigm

Message Passing Model A set of processes using only local memory Processes communicate by sending and receiving messages Data transfer requires cooperative operations to be performed by each process ( a send

  • peration must have a matching receive)

The programmer must link and make calls to libraries which manage the exchange between processors MPI is the main instance of this paradigm used on our machines

slide-33
SLIDE 33

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Data Parallel Paradigm

Most of the parallel work focuses on performing operations on a data set. The data set is usually a structure like an array or cube A set of tasks work collectively on the same data structure however, each task works on a different partition of the same data structure Tasks perform the same operation on their partition of data

slide-34
SLIDE 34

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Data Parallel Paradigm

slide-35
SLIDE 35

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

Shared Memory vs. Distributed Memory

Here at LSU we use MPI for both types of memory systems The most natural way to program any machine is to use tools & languages which express the algorithm explicitly for the architecture!

slide-36
SLIDE 36

http://www.loni.org

High Performance Computing @ Louisiana State University - http://www.hpc.lsu.edu/ Information Technology Services LONI High Performance Computing Workshop - Louisiana Tech University October 11&12, 2007

References

The entire user support and training group at TACC http://www.llnl.gov/computing/tutorials/ parallel_comp http://www.mhpcc.edu/training/workshop/ parallel_intro Wilkinson Barry, Allen Michael. Parallel Programming, Techniques and Applications Using Networked Workstations and Parallel Computers