Introduction to Parallel Computing George Karypis Basic - - PowerPoint PPT Presentation

introduction to parallel computing
SMART_READER_LITE
LIVE PREVIEW

Introduction to Parallel Computing George Karypis Basic - - PowerPoint PPT Presentation

Introduction to Parallel Computing George Karypis Basic Communication Operations Outline Importance of Collective Communication Operations One-to-All Broadcast All-to-One Reduction All-to-All Broadcast & Reduction


slide-1
SLIDE 1

Introduction to Parallel Computing

George Karypis

Basic Communication Operations

slide-2
SLIDE 2

Outline

Importance of Collective Communication

Operations

One-to-All Broadcast All-to-One Reduction All-to-All Broadcast & Reduction All-Reduce & Prefix-Sum Scatter and Gather All-to-All Personalized

slide-3
SLIDE 3

Collective Communication Operations

They represent regular communication patterns that are

performed by parallel algorithms.

Collective: Involve groups of processors

Used extensively in most data-parallel algorithms. The parallel efficiency of these algorithms depends on

efficient implementation of these operations.

They are equally applicable to distributed and shared

address space architectures

Most parallel libraries provide functions to perform them They are extremely useful for “getting started” in parallel

processing!

slide-4
SLIDE 4

MPI Names

slide-5
SLIDE 5

One-to-All Broadcast & All-to-One Reduction

slide-6
SLIDE 6

Broadcast on a Ring Algorithm

slide-7
SLIDE 7

Reduction on a Ring Algorithm

slide-8
SLIDE 8

Broadcast on a Mesh

slide-9
SLIDE 9

Broadcast on a Hypercube

slide-10
SLIDE 10

Code for the Broadcast Source: Root

slide-11
SLIDE 11

Code for Broadcast Arbitrary Source

slide-12
SLIDE 12

All-to-All Broadcast & Reduction

slide-13
SLIDE 13

All-to-All Broadcast for Ring

slide-14
SLIDE 14

All-to-All Broadcast on a Mesh

slide-15
SLIDE 15

All-to-All Broadcast on a HCube

slide-16
SLIDE 16

All-Reduce & Prefix-Sum

slide-17
SLIDE 17

Scatter & Gather

slide-18
SLIDE 18

Scatter Operation on HCube

slide-19
SLIDE 19

All-to-All Personalized (Transpose)

slide-20
SLIDE 20

All-to-all Personalized on a Ring

slide-21
SLIDE 21

All-to-all Personalized on a Mesh

slide-22
SLIDE 22

All-to-all Personalized on a HCube

slide-23
SLIDE 23

All-to-all Personalized on a HCube Improved Algorithm

Perform log(p) point-to-point communication steps Processor i communicates with processor iXORj during the jth communication step.

slide-24
SLIDE 24

Complexities

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27