Basic Communication Operations Possible variants # of nodes - PowerPoint PPT Presentation

Basic Communication Operations • Possible variants – # of nodes involved • Point-to-point vs collective operation – routing scheme • Store-and-Forward (S&F), Cut-Through (CT) and Packet Routing • Usually point-to-point implemented in hardware, collective in software • Many of the collective have a dual operation – the dual can be performed reversing the direction and sequence of messages in the original operation

Point-to-point • Store-and-forward => t comm ≈ t s + lmt w – ring • l = ⎣ p /2 ⎦ • t comm = t s + ⎣ p /2 ⎦ mt w – mesh • l = 2 ⎣√ p /2 ⎦ • t comm = t s + 2 ⎣√ p /2 ⎦ mt w – hypercube • l = log p • t comm = t s + mt w log p • Cut-through (or Packet)=> t comm = t s + lt h + mt w – Small messages: CT ≈ S&F ≈ t s + lt h – Large messages: CT ≈ t s + mt w (no dependence from l )

One-to-all broadcast • A.k.a single-node broadcast – message of size m on source processor – at the end of the operation message is replicated on all other procs • Dual operation: single-node accumulation (a.k.a reduce operation) – initially every processor has message of size m – at the end, combination of all messages is on single destination proc – combination is through an associative operation (sum, product, max, min)

Broadcast over mesh: example • Multiplication of 4 x 4 matrix with a 4 x 1 vector

Broadcast on ring (S&F) 3 4 7 6 5 4 2 4 0 1 2 3 1 2 3 • Number of steps: ⎡ p/ 2 ⎤ • Latency of communication step: t s + mt w • Total duration: T one_to_all = ( t s + mt w ) ⎡ p/ 2 ⎤

Broadcast on mesh (S&F) • Row/column broadcast 4 4 4 4 time: – ( t s + mt w ) ⎡√ p/ 2 ⎤ 4 4 4 4 • Total duration: – T one_to_all = 2( t s + mt w ) ⎡√ p/ 2 ⎤ 3 3 3 3 1 2 • 3D mesh – T one_to_all = 3( t s + mt w ) 2 ⎡ p 1/3 / 2 ⎤

Broadcast on hypercube (S&F) 3 2 3 3 2 1 3 • Total duration: T one_to_all = ( t s + mt w ) log p

Broadcast on hypercube: algorithm Procedure ONE_TO_ALL_BC( d, my_id, X ) begin mask := 2 d - 1 /* Set all bits of mask to 1 */ for i := d - 1 downto 0 do /* Outer loop */ begin mask := mask XOR 2 i /* Set bit i of mask to 0 */ if ( my_id AND mask ) = 0 then Only nodes with last i bits /* the lower i bits of my_id are 0 */ equal to 0 participate in if ( my_id AND 2 i ) = 0 then communication in i th iteration begin msg_destination := my_id XOR 2 i send X to msg_destination end If my i th bit is 0, I am a sender else otherwise I am a receiver begin msg_source := my_id XOR 2 i receive X from msg_source end endfor end ONE_TO_ALL_BC

Dual of Broadcast: single-node Accumulation Procedure ONE_TO_ALL_BC( d, my_id, X ) Procedure SINGLE_NODE_ACC( d, my_id,m, X, sum ) begin begin mask := 2 d - 1 /* Set all bits of mask to 1 */ for j := 0 to m - 1 do sum [ j ] := X [ j ] for i := d - 1 downto 0 do /* Outer loop */ mask := 0 begin for i := 0 to d - 1 do mask := mask XOR 2 i /* Set bit i of mask to 0 */ begin /* select node whose lower i bits are 0 */ if ( my_id AND mask ) = 0 then if ( my_id AND mask ) = 0 then if ( my_id AND 2 i ) ≠ 0 then /* the lower i bits of my_id are 0 */ if ( my_id AND 2 i ) = 0 then begin begin msg_destination := my_id XOR 2 i msg_destination := my_id XOR 2 i send sum to msg_destination send X to msg_destination end end else else begin msg_source := my_id XOR 2 i begin msg_source := my_id XOR 2 i receive X from msg_source receive X from msg_source for j := 0 to m - 1 do sum [ j ] := sum [ j ] + X [ j ] end end mask := mask XOR 2 i endfor end ONE_TO_ALL_BC endfor end SINGLE_NODE_ACC

Broadcast on ring (CT) 3 3 2 1 2 3 3 • Latency of communication at step i : t s + mt w + t h p/ 2 i • Total duration: – T one_to_all = Σ i =1…log p ( t s + mt w + t h p/ 2 i ) = t s log p + mt w log p + t h ( p - 1)

Broadcast on mesh (CT) • Row/column broadcast time: 4 4 4 4 – ( t s + mt w )log √ p + t h ( √ p - 1) • Total duration: 3 3 3 3 – ( t s + mt w )log p + 2 t h ( √ p - 1) 4 4 4 4 2 2 1

Broadcast on binary tree (CT) • Hypercube algorithm – there are different number of switches traversed along different paths • Total duration: – T one_to_all = ( t s + mt w + t h (log p + 1))log p

All-to-All Broadcast • A.k.a multinode broadcast – message of size m on each processor – at the end of the operation messages are replicated on all procs • Dual operation: multinode accumulation (a.k.a personalized reduction operation) – each processor is the destination of a single-node accumulation – combination is through an associative operation (sum, product, max, min)

A2A Broadcast on Ring (S&F) (6) (5) (5) (4) (0) (7) (4) (3) (6) And so forth, until eventually ... (3) (6) (7) (2) (1) (5) (0) (7) (0) (1) (3) (3) (2) (1) (4) • Number of steps: p - 1 • Latency of each communication step: t s + mt w • Total duration: T all_to_all = ( t s + mt w ) ( p - 1)

A2A Broadcast on mesh (S&F) Phase 1 Phase 2 • Row broadcast time: ( t s + mt w ) ( √ p - 1) • Column broadcast time: ( t s + √ p mt w ) ( √ p - 1) • Total duration: T all_to_all = 2 t s ( √ p - 1) + mt w ( p - 1)

A2A Broadcast on hypercube (S&F) • Duration of step i : t s + mt w 2 i -1 • Total duration: • T all_to_all = Σ i =1…log p ( t s + mt w 2 i -1 ) = t s log p + mt w ( p - 1)

Basic Communication Operations Possible variants # of nodes - PowerPoint PPT Presentation

Basic Communication Operations Possible variants # of nodes involved Point-to-point vs collective operation routing scheme Store-and-Forward (S&F), Cut-Through (CT) and Packet Routing Usually point-to-point

SK Telecom 1 U U U U U U U- U - - communication - - - - - communication

Session 12 Assessing and Developing Communication SECTION 4: 1 Communication Communication

Basic Communication Operations (cont.) Alexandre David B2-206 Today Scatter and Gather

UHF Communication System UHF Communication System UHF Communication System UHF Communication

Introduction to Parallel Computing George Karypis Basic Communication Operations Outline

NORTHERN REGION OPERATIONS SIGNAL OPERATIONS FAIRFAX COUNTY Ling Li, P.E. Operations

Auxiliar xiliary Operations y Operations Auxiliar Auxiliary Operations Operations The Series:

Operations in C Have the data, what now? Bit-wise boolean operations Logical operations

Communication Saves Lives Communication in healthcare Communication is essential to healthcare

TACN - 2019 Tennessee Advanced Communication Network 1 Tennessee Advanced Communication Network

Leadership Using Effective Communication L eader eadership ship = = Using communication t

COMMUNICATION Task Force 1 The Communication Task Force will envision a global communication

Total Access Communication Total Access Communication Total Access Communication Total Access

Vermont Communication Support Project Communication Support In and Out of the Courtroom

Communication, Services, and Coordination Communication, Services, and Coordination Communication

Lab 2 Group Communication Desired group communication Multicast communication Andreas

On Hindman Sets Denis I. Saveliev 2008 June 2, Pisa Partially supported by grant 06-01-00608-a

Dealing the Interference By bad luck or pathological happenstance a particular line in the

The Complexity of Homomorphism Factorization Kevin M. Berg University of Colorado Boulder August

Vector Spaces Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering

Parallel Models Different ways to exploit parallelism Reusing this material This work is

Binary Edwards Curves Daniel J. Bernstein Tanja Lange University of Illinois at Chicago and

Lecture 33: Concurrency Moores law (Transistors per chip doubles every N years), where

Enriched Topologies and Topological Representation of Semi-Unital Quantales Ulrich H ohle