. . . . . . 0 1 p-1 0 1 p-1 All-to-one Reduction Figure 4.1 - - PDF document

0 1 p 1 0 1 p 1 all to one reduction figure 4 1 one to
SMART_READER_LITE
LIVE PREVIEW

. . . . . . 0 1 p-1 0 1 p-1 All-to-one Reduction Figure 4.1 - - PDF document

One-to-all Broadcast M M M M . . . . . . 0 1 p-1 0 1 p-1 All-to-one Reduction Figure 4.1 One-to-all broadcast and all-to-one reduction. 3 3 2 7 6 5 4 1 0 1 2 3 2 3 3 Figure 4.2 One-to-all broadcast on an eight-node


slide-1
SLIDE 1

p-1 1 1 p-1

All-to-one Reduction

. . . . . .

M M M M One-to-all Broadcast

Figure 4.1 One-to-all broadcast and all-to-one reduction.

slide-2
SLIDE 2

2 3 3 2

1 2 3 4 5 6 7

1 3 3

Figure 4.2 One-to-all broadcast on an eight-node ring. Node 0 is the source of the broadcast. Each message transfer step is shown by a numbered, dotted arrow from the source of the message to its destination. The number on an arrow indicates the time step during which the message is transferred.

slide-3
SLIDE 3

1

1 2 3 4 5 6 7

2 2 1 1 3 1

Figure 4.3 Reduction on an eight-node ring with node 0 as the destination of the reduction.

slide-4
SLIDE 4

P

4 8 12

P P P P

4 8 12

P P P

1 5 9 13

P P P P

2 6 10 14

P P P P

3 7

Matrix

11 15

P P P P

All-to-one reduction

P P P P

1 2 3

Output

One-to-all broadcast

Vector Input Vector Figure 4.4 One-to-all broadcast and all-to-one reduction in the multiplication of a 4×4 matrix with a 4 × 1 vector.

slide-5
SLIDE 5

3 10 15

4 4 4 4 4 4 4 4 3 3 3 3 2 2 1

1 2 4 5 6 8 9 11 14 7 13 12

Figure 4.5 One-to-all broadcast on a 16-node mesh.

slide-6
SLIDE 6

1 3 2

(001)

4 5 7 6

3 3 3 1 2 2 (000) (011) (100) (101) (111) 3 (010) (110)

Figure 4.6 One-to-all broadcast on a three-dimensional hypercube. The binary representations of node labels are shown in parentheses.

slide-7
SLIDE 7

3 1 2 3 4 6 7 5 1 2 2 3 3 3

Figure 4.7 One-to-all broadcast on an eight-node tree.

slide-8
SLIDE 8

1

p

M -1 M 0 M 0 M 1 M 0 M 1 M 0 M 1

p

M -1

p

M -1

p

M M

  • 1

All-to-all reduction

. . . . . . . . .

p-1 1 1 p-1

. . . . . .

All-to-all broadcast

Figure 4.8 All-to-all broadcast and all-to-all reduction.

slide-9
SLIDE 9

. . . . . .

7 (4) 7 (3) 7 (2) (3,2,1,0,7,6,5) (1,0,7,6,5,4,3) (2,1,0,7,6,5,4) (0,7,6,5,4,3,2) (5) (4) (3) (2) (1) (6) (7) (0) (7,6) (6,5) (5,4) (4,3) (3,2) (2,1) (1,0)

7th communication step

(0,7) 7 (0) 7 (7) 7 (6)

1 6 7 2 3 4 5

2 (7) 2 (0) 2 (1) 2 (4) 2 (3) 2 (5)

1 6 7 2 3 4 5

1 (0) 1 (1) 1 (2) 1 (6) 1 (5) 1 (4) (7,6,5,4,3,2,1) (6,5,4,3,2,1,0) (5,4,3,2,1,0,7) (4,3,2,1,0,7,6)

1 6 7 2 3 4 5

7 (1) 7 (5) 2 (2) 2 (6) 1 (7) 1 (3)

1st communication step 2nd communication step Figure 4.9 All-to-all broadcast on an eight-node ring. The label of each arrow shows the time step and, within parentheses, the label of the node that owned the current message being transferred before the beginning of the broadcast. The number(s) in parentheses next to each node are the labels of nodes from which data has been received prior to the current communication step. Only the first, second, and last communication steps are shown.

slide-10
SLIDE 10

7 1 2 5 3 4 8 6

(3,4,5) (3,4,5) (3,4,5)

1 2 5 3 4 8 7 6

(6) (8) (3) (4) (5) (0) (1) (2) (7)

(a) Initial data distribution

(0,1,2)

(b) Data distribution after rowwise broadcast

(6,7,8) (6,7,8) (6,7,8) (0,1,2) (0,1,2)

Figure 4.10 All-to-all broadcast on a 3 × 3 mesh. The groups of nodes communicating with each

  • ther in each phase are enclosed by dotted boundaries. By the end of the second phase, all nodes

get (0,1,2,3,4,5,6,7) (that is, a message from each node).

slide-11
SLIDE 11

(0,...,7) (0,...,7) (0,...,7) (0,1, (0,...,7)

(b) Distribution before the second step

(0,...,7) 6,7) (4,5, 6,7) (4,5, 6,7) (4,5, 6,7) (4,5, 2,3) (0,1, 2,3) (0,1, 2,3) (0,1, 2,3)

1 3 2 4 5 7 6

(c) Distribution before the third step

1 3 2 4 5 7 6

(d) Final distribution of messages

(0,...,7) (0,...,7) (0,...,7)

1 3 2 4 5 7 6

(0) (2) (4) (1) (5) (3) (7) (6)

(a) Initial distribution of messages

1 3 2 4 5 7 6

(0,1) (2,3) (2,3) (0,1) (6,7) (6,7) (4,5) (4,5)

Figure 4.11 All-to-all broadcast on an eight-node hypercube.

slide-12
SLIDE 12

messages

1 6 7 2 3 4 5

Contention for a single channel by multiple Figure 4.12 Contention for a channel when the communication step of Figure 4.11(c) for the hy- percube is mapped onto a ring.

slide-13
SLIDE 13

(c) Distribution of sums before third step

1 3 2 4 5 7 6 1 3 2 4 5 7 6

(3) (7) (6) (4) [4] (6+7) (6) [4] (4+5) (2) [2] (2+3) [2] (4+5) (0+1) [0+1] (0+1) [0] (0) [0] (2+3) (5) (1) [6] [7] [3] [5] [1] [6] [2+3] [4+5] [6+7]

1 3 2 4 5 7 6 1 3 2 4 5 7 6

[0+ .. +7] [0+ .. +6] [0+1+2] (0+1+ 2+3) [0+1+2] (4+5) [0+1+2+3+4] [0+ .. +5] [4] (4+5) 2+3) (0+1+ [0] 2+3) (0+1+ [0] [0+1] [0+1+2+3] [0+1+2+3] (4+5+6+7) [4+5+6+7] (4+5+6) [4+5+6] [4+5] [0+1] (0+1+2+3)

(a) Initial distribution of values (d) Final distribution of prefix sums (b) Distribution of sums before second step

Figure 4.13 Computing prefix sums on an eight-node hypercube. At each node, square brackets show the local prefix sum accumulated in the result buffer and parentheses enclose the contents of the outgoing message buffer for the next step.

slide-14
SLIDE 14

M -1 M 0 M 1

. . .

M 1

p

M -1 M 0

p

Scatter

p-1 1 1 p-1

. . . . . .

Gather

Figure 4.14 Scatter and gather operations.

slide-15
SLIDE 15

2,3) (0,1, (4,5,

1

6,7)

3

(b) Distribution before the second step

2 4 5 7 6 1 3 2 4 5 7 6

(0,1,2,3, 4,5,6,7)

1 3 2 4 5 7 6 1 3 2 4 5 7 6

(6,7) (4) (5) (7) (6) (0,1) (2,3) (4,5) (0) (2) (1) (3)

(d) Final distribution of messages (a) Initial distribution of messages (c) Distribution before the third step

Figure 4.15 The scatter operation on an eight-node hypercube.

slide-16
SLIDE 16

. .

p

M -1,0

. . .

p

  • 1

Mp

  • 1,

. . .

p

  • 1

M 0,

p

  • 1

M 1,

1 p-1

. . .

M M 0,0

1,0

p

M M M 0,1

1,1

  • 1,1

.

All-to-all personalized

.

communication

p

  • 1

Mp

  • 1,

p

  • 1

M 1,

p

  • 1

M 0,

p-1 1

. . .

. . .

M M

. . . . .

M M 0,0

0,1 1,0 1,1

p

M -1,0

p

M -1,1 Figure 4.16 All-to-all personalized communication.

slide-17
SLIDE 17

P

3

n

P P P

1 2

Figure 4.17 All-to-all personalized communication in transposing a 4 × 4 matrix using four pro- cesses.

slide-18
SLIDE 18

({2,0}, ({1,0}) ({0,1} ... {0,5}) ({1,2} ... {1,0}) ({0,2} ... {0,5}) ({2,1}) ({3,2}) ({5,2} ... {5,4}) ({5,1} ... {5,4}) ({4,1} ... {4,3}) ({4,2}, {4,3}) ({3,1}, {3,2}) ({2,3}, {2,4}, {2,5}, {2,0}, {1,0}) {1,5}, {1,4}, ({1,3}, {0,5}) {0,4}, ({0,3}, ({5,3}, {5,4}) {2,1}) ({4,3}) {4,2}, {4,3}) {5,4}) {5,3}, {5,2}, {5,1}, ({5,0}, 1 {4,1}, {3,2}) {3,1}, ({3,0}, ({4,0}, {2,1})

1 2 3 4 5

({3,4} ... {3,2}) ({2,4} ... {2,1}) ({1,4} ... {1,0}) ({4,5} ... {4,3}) ({3,5} ... {3,2}) ({2,5} ... {2,1}) ({0,4}, {0,5}) ({1,5}, {1,0}) ({0,5}) ({5,4}) 3 3 1 2 3 4 5 1 2 3 4 5 2 3 4 5 2 4 3 5 1 1 1 2 4 5 5 4 2

Figure 4.18 All-to-all personalized communication on a six-node ring. The label of each mes- sage is of the form {x, y}, where x is the label of the node that originally owned the mes- sage, and y is the label of the node that is the final destination of the message. The label ({x1, y1}, {x2, y2}, . . . , {xn, yn}) indicates a message that is formed by concatenating n indi- vidual messages.

slide-19
SLIDE 19

{2,0},{2,3},{2,6}) {1,0},{1,3},{1,6}, {5,0},{5,3},{5,6}) {4,0},{4,3},{4,6}, ({3,0},{3,3},{3,6}, {8,0},{8,3},{8,6}) {7,0},{7,3},{7,6}, ({6,0},{6,3},{6,6}, {8,1},{8,4},{8,7}) ({6,1},{6,4},{6,7}, {7,1},{7,4},{7,7}, {8,2},{8,5},{8,8}) {7,2},{7,5},{7,8},

1 2 5 3 4

({6,2},{6,5},{6,8},

8

beginning of first phase

7 6

(b) Data distribution at the beginning of second phase

{4,4},{4,7}, {5,1},{5,,4}, {5,7}) ({0,2},{0,5}, {0,8},{1,2}, {1,5},{1,8}, {2,2},{2,5}, {2,8}) ({3,2},{3,5}, {3,8},{4,2}, {4,5},{4,8}, {5,2},{5,5}, {5,8}) ({3,1},{3,4}, {3,7},{4,1}, {2,7}) {2,1},{2,4}, {1,4},{1,7}, {0,7},{1,1}, ({0,1},{0,4}, ({0,0},{0,3},{0,6},

1 2 5 3 4 8 7 6

{1,1},{1,4},{1,7}, ({0,0},{0,3},{0,6}, ({3,0},{3,3},{3,6}, {4,1},{4,4},{4,7}, {5,2},{5,5},{5,8}) {8,2},{8,5},{8,8}) {7,1},{7,4},{7,7}, ({6,0},{6,3},{6,6}, {2,2},{2,5},{2,8}) ({1,0},{1,3},{1,6}, {0,1},{0,4},{0,7}, {0,2},{0,5},{0,8}) {1,2},{1,5},{1,8}) {2,1},{2,4},{2,7}, ({2,0},{2,3},{2,6}, {3,1},{3,4},{3,7}, {3,2},{3,5},{3,8}) ({4,0},{4,3},{4,6}, {4,2},{4,5},{4,8}) ({5,0},{5,3},{5,6}, {5,1},{5,4},{4,7}, {6,1},{6,4},{6,7}, {6,2},{6,5},{6,8}) ({7,0},{7,3},{7,6}, {7,2},{7,5},{7,8}) ({8,0},{8,3},{8,6}, {8,1},{8,4},{8,7},

(a) Data distribution at the

Figure 4.19 The distribution of messages at the beginning of each phase of all-to-all personalized communication on a 3 × 3 mesh. At the end of the second phase, node i has messages ({0,i}, . . . ,{8,i}), where 0 ≤ i ≤ 8. The groups of nodes communicating together in each phase are enclosed in dotted boundaries.

slide-20
SLIDE 20

({0,0} ... {0,7}) ({4,1},{6,1}, {4,5},{6,5}, {5,1},{7,1}, {5,5},{7,5}) ({1,0} ... {1,7}) ({4,0} ... {4,7}) ({5,0} ... {5,7}) ({3,0} ... {3,7}) ({2,0} ... {2,7}) ({7,0} ... {7,7}) ({6,0} ... {6,7})

(a) Initial distribution of messages

6 7 5 4 2 3 1

{1,0},{1,2},{1,4},{1,6}) ({0,0},{0,2},{0,4},{0,6}, {3,4},{3,6}) {3,0},{3,2}, {2,4},{2,6}, ({0,6} ... {7,6}) ({2,0},{2,2}, ({6,0},{6,2},{6,4},{6,6}, ({6,1},{6,3},{6,5},{6,7},

1 3 2 4 5 7 6

({1,1},{1,3},{1,5},{1,7}, {0,1},{0,3},{0,5},{0,7}) {7,0},{7,2},{7,4},{7,6}) {7,1},{7,3},{7,5},{7,7}) ({4,1},{4,3}, {4,5},{4,7}, {5,1},{5,3}, {5,5},{5,7})

1 3 2 4 5 7 6

(b) Distribution before the second step (d) Final distribution of messages

({0,0} ... {7,0}) ({0,1} ... {7,1}) ({0,5} ... {7,5}) ({0,4} ... {7,4}) ({0,7} ... {7,7}) ({0,3} ... {7,3}) ({0,2} ... {7,2}) {1,0},{1,4},{3,0},{3,4}) {0,1},{0,5},{2,1},{2,5})

1 3 2 4 5 7 6

({0,0},{0,4},{2,0},{2,4}, ({1,1},{1,5},{3,1},{3,5}, ({6,2},{6,6},{4,2},{4,6}, {7,2},{7,6},{5,2},{5,6}) ({7,3},{7,7},{5,3},{5,7}, {6,3},{6,7},{4,3},{4,7}) ({0,2},{2,2}, {0,6},{2,6}, {1,2},{3,2}, {1,6},{3,6})

(c) Distribution before the third step

Figure 4.20 An all-to-all personalized communication algorithm on a three-dimensional hypercube.

slide-21
SLIDE 21

2 6 6

(a) (d)

1 3 2 4 5 7 6 1 3 2 4 5 7 6 7

(c) (f)

1 3 2 4 5 7 6 1 3 2 4 5 7 6 4 6 7 1 5 7 6 4 5 3 5 4 4 2 3 5 6 1 3 2 2 1 7 3 1

(b) (e) (g)

1 3 2 4 5 7 6 1 3 2 4 5 7 6 1 3 2 4 5 7

Figure 4.21 Seven steps in all-to-all personalized communication on an eight-node hypercube.

slide-22
SLIDE 22

11

(14) (13) (12) (8) (0) (2) (10) (9) (6) (5) (4) (1) (15) (3) (7)

(c) Column shifts in the third communication step

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(12) (13) (14) (11)

1 2 3 4 5 6 7 8 9 10 12 13 14 15

(3) (7) (11)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

(3) (7) (1) (4) (5) (6) (2) (0) (11) (8) (9) (10) (14) (13) (12) (15)

(d) Final distribution of the data (a) Initial data distribution and the first communication step (b) Step to compensate for backward row shifts

(1) (4) (5) (6) (9) (10) (2) (0) (8) (12) (13) (14) (15) (11) (3) (7) (15) (1) (4) (5) (6) (9) (10) (2) (0) (8)

Figure 4.22 The communication steps in a circular 5-shift on a 4 × 4 mesh.

slide-23
SLIDE 23

1 2 3 4 5 6

(c) Final data distribution after the 5-shift

7

(7) (0) (3) (4) (6) (1) (2) (5)

1 2 3 4 5 6 7

(2) (3) (0) (1) (4) (7) (5) (6)

1 2 3 4 5 6 7

(3) (6) (7) (0) (5) (4) (1) (2)

1 2 3 4 5 6 7

(4) (7) (0) (3) (1) (2) (6) (5)

First communication step of the 4-shift Second communication step of the 4-shift

(a) The first phase (a 4-shift) (b) The second phase (a 1-shift) Figure 4.23 The mapping of an eight-node linear array onto a three-dimensional hypercube to perform a circular 5-shift as a combination of a 4-shift and a 1-shift.

slide-24
SLIDE 24

1 3 2 4 5 7 6 1 3 4 5 7 6 2 3 1 2 4 5 7 6 1 3 2 4 5 7 6

(g) 7-shift (f) 6-shift (d) 4-shift (e) 5-shift

6

(c) 3-shift (a) 1-shift (b) 2-shift

1 3 2 4 5 7 6 1 3 2 4 5 7 6 1 3 2 4 5 7

Figure 4.24 Circular q-shifts on an 8-node hypercube for 1 ≤ q < 8.