SLIDE 7 CMPI Collective Operations
Effi cient hypercube implementation Different amount of data is transferred in each dimension.
a b0 c0+c1 d0 a1 b c1 d0+d1 a2+a3 b2 c d2 a3 b2+b3 c3 d a=a0+a1+a2+a3 b=b0+b1+b2+b3 c=c0+c1+c2+c3 d=d0+d1+d2+d3
p0 p1 p2 p3 processors
a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a0+a1 a1 a2+a3 a3 b0 c0+c1 d0 b0+b1 c1 d0+d1 b2 c2+c3 d2 b2+b3 c3 d2+d3 p0 p1 p3 m p2 p0 p1 p3 m p2
step 2 step 1
1st dim. transfer 2nd dim. transfer (1 n−tuple) (1/2 n−tuple)
components n−tuple Final state
n n/4 n/2
Specialized Network Topologies for Efficient Communication in Computer Clusters – p.7/14