HIPS 2007 Long Beach Edgar Gabriel
Runtime Optimization of Application Level Communication Patterns
Edgar Gabriel and Shuo Huang Department of Computer Science University of Houston gabriel@cs.uh.edu
Runtime Optimization of Application Level Communication Patterns - - PowerPoint PPT Presentation
Runtime Optimization of Application Level Communication Patterns Edgar Gabriel and Shuo Huang Department of Computer Science University of Houston gabriel@cs.uh.edu HIPS 2007 Long Beach Edgar Gabriel Motivation Finite Difference code on a
HIPS 2007 Long Beach Edgar Gabriel
Edgar Gabriel and Shuo Huang Department of Computer Science University of Houston gabriel@cs.uh.edu
HIPS 2007 Long Beach Edgar Gabriel
Finite Difference code on a PC cluster using IB and GE interconnects Execution time for 200 iterations of the solver on 32 processes/processors
5 10 15 20 25 30
128x128x64 IB 128x128x128 IB 128x128x64 TCP 128x128x128 TCP
execution time [sec ]
fcfs fcfs-pack
p
HIPS 2007 Long Beach Edgar Gabriel
HIPS 2007 Long Beach Edgar Gabriel
HIPS 2007 Long Beach Edgar Gabriel
HIPS 2007 Long Beach Edgar Gabriel
Handle for tuple of < topology, vector, function-set> Request Abstraction for a process topology Topology Abstraction for a multi-dimensional data object Vector Set of functions providing the same functionality
Function-set Implementation of a particular operation
Function Group of attributes Attribute-set Abstraction for a characteristic of an implemen- tation represented by the set its possible values Attribute Functionality ADCL object
HIPS 2007 Long Beach Edgar Gabriel
ADCL_Vector vec; ADCL_Topology topo; ADCL_Request request; /* Generate a 2-D process topology */ MPI_Cart_create ( comm, 2, cart_dims, periods, 0,&cart_comm); ADCL_Topology_create ( cart_comm, &topo ); /* Register a 2D vector with ADCL */ ADCL_Vector_register (ndims, vec_dims, HALO_WIDTH, MPI_DOUBLE, vector, &vec); /* Match process topology, data item and function-set */ ADCL_Request_create (vec, topo, ADCL_FNCTSET_NEIGHBORHOOD, &request ); for (i=0; i<NIT; i++ ) { /* Main application loop */ ADCL_Request_start (request ); … }
HIPS 2007 Long Beach Edgar Gabriel
Implementation no. 1 2 3 4 5 6 7 Using the fastest implementation for the rest of the application
HIPS 2007 Long Beach Edgar Gabriel
max
j i i
HIPS 2007 Long Beach Edgar Gabriel
max
i winner
HIPS 2007 Long Beach Edgar Gabriel
HIPS 2007 Long Beach Edgar Gabriel
Y z X 1
Function a Function b Function c
Y z X 2 Y z X 3
Value for attribute 1 Value for attribute 2 Value for attribute 3 Value for attribute 4
HIPS 2007 Long Beach Edgar Gabriel
Y z X+1 1
Function c Function d Function e
Y z X+1 2 Y z X+1 3
Value for attribute 1 Value for attribute 2 Value for attribute 3 Value for attribute 4
HIPS 2007 Long Beach Edgar Gabriel
HIPS 2007 Long Beach Edgar Gabriel
Name
Handling of non-cont. data Data transfer primitive IsendIrecv_aao aao ddt MPI_Isend/Irecv/Waitall IsendIrecv_pair pair ddt MPI_Isend/Irecv/Waitall SendIrecv_aao aao ddt MPI_Send/Irecv/Waitall SendIrecv_pair pair ddt MPI_Send/Irecv/Wait IsendIrecv_aao_pack aao ddt MPI_Isend/Irecv/Waitall IsendIrecv_pair_pack pair Pack/unpack MPI_Isend/Irecv/Waitall SendIrecv_aao_pack aao ddt MPI_Send/Irecv/Waitall SendIrecv_pair_pack pair Pack/unpack MPI_Send/Irecv/Wait SendRecv_pair pair ddt MPI_Send/Recv Sendrecv_pair pair ddt MPI_Send/Recv SendRecv_pair_pack pair Pack/unpack MPI_Send/Recv Sendrecv_pair_pack pair Pack/unpack MPI_Send/Recv WinfencePut_aao aao ddt MPI_Put/MPI_Win_fence WinfenceGet_aao aao ddt MPI_Get/MPI_Win_fence PostStartPut_aao aao ddt MPI_Put/MPI_Win_post/start PostStartGet_aao aao ddt MPI_Get/MPI_Win_post/start WinfencePut_pair pair ddt MPI_Put/MPI_Win_fence WinfenceGet_pair pair ddt MPI_Get/MPI_Win_fence PostStartPut_pair pair ddt MPI_Put/MPI_Win_post/start PostStartGet_pair pair ddt MPI_Get/MPI_Win_post/start
HIPS 2007 Long Beach Edgar Gabriel
InfiniBand 32 processes small problem size
10.4 10.6 10.8 11 11.2 11.4 11.6 11.8 12 12.2 12.4 I s e n d I r e c v _ a a
e n d I r e c v _ a a
s e n d I r e c v _ p a i r S e n d R e c v _ p a i r S e n d I r e c v _ p a i r S e n d r e c v _ p a i r I s e n d I r e c v _ a a
p a c k S e n d I r e c v _ a a
p a c k I s e n d I r e c v _ p a i r _ p a c k S e n d R e c v _ p a i r _ p a c k S e n d I r e c v _ p a i r _ p a c k S e n d r e c v _ p a i r _ p a c k b r u t e h y p
HIPS 2007 Long Beach Edgar Gabriel
InfiniBand 32 processes large problem size
72.5 73 73.5 74 74.5 75 75.5 76 76.5 77 77.5 I s e n d I r e c v _ a a
e n d I r e c v _ a a
s e n d I r e c v _ p a i r S e n d R e c v _ p a i r S e n d I r e c v _ p a i r S e n d r e c v _ p a i r I s e n d I r e c v _ a a
p a c k S e n d I r e c v _ a a
p a c k I s e n d I r e c v _ p a i r _ p a c k S e n d R e c v _ p a i r _ p a c k S e n d I r e c v _ p a i r _ p a c k S e n d r e c v _ p a i r _ p a c k b r u t e h y p
HIPS 2007 Long Beach Edgar Gabriel
TCP over Fast Ethernet 32 processes small problem size
50 100 150 200 250 300 350 400 IsendIrecv_aao SendIrecv_aao IsendIrecv_pair SendRecv_pair SendIrecv_pair Sendrecv_pair IsendIrecv_aao_pack SendIrecv_aao_pack IsendIrecv_pair_pack SendRecv_pair_pack SendIrecv_pair_pack Sendrecv_pair_pack brute hypo Execution time [sec]
HIPS 2007 Long Beach Edgar Gabriel
TCP over Fast Ethernet 32 processes large problem size
50 100 150 200 250 300 350 400 450 I s e n d I r e c v _ a a
e n d I r e c v _ a a
s e n d I r e c v _ p a i r S e n d R e c v _ p a i r S e n d I r e c v _ p a i r S e n d r e c v _ p a i r I s e n d I r e c v _ a a
p a c k S e n d I r e c v _ a a
p a c k I s e n d I r e c v _ p a i r _ p a c k S e n d R e c v _ p a i r _ p a c k S e n d I r e c v _ p a i r _ p a c k S e n d r e c v _ p a i r _ p a c k b r u t e h y p
HIPS 2007 Long Beach Edgar Gabriel
a challenging topic – Hyper-threading – Processor frequency scaling
– Process placement by the batch scheduler
– Attributes should not be correlated
– How much longer will we have to deal with MPI?
HIPS 2007 Long Beach Edgar Gabriel
– Historic learning, Game theory, genetic algorithms