Validation of Dimemas communication model for MPI collective - - PDF document

validation of dimemas communication model for mpi
SMART_READER_LITE
LIVE PREVIEW

Validation of Dimemas communication model for MPI collective - - PDF document

Validation of Dimemas communication model for MPI collective operations Sergi Girona, Jess Labarta, Rosa M. Badia European Center for Parallelism of Barcelona Departament dArquitectura de Computadors Technical University of Catalonia


slide-1
SLIDE 1

1

Validation of Dimemas communication model for MPI collective operations

Sergi Girona, Jesús Labarta, Rosa M. Badia

European Center for Parallelism of Barcelona Departament d´Arquitectura de Computadors Technical University of Catalonia Barcelona, Spain

Sergi Girona, EuroPVM/MPI’2000

Dimemas Dimemas

Application performance analysis tool for message passing programs In development since 1992 On a workstation Dimemas currently distributed by CEPBA

slide-2
SLIDE 2

2

Sergi Girona, EuroPVM/MPI’2000

Paraver Paraver Visualization Visualization and analysis and analysis

Tuning Methodology Tuning Methodology

Dimemas Dimemas Trace Trace File File Message Message Passing Passing Code Code Tracing Tracing facilities facilities Tracing Tracing Sequential Sequential machine machine Parallel machine Parallel machine DIMEMAS DIMEMAS Simulation Simulation Visualization Visualization Trace Trace File File MP library MP library

  • MPI

MPI

  • PVM

PVM

  • etc...

etc... Parameters Parameters modification modification Code Code modification modification

Sergi Girona, EuroPVM/MPI’2000

Tracefile Tracefile

Characterizes application

Sequence of resource demands for each task Sequence of events: communication

Application model

slide-3
SLIDE 3

3

Sergi Girona, EuroPVM/MPI’2000

Simulated Architecture Simulated Architecture

“Abstract” architecture

Simple/General

Network of SMPs

Fast simulation Key factors influencing performance Abstract interconnect

Local/remote latency/BW Injection mechanism (#links, half/full duplex) Bisection BW, contention

CPU Local Memory B CPU CPU L CPU CPU CPU Local Memory L CPU CPU CPU Local Memory L

Sergi Girona, EuroPVM/MPI’2000

System System

Process to processor mapping Multiprogramming

Tasks sharing node Different applications

CPU Local Memory B CPU CPU L CPU CPU CPU Local Memory L CPU CPU CPU Local Memory L

slide-4
SLIDE 4

4

Sergi Girona, EuroPVM/MPI’2000

Point to Point Communication Point to Point Communication

Latency Bandwidth Resource contention

Bandwidth Size Latency T + =

sender receiver ta

td tb tc te tb tf

tg

T0 T1

Sergi Girona, EuroPVM/MPI’2000

Collective Communication Model Collective Communication Model

Barrier Fan-in/fan-out phases

Size of message Null/Const/Lin/Log

Processor time Block time

  • Comm. time
slide-5
SLIDE 5

5

Sergi Girona, EuroPVM/MPI’2000

Collective Communication Model Collective Communication Model

Communication time Model factor

OR MODEL_FACT Bandwidth Size Latency Time

∗       + =

P Linear Logarithmic 1 Constant Null Factor Model

 

      = = ∑

=

B C steps , steps Nsteps

i P log 1 i i

2

Sergi Girona, EuroPVM/MPI’2000

Parameters Acquisition Parameters Acquisition

Execution of PBM on SGI Origin

Dedicated: execution time Shared: traces for Dimemas

Compute latency, bandwidth, links, buses, phases, ...

ST-ORM http://www.cepba.upc.es/ST-ORM Objective: Predicted time with less than 10% error

Parameters

D i m e m a s

Dedicated

>10%

Predicted

slide-6
SLIDE 6

6

Sergi Girona, EuroPVM/MPI’2000

System Characterization System Characterization

< 10% error regions

40 80 120 160 200 A l l g a t h e r A l l g a t h e r v A l l r e d u c e A l l t

  • A

l l B a r r i e r B C a s t E x c h a n g e P i n g P i n g P i n g P

  • n

g R e d u c e R e d u c e _ s c a t t e r S e n d R e c v

Bandwidth

25 50 75 100 A l l g a t h e r A l l g a t h e r v A l l r e d u c e A l l t

  • A

l l B a r r i e r B C a s t E x c h a n g e P i n g P i n g P i n g P

  • n

g R e d u c e R e d u c e _ s c a t t e r S e n d R e c v

Latency

Sergi Girona, EuroPVM/MPI’2000

System Characterization System Characterization

  • Latency = 25 µ

µseconds

  • Bandwidth = 87.5 MB/s
  • 1 HD link per node

OUT IN Scan Reduce_Scatter Allreduce Reduce Alltoallv Alltoall Allgatherv Allgather Scatterv Scatter Gatherv Gather Bcast Barrier Operation NULL MAX LOG MAX LIN MAX LIN NULL MEAN LOG NULL MEAN LOG MEAN LOG NULL MEAN LOG NULL MEAN LOG MEAN LOG MEAN LOG MEAN LOG MAX LOG MEAN LOG MAX LOG MEAN LOG MAX LOG 2MAX LOG NULL 2MAX LOG MIN LOG 2MAX LOG LOG Model MAX Size MAX Size LOG Model

slide-7
SLIDE 7

7

Sergi Girona, EuroPVM/MPI’2000

Influence of Buses Influence of Buses

10 30 50 70 90 1 3 5 7 9 11 13 15 10 50 90 130 170 1 3 5 7 9 11 13 15 30 60 90 120 150 1 3 5 7 9 11 13 15 200 450 700 950 1200 1 3 5 7 9 11 13 15

SendRecv Exchange Allgather Reduce_scatter

Sergi Girona, EuroPVM/MPI’2000

Validation Validation

NAS benchmarks Classes W, A Size: 8/9, 16, 25/32

  • 20%
  • 10%

0% 10% 20% 30% 40% BT CG FFT IS LU MG SP

slide-8
SLIDE 8

8

Sergi Girona, EuroPVM/MPI’2000

Conclusions Conclusions

Simple but accurate formulation for collective

communication

Methodology for model validation Dimemas is a valid tool for performance analysis of

message passing programs, parallel machines and message passing libraries

Future: RMA and I/O operations pending for validation