Time Stamp Synchronization for Event Traces
- f Large-Scale Message Passing Applications
- D. Becker and F. Wolf
Forschungszentrum Jülich Central Institute for Applied Mathematics
- R. Rabenseifner
Time Stamp Synchronization for Event Traces of Large-Scale Message - - PowerPoint PPT Presentation
Time Stamp Synchronization for Event Traces of Large-Scale Message Passing Applications D. Becker and F. Wolf Forschungszentrum Jlich Central Institute for Applied Mathematics R. Rabenseifner High Performance Computing Center Stuttgart
Daniel Becker 2
Introduction Event model and replay-based parallel analysis Controlled logical clock Extended controlled logical clock Timestamp synchronization Conclusion Future work
Daniel Becker 3
Goal - diagnose wait states in MPI applications on large-
Scalability through parallel analysis of event traces Trace analysis report Execution on parallel machine Parallel trace analyzer Local trace files
Daniel Becker 4
time process time process time process time process
ENTER EXIT SEND RECV COLLEXIT (a) Late sender (c) Late sender / wrong order (b) Late receiver (d) Wait at n-to-n
Daniel Becker 5
Wait states diagnosis measures temporal displacements
Problem - local processor clocks are often non-
Present approach - linear interpolation
Inaccuracies and changing drifts can still cause violations
Daniel Becker 6
Requirement - realistic message passing codes
Build on controlled logical clock by Rolf Rabenseifner
Approach
Daniel Becker 7
Event includes at least timestamp, location
Event type refers to
Event sequence recorded for typical MPI operations
E X S E
CX
Enter Exit Collective Exit Send Receive
E X R
MPI_Send() MPI_Recv() MPI_Allreduce()
Daniel Becker 8
Daniel Becker 9
Waiting time due to inherent synchronization in N-to-N
Algorithm:
time location
2 1 1
… …
3
… …
2
… …
1 2 3 2 1 2 3 3
Daniel Becker 10
Guarantees Lamport's clock condition
Scans event trace for clock condition violations and
Stretches process-local time axis in the immediate
Daniel Becker 11
Inconsistent event stream Corrected and forward amortized event stream
Daniel Becker 12
Forward amortized event stream Forward and backward amortized event stream
Daniel Becker 13
Consider single collective operation as composition of
Distinguish between different types
Determine send and receive events for each type Define happened-before relations based on
Daniel Becker 14
1xN: Root sends data to N processes Nx1: N processes send data to root NxN: N processes send data to N processes
Daniel Becker 15
Synchronization needs one send event timestamp Operation may have multiple send and receive events Multiple receives used to synchronize multiple clocks Latest send event is the relevant send event Example: N-to-1 root
Daniel Becker 16
New timestamp LC’ is maximum of
Daniel Becker 17
Approximates original communication
Limits synchronization error Bounds propagation during forward amortization Requires global view of the trace data
Daniel Becker 18
Results of the extended controlled logical clock with jump discontinuities Linear interpolation with backwards amortization Piecewise linear interpolation with backwards amortization
LCi’ - LCi
b
x x x x x
R S S S S S R I I
Jump discontinuity due to a clock condition violation LCi
b:= LCi’ without jump
min(LCk’ (corr. receive event) - µ - LCi
b )
wish Amortization interval = jump accuracy Amortization interval
Daniel Becker 19
Event tracing of applications running on thousands of
Proposed algorithm depends on accuracy of
Two-step synchronization scheme
Daniel Becker 20
Account for differences in offset and drift Assume that drift is not time dependant Offset measurement at program initialization and
Linear interpolation between these two points
Daniel Becker 21
Extended controlled logical clock Parallel traversal of the event stream
Exchange required timestamp at synchronization points Perform clock correction Apply control mechanism after replaying the
Daniel Becker 22
Timestamps exchanged depending on the type of
Daniel Becker 23
Timestamps exchanged depending on the type of
Daniel Becker 24
Extended controlled logical clock algorithm takes
Parallel implementation design presented using
Daniel Becker 25
Finish actual implementation Evaluate algorithm using real message
Extend algorithm to shared memory
Extend algorithm to one sided communication
Daniel Becker 26