Finding and Optimizing Phases in Parallel Programs
Ray Chen <rchen@cs.umd.edu> Jeffrey K. Hollingsworth <hollings@cs.umd.edu>
Scalable Tools Workshop 2016
Finding and Optimizing Phases in Parallel Programs Ray Chen - - PowerPoint PPT Presentation
Finding and Optimizing Phases in Parallel Programs Ray Chen <rchen@cs.umd.edu> Jeffrey K. Hollingsworth <hollings@cs.umd.edu> Scalable Tools Workshop 2016 Motivation HPC programs often contain phases Dynamic execution
Scalable Tools Workshop 2016
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
2
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
3
while (locDom->time() < locDom->stoptime()) { TimeIncrement(*locDom); LagrangeLeapFrog(*locDom); }
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
4
while (locDom->time() < locDom->stoptime()) { TimeIncrement(*locDom); LagrangeLeapFrog(*locDom); } while (locDom->time() < locDom->stoptime()) { cali::Annotation region1(“tuner.communication”).begin(); TimeIncrement(*locDom); region1.end(); cali::Annotation region2(“tuner.computation”).begin(); LagrangeLeapFrog(*locDom); region2.end() }
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
5
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
6
Actual Timeline Contextual Timeline Contextual Timeline
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
7
My application has three phases
I know what variables affect MPI performance I know what variables affect BLAS performance I know what variables affect FFTW performance
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
8
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
9
2 1 3 2 1 3 1 3 2
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
10
1 3 2 1 3 2 1 3 2 1 3 2
FFTz FFTy1 FFTx A2A1
(non-blocking)
A2A2
(non-blocking)
FFTy2 FFTz FFTy FFTx A2A1
(blocking)
A2A2
(blocking)
2 1 3 2 1 3 1 3 2
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
11
T1
2 1 3
FFTz & Pack
1 3 2
Unpack & FFTy1
Px1 Py1
x y
T1 Ny / p2 Ux1 Uz1
x z
T1 Nz / p2
T1 T1 T2
1 3 2 1 3 2 1 3 2 1 3 2
FFTz FFTy1 FFTx A2A1 A2A2
(non-blocking) (non-blocking)
FFTy2
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
12
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
13
14
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
15
8/2/16 Finding and Optimizing Phases in Parallel Programs: Scalable Tools Workshop
16