Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics
PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP
Integrating I/O Measurement into Performance Optimisation and - - PowerPoint PPT Presentation
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Background What is POP? Center of Excellence that provides service to analyze parallel codes
PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
2
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
3
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
4
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 9 16
Load Balance
Lustre - Skylake NFS Lustre - Broadwell BeeGFS
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
5
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 9 16
Transfer Efficiency
Lustre - Skylake NFS Lustre - Broadwell BeeGFS
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
6
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 9 16
Serialization Efficiency
Lustre - Skylake NFS Lustre - Broadwell BeeGFS
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
7
10% 30% 50% 70% 90% 110% 130% 150% 4 9 16
General Efficiency
Lustre - Skylake NFS Lustre - Broadwell BeeGFS
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
8
− I/O time is not evaluated on the ideal situation where I/O transfer rate is not a problem
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
10
& Broadwell has higher transfer rate than the other filesystem.
compared to the other filesystems.
compute cluster where Intel Skylake faster runtime
20 40 60 80 100 1 4 9 16
NPB-IO Runtime (s)
NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell 200 400 600 800 1000 1 4 9 16
NPB-IO Transfer Rate (MiB/s)
NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
11
reading file and not for writing
both read and write
1 2 3 4 1 4 9 16
NPB-IO Cummulative time in shared write (s)
NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell 1 2 3 4 1 4 9 16
NPB-IO Cummulative time in shared read (s)
NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell
87% 91% 88% 90% 83% 88% 84% 76% 22% 24% 26% 24% 22% 10% 13% 24% 49% 50% 50% 50% 1 4 9 16
NPB-IO Shared Read Proportion
NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
12
200 400 600 800 1000 1 2 4 8
CalculiX Runtime in Seconds
$WORK - login-t $WORK $HPCWORK 5 10 15 20 25 30 35 1 2 4 8
CalculiX POSIX transfer speed
$WORK - login-t $WORK $HPCWORK 50 100 150 200 250 300 350 400 1 2 4 8
CalculiX STDIO transfer speed
$WORK - login-t $WORK $HPCWORK
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
13
0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 4 8
Shared reads cummulative I/O
$WORK - login-t $WORK $HPCWORK 20 40 60 80 100 120 140 1 2 4 8
Shared writes cummulative I/O
$WORK - login-t $WORK $HPCWORK
93.16% 96.00% 96.60% 93.87% 95.86% 97.92% 97.74% 97.27% 99.55% 99.32% 99.28% 99.34% 1 2 3 4
SHARED WRITE PROPORTION
$WORK - login-t $WORK $HPCWORK
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
14
− Measuring I/O computation time within shared file systems needs to consider cluster workloads, filesystem type, and the chosen programming model
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
15
Compound metric from parallel efficiency * computation efficiency
Source: https://pop-coe.eu/node/69
compound metrics from load balance * communication efficiency
maximum computation time on ideal network / total runtime on ideal network
total runtime on ideal network / total runtime on real network
ratios of total time in useful computation summed over all processes.
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
16
− Subtype full: MPI I/O with collective buffering − Size A, B, C − Compiled with Intel compiler 2018.4
− Open source finite state element analysis application − POSIX I/O − Compiled with Intel compiler 2018.4
− Intel Skylake − Filesystems: NFS, Lustre
− Intel Broadwell − Filesystems: NFS, Lustre, BeeGFS
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
17
0% 20% 40% 60% 80% 100% 4 9 16
Load Balance - Class C
Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16
Load Balance - Class B
Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16
Load Balance - Class A
Lustre - Skylake NFS Lustre - Broadwell BeeGFS
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
18
0% 20% 40% 60% 80% 100% 4 9 16
Transfer Efficiency - Class C
Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16
Transfer Efficiency - Class B
Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16
Transfer Efficiency - Class A
Lustre - Skylake NFS Lustre - Broadwell BeeGFS
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
19
0% 20% 40% 60% 80% 100% 4 9 16
Serialization Efficiency - Class C
Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16
Serialization Efficiency - Class B
Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16
Serialization Efficiency - Class A
Lustre - Skylake NFS Lustre - Broadwell BeeGFS
Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem
20
0% 50% 100% 150% 4 9 16
General Efficiency - Class A
Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 50% 100% 150% 4 9 16
General Efficiency - Class B
Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 50% 100% 150% 4 9 16
General Efficiency - Class C
Lustre - Skylake NFS Lustre - Broadwell BeeGFS