Integrating I/O Measurement into Performance Optimisation and - - PowerPoint PPT Presentation

integrating i o measurement into performance optimisation
SMART_READER_LITE
LIVE PREVIEW

Integrating I/O Measurement into Performance Optimisation and - - PowerPoint PPT Presentation

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Background What is POP? Center of Excellence that provides service to analyze parallel codes


slide-1
SLIDE 1

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics

PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP

slide-2
SLIDE 2

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

2

Background

  • What is POP?

Center of Excellence that provides service to analyze parallel codes for academia and industry within the European Union to promote best practice in parallel programming. The goal of the current POP metrics is to sort out components affecting performance in a way to make it easy to read and understand

  • Unfortunately…

I/O was not considered inside this model yet

slide-3
SLIDE 3

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

3

Methodology

slide-4
SLIDE 4

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

4

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 9 16

Load Balance

Lustre - Skylake NFS Lustre - Broadwell BeeGFS

Current Impact on I/O Metrics with Collective IO Buffering (1)

slide-5
SLIDE 5

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

5

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 9 16

Transfer Efficiency

Lustre - Skylake NFS Lustre - Broadwell BeeGFS

Current Impact on I/O Metrics with Collective IO Buffering (2)

slide-6
SLIDE 6

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

6

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 9 16

Serialization Efficiency

Lustre - Skylake NFS Lustre - Broadwell BeeGFS

Current Impact on I/O Metrics with Collective IO Buffering (3)

slide-7
SLIDE 7

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

7

Current Impact on I/O Metrics with Collective IO Buffering (4)

  • 10%

10% 30% 50% 70% 90% 110% 130% 150% 4 9 16

General Efficiency

Lustre - Skylake NFS Lustre - Broadwell BeeGFS

slide-8
SLIDE 8

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

8

Initial Conculsion & Next Steps

  • For MPI-IO with collective buffering case, the file systems difference appears on

serialization efficiency this is due to:

− I/O time is not evaluated on the ideal situation where I/O transfer rate is not a problem

  • Performing more tests on various applications with different I/O size and pattern.
  • Evaluating tools and methodologies to generate information that can represent the

new I/O metric

slide-9
SLIDE 9

Addendum

Additional result & Information

slide-10
SLIDE 10

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

10

Darshan I/O result for NAS Parallel Benchmark (1)

  • Lustre filesystems both on Skylake

& Broadwell has higher transfer rate than the other filesystem.

  • This contributes to smaller runtime

compared to the other filesystems.

  • We can also see the impact on the

compute cluster where Intel Skylake faster runtime

20 40 60 80 100 1 4 9 16

NPB-IO Runtime (s)

NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell 200 400 600 800 1000 1 4 9 16

NPB-IO Transfer Rate (MiB/s)

NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell

slide-11
SLIDE 11

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

11

Darshan I/O result for NAS Parallel Benchmark (2)

  • Lustre shows good performance on

reading file and not for writing

  • BeeGFS shows balanced proportion for

both read and write

1 2 3 4 1 4 9 16

NPB-IO Cummulative time in shared write (s)

NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell 1 2 3 4 1 4 9 16

NPB-IO Cummulative time in shared read (s)

NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell

87% 91% 88% 90% 83% 88% 84% 76% 22% 24% 26% 24% 22% 10% 13% 24% 49% 50% 50% 50% 1 4 9 16

NPB-IO Shared Read Proportion

NFS - Broadwell NFS - Skylake Lustre - Skylake Lustre - Broadwell BeeGFS - Broadwell

slide-12
SLIDE 12

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

12

CalculiX I/O result for NAS Parallel Benchmark (1)

200 400 600 800 1000 1 2 4 8

CalculiX Runtime in Seconds

$WORK - login-t $WORK $HPCWORK 5 10 15 20 25 30 35 1 2 4 8

CalculiX POSIX transfer speed

$WORK - login-t $WORK $HPCWORK 50 100 150 200 250 300 350 400 1 2 4 8

CalculiX STDIO transfer speed

$WORK - login-t $WORK $HPCWORK

  • Good efficiency based on POP metrics
  • Lustre filesystem in the $HPCWORK

performs worse than the other filesystem

  • performance. Initial hypotheses: POSIX

data transfer is mainly for writing and Lustry shared write performance is slower

slide-13
SLIDE 13

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

13

CalculiX I/O result for NAS Parallel Benchmark (2)

0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 4 8

Shared reads cummulative I/O

$WORK - login-t $WORK $HPCWORK 20 40 60 80 100 120 140 1 2 4 8

Shared writes cummulative I/O

$WORK - login-t $WORK $HPCWORK

93.16% 96.00% 96.60% 93.87% 95.86% 97.92% 97.74% 97.27% 99.55% 99.32% 99.28% 99.34% 1 2 3 4

SHARED WRITE PROPORTION

$WORK - login-t $WORK $HPCWORK

  • Lustre performs badly doing file writing

and CalculiX program creates and writes into 5 files continuously

  • This is the case when the filesystem type

affects the performance. In runtime result

  • n the previous slide we can see that

$HPCWORK result is the slowest among all three

slide-14
SLIDE 14

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

14

Background

  • Increased importance of the I/O optimization of the HPC application.
  • The topic is challenging due to various moving variables that make measurement

difficult.

− Measuring I/O computation time within shared file systems needs to consider cluster workloads, filesystem type, and the chosen programming model

  • POP is a Center of Excellence that provides service to analyze parallel codes for

academia and industry within the European Union to promote best practice in parallel programming.

  • The goal of the current POP metrics is to sort out components affecting

performance in a way to make it easy to read and understand. The new I/O performance metrics should conform to this model

slide-15
SLIDE 15

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

15

POP Metrics Explanation

  • General Efficiency Metric

Compound metric from parallel efficiency * computation efficiency

Source: https://pop-coe.eu/node/69

  • Parallel Efficiency

compound metrics from load balance * communication efficiency

  • Load Balance: average computation time / maximum computation time
  • Communication Efficiency: maximum computation time / total runtime
  • Serialization Efficiency:

maximum computation time on ideal network / total runtime on ideal network

  • Transfer Efficiency:

total runtime on ideal network / total runtime on real network

  • Computation Efficiency

ratios of total time in useful computation summed over all processes.

slide-16
SLIDE 16

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

16

Test Case Environment

Software Information:

  • NAS Parallel Benchmark

− Subtype full: MPI I/O with collective buffering − Size A, B, C − Compiled with Intel compiler 2018.4

  • CalculiX

− Open source finite state element analysis application − POSIX I/O − Compiled with Intel compiler 2018.4

Hardware:

  • RWTH Aachen University CLAIX18 compute cluster

− Intel Skylake − Filesystems: NFS, Lustre

  • RWTH Aachen University CLAIX16 compute cluster

− Intel Broadwell − Filesystems: NFS, Lustre, BeeGFS

slide-17
SLIDE 17

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

17

0% 20% 40% 60% 80% 100% 4 9 16

Load Balance - Class C

Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16

Load Balance - Class B

Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16

Load Balance - Class A

Lustre - Skylake NFS Lustre - Broadwell BeeGFS

Current Impact on I/O Metrics (1)

slide-18
SLIDE 18

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

18

0% 20% 40% 60% 80% 100% 4 9 16

Transfer Efficiency - Class C

Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16

Transfer Efficiency - Class B

Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16

Transfer Efficiency - Class A

Lustre - Skylake NFS Lustre - Broadwell BeeGFS

Current Impact on I/O Metrics (2)

slide-19
SLIDE 19

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

19

0% 20% 40% 60% 80% 100% 4 9 16

Serialization Efficiency - Class C

Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16

Serialization Efficiency - Class B

Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 20% 40% 60% 80% 100% 4 9 16

Serialization Efficiency - Class A

Lustre - Skylake NFS Lustre - Broadwell BeeGFS

Current Impact on I/O Metrics (3)

slide-20
SLIDE 20

Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics PDSW 2019: 4TH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP Radita Liem

20

Current Impact on I/O Metrics (4)

0% 50% 100% 150% 4 9 16

General Efficiency - Class A

Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 50% 100% 150% 4 9 16

General Efficiency - Class B

Lustre - Skylake NFS Lustre - Broadwell BeeGFS 0% 50% 100% 150% 4 9 16

General Efficiency - Class C

Lustre - Skylake NFS Lustre - Broadwell BeeGFS