Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance
Megha Agarwal Divyansh Singhvi Preeti Malakar Suren Byna
PDSW @ SC'19 November 18, 2019 Indian Institute of Technology Kanpur, India Lawrence Berkeley Laboratory, USA
Active Learning-based Automatic Tuning and Prediction of Parallel - - PowerPoint PPT Presentation
Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance Megha Agarwal Divyansh Singhvi Preeti Malakar Suren Byna Indian Institute of Technology Kanpur, India PDSW @ SC'19 Lawrence Berkeley Laboratory, USA November
PDSW @ SC'19 November 18, 2019 Indian Institute of Technology Kanpur, India Lawrence Berkeley Laboratory, USA
Source: Huong Luu, et al., “A Multiplatform Study of I/O Behavior on Peta- scale Supercomputers”. HPDC '15 A few applications achieve less than 1% of I/O throughput capacity of file systems
2
75% of applications achieve less than 1GB/s I/O throughput
3
HDF5
(Alignment, Chunking, etc.)
MPI I/O
(Enabling collective buffering, Sieving buffer size, collective buffer size, collective buffer nodes, etc.)
Application Parallel File System
(Number of I/O nodes, stripe size, enabling prefetching buffer, etc.)
Storage Hardware Storage Hardware
4
5
Overview of Dynamic Model-driven I/O tuning Pruning Model Generation Training Phase Develop an I/O Model Training Set Top k Configurations I/O Model All Possible Values Overall Architecture
Exploration I/O Autotuning Framework HPC System Optimize I/O Storage System I/O Kernel Top k Configurations Refit the model (Controled by user) Performance Results
Select the Best Performing Configuration
All Possible Configuratinos Refitting Executable H5Tuner I/O Benchmark
XML File
6
7
8
9
10
Build a “surrogate” model P(y|x) (1) Find a set of parameters based on previous runs (random choice of parameters for the first iteration) (2) Run the application in the objective function with the parameters chosen in (1) to measure I/O bandwidth (3) Update the surrogate model incorporating the current performance MAX_EVALS
11
(2) Predict I/O bandwidth with the parameters chosen in (1)
ExAct - Objective function obtains output by running the application on input parameters PrAct- Objective function obtains output by running Predict on input parameters Predict is an offline model trained on dataset that predicts I/O bandwidth for a given set of input parameters.
12
Red - Initial probability distribution Blue - Post training prob. distribution
13
Cb-buffer size distribution Loss distribution Stripe size distribution Stripe count distribution Romio cb_read Romio cb_write Romio ds_read Romio ds_write
14
15
16
87% read and 20% write improvements (on average)
17
Significant improvement with large data sizes
19
20
21
22
IOR BTIO S3D Generic-IO
23
S3D-IO weak scaling on unseen configurations BT-IO with unseen configurations.
24
25
26
28
https://github.com/meghaagr13/Autotuning-PIO