Towards On-Demand I/O Forwarding in HPC Platforms Jean Luca Bez, - PowerPoint PPT Presentation

Towards On-Demand I/O Forwarding in HPC Platforms Jean Luca Bez, Francieli Zanon Boito, Ramon Nou, Alberto Miranda, Toni Cortes, and Philippe O. A. Navaux jean.bez@inf.ufrgs.br PDSW 2020 — International Parallel Data Systems Workshop

INTRODUCTION Agenda The I/O Forwarding Layer ● ● Motivation FORGE The I/O Forwarding Explorer ● ● Forwarding in MareNostrum 4 Forwarding in SDumont ● ● Conclusion 2

INTRODUCTION The I/O Forwarding Layer Compute Nodes Parallel File System Application A Application B Application C Metadata Servers Client Client Client Meta Server 1 Client Client Client Meta Server 2 ● ● ● ● ● ● ● ● ● ● ● ● Client Client Client Meta Server M Client Client Client Data Server 1 Client Client Client Data Server 2 ● ● ● ● ● ● ● ● ● ● ● ● Client Client Client Data Server N Application D Application E Application X Data Servers 3

INTRODUCTION The I/O Forwarding Layer Compute Nodes Parallel File System Application A Application B Application C Metadata Servers Client Client Client Meta Server 1 Client Client Client Meta Server 2 ● ● ● ● ● ● ● ● ● ● ● ● Client Client Client Meta Server M Client Client Client Data Server 1 Client Client Client Data Server 2 ● ● ● ● ● ● ● ● ● ● ● ● Client Client Client Data Server N Application D Application E Application X Data Servers 4

INTRODUCTION The I/O Forwarding Layer Compute Nodes Forwarding Layer Parallel File System Application A Application B Application C Metadata Servers Client Client Client Meta Server 1 Client Client Client Meta Server 2 ION 1 ● ● ● ● ● ● ● ● ● ● ● ● Client Client Client Meta Server M ION 2 ION 3 Client Client Client Data Server 1 ● ● ● ION K Client Client Client Data Server 2 ● ● ● ● ● ● ● ● ● ● ● ● Client Client Client Data Server N Application D Application E Application X Data Servers 5

INTRODUCTION The I/O Forwarding Layer Compute Nodes Forwarding Layer Parallel File System Application A Application B Application C Metadata Servers Client Client Client Meta Server 1 Client Client Client Meta Server 2 ION 1 ● ● ● ● ● ● ● ● ● ● ● ● Client Client Client Meta Server M ION 2 ION 3 Client Client Client Data Server 1 ● ● ● ION K Client Client Client Data Server 2 ● ● ● ● ● ● ● ● ● ● ● ● Client Client Client Data Server N Application D Application E Application X Data Servers 6

INTRODUCTION Motivation ● Investigate the impact of I/O forwarding on performance Take into account the application’s access pattern ● ● Most machines cannot be easily reconfigured End-users are not allowed to change this layer ● ● We need a research/exploration alternative! ● When forwarding is the best choice? How many I/O nodes should an application use? ● 7

ARCHITECTURE FORGE: The I/O FORwardinG Explorer Compute Node Compute Node Compute Node Compute Node 2 6 3 7 4 8 5 9 11 15 12 16 10 14 13 16 FORGE CN FORGE CN FORGE CN FORGE CN Compute FORGE FORGE Compute 0 1 Node ION ION Node Parallel File System MPI Rank 8

FORGE EXPERIMENTS MareNostrum 4 (Spain) and Santos Dumont (Brazil) supercomputers ● ● 189 distinct scenarios (access patterns and deployments): Compute nodes: 8, 16, and 32 ○ ○ Client processes per compute node: 12, 24, and 48 (96, 192, 384, 768, and 1536 processes in total) ○ File layout: file-per-process or shared file Spatiality: contiguous or 1D-strided ○ ○ Operation: WRITE Request sizes: 32KB, 128KB, 512KB, 1MB, 4MB, 6MB, and 8MB ○ ○ Stonewall: one second 9

I/O FORWARDING MareNostrum 4 ● Bandwidth at client-side ● 5 repetitions for each ● Different days and periods 10

I/O FORWARDING MareNostrum 4 ● How many choices do we have to consider? Dunn’s nonparametric test ● ● 3 choices impact performance 46% patterns (88 out of 189) ● What is the best number of I/O nodes? No simple rule to fit all ● 16

I/O FORWARDING Santos Dumont ● Forwarding impact is different! ● The more I/O nodes, the better ● Not forwarding is an option 17

RESULTS Discussion Increasing heterogeneous applications ● ● Shift from must-use to on-demand I/O forwarding layer Transparently reshape the flow of requests ● ● Towards a dynamic allocation of I/O nodes Idle or reserved set of compute nodes could act as I/O nodes ● ● Interference on I/O could not be reduced or eliminated 23

PRESENTATION Conclusion ● I/O forwarding is an established and widely-adopted technique ● Not always possible to explore its advantages under different setups ● Impact or disrupt production systems ● FORGE: a lightweight forwarding layer in user-space ● Understand the impact of forwarding different access patterns ● Evaluation in MareNostrum 4 and Santos Dumont supercomputers ● Shift from must-use to on-demand I/O forwarding layer 24

ACKNOWLEDGMENTS This study was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. It has also received support from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil; It is also partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grants PID2019-107255GB; and the Generalitat de Catalunya under contract 2014—SGR—1051. The author thankfully acknowledges the computer resources, technical expertise and assistance provided by the Barcelona Supercomputing Center - Centro Nacional de Supercomputación. The authors acknowledge the National Laboratory for Scientific Computing (LNCC/MCTI, Brazil) for providing HPC resources of the SDumont supercomputer, which have contributed to the research results reported within this paper. URL: http://sdumont.lncc.br. 25

Towards On-Demand I/O Forwarding in HPC Platforms Jean Luca Bez, Francieli Zanon Boito, Ramon Nou, Alberto Miranda, Toni Cortes, and Philippe O. A. Navaux jean.bez@inf.ufrgs.br PDSW 2020 — International Parallel Data Systems Workshop

Towards On-Demand I/O Forwarding in HPC Platforms Jean Luca Bez, - PowerPoint PPT Presentation

Towards On-Demand I/O Forwarding in HPC Platforms Jean Luca Bez, Francieli Zanon Boito, Ramon Nou, Alberto Miranda, Toni Cortes, and Philippe O. A. Navaux jean.bez@inf.ufrgs.br PDSW 2020 International Parallel Data Systems Workshop

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Product Transport & Shipping Options 1 DHL Logistics Cambodia | 2014 DHL Global Forwarding

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

MATLAB on UL HPC Checkpointing & parallel execution UL High Performance Computing (HPC) Team

SANCTIFICATION IN 1 JOHN: KEY TERMS AND DOCTRINES KEY TERMS AND DOCTRINES IN LIGHT OF THE

Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances

Controlling False Discovery Rate Privately Weijie Su University of Pennsylvania NIPS, Barcelona,

molecular docking studies of o-benzoyl benzoic acid based 1,3,4-oxadiazole analogues Suman Bala 1

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Stability Analysis For Unsupervised Learning Dr. Derek Greene Insight @ UCD April 2014

Status of DUNE DAQ Hardware/Firmware Development Status David Cussans DUNE DAQ Meeting 15 th

A Bayesian test of the lineage-specificity of word-order correlations Gerhard Jger Tbingen

Towards On-Demand I/O Forwarding in HPC Platforms Jean Luca Bez, - PowerPoint PPT Presentation

Towards On-Demand I/O Forwarding in HPC Platforms Jean Luca Bez, Francieli Zanon Boito, Ramon Nou, Alberto Miranda, Toni Cortes, and Philippe O. A. Navaux jean.bez@inf.ufrgs.br PDSW 2020 International Parallel Data Systems Workshop

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Product Transport &amp; Shipping Options 1 DHL Logistics Cambodia | 2014 DHL Global Forwarding

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

MATLAB on UL HPC Checkpointing &amp; parallel execution UL High Performance Computing (HPC) Team

SANCTIFICATION IN 1 JOHN: KEY TERMS AND DOCTRINES KEY TERMS AND DOCTRINES IN LIGHT OF THE

Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances

Controlling False Discovery Rate Privately Weijie Su University of Pennsylvania NIPS, Barcelona,

molecular docking studies of o-benzoyl benzoic acid based 1,3,4-oxadiazole analogues Suman Bala 1

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Stability Analysis For Unsupervised Learning Dr. Derek Greene Insight @ UCD April 2014

Status of DUNE DAQ Hardware/Firmware Development Status David Cussans DUNE DAQ Meeting 15 th

A Bayesian test of the lineage-specificity of word-order correlations Gerhard Jger Tbingen

Product Transport & Shipping Options 1 DHL Logistics Cambodia | 2014 DHL Global Forwarding

MATLAB on UL HPC Checkpointing & parallel execution UL High Performance Computing (HPC) Team