Simulation for Experimenting HPC Systems Martin Quinson (Nancy - PowerPoint PPT Presentation

Simulation for Experimenting HPC Systems Martin Quinson (Nancy University, France) et Al. Nancy, June 3 2010

Scientific Computation Applications Physics Nobel Price 1996 Classical Approaches in science and engineering Georges Smoot 1. Theoretical work: equations on a board 2. Experimental study on an scientific instrument That’s not always desirable (or even possible) Large Hardron Collider ◮ Some phenomenons are intractable theoretically ◮ Experiments too expensive, difficult, slow, dangerous The third scientific way: Computational Science 3. Study in silico using computers Modeling / Simulation of the phenomenon or data-mining � High Performance Computing Systems Martin Quinson Simulation for Experimenting HPC Systems Introduction and Context 2/31

Scientific Computation Applications Physics Nobel Price 1996 Georges Smoot Large Hardron Collider The third scientific way: Computational Science 3. Study in silico using computers Modeling / Simulation of the phenomenon or data-mining � High Performance Computing Systems These systems deserve very advanced analysis ◮ Their debugging and tuning are technically difficult ◮ Their use induce high methodological challenges ◮ Science of the in silico science Martin Quinson Simulation for Experimenting HPC Systems Introduction and Context 2/31

Studying Large Distributed HPC Systems (Grids) Why? Compare aspects of the possible designs/algorithms/applications ◮ Response time ◮ Scalability ◮ Fault-tolerance ◮ Throughput ◮ Robustness ◮ Fairness How? Several methodological approaches ◮ Theoretical approch: mathematical study [of algorithms] � Better understanding, impossibility theorems; � Everything NP-hard ◮ Experimentations ( ≈ in vivo): Real applications on Real platforms � Believable; � Hard and long. Experimental control? Reproducibility? ◮ Emulation ( ≈ in vitro): Real applications on Synthetic platforms � Better experimental control; � Even more difficult ◮ Simulation (in silico): Prototype of applications on model of systems � Simple; � Experimental bias ⇒ No approach is enough, all are mandatory Martin Quinson Simulation for Experimenting HPC Systems Introduction and Context 3/31

Outline Introduction and Context High Performance Computing for Science In vivo approach (direct experimentation) In vitro approach (emulation) In silico approach (simulation) The SimGrid Project User Interface(s) SimGrid Models SimGrid Evaluation Grid Simulation and Open Science Recapping Objectives SimGrid and Open Science HPC experiments and Open Science Conclusions Martin Quinson Simulation for Experimenting HPC Systems Introduction and Context 4/31

In vivo approach to HPC experiments (direct experiment) ◮ Principle: Real applications, controlled environment ◮ Challenges: Hard and long. Experimental control? Reproducibility? Grid’5000 project: a scientific instrument for the HPC ◮ Instrument for research in computer science ( deploy your own OS) ◮ 9 sites, 1500 nodes (3000 cpus, 4000 cores); dedicated 10Gb links Luxembourg Br´ esil Other existing platforms ◮ PlanetLab: No experimental control ⇒ no reproducibility ◮ Production Platforms (EGEE): must use provided middleware ◮ FutureGrid: future American experimental platform inspired from Grid’5000 Martin Quinson Simulation for Experimenting HPC Systems Introduction and Context 5/31

In vitro approach to HPC experiments (emulation) ◮ Principle: Injecting load on real systems for the experimental control ≈ Slow platform down to put it in wanted experimental conditions ◮ Challenges: Get realistic results, tool stack complex to deploy and use Wrekavoc: applicative emulator machine physique 1 machine physique 2 ◮ Emulates CPU and network ◮ Homogeneous or Heterogeneous platforms Réseau émulé machine physique 3 machine physique 4 Virtualisation sur les noeuds Other existing tools ◮ Network emulation: ModelNet, DummyNet, . . . Tools rather mature, but limited to network ◮ Applicative emulation: MicroGrid, eWan, Emulab Rarely (never?) used outside the lab where they were created Martin Quinson Simulation for Experimenting HPC Systems Introduction and Context 6/31

In silico approach to HPC experiments (simulation) ◮ Principle: Prototypes of applications, models of platforms ◮ Challenges: Get realistic results (experimental bias) SimGrid: generic simulation framework for distributed applications ◮ Scalable (time and memory) , modular, portable. +70 publications. ◮ Collaboration Loria / Inria Rhˆ one-Alpes / CCIN2P3 / U. Hawaii 100000 Default CPU Model 10000 Partial LMM Invalidation Lazy Action Management execution time (s) 1000 Trace Integration Root 1 2 2 100 3 10 2 3 1 4 5 3 Time 1 0.1 SMPI GRAS 4 SimDag MSG 1 5 4 6 6 0.01 SMURF GRE: GRAS in situ 5 Time SimIX network proxy 6 0.001 End 1 2 4 8 1 3 6 1 2 5 1 SimIX 0 0 0 0 6 2 4 2 5 1 0 0 0 0 8 6 2 2 0 0 0 4 ”POSIX-like” API on a virtual platform 0 number of simulated hosts SURF virtual platform simulator XBT Other existing tools ◮ Large amount of existing simulator for distributed platforms: GridSim, ChicSim, GES; P2PSim, PlanetSim, PeerSim; ns-2, GTNetS. ◮ Few are really usable: Diffusion, Software Quality Assurance, Long-term availability ◮ No other study the validity, the induced experimental bias Martin Quinson Simulation for Experimenting HPC Systems Introduction and Context 7/31

Outline Introduction and Context High Performance Computing for Science In vivo approach (direct experimentation) In vitro approach (emulation) In silico approach (simulation) The SimGrid Project User Interface(s) SimGrid Models SimGrid Evaluation Grid Simulation and Open Science Recapping Objectives SimGrid and Open Science HPC experiments and Open Science Conclusions Martin Quinson Simulation for Experimenting HPC Systems The SimGrid Project 8/31

User-visible SimGrid Components GRAS AMOK SimDag MSG SMPI Framework toolbox Library to run MPI Framework for Simple application- applications on top of to develop DAGs of parallel tasks level simulator distributed applications a virtual environment XBT: Grounding features (logging, etc.), usual data structures (lists, sets, etc.) and portability layer SimGrid user APIs ◮ SimDag: specify heuristics as DAG of (parallel) tasks ◮ MSG: specify heuristics as Concurrent Sequential Processes (Java/Ruby/Lua bindings available) ◮ GRAS: develop real applications, studied and debugged in simulator ◮ SMPI: simulate MPI codes Martin Quinson Simulation for Experimenting HPC Systems The SimGrid Project 9/31

User-visible SimGrid Components GRAS AMOK SimDag MSG SMPI Framework toolbox Library to run MPI Framework for Simple application- applications on top of to develop DAGs of parallel tasks level simulator distributed applications a virtual environment XBT: Grounding features (logging, etc.), usual data structures (lists, sets, etc.) and portability layer SimGrid user APIs ◮ SimDag: specify heuristics as DAG of (parallel) tasks ◮ MSG: specify heuristics as Concurrent Sequential Processes (Java/Ruby/Lua bindings available) ◮ GRAS: develop real applications, studied and debugged in simulator ◮ SMPI: simulate MPI codes Which API should I choose? ◮ Your application is a DAG � SimDag ◮ You have a MPI code � SMPI ◮ You study concurrent processes, or distributed applications ◮ You need graphs about several heuristics for a paper � MSG ◮ You develop a real application (or want experiments on real platform) � GRAS ◮ Most popular API (for now): MSG Martin Quinson Simulation for Experimenting HPC Systems The SimGrid Project 9/31

MSG: Heuristics for Concurrent Sequential Processes (historical) Motivation ◮ Centralized scheduling does not scale ◮ SimDag (and its predecessor) not adapted to study decentralized heuristics ◮ MSG not strictly limited to scheduling, but particularly convenient for it Main MSG abstractions ◮ Agent: some code, some private data, running on a given host ◮ Task: amount of work to do and of data to exchange ◮ Host: location on which agents execute ◮ Mailbox: similar to MPI tags Martin Quinson Simulation for Experimenting HPC Systems The SimGrid Project 10/31

MSG: Heuristics for Concurrent Sequential Processes (historical) Motivation ◮ Centralized scheduling does not scale ◮ SimDag (and its predecessor) not adapted to study decentralized heuristics ◮ MSG not strictly limited to scheduling, but particularly convenient for it Main MSG abstractions ◮ Agent: some code, some private data, running on a given host set of functions + XML deployment file for arguments ◮ Task: amount of work to do and of data to exchange ◮ MSG task create(name, compute duration, message size, void *data) ◮ Communication: MSG task { put,get } , MSG task Iprobe ◮ Execution: MSG task execute MSG process sleep, MSG process { suspend,resume } ◮ Host: location on which agents execute ◮ Mailbox: similar to MPI tags Martin Quinson Simulation for Experimenting HPC Systems The SimGrid Project 10/31

Simulation for Experimenting HPC Systems Martin Quinson (Nancy - PowerPoint PPT Presentation

Simulation for Experimenting HPC Systems Martin Quinson (Nancy University, France) et Al. Nancy, June 3 2010 Scientific Computation Applications Physics Nobel Price 1996 Classical Approaches in science and engineering Georges Smoot 1.

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Experimenting on the World Wide Web Ulf-Dietrich Reips University of Tbingen, Germany Contact:

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

MATLAB on UL HPC Checkpointing & parallel execution UL High Performance Computing (HPC) Team

This talk was originally presented at Apachecon Europe 2009 as part of Yahoo!s outreach to the

From Instability to Resilience: The Story of a Web Site

Processes and the Kernel Jeff Chase Duke University OS

Remote Access and SSH a t t e n t i i g r e e n ! P a y t o t e x t o n n T h e s e c o r r e c t

2 3 / i Dale: - 7/ Ccoctra: T o Who. Ltly Th is is to coAfimt A I I ix *I P. Pa

Running Through Covid (or any Crisis) By Dean Johnson RunDeanRun.ca Motivation to Run (without

CRUSADER CUP : COOKTOWN Juggling its way through a heavy schedule as it is in school, a group

Impacting Digital Citizenship with Information Literacy Samuel R. Putnam, Engineering Librarian

Simulation for Experimenting HPC Systems Martin Quinson (Nancy - PowerPoint PPT Presentation

Simulation for Experimenting HPC Systems Martin Quinson (Nancy University, France) et Al. Nancy, June 3 2010 Scientific Computation Applications Physics Nobel Price 1996 Classical Approaches in science and engineering Georges Smoot 1.

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Experimenting on the World Wide Web Ulf-Dietrich Reips University of Tbingen, Germany Contact:

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

MATLAB on UL HPC Checkpointing &amp; parallel execution UL High Performance Computing (HPC) Team

This talk was originally presented at Apachecon Europe 2009 as part of Yahoo!s outreach to the

From Instability to Resilience: The Story of a Web Site

Processes and the Kernel Jeff Chase Duke University OS

Remote Access and SSH a t t e n t i i g r e e n ! P a y t o t e x t o n n T h e s e c o r r e c t

2 3 / i Dale: - 7/ Ccoctra: T o Who. Ltly Th is is to coAfimt A I I ix *I P. Pa

Running Through Covid (or any Crisis) By Dean Johnson RunDeanRun.ca Motivation to Run (without

CRUSADER CUP : COOKTOWN Juggling its way through a heavy schedule as it is in school, a group

Impacting Digital Citizenship with Information Literacy Samuel R. Putnam, Engineering Librarian

MATLAB on UL HPC Checkpointing & parallel execution UL High Performance Computing (HPC) Team