Towards Modeling and Simulation of Exascale Computing Platforms - PowerPoint PPT Presentation

Towards Modeling and Simulation of Exascale Computing Platforms Luka Stanisic Supervised by: A.Legrand, B.Videau and J.F.M´ ehaut Laboratoire d’Informatique de Grenoble MESCAL and NANOSIM teams June 21, 2012 Luka Stanisic Modeling of caches June 21, 2012 1 / 22

Introduction Introduction Future super-computer platforms will be facing big challenges due to the enormous power consumption This internship was part of two research projects: Mont-Blanc (European): Developing scalable and power efficient HPC 1 platform based on low-power ARM processors SONGS (ANR): Designing unified and open simulation framework for 2 performance evaluation of next generation systems Adequate models are required Goal : Investigate is it possible to model CPU behavior at coarse grain, especially ARM processors Luka Stanisic Modeling of caches June 21, 2012 2 / 22

Introduction Simulation vs. alternative approach Simulation (cycle-accurate simulation) and emulation: Often too slow Questionable accuracy Luka Stanisic Modeling of caches June 21, 2012 3 / 22

Introduction Simulation vs. alternative approach Simulation (cycle-accurate simulation) and emulation: Often too slow Questionable accuracy We need coarse-grain models: Lots of existing projects: LAPSE, MPI-SIM, BigSIM, MPI-NetSim, MicroGrid, PMAC , . . . Memory is the bottleneck of most HPC applications Starting point of this work were 2 articles from Allan Snavely and his team (PMAC) that seemed very promising: 1 “A Framework for Application Performance Modeling and Prediction” . A.Snavely, L.Carrington, N.Wolter, J.Labarta, R.Badia, A.Purkayastah, in SuperComputing 2002 2 “A Genetic Algorithms Approach to Modeling the Performance of Memory- bound Computations” . M. Tikir, L. Carrington, E. Strohmaier, A. Snavely in SuperComputing 2007 Luka Stanisic Modeling of caches June 21, 2012 3 / 22

Introduction Framework for Application Performance Modeling and Prediction Authors propose a macroscopic approach: Trying to characterize the code as a whole with parameters that can later be related to platform characteristics in order to evaluate perfor- mances Luka Stanisic Modeling of caches June 21, 2012 4 / 22

Introduction Kernel from MultiMAPS MultiMAPS( size , stride , nloops ) allocate buffer size ; timer start; for(i=1: nloops ) access elements in buffer by stride ; timer stop; bandwidth=#accesses/time; deallocate buffer; Luka Stanisic Modeling of caches June 21, 2012 5 / 22

Introduction Kernel from MultiMAPS Our first experiments: MultiMAPS( size , stride , nloops ) allocate buffer size ; timer start; for(i=1: nloops ) access elements in buffer by stride ; timer stop; bandwidth=#accesses/time; deallocate buffer; Luka Stanisic Modeling of caches June 21, 2012 5 / 22

Introduction Methodology Problem with the related work is that it is not very well documented, it is not suited for NUMA, multicore architectures and experiments are not reproducible We wanted to do the measurements in a clean, coherent and systematic way Luka Stanisic Modeling of caches June 21, 2012 6 / 22

Introduction Outline Kernel Parameters 1 Memory Allocation Parameters 2 Optimization Parameters 3 Operating System Parameters 4 Conclusion 5 Luka Stanisic Modeling of caches June 21, 2012 7 / 22

Kernel Parameters Outline Kernel Parameters 1 Memory Allocation Parameters 2 Optimization Parameters 3 Operating System Parameters 4 Conclusion 5 Luka Stanisic Modeling of caches June 21, 2012 8 / 22

Kernel Parameters Influence of Stride Parameter Comparing with the results from Intel Core i7 Sandy Bridge processor: MultiMAPS: Few max values Clear plateaus 1 Sharp drop when getting out of 2 the L1 cache size Performance is lower for larger 3 strides Luka Stanisic Modeling of caches June 21, 2012 9 / 22

Kernel Parameters Influence of Stride Parameter Comparing with the results from Intel Core i7 Sandy Bridge processor: MultiMAPS: Randomization + Boxplots Clear plateaus 1 Sharp drop when getting out of 2 the L1 cache size Performance is lower for larger 3 strides Different bandwidths for 4 strides 8, 16, 32 inside L1 cache size Performance drop for higher 5 memory size values stop after stride 8 Luka Stanisic Modeling of caches June 21, 2012 9 / 22

Kernel Parameters Influence of Stride Parameter Comparing with the results from Intel Core i7 Sandy Bridge processor: MultiMAPS: Randomization + Boxplots Clear plateaus 1 Sharp drop when getting out of 2 the L1 cache size Performance is lower for larger 3 strides Different bandwidths for 4 strides 8, 16, 32 inside L1 cache size Performance drop for higher 5 memory size values stop after stride 8 This is general behavior, but with many exceptions Luka Stanisic Modeling of caches June 21, 2012 9 / 22

Kernel Parameters Unexpected Behavior Example for Intel Core i7 3.40 GHz Sandy Bridge: Irregular behavior inside L1 cache size! Luka Stanisic Modeling of caches June 21, 2012 10 / 22

Kernel Parameters Unexpected Behavior Example for Intel Core i7 3.40 GHz Example for ARM Dual Cortex A9 Sandy Bridge: 1 GHz Snowball: Irregular behavior inside L1 cache size! Luka Stanisic Modeling of caches June 21, 2012 10 / 22

Kernel Parameters Unexpected Behavior Example for Intel Core i7 3.40 GHz Example for ARM Dual Cortex A9 Sandy Bridge: 1 GHz Snowball: Strides 10, 12, 14 have better Irregular behavior inside L1 cache performance than Stride 8 ?!? size! Luka Stanisic Modeling of caches June 21, 2012 10 / 22

Memory Allocation Parameters Outline Kernel Parameters 1 Memory Allocation Parameters 2 Optimization Parameters 3 Operating System Parameters 4 Conclusion 5 Luka Stanisic Modeling of caches June 21, 2012 11 / 22

Memory Allocation Parameters Reproducibility Issue on ARM Same input parameters, consecutive experiments 42 repetitions per each memory size, NO NOISE! Results from ARM Dual Cortex A9 1GHz (Snowball): Luka Stanisic Modeling of caches June 21, 2012 12 / 22

Memory Allocation Parameters Influence of Allocation Strategy on ARM Different memory allocation technique: Performance depend on actual physical address: Luka Stanisic Modeling of caches June 21, 2012 13 / 22

Optimization Parameters Outline Kernel Parameters 1 Memory Allocation Parameters 2 Optimization Parameters 3 Operating System Parameters 4 Conclusion 5 Luka Stanisic Modeling of caches June 21, 2012 14 / 22

Optimization Parameters Influence of Code Optimizations Element type Using long long int which is 64b instead of regular int 32b Vectorized instructions: On Intel: 128b SSE and 256b AVX On ARM: 128b NEON Loop unrolling Standard execution: With loop unrolling: for(j=0;j < buffersize;j+=STRIDE) for(j=0;j < buffersize;j+=STRIDE*8) { { sum+=buffer[j]; sum+=buffer[j]; } ... sum+=buffer[j+7*STRIDE]; } Luka Stanisic Modeling of caches June 21, 2012 15 / 22

Optimization Parameters Results from Intel Sandy Bridge: Luka Stanisic Modeling of caches June 21, 2012 16 / 22

Optimization Parameters Results from ARM Snowball: Luka Stanisic Modeling of caches June 21, 2012 17 / 22

Towards Modeling and Simulation of Exascale Computing Platforms - PowerPoint PPT Presentation

Towards Modeling and Simulation of Exascale Computing Platforms Luka Stanisic Supervised by: A.Legrand, B.Videau and J.F.M ehaut Laboratoire dInformatique de Grenoble MESCAL and NANOSIM teams June 21, 2012 Luka Stanisic Modeling of

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

The U.S. D.O.E. Exascale Computing Project Goals and Challenges Paul Messina, ECP Director

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

HPC Future Look Exascale and Challenges Outline Future architectures Exascale initiatives

Why Nobody Should Care About Operating Systems for Exascale Operating Systems for Exascale Ron

exascale road in China Ruibo WANG National University of Defense Technology Contents NUDT

Major Challenges to Achieve Exascale Performance Shekhar Borkar Intel Corp. April 29, 2009

The Exascale Computing Project (ECP) Paul Messina, ECP Director Stephen Lee, ECP Deputy Director

Exascale Computing Project: Software Technology Perspective Rajeev Thakur, Argonne National Lab.

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

Simulation & Modeling Event-Oriented Simulations Maria Hybinette, UGA Outline Simulation

Towards Exascale Computing Yutaka Ishikawa University of Tokyo RIKEN AICS Outline of This Talk

THE ROAD TO EXASCALE: HARDWARE AND SOFTWARE CHALLENGES JACK DONGARRA UNIVERSITY OF TENNESSEE

Time to Start over? Software for Exascale William Gropp www.cs.illinois.edu/~wgropp Why Is

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Computer Simulation Modeling Jonathan Thaler Department of Computer Science 1 / 61 Modeling

Multi-scale Analysis of Large Distributed Computing Systems Lucas M. Schnorr, Arnaud Legrand,

Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks Application Scheduling on Grids

Our project group has up to now; prepared electricity projects primarily in Turkey, as well as in

UNICREDIT GROUP 3Q13 RESULTS Federico Ghizzoni, Chief Executive Officer Milan, 12 th November 2013

Steady State Property Verification: a Comparison Study Diana EL RABIH ( 1 ) , Gael Gorgo ( 2 ) ,

Overview of grid / cloud research in France M. Dayd Institut des Grilles, CNRS / INS2I June 1

Featured Technologies 1 Corporate Profile Earth Recycle CO., LTD Office location: 726 Tonan

IMPLEMENTING MALAYSIAS 40% INTENSITY REDUCTION INDICATOR: THE KEY ROLE OF THE PRIVATE SECTOR

Towards Modeling and Simulation of Exascale Computing Platforms - PowerPoint PPT Presentation

Towards Modeling and Simulation of Exascale Computing Platforms Luka Stanisic Supervised by: A.Legrand, B.Videau and J.F.M ehaut Laboratoire dInformatique de Grenoble MESCAL and NANOSIM teams June 21, 2012 Luka Stanisic Modeling of

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

The U.S. D.O.E. Exascale Computing Project Goals and Challenges Paul Messina, ECP Director

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

HPC Future Look Exascale and Challenges Outline Future architectures Exascale initiatives

Why Nobody Should Care About Operating Systems for Exascale Operating Systems for Exascale Ron

exascale road in China Ruibo WANG National University of Defense Technology Contents NUDT

Major Challenges to Achieve Exascale Performance Shekhar Borkar Intel Corp. April 29, 2009

The Exascale Computing Project (ECP) Paul Messina, ECP Director Stephen Lee, ECP Deputy Director

Exascale Computing Project: Software Technology Perspective Rajeev Thakur, Argonne National Lab.

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

Simulation &amp; Modeling Event-Oriented Simulations Maria Hybinette, UGA Outline Simulation

Towards Exascale Computing Yutaka Ishikawa University of Tokyo RIKEN AICS Outline of This Talk

THE ROAD TO EXASCALE: HARDWARE AND SOFTWARE CHALLENGES JACK DONGARRA UNIVERSITY OF TENNESSEE

Time to Start over? Software for Exascale William Gropp www.cs.illinois.edu/~wgropp Why Is

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Computer Simulation Modeling Jonathan Thaler Department of Computer Science 1 / 61 Modeling

Multi-scale Analysis of Large Distributed Computing Systems Lucas M. Schnorr, Arnaud Legrand,

Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks Application Scheduling on Grids

Our project group has up to now; prepared electricity projects primarily in Turkey, as well as in

UNICREDIT GROUP 3Q13 RESULTS Federico Ghizzoni, Chief Executive Officer Milan, 12 th November 2013

Steady State Property Verification: a Comparison Study Diana EL RABIH ( 1 ) , Gael Gorgo ( 2 ) ,

Overview of grid / cloud research in France M. Dayd Institut des Grilles, CNRS / INS2I June 1

Featured Technologies 1 Corporate Profile Earth Recycle CO., LTD Office location: 726 Tonan

IMPLEMENTING MALAYSIAS 40% INTENSITY REDUCTION INDICATOR: THE KEY ROLE OF THE PRIVATE SECTOR

Simulation & Modeling Event-Oriented Simulations Maria Hybinette, UGA Outline Simulation