Introduction to High Performance Computing and Optimization Oliver - PowerPoint PPT Presentation

Institut für Numerische Mathematik und Optimierung Introduction to High Performance Computing and Optimization Oliver Ernst Audience: 1./3. CMS, 5./7./9. Mm, doctoral students Wintersemester 2012/13

Contents 1. Introduction 2. Processor Architecture 3. Optimization of Serial Code 3.1 Performance Measurement 3.2 Optimization Guidelines 3.3 Compiler-Aided Optimization 3.4 Combine example 3.5 Further Optimization Issues 4. Parallel Computing 4.1 Introduction 4.2 Scalability 4.3 Parallel Architechtures 4.4 Networks 5. OpenMP Programming Oliver Ernst (INMO) Wintersemester 2012/13 1 HPC

Contents 1. Introduction 2. Processor Architecture 3. Optimization of Serial Code 4. Parallel Computing 4.1 Introduction 4.2 Scalability 4.3 Parallel Architechtures 4.4 Networks 5. OpenMP Programming Oliver Ernst (INMO) Wintersemester 2012/13 138 HPC

Contents 4. Parallel Computing 4.1 Introduction 4.2 Scalability 4.3 Parallel Architechtures 4.4 Networks Oliver Ernst (INMO) Wintersemester 2012/13 139 HPC

Parallel Computing Introduction Many processing units (computers, nodes, processors, cores, threads) collaborate to solve one problem concurrently. Currently: many means up to 1.5 million (current Top500 leader). Objectives: faster execution time for one task (speedup), solution of larger problem (scaleup), memory requirements exceed resources of single computer. Challenges for hardware designers: Power Communication network Memory bandwidth Low level synchronization (e.g. cache coherency) File system Challenges for programmer: Load balancing Synchronization/Communication Algorithm design and redesign Software interface Make maximal use of computer’s resources. Oliver Ernst (INMO) Wintersemester 2012/13 140 HPC

Parallel Computing Types of parallelism: Data parallelism The scale of parallelism refers to the size of concurrently executed tasks. Fine-grain parallelism at scale of functional units of processor (ILP), individual instructions or micro-instructions. Medium-grain parallelism at scale of independent iterations of a loop (e.g. linear algebra operations on vectors, matrices, tensors). Coarse-grain parallelism refers to larger computational tasks with looser synchronization (e.g. domain decomposition methods in PDE/linear system solvers). Data parallel applications are usually implemented using an SPMD (Single Program, Multiple Data) software design, in which the same program runs on all processing units, but not in the tightly synchronized lockstep fashion of SIMD. Oliver Ernst (INMO) Wintersemester 2012/13 141 HPC

Parallel Computing Types of parallelism: Functional parallelism Concurrent execution of different tasks. Programming style known as MPMD (Multiple Program, Multiple Data). More difficult to load balance. Variants: master-slave scheme: one administrative unit to distribute tasks/collect results; remainung units receive tasks and report results to master upon completion. Large-scale functional decomposition: large loosely coupled tasks executed on larger computational units with looser synchronization (e.g. climate models coupling ocean and atmospheric dynamics, fluid-structure interation codes, “multiphysics” codes) Oliver Ernst (INMO) Wintersemester 2012/13 142 HPC

Contents 4. Parallel Computing 4.1 Introduction 4.2 Scalability 4.3 Parallel Architechtures 4.4 Networks Oliver Ernst (INMO) Wintersemester 2012/13 143 HPC

Scalability Basic considerations T : time for 1 worker to complete task , N workers S := T ideal speedup (perfect linear scaling) N Not all computational (or other) tasks scale in this ideal way. “Nine women can’t make a baby in one month.” Fred Brooks. The Mythical Man-Month (1975) Limiting factors: Not all workers receive tasks of equal complexity (or aren’t equally fast); load imbalance Some resources necessary for task completion not available N times; serialization of concurrent execution while waiting for access. Extra work/waiting time due to parallel execution; overhead which is not required for serial task completion. Oliver Ernst (INMO) Wintersemester 2012/13 144 HPC

Scalability Performance metrics: Strong scaling T = T s f = s + p serial task completion time, fixed problem size s : serial portion (not parallelizable) of task p : (perfectly) parallelizable portion of task Solution time using N workers: : f = s + p T p N Known as strong scaling since task size fixed. Parallelization used to reduce solution time for fixed problem. Oliver Ernst (INMO) Wintersemester 2012/13 145 HPC

Scalability Performance metrics: Weak scaling Use parallelism to solve larger problem: assume s fixed and parallelizable portion grows with N like N α , α > 0 (often α = 1 ). Then: T = T s v = s + pN α serial task completion time, variable problem size Solution time using N workers: T p v = s + pN α − 1 Known as weak scaling since task size variable. Parallelization used to solve larger problem. Oliver Ernst (INMO) Wintersemester 2012/13 146 HPC

Scalability Application speedup Define performance := work application speedup := parallel performance time , serial performance . Serial performance for fixed problem size s + p : f = s + p = s + p P s s + p = 1 . T s f Parallel performance for fixed problem size, normalize s + p = 1 : f = s + p s + p 1 P p = s + p/N = . T p s + 1 − s f N Application speedup (fixed problem size): P p 1 f S f = = (cf. Amdahl’s Law). P s s + 1 − s f N Oliver Ernst (INMO) Wintersemester 2012/13 147 HPC

Scalability Application speedup: different notion of “work” Count as work only parallelizable portion. Serial performance: = p P sp = p. f T s f Parallel performance: = p 1 − s P pp = . T p f s + 1 − s f N Application speedup: P pp 1 S p f f = = . P sp s + 1 − s f N P pp no longer identical with S p f . f Scalability doesn’t change, but performance does (factor of p smaller). Oliver Ernst (INMO) Wintersemester 2012/13 148 HPC

Introduction to High Performance Computing and Optimization Oliver - PowerPoint PPT Presentation

Institut fr Numerische Mathematik und Optimierung Introduction to High Performance Computing and Optimization Oliver Ernst Audience: 1./3. CMS, 5./7./9. Mm, doctoral students Wintersemester 2012/13 Contents 1. Introduction 2. Processor

New York University High Performance Computing High Performance Computing Information

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Introduction to High Performance Computing Pierre Aubert High Performance Computing (HPC)

Introduction to High Performance Computing Using Sapelo2 at GACRC Georgia Advanced Computing

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

Trends in High Performance Trends in High Performance Computing and the Grid Computing and the

An Overview of High An Overview of High Performance Computing and Performance Computing and

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

High Performance Computing, High Performance Computing, Computational Grid, and Numerical

High Performance Computing at High Performance Computing at the University of Utah: A User the

High-performance computing in Java: the data processing of Gaia X. Luri & J. Torra ICCUB/IEEC

Parallel Programming and High-Performance Computing Part 2: High-Performance Networks Dr.

Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn,

An Overview of High Performance An Overview of High Performance Computing, Clusters, and the Grid

High Performance Computing, High Performance Computing, Computational Grid, and Numerical

Density dependent transmission from process algebra models of disease spread Introduction

ISO Layering Architecture ISO Layering Architecture Srinidhi Varadarajan ISO Layering ISO

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David B2-206 Topic Overview

Diamagnetism & Paramagnetism, Shielding, Exceptions to Aufbau AP Chem Paramagnetism Having

Normal behavior in heavy rare-earths Electrical resistivity minimum in Gd 2 PdSi 3 (also in single

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Nuclear Beams at HL - LHC Plans, requirements, solutions CERN-ACC-SLIDES-2014-0082 HiLumi LHC

Introduction to High Performance Computing and Optimization Oliver - PowerPoint PPT Presentation

Institut fr Numerische Mathematik und Optimierung Introduction to High Performance Computing and Optimization Oliver Ernst Audience: 1./3. CMS, 5./7./9. Mm, doctoral students Wintersemester 2012/13 Contents 1. Introduction 2. Processor

New York University High Performance Computing High Performance Computing Information

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Introduction to High Performance Computing Pierre Aubert High Performance Computing (HPC)

Introduction to High Performance Computing Using Sapelo2 at GACRC Georgia Advanced Computing

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

Trends in High Performance Trends in High Performance Computing and the Grid Computing and the

An Overview of High An Overview of High Performance Computing and Performance Computing and

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

High Performance Computing, High Performance Computing, Computational Grid, and Numerical

High Performance Computing at High Performance Computing at the University of Utah: A User the

High-performance computing in Java: the data processing of Gaia X. Luri &amp; J. Torra ICCUB/IEEC

Parallel Programming and High-Performance Computing Part 2: High-Performance Networks Dr.

Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn,

An Overview of High Performance An Overview of High Performance Computing, Clusters, and the Grid

High Performance Computing, High Performance Computing, Computational Grid, and Numerical

Density dependent transmission from process algebra models of disease spread Introduction

ISO Layering Architecture ISO Layering Architecture Srinidhi Varadarajan ISO Layering ISO

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David B2-206 Topic Overview

Diamagnetism &amp; Paramagnetism, Shielding, Exceptions to Aufbau AP Chem Paramagnetism Having

Normal behavior in heavy rare-earths Electrical resistivity minimum in Gd 2 PdSi 3 (also in single

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Nuclear Beams at HL - LHC Plans, requirements, solutions CERN-ACC-SLIDES-2014-0082 HiLumi LHC

High-performance computing in Java: the data processing of Gaia X. Luri & J. Torra ICCUB/IEEC

Diamagnetism & Paramagnetism, Shielding, Exceptions to Aufbau AP Chem Paramagnetism Having