Lightweight Requirements Engineering for Exascale Co-design Felix - PowerPoint PPT Presentation

Lightweight Requirements Engineering for Exascale Co-design Felix Wolf, TU Darmstadt Application System 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 1

Acknowledgement • Alexandru Calotoiu, TU Darmstadt • Alexander Graf, TU Darmstadt • Torsten Hoefler, ETH Zurich • Daniel Lorenz, TU Darmstadt • Sergei Shudler, TU Darmstadt • Sebastian Rinke, TU Darmstadt 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 2

Co-design Workload System Better algorithms 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 3

Current performance might be deceptive… Computation n o i t a t u p Communication m o C Communication 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 4

Hardware-specific performance models Application 1 Application 1 Application 1 Application 1 Application 1 Performance Performance Performance Performance Performance Performance Performance Performance Performance model 1.1 model 1.2 model 1.3 model 1.1 model 1.2 model 1.3 Performance Performance Performance model 1.1 model 1.2 model 1.3 Performance Performance Performance model 1.1 model 1.2 model 1.3 model 1.1 model 1.2 model 1.3 … System 1 System 2 System n 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 5

Application-centric requirements models Application 1 Application 1 Application 1 Application 1 Application 1 Requirments Requirments Requirments model 1 Requirments model 1 Requirments model 1 model 1 model 1 System 1 System 2 System n 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 6

Data metabolism at the hardware / software interface Application Hardware 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 7

Hardware-independent requirement metrics Memory #Bytes used #Bytes used #Bytes used + Stack #Loads #Loads #Loads + Stack & stores & stores distance & stores distance CPU #FLOPS #FLOPS #FLOPS #Bytes #Bytes #Bytes sent & sent & sent & received received received Network 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 8

Requirements model of an application Set of functions p = #processes n = input size per process r i (p,n) with each r i representing one of the requirement metrics • All metrics refer to single process • We model neither time nor energy 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 9

Lightweight requirements engineering for (exascale) co-design Collect Derive Extrapolate portable requirement to new requirement models system metrics 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 10

Collection of requirement metrics Requirement Metric Profiling tool Computation # Floating-point operations Network comm. # Bytes sent & received Memory # Bytes used getrusage() footprint Memory access # Loads & stores Threadspotter Memory locality Stack distance Collection single-threaded (#FLOPS roughly independent of #threads) 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 11

Modeling locality Reuse distance vs. stack distance Paratools Threadspotter A B C B A Reuse distance = 1 Stack distance=1 Reuse distance = 3 Stack distance = 2 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 12

Automatic empirical performance modeling with Extra-P main() { foo() bar() compute() Instrumentation } Small-scale measurements Extra-P Input Output Human-readable, multi-parameter performance models n m A. Calotoiu, et al.: Fast Multi-Parameter j kl ( x l ) i kl ⋅ log 2 ∑ ∏ f ( x 1 ,.., x m ) = c k x l Performance Modeling ( CLUSTER ’16 ) k = 1 l = 1 www.scalasca.org/software/extra-p/download.html 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 13

Test applications Kripke LULESH MILC icoFoam Relearn 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 14

Experimental setup JUQUEEN @ Jülich Supercomputing Centre IBM Blue Gene/Q Lichtenberg @ TU Darmstadt Intel Xeon with Infiniband 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 15

Modeling application requirements Models represent per-process effects p – number of processes n – problem size per process Lulesh Requirement Metric Model 10 5 ⋅ n ⋅ log( n ) ⋅ p 0.25 ⋅ log( p ) Computation #FLOPs 10 3 ⋅ n ⋅ p 0.25 ⋅ log( p ) Communication #Bytes sent & received 10 5 ⋅ n ⋅ log( n ) ⋅ log( p ) Memory access #Loads & stores 10 5 ⋅ n ⋅ log( n ) Memory footprint #Bytes used Constant Memory locality Stack distance 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 16

Determining requirements on a new system Requirement Available sockets # Processes models Overall Requirements problem size #FLOPS #Bytes sent ... Problem size Available memory per process per process 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 17

Requirements engineering process Memory capacity Memory bandwidth Computational performance Network bandwidth 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 18

Case study Three system upgrades Racks x 2 Sockets x 2 Memory x 2 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 19

icoFoam Baseline LULESH Relearn Three upgrades – Apps Kripke MILC summary Ratios System Upgrade A: Double the racks Problem size per process 1 1 1 1 0.5 1 Overall problem size 2 2 2 2 1 2 LULESH Computation 1 1.2 1 1 0.5 1 Communication 1 1.2 1 1 0.7 1 Memory accesses 2 1.2 2.8 2 0.7 1 Relearn MILC System Upgrade B: Double the sockets Problem size per process 0.5 0.5 0.5 0.3 0.3 0.5 Overall problem size 1 1 1 0.5 0.6 1 Kripke Computation 0.5 0.6 0.5 0.3 0.2 0.5 Communication 0.5 0.6 0.5 0.3 0.3 0.5 Memory accesses 0.5 1 1.4 1 0.5 0.5 System Upgrade C: Double the memory Problem size per process 2 1.4 2 4 1.4 2 Overall problem size 2 1.4 2 4 1.4 2 Kripke Relearn Computation 2 1.4 2 4 1.7 2 Communication 2 1.4 2 4 1.4 2 Memory accesses 2 1.4 2 4 1.4 2 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 20 icoFoam MILC

Case study II Three exascale strawman systems Metric Massively Vector Hybrid parallel Nodes 2 * 10 4 5 * 10 4 10 4 Processors 2 * 10 9 5 * 10 7 10 8 Processors per node 10 5 10 3 10 4 Memory per processor 5 * 10 6 2 * 10 8 10 8 Flop/s per processor 5 * 10 8 2 * 10 10 10 10 Moderate Many but weak Few but powerful number of processors processors moderate processors Total memory: 10 PB 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 21

Case study II Three exascale strawman systems Metric Massively Vector Hybrid parallel Maximum overall 10 10 10 10 10 10 Kripke problem size Minimum wall time for 0.1 0.1 0.1 benchmark problem[s] Maximum overall Lulesh 3.9 10 10 1.7 10 10 1.9 10 10 Bigger problem problem size versus Minimum wall time for 40 21.5 33 faster solution benchmark problem [s] Maximum overall 10 10 10 10 10 10 MILC problem size Minimum wall time for 10 2 10 2 10 2 benchmark problem [s] Relearn Maximum overall 5 10 10 4 10 12 10 12 problem size Vector system clear winner Minimum wall time for 4 0.02 0.2 benchmark problem [s] 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 22

Summary Application-centric requirements models Automated • No need to integrate hardware knowledge • Generation via standard profiling tools • Memory locality taken into account Practical co-design process • Extrapolates requirements to envisaged system BOE co-design for large workloads • Points out bottlenecks on both sides 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 23

Tasking Idea – separate problem decomposition from concurrency • Decompose problem into a set of tasks and insert them into task pool • Threads fetch them from there until all tasks are completed and task pool empty. Note that a task may create new tasks • Advantage: good load balance if problem is over-decomposed create tasks Thread pool Task pool Scheduler assign them to threads fetch tasks 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 24

Lightweight Requirements Engineering for Exascale Co-design Felix - PowerPoint PPT Presentation

Lightweight Requirements Engineering for Exascale Co-design Felix Wolf, TU Darmstadt Application System 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 1 Acknowledgement

Why Nobody Should Care About Operating Systems for Exascale Operating Systems for Exascale Ron

exascale road in China Ruibo WANG National University of Defense Technology Contents NUDT

Major Challenges to Achieve Exascale Performance Shekhar Borkar Intel Corp. April 29, 2009

HPC Future Look Exascale and Challenges Outline Future architectures Exascale initiatives

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

Exploring Lightweight Implementations of Generics Bruno Oliveira University of Oxford Page 1

Lightweight Block Cipher Design Gregor Leander HGI, Ruhr University Bochum, Germany Croatia 2014

New Lightweight DES Variants Suited for RFID Applications G. Leander, C. Paar, A. Poschmann, K.

Hermes: A Language for Lightweight Encryption Torben gidius Mogensen RC 2020 Background:

Lightweight Block Cipher Design Gregor Leander HGI, Ruhr University Bochum, Germany Sardinia

The Exascale Computing Project (ECP) Paul Messina, ECP Director Stephen Lee, ECP Deputy Director

Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy Iverson 1 , 2 , Ya Ju Fan 1 ,

Containment Domains Resilience Mechanisms and Tools Toward Exascale Resilience Mattan Erez The

Hit The Ground Running: AFS Fifteen minutes of information you need to understand how to install

Search for Dark Matter in association with a hadronically DPF Conference August 1 st , 2017

Lisa Thornton, Lisa Thornton Inc Ideal Regulatory Framework for Broadband 25 March 2009 Lisa

Runtime Model Predictive Verification on Embedded Platforms 1 Pei Zhang, Jianwen Li, Joseph

Effective Use of Non-Debtor Third Party Releases Presented by: NCBJmeetjng.org 91st Annual

THermodynamic Formalism and Uncertainty Quantification Luc Rey-Bellet University of

ECE 242 Data Structures Lecture 29 Graph Traversal November 23, 2009 ECE242 L29: Graph

parents(charles,elizabeth,philip). parents charles elizabeth philip a+bc + a b c

Lightweight Requirements Engineering for Exascale Co-design Felix - PowerPoint PPT Presentation

Lightweight Requirements Engineering for Exascale Co-design Felix Wolf, TU Darmstadt Application System 11/17/19 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 1 Acknowledgement

Why Nobody Should Care About Operating Systems for Exascale Operating Systems for Exascale Ron

exascale road in China Ruibo WANG National University of Defense Technology Contents NUDT

Major Challenges to Achieve Exascale Performance Shekhar Borkar Intel Corp. April 29, 2009

HPC Future Look Exascale and Challenges Outline Future architectures Exascale initiatives

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

Exploring Lightweight Implementations of Generics Bruno Oliveira University of Oxford Page 1

Lightweight Block Cipher Design Gregor Leander HGI, Ruhr University Bochum, Germany Croatia 2014

New Lightweight DES Variants Suited for RFID Applications G. Leander, C. Paar, A. Poschmann, K.

Hermes: A Language for Lightweight Encryption Torben gidius Mogensen RC 2020 Background:

Lightweight Block Cipher Design Gregor Leander HGI, Ruhr University Bochum, Germany Sardinia

The Exascale Computing Project (ECP) Paul Messina, ECP Director Stephen Lee, ECP Deputy Director

Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy Iverson 1 , 2 , Ya Ju Fan 1 ,

Containment Domains Resilience Mechanisms and Tools Toward Exascale Resilience Mattan Erez The

Hit The Ground Running: AFS Fifteen minutes of information you need to understand how to install

Search for Dark Matter in association with a hadronically DPF Conference August 1 st , 2017

Lisa Thornton, Lisa Thornton Inc Ideal Regulatory Framework for Broadband 25 March 2009 Lisa

Runtime Model Predictive Verification on Embedded Platforms 1 Pei Zhang, Jianwen Li, Joseph

Effective Use of Non-Debtor Third Party Releases Presented by: NCBJmeetjng.org 91st Annual

THermodynamic Formalism and Uncertainty Quantification Luc Rey-Bellet University of

ECE 242 Data Structures Lecture 29 Graph Traversal November 23, 2009 ECE242 L29: Graph

parents(charles,elizabeth,philip). parents charles elizabeth philip a+b*c + a * b c

parents(charles,elizabeth,philip). parents charles elizabeth philip a+bc + a b c