How runtime systems can support resource awareness in HPC: the HPX - - PowerPoint PPT Presentation

how runtime systems can support resource awareness in hpc
SMART_READER_LITE
LIVE PREVIEW

How runtime systems can support resource awareness in HPC: the HPX - - PowerPoint PPT Presentation

Introduction Resource awareness HPX RA in HPX Example Summary How runtime systems can support resource awareness in HPC: the HPX case Tommaso Bianucci Technische Universitt Mnchen 22 June 2018 Tommaso Bianucci Technische Universitt


slide-1
SLIDE 1

Introduction Resource awareness HPX RA in HPX Example Summary

How runtime systems can support resource awareness in HPC: the HPX case

Tommaso Bianucci

Technische Universität München

22 June 2018

Tommaso Bianucci Technische Universität München

slide-2
SLIDE 2

Introduction Resource awareness HPX RA in HPX Example Summary

Exascale will be hard

◮ 1 ExaFLOPS = 1018 FLOPS ◮ Billions of cores? ◮ Etherogeneous hardware

◮ Manycore CPUs ◮ GPUs ◮ FPGAs

− → These machines expose an extreme degree of parallelism.

Tommaso Bianucci Technische Universität München

slide-3
SLIDE 3

Introduction Resource awareness HPX RA in HPX Example Summary

Exascale will be hard

◮ 1 ExaFLOPS = 1018 FLOPS ◮ Billions of cores? ◮ Etherogeneous hardware

◮ Manycore CPUs ◮ GPUs ◮ FPGAs

− → These machines expose an extreme degree of parallelism.

Tommaso Bianucci Technische Universität München

slide-4
SLIDE 4

Introduction Resource awareness HPX RA in HPX Example Summary

Applications are hard

◮ Scaling-impaired applications ◮ Unbalanced execution tree

This causes:

◮ Poor parallel performance ◮ Suboptimal resource usage

− → Some applications do not scale well.

Tommaso Bianucci Technische Universität München

slide-5
SLIDE 5

Introduction Resource awareness HPX RA in HPX Example Summary

Applications are hard

◮ Scaling-impaired applications ◮ Unbalanced execution tree

This causes:

◮ Poor parallel performance ◮ Suboptimal resource usage

− → Some applications do not scale well.

Tommaso Bianucci Technische Universität München

slide-6
SLIDE 6

Introduction Resource awareness HPX RA in HPX Example Summary

Applications are hard

◮ Scaling-impaired applications ◮ Unbalanced execution tree

This causes:

◮ Poor parallel performance ◮ Suboptimal resource usage

− → Some applications do not scale well.

Tommaso Bianucci Technische Universität München

slide-7
SLIDE 7

Introduction Resource awareness HPX RA in HPX Example Summary

Programming model matters

Current predominant model in HPC:

◮ Fork-join for shared memory (OpenMP) ◮ Communicating Sequential Processes for distributed memory

(MPI) Problems:

◮ Global barriers ◮ Load imbalance

P.A.Grubel:"Dynamic adaptation in hpx, a task-based parallel runtime system" 2016. Tommaso Bianucci Technische Universität München

slide-8
SLIDE 8

Introduction Resource awareness HPX RA in HPX Example Summary

Programming model matters

Current predominant model in HPC:

◮ Fork-join for shared memory (OpenMP) ◮ Communicating Sequential Processes for distributed memory

(MPI) Problems:

◮ Global barriers ◮ Load imbalance

P.A.Grubel:"Dynamic adaptation in hpx, a task-based parallel runtime system" 2016. Tommaso Bianucci Technische Universität München

slide-9
SLIDE 9

Introduction Resource awareness HPX RA in HPX Example Summary

Resource awareness: clever software for complex hardware

Resource awareness

◮ Adaptive allocation and usage of resources ◮ The system is aware of its own resources ◮ At runtime vs. before execution

Tommaso Bianucci Technische Universität München

slide-10
SLIDE 10

Introduction Resource awareness HPX RA in HPX Example Summary

Resource awareness: clever software for complex hardware

Resource awareness

◮ Adaptive allocation and usage of resources ◮ The system is aware of its own resources ◮ At runtime vs. before execution

Tommaso Bianucci Technische Universität München

slide-11
SLIDE 11

Introduction Resource awareness HPX RA in HPX Example Summary

What are resources?

  • 1. Hardware entities

◮ Computational units ◮ Memory ◮ Bus/network bandwidth ◮ I/O devices ◮ Power ◮ Thermal

  • 2. Software entities

◮ Buffers ◮ Queues Tommaso Bianucci Technische Universität München

slide-12
SLIDE 12

Introduction Resource awareness HPX RA in HPX Example Summary

What are resources?

  • 1. Hardware entities

◮ Computational units ◮ Memory ◮ Bus/network bandwidth ◮ I/O devices ◮ Power ◮ Thermal

  • 2. Software entities

◮ Buffers ◮ Queues Tommaso Bianucci Technische Universität München

slide-13
SLIDE 13

Introduction Resource awareness HPX RA in HPX Example Summary

Different levels of awareness

  • 1. Embedded computing

E.g.: Invasive computing on MPSoC

  • 2. Application/runtime system level

E.g.: load balance, task scheduling

  • 3. Supercomputing facility

E.g.: Invasive MPI + job scheduler integration

Tommaso Bianucci Technische Universität München

slide-14
SLIDE 14

Introduction Resource awareness HPX RA in HPX Example Summary

Different levels of awareness

  • 1. Embedded computing

E.g.: Invasive computing on MPSoC

  • 2. Application/runtime system level

E.g.: load balance, task scheduling

  • 3. Supercomputing facility

E.g.: Invasive MPI + job scheduler integration

Tommaso Bianucci Technische Universität München

slide-15
SLIDE 15

Introduction Resource awareness HPX RA in HPX Example Summary

Different levels of awareness

  • 1. Embedded computing

E.g.: Invasive computing on MPSoC

  • 2. Application/runtime system level

E.g.: load balance, task scheduling

  • 3. Supercomputing facility

E.g.: Invasive MPI + job scheduler integration

Tommaso Bianucci Technische Universität München

slide-16
SLIDE 16

Introduction Resource awareness HPX RA in HPX Example Summary

High Performance paralleX

C++ runtime system for

◮ Task-based parallelism ◮ Shared memory + Distributed memory parallelization ◮ Fine-grained parallelism

Tommaso Bianucci Technische Universität München

slide-17
SLIDE 17

Introduction Resource awareness HPX RA in HPX Example Summary

High Performance paralleX

C++ runtime system for

◮ Task-based parallelism ◮ Shared memory + Distributed memory parallelization ◮ Fine-grained parallelism

Tommaso Bianucci Technische Universität München

slide-18
SLIDE 18

Introduction Resource awareness HPX RA in HPX Example Summary

HPX foundations

◮ Asynchronous scheduling and

execution

◮ Lightweight synchronization ◮ Active Global Address Space

(AGAS)

◮ Performance monitoring

framework

  • T. Heller et al.: “Hpx – an open source c++

standard library for parallelism and concurrency” 2017. Tommaso Bianucci Technische Universität München

slide-19
SLIDE 19

Introduction Resource awareness HPX RA in HPX Example Summary

Futures

  • H. Kaiser et al.: “Parallex an advanced parallel execution model for scaling-impaired applications” 2009.

Tommaso Bianucci Technische Universität München

slide-20
SLIDE 20

Introduction Resource awareness HPX RA in HPX Example Summary

Programming model

  • T. Heller: “C++ on its way to exascale and beyond - The HPX Parallel Runtime System” 2016.

Tommaso Bianucci Technische Universität München

slide-21
SLIDE 21

Introduction Resource awareness HPX RA in HPX Example Summary

Programming model

  • T. Heller: “C++ on its way to exascale and beyond - The HPX Parallel Runtime System” 2016.

Tommaso Bianucci Technische Universität München

slide-22
SLIDE 22

Introduction Resource awareness HPX RA in HPX Example Summary

Programming model

  • T. Heller: “C++ on its way to exascale and beyond - The HPX Parallel Runtime System” 2016.

Tommaso Bianucci Technische Universität München

slide-23
SLIDE 23

Introduction Resource awareness HPX RA in HPX Example Summary

Programming model

  • T. Heller: “C++ on its way to exascale and beyond - The HPX Parallel Runtime System” 2016.

Tommaso Bianucci Technische Universität München

slide-24
SLIDE 24

Introduction Resource awareness HPX RA in HPX Example Summary

HPX and Resource Awareness

Capabilities

  • 1. Task scheduling

− → Work stealing + NUMA-awareness

  • 2. AGAS

− → Dynamic relocation of objects

  • 3. Percolation

− → Directly addressing HW accelerators

  • 4. Performance counters

− → Easier integration into applications

Tommaso Bianucci Technische Universität München

slide-25
SLIDE 25

Introduction Resource awareness HPX RA in HPX Example Summary

HPX and Resource Awareness

Capabilities

  • 1. Task scheduling

− → Work stealing + NUMA-awareness

  • 2. AGAS

− → Dynamic relocation of objects

  • 3. Percolation

− → Directly addressing HW accelerators

  • 4. Performance counters

− → Easier integration into applications

Tommaso Bianucci Technische Universität München

slide-26
SLIDE 26

Introduction Resource awareness HPX RA in HPX Example Summary

HPX and Resource Awareness

Capabilities

  • 1. Task scheduling

− → Work stealing + NUMA-awareness

  • 2. AGAS

− → Dynamic relocation of objects

  • 3. Percolation

− → Directly addressing HW accelerators

  • 4. Performance counters

− → Easier integration into applications

Tommaso Bianucci Technische Universität München

slide-27
SLIDE 27

Introduction Resource awareness HPX RA in HPX Example Summary

HPX and Resource Awareness

Capabilities

  • 1. Task scheduling

− → Work stealing + NUMA-awareness

  • 2. AGAS

− → Dynamic relocation of objects

  • 3. Percolation

− → Directly addressing HW accelerators

  • 4. Performance counters

− → Easier integration into applications

Tommaso Bianucci Technische Universität München

slide-28
SLIDE 28

Introduction Resource awareness HPX RA in HPX Example Summary

HPX and Resource Awareness

Capabilities

  • 1. Task scheduling

− → Work stealing + NUMA-awareness

  • 2. AGAS

− → Dynamic relocation of objects

  • 3. Percolation

− → Directly addressing HW accelerators

  • 4. Performance counters

− → Easier integration into applications

Tommaso Bianucci Technische Universität München

slide-29
SLIDE 29

Introduction Resource awareness HPX RA in HPX Example Summary

HPX and Resource Awareness (2)

Limitations

  • 1. (An)elasticity of HPX

− → Worker threads and localities cannot be changed at runtime

  • 2. Energy unawareness

− → E.g. no DVFS support

  • 3. Fault tolerance

− → No built-in facility

Tommaso Bianucci Technische Universität München

slide-30
SLIDE 30

Introduction Resource awareness HPX RA in HPX Example Summary

HPX and Resource Awareness (2)

Limitations

  • 1. (An)elasticity of HPX

− → Worker threads and localities cannot be changed at runtime

  • 2. Energy unawareness

− → E.g. no DVFS support

  • 3. Fault tolerance

− → No built-in facility

Tommaso Bianucci Technische Universität München

slide-31
SLIDE 31

Introduction Resource awareness HPX RA in HPX Example Summary

HPX and Resource Awareness (2)

Limitations

  • 1. (An)elasticity of HPX

− → Worker threads and localities cannot be changed at runtime

  • 2. Energy unawareness

− → E.g. no DVFS support

  • 3. Fault tolerance

− → No built-in facility

Tommaso Bianucci Technische Universität München

slide-32
SLIDE 32

Introduction Resource awareness HPX RA in HPX Example Summary

HPX coding example: the Mandelbrot set

  • zc,0

= 0 zc,n+1 = z2

c,n + c

M = {c ∈ C : lim

n→∞ |zc,n| < +∞}

Tommaso Bianucci Technische Universität München

slide-33
SLIDE 33

Introduction Resource awareness HPX RA in HPX Example Summary

HPX coding example: the Mandelbrot set

  • zc,0

= 0 zc,n+1 = z2

c,n + c

M = {c ∈ C : lim

n→∞ |zc,n| < +∞}

Tommaso Bianucci Technische Universität München

slide-34
SLIDE 34

Introduction Resource awareness HPX RA in HPX Example Summary

HPX coding example: the Mandelbrot set

  • zc,0

= 0 zc,n+1 = z2

c,n + c

M = {c ∈ C : lim

n→∞ |zc,n| < +∞}

Tommaso Bianucci Technische Universität München

slide-35
SLIDE 35

Introduction Resource awareness HPX RA in HPX Example Summary

Mandelbrot: kernel

1 void mandelbrot_kernel ( i n t taskNo , s t a t i c D a t a ∗sd ) { 2 // Computes

  • ne

row

  • f

the image 3 i n t i = taskNo ; 4 f o r ( i n t j =0; j < sd− >xRes ; ++j ) { 5 i n t x = getX ( j , sd ) ; 6 i n t y = getY ( i , sd ) ; 7 complex double Z = 0 + 0∗ I ; 8 complex double C = x + y∗ I ; 9 10 i n t k = 0; 11 do { // Check the convergence

  • f

the sequence 12 Z = Z ∗ Z + C ; 13 k++; 14 } w h i l e ( cabs (Z) < 2 && k < max_iter ) ; 15 16 i f ( k == max_iter ) { // In case i t did not d i v e r g e . . . 17 memcpy( img [ i ] [ j ] , black , 3) ; // . . . we s e t a black p i x e l 18 } 19 e l s e { // I f i t d i v e r g e d . . . 20 // . . . we s e t the c o l o r a cc o rd i ng to k (# i t e r a t i o n s ) 21 memcpy( img [ i ] [ j ] , ge t C o l o r ( k ) , 3) ; 22 } 23 } 24 } Tommaso Bianucci Technische Universität München

slide-36
SLIDE 36

Introduction Resource awareness HPX RA in HPX Example Summary

Mandelbrot: sequential code

1 void mandelbrot_seq ( . . . ) { 2 3 s t a t i c D a t a ∗sd = assembleStaticData ( . . . ) ; 4 5 // I t e r a t e

  • n rows ,

s e q u e n t i a l l y 6 f o r ( i n t i =0; i < yRes ; ++i ) 7 { 8 mandelbrot_kernel ( i , sd ) ; 9 } 10 } Tommaso Bianucci Technische Universität München

slide-37
SLIDE 37

Introduction Resource awareness HPX RA in HPX Example Summary

Mandelbrot: futurized code

1 void mandelbrot_hpx ( . . . ) { 2 3 s t a t i c D a t a ∗sd = assembleStaticData ( . . . ) ; 4 5 std : : vector <hpx : : future <void> > f u t u r e s ; 6 f o r ( i n t i =0; i < yRes ; ++i ) 7 { 8 hpx : : future <void> f = hpx : : async(& mandelbrot_kernel , i , sd ) ; 9 f u t u r e s . push_back ( f ) ; 10 } 11 12 hpx : : w a i t _ a l l ( f u t u r e s ) ; 13 } Tommaso Bianucci Technische Universität München

slide-38
SLIDE 38

Introduction Resource awareness HPX RA in HPX Example Summary

Mandelbrot step by step

Tommaso Bianucci Technische Universität München

slide-39
SLIDE 39

Introduction Resource awareness HPX RA in HPX Example Summary

Summary

◮ Future exascale computing requires smart code. ◮ Resource awareness can be a way to achieve better

performance.

◮ HPX has the potential to become a major runtime system for

HPC, thanks to both its performance and programmability.

Thanks!

Tommaso Bianucci Technische Universität München

slide-40
SLIDE 40

Introduction Resource awareness HPX RA in HPX Example Summary

Summary

◮ Future exascale computing requires smart code. ◮ Resource awareness can be a way to achieve better

performance.

◮ HPX has the potential to become a major runtime system for

HPC, thanks to both its performance and programmability.

Thanks!

Tommaso Bianucci Technische Universität München