PIC codes in the HPC environment PIC codes in the HPC environment - - - PowerPoint PPT Presentation

pic codes in the hpc environment
SMART_READER_LITE
LIVE PREVIEW

PIC codes in the HPC environment PIC codes in the HPC environment - - - PowerPoint PPT Presentation

PIC codes in the HPC environment PIC codes in the HPC environment - A. Beck SMILEI training workshop 1 / 35 Structure HPC environment, trends and prospectives 1 The PIC method and its parallelization 2 The load balancing issue 3 PIC codes


slide-1
SLIDE 1

PIC codes in the HPC environment

PIC codes in the HPC environment - A. Beck SMILEI training workshop 1 / 35

slide-2
SLIDE 2

Structure

1

HPC environment, trends and prospectives

2

The PIC method and its parallelization

3

The load balancing issue

PIC codes in the HPC environment - A. Beck SMILEI training workshop 2 / 35

slide-3
SLIDE 3

Structure

1

HPC environment, trends and prospectives

2

The PIC method and its parallelization

3

The load balancing issue

PIC codes in the HPC environment - A. Beck SMILEI training workshop 3 / 35

slide-4
SLIDE 4

What is a super computer ?

Compute node Compute node Compute node Network Compute node

Distributed computing

PIC codes in the HPC environment - A. Beck SMILEI training workshop 4 / 35

slide-5
SLIDE 5

What is a super computer ?

Compute node Compute node Compute node Compute node Network

Memory

Compute unit

Distributed memory system

PIC codes in the HPC environment - A. Beck SMILEI training workshop 5 / 35

slide-6
SLIDE 6

What is a super computer ?

Compute node Compute node Compute node Compute node Network

Memory

Core Core Core Core

Distributed {shared memory} system

PIC codes in the HPC environment - A. Beck SMILEI training workshop 6 / 35

slide-7
SLIDE 7

Objective exaflop/s

Tianhe (China, June 2013) : 31 PFLOPS for 17 MW gives 1.85 GFLOPS/W. Extrapolation : 1000 PFLOPS ==> 540 MW !

= ? =

The objective is P < 20 MW The challenge for constructors is to increase both total performance and energy efficiency of computing nodes.

PIC codes in the HPC environment - A. Beck SMILEI training workshop 7 / 35

slide-8
SLIDE 8

Constructors strategy : 1) Many core

Increased performances Reasonable energy budget

PIC codes in the HPC environment - A. Beck SMILEI training workshop 8 / 35

slide-9
SLIDE 9

Constructors strategy : 2) GPGPU

NVIDIA & AMD :General Purpose Graphical Processor Unit

Most energy efficient architecture today Difficult to adress :

Libraries : Cuda, OpenCl. Directives programming : OpenMP 4 ou openACC.

PIC codes in the HPC environment - A. Beck SMILEI training workshop 9 / 35

slide-10
SLIDE 10

Constructors strategy : 3) Xeon Phi

Intel

Powers several top HPC systems. + Irene (France) - Aurora (U.S) Supposedly accessible through “Normal” programming but relies criticaly

  • n the SIMD instruction set.

PIC codes in the HPC environment - A. Beck SMILEI training workshop 10 / 35

slide-11
SLIDE 11

Constructors strategy : 4) China

Architecture SunWay

Most powerful system in the world : 93 PFLOPS. 15 MW The SunWay architecture mimicks Xeon Phi.

PIC codes in the HPC environment - A. Beck SMILEI training workshop 11 / 35

slide-12
SLIDE 12

Constructors strategy : 5) Vectorization

Excellent potential speed up, very good power budget. Heavy constraints on data structure and algorithm. Difficult to use at its full extent in a PIC code.

PIC codes in the HPC environment - A. Beck SMILEI training workshop 12 / 35

slide-13
SLIDE 13

Official announcments for Exascale

U.S. : Exascale for 2021. No specifications. Japan : “Post K Supercomputer”. EFLOPS for 2020. Architecture ARM. China : 3 exascale systems for 2020. Europe : 2 Exascale systems for 2022. At least 1 powered by European technology (probably ARM).

PIC codes in the HPC environment - A. Beck SMILEI training workshop 13 / 35

slide-14
SLIDE 14

Why am I concerned ? What should I do ?

As a developer

1

Expose parallelism. Massive parallelization is key.

2

Focus on the algorithm and data structures. Not on architectures.

3

Reduce data movement : Computation is becoming cheaper, loads and stores not so much.

4

Be aware of the increasing gap between peak power and effective

  • performances. The race to exascale is becoming a race to exaflops.

As a scientist

1

Collaborate with experts : complexity of HPC systems increases a lot !

PIC codes in the HPC environment - A. Beck SMILEI training workshop 14 / 35

slide-15
SLIDE 15

Structure

1

HPC environment, trends and prospectives

2

The PIC method and its parallelization

3

The load balancing issue

PIC codes in the HPC environment - A. Beck SMILEI training workshop 15 / 35

slide-16
SLIDE 16

PIC codes in the HPC environment - A. Beck SMILEI training workshop 16 / 35

slide-17
SLIDE 17

Explicit PIC code principle Solve Maxwell Solve Vlasov Interpolator Pusher Projector

PIC codes in the HPC environment - A. Beck SMILEI training workshop 17 / 35

slide-18
SLIDE 18

Domain decomposition

PIC codes in the HPC environment - A. Beck SMILEI training workshop 18 / 35

slide-19
SLIDE 19

Domain decomposition : MPI

PIC codes in the HPC environment - A. Beck SMILEI training workshop 19 / 35

slide-20
SLIDE 20

Domain decomposition : MPI

Network Compute node

Memory

Core Core Core Core

Compute node

Memory

Core Core Core Core

Compute node

Memory

Core Core Core Core

Compute node

Memory

Core Core Core Core

PIC codes in the HPC environment - A. Beck SMILEI training workshop 20 / 35

slide-21
SLIDE 21

Domain decomposition : MPI + openMP in SMILEI

+ Patch

PIC codes in the HPC environment - A. Beck SMILEI training workshop 21 / 35

slide-22
SLIDE 22

Domain synchronization

If processors have a shared memory ==> OpenMP If processors have ditributed memory ==> MPI Same logic for particles

PIC codes in the HPC environment - A. Beck SMILEI training workshop 22 / 35

slide-23
SLIDE 23

Message Passing Interface (MPI)

Characteristics Library Coarse grain Inter node Distributed memory Almost all HPC codes Issues Latency OS jitter Global communication scalability

PIC codes in the HPC environment - A. Beck SMILEI training workshop 23 / 35

slide-24
SLIDE 24

Open Multi-Threading (openMP)

Characteristics Compiler Directives Medium grain Intra node Shared memory Many HPC codes Issues Thread creation overhead Memory/core affinity Interface with MPI (MPI_THREAD_MULTIPLE)

PIC codes in the HPC environment - A. Beck SMILEI training workshop 24 / 35

slide-25
SLIDE 25

Structure

1

HPC environment, trends and prospectives

2

The PIC method and its parallelization

3

The load balancing issue

PIC codes in the HPC environment - A. Beck SMILEI training workshop 25 / 35

slide-26
SLIDE 26

Domain decomposition : MPI + openMP in SMILEI

PIC codes in the HPC environment - A. Beck SMILEI training workshop 26 / 35

slide-27
SLIDE 27

Domain decomposition : MPI + openMP in SMILEI

PIC codes in the HPC environment - A. Beck SMILEI training workshop 27 / 35

slide-28
SLIDE 28

Domain decomposition : MPI + openMP in SMILEI

PIC codes in the HPC environment - A. Beck SMILEI training workshop 28 / 35

slide-29
SLIDE 29
  • penMP dynamic scheduler benefits

MPI × OpenMP

2000 4000 6000 8000 10000 12000 Number of iterations 100 200 300 400 500 600 Time for 100 iterations [s] 768X1 384X2 256X3 128X6 64X12

OpenMP dynamic scheduler is able to smooth the load but only at the node level.

PIC codes in the HPC environment - A. Beck SMILEI training workshop 29 / 35

slide-30
SLIDE 30

Patched base data structure

PIC codes in the HPC environment - A. Beck SMILEI training workshop 30 / 35

slide-31
SLIDE 31

Hilbert ordering

We need a policy to assign patches to MPI processes. To do so, patches are

  • rganized along a one dimensional space-filling curve.

1

Continuous curve which goes across all patches.

2

Each patch is visited only once.

3

Two consecutive patches are neighbours.

4

In addition we want compactness !

PIC codes in the HPC environment - A. Beck SMILEI training workshop 31 / 35

slide-32
SLIDE 32

Hilbert ordering

We need a policy to assign patches to MPI processes. To do so, patches are

  • rganized along a one dimensional space-filling curve.

1

Continuous curve which goes across all patches.

2

Each patch is visited only once.

3

Two consecutive patches are neighbours.

4

In addition we want compactness !

PIC codes in the HPC environment - A. Beck SMILEI training workshop 32 / 35

slide-33
SLIDE 33

With dynamic load balancing activated

MPI × OpenMP

2000 4000 6000 8000 10000 12000 14000 Number of iterations 20 40 60 80 100 120 140 160 Time for 100 iterations [s] 128X6 64X12 128X6 + DLB 64X12 + DLB

Yellow and red are copied from previous figure.

PIC codes in the HPC environment - A. Beck SMILEI training workshop 33 / 35

slide-34
SLIDE 34

Dynamic evolution of MPI domains

Color represents the local patch computational load imbalance Iloc = log10 (Lloc/Lav)

PIC codes in the HPC environment - A. Beck SMILEI training workshop 34 / 35

slide-35
SLIDE 35

Dynamic evolution of MPI domains

Color represents the local patch computational load imbalance Iloc = log10 (Lloc/Lav)

PIC codes in the HPC environment - A. Beck SMILEI training workshop 35 / 35