Semi-Lagrangian Simulations for Solving 2d2v Vlasov-Poisson Systems - - PowerPoint PPT Presentation

semi lagrangian simulations for solving 2d2v vlasov
SMART_READER_LITE
LIVE PREVIEW

Semi-Lagrangian Simulations for Solving 2d2v Vlasov-Poisson Systems - - PowerPoint PPT Presentation

Semi-Lagrangian Simulations for Solving 2d2v Vlasov-Poisson Systems (one and two species) Yann Barsamian 1,3 , Joackim Bernier 4 , Sever Hirstoaga 1,2 , Michel Mehrenberger 1,2 , Eric Violard 1,3 1. 2. CNRS IRMA (MoCo), INRIA (TONUS) 3. CNRS


slide-1
SLIDE 1

Semi-Lagrangian Simulations for Solving 2d2v Vlasov-Poisson Systems (one and two species)

Yann Barsamian1,3, Joackim Bernier4, Sever Hirstoaga1,2, Michel Mehrenberger1,2, ´ Eric Violard1,3

1.

  • 2. CNRS IRMA (MoCo), INRIA (TONUS)
  • 3. CNRS ICube (ICPS), INRIA (CAMUS)
  • 4. Universit´

e de Rennes

June 2017

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 1 / 28

slide-2
SLIDE 2

Outline

Motivation Comparison of two standard parallel paradigms Optimization of the domain decomposition paradigm Fluorescent Light Sun

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 2 / 28

slide-3
SLIDE 3

Kinetic Modeling (one species, dimensionless quantities)

     ∂f ∂t + − → v · ∇−

→ x f − −

→ E · ∇−

→ v f = 0

Vlasov ∇−

→ x

− → E = ρ Poisson f (− → x , − → v , t): distribution function of the electrons − → E (− → x , t): the self-induced electric field t: time − → x : position (2d with periodic boundaries: flat torus) − → v : velocity (2d) ρ(− → x , t) = 1 −

  • f (−

→ x , − → v , t)d− → v : volume charge density

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 3 / 28

slide-4
SLIDE 4

Semi-Lagrangian Methods: Splitting

splitting of the system      ∂f ∂t + − → v · ∇−

→ x f − −

→ E · ∇−

→ v f = 0

Vlasov ∇−

→ x

− → E = ρ Poisson

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 4 / 28

slide-5
SLIDE 5

Semi-Lagrangian Methods: Splitting

splitting of the system into two simpler systems1:      ∂f ∂t + − → v · ∇−

→ x f − −

→ E · ∇−

→ v f = 0

Vlasov ∇−

→ x

− → E = ρ Poisson      ∂f ∂t + − → v · ∇−

→ x f = 0

∇−

→ x

− → E = ρ        ∂f ∂t − − → E · ∇−

→ v f = 0

∂− → E ∂t = − →

1Cheng & Knorr, 1976

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 4 / 28

slide-6
SLIDE 6

Semi-Lagrangian Methods: Splitting

splitting of the system into two simpler systems1:      ∂f ∂t + − → v · ∇−

→ x f − −

→ E · ∇−

→ v f = 0

Vlasov ∇−

→ x

− → E = ρ Poisson      ∂f ∂t + − → v · ∇−

→ x f = 0

∇−

→ x

− → E = ρ        ∂f ∂t − − → E · ∇−

→ v f = 0

∂− → E ∂t = − → advection: ∂g ∂t + a · ∇xg = 0 is a translation: x g(x, 0) x g(x, T) = g(x − aT, 0) aT

1Cheng & Knorr, 1976

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 4 / 28

slide-7
SLIDE 7

Semi-Lagrangian Methods: Advection on − → x

Values after k time steps. Values after k + 1 time steps. x y x y

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 5 / 28

slide-8
SLIDE 8

Semi-Lagrangian Methods: Advection on − → x

Values after k time steps. Values after k + 1 time steps. x y x y f ∗(x, y, − → v , (k + 1)∆t)

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 5 / 28

slide-9
SLIDE 9

Semi-Lagrangian Methods: Advection on − → x

Values after k time steps. Values after k + 1 time steps. x y x y f ∗(x, y, − → v , (k + 1)∆t) f (x − ∆x, y − ∆y, − → v , k∆t) Advection

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 5 / 28

slide-10
SLIDE 10

Semi-Lagrangian Methods: Advection on − → x

Values after k time steps. Values after k + 1 time steps. x y x y f (x − ∆x, y − ∆y, − → v , k∆t) Interpolation

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 5 / 28

slide-11
SLIDE 11

Semi-Lagrangian Methods: Pseudo-Code

Parameters: ∆t, the time step. Algorithm: 1 Initialize f 2 Foreach time iteration, do 3 Output diagnostics 4 Advection of f on − → x for 0.5∆t 5 Compute ρ from f 6 Compute E from ρ 7 Advection of f on − → v for ∆t 8 Advection of f on − → x for 0.5∆t 9 Compute ρ from f 10 Compute E from ρ 11 End Foreach

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 6 / 28

slide-12
SLIDE 12

Semi-Lagrangian Methods: Pseudo-Code

Parameters: ∆t, the time step. Algorithm: 1 Initialize f 2 Foreach time iteration, do 3 Output diagnostics 4 Advection of f on − → x for 0.5∆t 5 Compute ρ from f 6 Compute E from ρ 7 Advection of f on − → v for ∆t 8 Advection of f on − → x for 0.5∆t 9 Compute ρ from f 10 Compute E from ρ 11 End Foreach

Advections (Lagrange interpolations): 90-99% of execution time Poisson solver (FFT): 1-10% of execution time

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 6 / 28

slide-13
SLIDE 13

Supercomputers

Marconi : 1,512 nodes (each composed of 2 sockets of 18 Intel Broadwell cores and 4 memory channels) = 54,432 cores2.

2Compare to 10,649,600 (Sunway TaihuLight

), 361,760 (Piz Daint ), 560,640 (Titan ), 556,104 (Oakforest-PACS ): https://www.top500.org

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 7 / 28

slide-14
SLIDE 14

Parallelization Scheme 1: Remap

f split in v. f split in x. x v x v

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 8 / 28

slide-15
SLIDE 15

Parallelization Scheme 1: Remap

f split in v. f split in x. x v x v

Example with a 642 × 642 grid and 4 processors for f split in v: processor 0 will have f [00 . . . 63][00 . . . 63][00 . . . 31][00 . . . 31] processor 1 will have f [00 . . . 63][00 . . . 63][00 . . . 31][32 . . . 63] processor 2 will have f [00 . . . 63][00 . . . 63][32 . . . 63][00 . . . 31] processor 3 will have f [00 . . . 63][00 . . . 63][32 . . . 63][32 . . . 63]

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 8 / 28

slide-16
SLIDE 16

Parallelization Scheme 1: Remap

f split in v. f split in x. x v x v

Example with a 642 × 642 grid and 4 processors for f split in x: processor 0 will have f [00 . . . 31][00 . . . 31][00 . . . 63][00 . . . 63] processor 1 will have f [00 . . . 31][32 . . . 63][00 . . . 63][00 . . . 63] processor 2 will have f [32 . . . 63][00 . . . 31][00 . . . 63][00 . . . 63] processor 3 will have f [32 . . . 63][32 . . . 63][00 . . . 63][00 . . . 63]

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 8 / 28

slide-17
SLIDE 17

Parallelization Scheme 2: Domain Decomposition

x v

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 9 / 28

slide-18
SLIDE 18

Parallelization Scheme 2: Domain Decomposition

x v

Example with a 642 × 642 grid and 4 processors: processor 0 will have f [00 . . . 31][00 . . . 63][00 . . . 31][00 . . . 63] processor 1 will have f [00 . . . 31][00 . . . 63][32 . . . 63][00 . . . 63] processor 2 will have f [32 . . . 63][00 . . . 63][00 . . . 31][00 . . . 63] processor 3 will have f [32 . . . 63][00 . . . 63][32 . . . 63][00 . . . 63]

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 9 / 28

slide-19
SLIDE 19

Advection with Remap: Pseudo-Code

Local variable: buffer[max(ncx, ncy)]. Algorithm: 1 Remap 2 Foreach vx, do 3 Foreach vy, do 4 Compute the displacement on the x-axis 5 Foreach y, do 6 buffer ← f[:][y][vx][vy] 7 Foreach x, do 8 Interpolate on the x-axis from buffer 9 Compute the displacement on the y-axis 10 Foreach x, do 11 buffer ← f[x][:][vx][vy] 12 Foreach y, do 13 Interpolate on the y-axis from buffer

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 10 / 28

slide-20
SLIDE 20

Advection with Domain Decomposition: Pseudo-Code 1

Local variable: buffer[max(ncx, ncy) + Lagrange degree]. Algorithm: 1 Foreach vx, do 2 Foreach vy, do 3 Compute the displacement on the x-axis 4 Foreach y, do 5 Communicate the needed points 6 Foreach x, do 7 Interpolate on the x-axis from buffer 8 Compute the displacement on the y-axis 9 Foreach x, do 10 Communicate the needed points 11 Foreach y, do 12 Interpolate on the y-axis from buffer

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 11 / 28

slide-21
SLIDE 21

Advection with Domain Decomposition: Pseudo-Code 2

Local variable: buffer[max(ncx * (ncy + Lag. degree), ncy * (ncx + Lag. degree))]. Algorithm: 1 Foreach vx, do 2 Foreach vy, do 3 Compute the displacement on the x-axis 4 Communicate the needed points 5 Foreach y, do 6 Foreach x, do 7 Interpolate on the x-axis from buffer 8 Compute the displacement on the y-axis 9 Communicate the needed points 10 Foreach x, do 11 Foreach y, do 12 Interpolate on the y-axis from buffer

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 12 / 28

slide-22
SLIDE 22

Advection with Domain Decomposition: Drawing

x vx

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 13 / 28

slide-23
SLIDE 23

Advection with Domain Decomposition: Drawing

x vx

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 13 / 28

slide-24
SLIDE 24

Advection with Domain Decomposition: Drawing

x vx

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 13 / 28

slide-25
SLIDE 25

Strong Scaling on 2,048 Cores: 1282 × 5122 Grid

1 10 100 1000 64 128 256 512 1024 2048 Execution time (s) Remap - no transpose [ix][iy][ivx][ivy] Remap - no transpose [ivx][ivy][ix][iy] Remap - 4d transpose Domain Decomposition Two-stream instability testcase, 13 iterations with ∆t = 0.1. Architecture: Intel Broadwell EP (2016). Domain decomposition: almost twice as fast.

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 14 / 28

slide-26
SLIDE 26

Strong Scaling on 2,048 Cores: 322 × 5122 Grid

1 10 100 1000 1 2 4 8 16 32 64 128 256 512 1024 2048 Execution time (s) Remap - 4d transpose Domain Decomposition Two-stream instability testcase, 10 iterations with ∆t = 0.1. Architecture: Intel Broadwell EP (2016). Domain decomposition: can use more cores. . . but carefully !

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 15 / 28

slide-27
SLIDE 27

Strong Scaling on 2,048 Cores: 322 × 5122 Grid

2 4 6 8 10 12 14 16 18 Remap D.D. Remap D.D. Remap D.D. Remap D.D. Remap D.D. Remap D.D. Execution time (s) Advections (computation) Advections (communication) Poisson 2048 mpi 1024 mpi 512 mpi 256 mpi 128 mpi 64 mpi Two-stream instability testcase, 10 iterations with ∆t = 0.1. Architecture: Intel Broadwell EP (2016). Domain decomposition: can use more cores. . . but carefully !

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 16 / 28

slide-28
SLIDE 28

Strong Scaling on 2,048 Cores: 1282 × 5122 Grid

500 1000 1500 2000 Remap D.D. - 1d D.D. - 2d Remap D.D. - 1d D.D. - 2d Remap D.D. - 1d D.D. - 2d Remap D.D. - 1d D.D. - 2d Remap D.D. - 1d D.D. - 2d Remap D.D. - 1d D.D. - 2d Execution time (s) Advections (computation) Advections (communication) Poisson 2048 mpi 1024 mpi 512 mpi 256 mpi 128 mpi 64 mpi Two-stream instability testcase, 13 iterations with ∆t = 0.1. Architecture: Intel Broadwell EP (2016). MPI communication: message size of 256 kB is optimal.

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 17 / 28

slide-29
SLIDE 29

Advection with Domain Decomposition: Pseudo-Code 3

Local variable: buffer[max(ncvx * ncx * (ncy + Lagrange degree), ncvy * ncy * (ncx + Lagrange degree))]. Algorithm: 1 Foreach vx, do 2 Compute the displacement on the x-axis 3 Communicate the needed points 4 Foreach vy, do 5 Foreach y, do 6 Foreach x, do 7 Interpolate on the x-axis from buffer 8 Foreach vy, do 9 Compute the displacement on the y-axis 10 Communicate the needed points 11 Foreach vx, do 12 Foreach x, do 13 Foreach y, do 14 Interpolate on the y-axis from buffer

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 18 / 28

slide-30
SLIDE 30

Domain Decomposition: 644 Grid on 84 = 4, 096 Cores

Location of the f values needed for processors who hold x values in the fourth column (sub-timestep of 0.5, velocities in [−5; 5]). x vx

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 19 / 28

slide-31
SLIDE 31

Domain Decomposition: 644 Grid on 4, 096 Cores

Location of the f values needed for processors who hold x values in the fourth column (sub-timestep of 0.5, velocities in [−5; 5]). x vx 22 velocities 21 velocities 21 velocities

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 20 / 28

slide-32
SLIDE 32

Two-stream instability

(x, y) ∈ [0; 4π)2, (vx, vy) ∈ [−5; 5]2, 1282 × 10242 grid, ∆t = 0.05, f (− → x , − → v , 0)3 =

  • 1 + 0.1
  • cos

y 2

  • + cos

x + y 2 v 2

x

2πe−

v2 x +v2 y 2

Iteration 0

3Barsamian, Bernier, Hirstoaga & Mehrenberger, 2017

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 21 / 28

slide-33
SLIDE 33

Two-stream instability

(x, y) ∈ [0; 4π)2, (vx, vy) ∈ [−5; 5]2, 1282 × 10242 grid, ∆t = 0.05, f (− → x , − → v , 0)3 =

  • 1 + 0.1
  • cos

y 2

  • + cos

x + y 2 v 2

x

2πe−

v2 x +v2 y 2

Iteration 100

3Barsamian, Bernier, Hirstoaga & Mehrenberger, 2017

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 21 / 28

slide-34
SLIDE 34

Two-stream instability

(x, y) ∈ [0; 4π)2, (vx, vy) ∈ [−5; 5]2, 1282 × 10242 grid, ∆t = 0.05, f (− → x , − → v , 0)3 =

  • 1 + 0.1
  • cos

y 2

  • + cos

x + y 2 v 2

x

2πe−

v2 x +v2 y 2

Iteration 1,000

3Barsamian, Bernier, Hirstoaga & Mehrenberger, 2017

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 21 / 28

slide-35
SLIDE 35

Two-stream instability

(x, y) ∈ [0; 4π)2, (vx, vy) ∈ [−5; 5]2, 1282 × 10242 grid, ∆t = 0.05, f (− → x , − → v , 0)3 =

  • 1 + 0.1
  • cos

y 2

  • + cos

x + y 2 v 2

x

2πe−

v2 x +v2 y 2

1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 5 10 15 20 25 30 35 40 Norm L2(Electric fields) Time (adimensionned) Norm L2(Electric fields) Fourier mode (0, 1) Fourier mode (1, 1) Fourier mode (1, 0)

3Barsamian, Bernier, Hirstoaga & Mehrenberger, 2017

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 21 / 28

slide-36
SLIDE 36

Two-Species Testcase

(x, y) ∈ [0; 10π)2, (vx, vy) ∈ [−6; 6]2, 32 × 256 × 64 × 2048 grid, ∆t = 0.02 (order six time splitting),                fe(− → x , − → v , 0) =

1 2πe−

v2 x +v2 y 2

, fi(− → x , − → v , 0) =

1 4πσ1σ2 (1 − A1 sin(k1x) − A2 sin(k2y))

  • e

− (vx −vd )2

2σ2 1

+ e

− (vx +vd )2

2σ2 1

  • e

v2 y 2σ2 2

with mass ratio 0.01, modes k1 = k2 = 0.2, perturbation amplitudes A1 = 0.005, A2 = 0.25, velocity drift vd = 2.4 and thermal velocities σ1 = 0.5, σ2 = 1.

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 22 / 28

slide-37
SLIDE 37

Two-Species Testcase: Electrons

y-vy cut electrons at times 2, 4, 10, 50

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 23 / 28

slide-38
SLIDE 38

Two-Species Testcase: Ions

y-vy cut ions at times 10, 20, 50, 100

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 24 / 28

slide-39
SLIDE 39

Two-Species Testcase: Energy Conservation

Error on the energy for two different time splitting schemes

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 25 / 28

slide-40
SLIDE 40

Two-Species Testcase: Energy Conservation

Electric energy for two different time splitting schemes

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 26 / 28

slide-41
SLIDE 41

Conclusions - Outlook

comparison of two parallel paradigms for a 2d2v SL code

remap domain decomposition

future outlooks

further optimizations of the domain decomposition paradigm

  • comp. / comm. overlapping (low number of processors)

3d or even 4d communication (medium number of processors) dynamic domain decomposition (high number of processors)

use OpenMP (use less processors hence less communication) port the code to 3d3v

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 27 / 28

slide-42
SLIDE 42

ybarsamian@unistra.fr

  • Y. Barsamian

(Strasbourg, France) Parallelization of 2d2v semi-Lagrangian Simulations 27/06/2017 28 / 28