Advancing Fusion Science with CGYRO using GPU-Based Leadership - - PowerPoint PPT Presentation

advancing fusion science with cgyro using gpu based
SMART_READER_LITE
LIVE PREVIEW

Advancing Fusion Science with CGYRO using GPU-Based Leadership - - PowerPoint PPT Presentation

Advancing Fusion Science with CGYRO using GPU-Based Leadership Systems by J. Candy 1 , I. Sfiligoi 2 and E. Belli 1 . 1 General Atomics, San Diego, CA 2 San Diego Supercomputer Center, San Diego CA Presented at GTC 2019 San Jose, CA 18-21


slide-1
SLIDE 1

Advancing Fusion Science with CGYRO using GPU-Based Leadership Systems

by

  • J. Candy1, I. Sfiligoi2 and E. Belli1.

1General Atomics, San Diego, CA 2San Diego Supercomputer Center, San Diego CA

Presented at

GTC 2019 San Jose, CA 18-21 March 2019 ID: S9202

1

Candy/GTC/March 2019/S9202

slide-2
SLIDE 2

Sincere thanks to

  • Chris Holland (UCSD)
  • Orso Meneghini, Sterling Smith, Ron Waltz, Gary Staebler (GA)
  • Nathan Howard, Alessandro Marinoni (MIT)
  • Walter Guttenfelder, Brian Grierson (PPPL)
  • George Fann (ORNL)
  • Klaus Hallatschek (IPP, Germany)

2

Candy/GTC/March 2019/S9202

slide-3
SLIDE 3

OUTLINE

1 Who is General Atomics?

3

Candy/GTC/March 2019/S9202

slide-4
SLIDE 4

OUTLINE

1 Who is General Atomics? 2 The case for fusion energy

4

Candy/GTC/March 2019/S9202

slide-5
SLIDE 5

OUTLINE

1 Who is General Atomics? 2 The case for fusion energy 3 Mathematical formulation and GPU-based numerical solution

5

Candy/GTC/March 2019/S9202

slide-6
SLIDE 6

OUTLINE

1 Who is General Atomics? 2 The case for fusion energy 3 Mathematical formulation and GPU-based numerical solution 4 Simulation of turbulent energy loss in a tokamak plasma

6

Candy/GTC/March 2019/S9202

slide-7
SLIDE 7

OUTLINE

1 Who is General Atomics? 2 The case for fusion energy 3 Mathematical formulation and GPU-based numerical solution 4 Simulation of turbulent energy loss in a tokamak plasma 5 GPU performance: development and results

7

Candy/GTC/March 2019/S9202

slide-8
SLIDE 8

Who is General Atomics?

8

Candy/GTC/March 2019/S9202

slide-9
SLIDE 9

Who is General Atomics?

1 General Atomics (GA) is a private contractor in San Diego

9

Candy/GTC/March 2019/S9202

slide-10
SLIDE 10

Who is General Atomics?

1 General Atomics (GA) is a private contractor in San Diego 2 The GA Magnetic Fusion division does DOE-funded research

10

Candy/GTC/March 2019/S9202

slide-11
SLIDE 11

Who is General Atomics?

1 General Atomics (GA) is a private contractor in San Diego 2 The GA Magnetic Fusion division does DOE-funded research 3 Hosts DIII-D National Fusion Facility

11

Candy/GTC/March 2019/S9202

slide-12
SLIDE 12

Founded on July 18, 1955 (photo 1957)

The General Atomic Division of General Dynamics

12

Candy/GTC/March 2019/S9202

slide-13
SLIDE 13

Laboratory formally dedicated on June 25th, 1959

John Jay Hopkins Laboratory for Pure and Applied Science

13

Candy/GTC/March 2019/S9202

slide-14
SLIDE 14

Present-day Campus (2019)

Retains feel of early architecture

14

Candy/GTC/March 2019/S9202

slide-15
SLIDE 15

Doublet III (1974)

15

Candy/GTC/March 2019/S9202

slide-16
SLIDE 16

DIII-D (Present day)

16

Candy/GTC/March 2019/S9202

slide-17
SLIDE 17

The case for fusion energy

17

Candy/GTC/March 2019/S9202

slide-18
SLIDE 18

Energy Use by Technology and Year

energy.mit.edu/news/limiting-global-warming-aggressive-measures-needed

18

Candy/GTC/March 2019/S9202

slide-19
SLIDE 19

Surface Temperature Anomaly

energy.mit.edu/news/limiting-global-warming-aggressive-measures-needed

19

Candy/GTC/March 2019/S9202

slide-20
SLIDE 20

Plasma theory in closed fieldline region well-understood

20

Candy/GTC/March 2019/S9202

slide-21
SLIDE 21

Helical field perfectly confines plasma (almost)

21

Candy/GTC/March 2019/S9202

slide-22
SLIDE 22

There is a small amount of radial energy/particle loss

  • Collisions (1970s): Γcollision
  • Turbulence (1980s): Γturbulence
  • Both exhibit gyroBohm scaling

flux Γ ∼ v(ρ/a)2 confinement time τ = a Γ ∼ a3 vρ2

  • a = torus radius
  • ρ = particle orbit size
  • v = particle velocity

22

Candy/GTC/March 2019/S9202

slide-23
SLIDE 23

Tokamak physics spans multiple space/timescales

Core-edge-SOL (CESOL) region coupling Ψ Profile

Core Edge SOL

CESOL

23

Candy/GTC/March 2019/S9202

slide-24
SLIDE 24

Tokamak confinement improves with LARGE PLASMA VOLUME

24

Candy/GTC/March 2019/S9202

slide-25
SLIDE 25

ITER Facility (35 nations) under construction in France

GOAL: Simulate turbulent plasma in core (magenta) region

25

Candy/GTC/March 2019/S9202

slide-26
SLIDE 26

Mathematical formulation and GPU-based numerical solution

26

Candy/GTC/March 2019/S9202

slide-27
SLIDE 27

Gyrokinetic Theory for Magnetized Plasma

The Cooper/Kripke Inversion

27

Candy/GTC/March 2019/S9202

slide-28
SLIDE 28

Gyrokinetic equation for plasma species a

Typically: a = (deuterium, carbon, electron)

∂ ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗ Ψa + ΩNL( ha , Ψa ) = Ca Symbol definitions particles

  • Ha =

ha + zaTe Ta

  • Ψa

28

Candy/GTC/March 2019/S9202

slide-29
SLIDE 29

Gyrokinetic equation for plasma species a

Typically: a = (deuterium, carbon, electron)

∂ ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗ Ψa + ΩNL( ha , Ψa ) = Ca Symbol definitions particles

  • Ha =

ha + zaTe Ta

  • Ψa

fields

  • Ψa = J0(γa)
  • δ

φ − v c δ A

  • + v2

Ωcac J1(γa) γa δ B

29

Candy/GTC/March 2019/S9202

slide-30
SLIDE 30

Electromagnetic GK-Maxwell Equations

Coupling to fields is a MAJOR complication!

  • k2

⊥λ2 D +

  • a

z2

a

Te Ta

  • d3v f0a

ne

  • δ

φ =

  • a

za

  • d3v f0a

ne J0(γa) Ha 2 βe,unit k2

⊥ρ2 s δ

A =

  • a

za

  • d3v f0a

ne v cs J0(γa) Ha − 2 βe,unit B Bunit δ B =

  • a
  • d3v f0a

ne mav2

Te J1 (γa) γa

  • Ha

30

Candy/GTC/March 2019/S9202

slide-31
SLIDE 31

Gyrokinetic equation for plasma species a

Typically, deuterium, some carbon, and electrons

∂ ha ∂τ − i ΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗ Ψa + ΩNL( ha, Ψa) = Ca E×B flow −iΩs = −i kθL 2π a cs γE

31

Candy/GTC/March 2019/S9202

slide-32
SLIDE 32

Gyrokinetic equation for plasma species a

Typically, deuterium, some carbon, and electrons

∂ ha ∂τ − iΩsX ha − i

  • Ωθ + Ωξ + Ωd
  • Ha − iΩ∗

Ψa + ΩNL( ha, Ψa) = Ca Streaming −iΩθ = v ws ∂ ∂θ

32

Candy/GTC/March 2019/S9202

slide-33
SLIDE 33

Gyrokinetic equation for plasma species a

Typically, deuterium, some carbon, and electrons

∂ ha ∂τ − iΩsX ha − i

  • Ωθ + Ωξ + Ωd
  • Ha − iΩ∗

Ψa + ΩNL( ha, Ψa) = Ca Trapping −iΩξ = − vta ws ua √ 2

  • 1 − ξ2 ∂ ln B

∂θ ∂ ∂ξ − 1 2ua ∂λa ∂θ

  • v

ws ∂ ∂ua + √ 2vta ws

  • 1 − ξ2 ∂

∂ξ

  • 33

Candy/GTC/March 2019/S9202

slide-34
SLIDE 34

Gyrokinetic equation for plasma species a

Typically, deuterium, some carbon, and electrons

∂ ha ∂τ − iΩsX ha − i

  • Ωθ + Ωξ + Ωd
  • Ha − iΩ∗

Ψa + ΩNL( ha, Ψa) = Ca Drift motion −iΩd = avta cs b ×

  • u2

a

  • 1 + ξ2 ∇B

B + u2

aξ2 8π

B2 (∇p)eff

  • · ik⊥ρa

+ Ma 2av csR0 b × R JψB ∂R ∂θ ∇ϕ − Bt B ∇R

  • · ik⊥ρa

+ a cs b ×

  • −vta

Ta Fc + c B∇Φ∗

  • · ik⊥ρa

34

Candy/GTC/March 2019/S9202

slide-35
SLIDE 35

Gyrokinetic equation for plasma species a

Typically, deuterium, some carbon, and electrons

∂ ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − i Ω∗ Ψa + ΩNL( ha, Ψa) = Ca Gradient drive −iΩ∗ = a Lna + a LTa

  • u2

a − 3

2

  • + γpv

a v2

ta

RBt R0B

  • ikθρs

+ a LTa zae Ta Φ∗ − M2

a

2R2

  • R2 − R(θ0)2

+M2

a

aR(θ0) R2 dR(θ0) dr + Maγp a vtaR2

  • R2 − R(θ0)2

ikθρs

35

Candy/GTC/March 2019/S9202

slide-36
SLIDE 36

Gyrokinetic equation for plasma species a

Typically, deuterium, some carbon, and electrons

∂ ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗ Ψa + ΩNL( ha, Ψa) = Ca Nonlinearity ΩNL( ha, Ψa) = acs ΩcD

  • k′

⊥+k′′ ⊥=k⊥

  • b · k′

⊥ × k′′ ⊥

Ψa(k′

⊥)

ha(k′′

⊥)

36

Candy/GTC/March 2019/S9202

slide-37
SLIDE 37

Gyrokinetic equation for plasma species a

Typically, deuterium, some carbon, and electrons

∂ ha ∂τ − iΩsX ha − i (Ωθ + Ωξ + Ωd) Ha − iΩ∗ Ψa + ΩNL( ha, Ψa) = Ca Cross-species collision operator Ca =

  • b

CL

ab

  • Ha,

Hb

  • CL

ab(

Ha, Hb) = νD

ab

2 ∂ ∂ξ

  • 1 − ξ2 ∂

Ha ∂ξ + 1 v2 ∂ ∂v

  • ν

ab

2

  • v4 ∂

Ha ∂v + ma Tb v5 Ha

Hak2

⊥ρ2 a

v2 4v2

ta

  • νD

ab

  • 1 + ξ2

+ ν

ab

  • 1 − ξ2

+ Rmom( Hb) + Rene( Hb)

37

Candy/GTC/March 2019/S9202

slide-38
SLIDE 38

Sonic Transport Fluxes

These are inputs to an independent TRANSPORT CODE

particle flux Γa =

  • k⊥
  • d3v

H∗

a c1a

Ψa

  • energy flux Qa =
  • k⊥
  • d3v

H∗

a c2a

Ψa

  • momentum flux Πa =
  • k⊥
  • d3v

H∗

a c3a

Ψa

  • 38

Candy/GTC/March 2019/S9202

slide-39
SLIDE 39

What do we solve for

5-dimensional distribution for every plasma species

Six-dimensional array (mapped into internal 2D array in CGYRO) Ha(kx, ky, θ, ξ, v

  • 5D mesh

, t) The spatial coordinates are kx −→ radial wavenumbers ky −→ binormal wavenumbers θ −→ field-line coordinate The velocity-space coordinates are ξ = v/v −→ cosine of the pitch angle ∈ [−1, 1] v −→ speed ∈ [0, ∞] .

39

Candy/GTC/March 2019/S9202

slide-40
SLIDE 40

Visual representation of computational mesh

k0

x 1024

ky

256

θ

32 k0

x 128

ky

32

deuterium (a = 1) carbon (a = 2) electron (a = 3) ξ

24

v

8 velocity-space mesh ion-scale mesh multiscale mesh

40

Candy/GTC/March 2019/S9202

slide-41
SLIDE 41

CGYRO optimized for challenging multiscale turbulence

COMPLETE REDESIGN of world-renowned GYRO code

41

Candy/GTC/March 2019/S9202

slide-42
SLIDE 42

Simulation of turbulent energy loss in a tokamak plasma

42

Candy/GTC/March 2019/S9202

slide-43
SLIDE 43

CGYRO computes the turbulent flux

DIII-D Tokamak at General Atomics in San Diego, CA

43

Candy/GTC/March 2019/S9202

slide-44
SLIDE 44

CGYRO computes the turbulent flux

DIII-D Tokamak at General Atomics in San Diego, CA

44

Candy/GTC/March 2019/S9202

slide-45
SLIDE 45

Multiscale DIII-D Simulation at r/a = 0.92

ITER baseline discharge (Haskey, Grierson) 164988

5 10 15 20 25 30 kyρs 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 Fractional Qe

long-wavelength (global, full-F , etc) regime

Resolution kxρs 124.0 , kyρs 31.8 Time 9 hrs on 32K cores Qi/QGB Qe/QGB pwrbal 2.5 8.2 NEO 2.7 0.0 CGYRO 0.0 8.0

45

Candy/GTC/March 2019/S9202

slide-46
SLIDE 46

Simulation underway on Titan (NCCS)

4986 nodes = 4986 Tesla K20X GPUs

46

Candy/GTC/March 2019/S9202

slide-47
SLIDE 47

Important locations for CGYRO

Source code github.com/gafusion/gacode DOI www.osti.gov/doecode/biblio/20298 User Documentation gafusion.github.io/doc Documentary Video (for GYRO) www.youtube.com/watch?v=RLI6QW2x4Lg

47

Candy/GTC/March 2019/S9202

slide-48
SLIDE 48

Fidelity Hierarchy (Pyramid)

Range of models all the way up to leadership codes

Leadership-class computing highest fidelity simulations

Calibrate

Reduced models for validation Machine-learning models for

  • ptimization & real-time control

Train

One-off heroic simulation

Inform Inform Physics Validation Physics Application Physics Development

48

Candy/GTC/March 2019/S9202

slide-49
SLIDE 49

Create TGLF-NN neural net from TGLF reduced model

  • 23 inputs → 4 outputs
  • Each dataset has 500K cases from 2300 multi-machine discharges
  • Trained with TENSORFLOW
  • Must be retrained as TGLF model is updated
  • TGLF itself derived from HPC CGYRO simulation

ExB

49

Candy/GTC/March 2019/S9202

slide-50
SLIDE 50

GPU performance: development and results

50

Candy/GTC/March 2019/S9202

slide-51
SLIDE 51

CGYRO: Roadmap for efficient GPU implementation

1 Numerical algorithms selected to allow intensive threading/acceleration

− Nonlinearity (nl) = FFT − Collisions (coll) = Matrix-vector multiply 51

Candy/GTC/March 2019/S9202

slide-52
SLIDE 52

CGYRO: Roadmap for efficient GPU implementation

1 Numerical algorithms selected to allow intensive threading/acceleration

− Nonlinearity (nl) = FFT − Collisions (coll) = Matrix-vector multiply

2 Key kernels have threaded (default) and accelerated variations

− Smart loop order and good memory management keeps kernels similar 52

Candy/GTC/March 2019/S9202

slide-53
SLIDE 53

CGYRO: Roadmap for efficient GPU implementation

1 Numerical algorithms selected to allow intensive threading/acceleration

− Nonlinearity (nl) = FFT − Collisions (coll) = Matrix-vector multiply

2 Key kernels have threaded (default) and accelerated variations

− Smart loop order and good memory management keeps kernels similar

3 Implemented GPU-aware MPI (utilizes GPUDirect and GPU-Infiniband RDMA)

53

Candy/GTC/March 2019/S9202

slide-54
SLIDE 54

Initial thought was that nonlinearity (nl) would dominate

54

Candy/GTC/March 2019/S9202

slide-55
SLIDE 55

Acceleration of nl exposed cost of other kernels

Titan K20 GPU too small to store collision matrix

55

Candy/GTC/March 2019/S9202

slide-56
SLIDE 56

CGYRO: Roadmap for efficient GPU implementation

1 Numerical algorithms selected to allow intensive threading/acceleration

− Nonlinearity (nl) = FFT − Collisions (coll) = Matrix-vector multiply

2 Key kernels have threaded (default) and accelerated variations

− Smart loop order and good memory management keeps kernels similar

3 Implemented GPU-aware MPI (utilizes GPUDirect and GPU-Infiniband RDMA)

56

Candy/GTC/March 2019/S9202

slide-57
SLIDE 57

CGYRO: Roadmap for efficient GPU implementation

!$acc loop seq do ivp=1,nv cvec_re = real(cvec(ivp)) cvec_im = aimag(cvec(ivp)) !$acc loop vector do iv=1,nv cval = cmat(iv,ivp,ic_loc) bvec(iv) = bvec(iv) + cmplx(cval*cvec_re,cval*cvec_im) enddo enddo

57

Candy/GTC/March 2019/S9202

slide-58
SLIDE 58

CGYRO: Roadmap for efficient GPU implementation

#ifdef DISABLE_GPUDIRECT_MPI !$acc update host(fsendr) #else !$acc host_data use_device(fsendr,f) #endif call MPI_ALLTOALL(fsendr,nsend,MPI_DOUBLE_COMPLEX, & f, nsend,MPI_DOUBLE_COMPLEX,lib_comm,ierr) #ifdef DISABLE_GPUDIRECT_MPI !$acc update device(f) #else !$acc end host_data #endif

58

Candy/GTC/March 2019/S9202

slide-59
SLIDE 59

Power9 (CPU) versus Power9 + 4X V100 (GPU)

59

Candy/GTC/March 2019/S9202

slide-60
SLIDE 60

CPU systems versus 4X V100

60

Candy/GTC/March 2019/S9202

slide-61
SLIDE 61

GPU type comparison

Stampede2, GA, Piz Daint, Titan

61

Candy/GTC/March 2019/S9202

slide-62
SLIDE 62

Google Cloud Partition Comparison

Santa Fe (last week)

62

Candy/GTC/March 2019/S9202

slide-63
SLIDE 63

Cloud V100 compared to Summit and Cori

63

Candy/GTC/March 2019/S9202

slide-64
SLIDE 64

OUTLINE

1 History of General Atomics? 2 The case for fusion energy 3 Mathematical formulation and GPU-based numerical solution 4 Simulation of turbulent energy loss in a tokamak plasma 5 GPU performance: development and results

64

Candy/GTC/March 2019/S9202

slide-65
SLIDE 65

Disclaimer

This report was prepared as an account of work sponsored by an agency of the United States

  • Government. Neither the United States Government nor any agency thereof, nor any of their

employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof, or those of the European Commission.

65

Candy/GTC/March 2019/S9202