a parallel grad shafranov solver for real time control of
play

A parallel Grad-Shafranov solver for real-time control of tokamak - PowerPoint PPT Presentation

A parallel Grad-Shafranov solver for real-time control of tokamak plasmas M. Rampp, R. Preuss, R. Fischer, L. Giannone, K. Hallatschek Computing Centre (RZG) of the Max-Planck-Society and Max-Planck-Institute for Plasma Physics (IPP) Frascati,


  1. A parallel Grad-Shafranov solver for real-time control of tokamak plasmas M. Rampp, R. Preuss, R. Fischer, L. Giannone, K. Hallatschek Computing Centre (RZG) of the Max-Planck-Society and Max-Planck-Institute for Plasma Physics (IPP) Frascati, Mar 26, 2012 slide 1 M. Rampp (RZG/MPG)

  2. Overview Outline: GPEC (Garching Parallel Equilibrium Code), a descendant of GEC (by Lackner et al.) 1. Introduction 2. Basic equations and parallel algorithm for numerical solution 3. Benchmarks & Notes on implementation 4. Summary & Outlook References ⊲ R. Preuss: Real time equilibrium calculation , AUG Seminar, IPP (July 2009) ⊲ R. Preuss, R. Fischer, M. Rampp, K. Hallatschek, U. von Toussaint, L. Giannone, P. McCarthy: Parallel equilibrium algorithm for real-time control of tokamak plasmas , IPP-Report R/47 (2012) slide 2 M. Rampp (RZG/MPG)

  3. Introduction Scope & Motivation ⊲ develop a ”fast” numerical solver for the Grad-Shafranov equation ⊲ ”fast”: enable application in plasma equilibrium codes (CLISTE, IDE @AUG): · real-time control of fusion experiments (moderate numerical resolution, runtime � 1 ms ) · efficient and scalable offline analysis (high numerical resolution) Strategy ⊲ rely on Open-Source Software, OSS (no vendor lock-in: technology, licensing, ITER policy) ⊲ employ commodity hardware & open and well-established software standards · x86 Linux servers or clusters · standard compilers, programming models & runtime (FORTRAN 90, OpenMP, MPI) ⊲ try to exploit all possible levels of parallelism: processors/tasks, cores/threads, instructions Status ⊲ GPEC fully implemented, validated and documented ⊲ integration into IDE completed (offline analysis of AUG data) ⊲ prototypical MPI-parallelization of IDE (towards real-time capability) slide 3 M. Rampp (RZG/MPG)

  4. Context: Grad-Shafranov equation in equilibrium codes Grad-Shafranov equation in axisymmetric geometry: ∂r + ∂ 2 ∆ ∗ := r ∂ 1 ∂ j φ := r∂p ( r, ψ ) F ( ψ ) d F ( ψ ) ∆ ∗ ψ = − 4 π 2 µ 0 rj φ with + µ 0 ∂z 2 ∂r r ∂ψ r d ψ Application in equilibrium codes: ⊲ expansion of toroidal current density j φ into linear combination of N p + N F basis functions N p N F � � p ′ ( ψ ) = , FF ′ ( ψ ) = c h π h ( ψ ) d k ϕ k ( ψ ) h =1 k =1 ⊲ for each basis function, solve individually ∆ ∗ ψ h = − 4 π 2 µ 0 r 2 π h ( ψ old ) , ∆ ∗ ψ N p + k = − 4 π 2 µ 2 0 ϕ k ( ψ old ) ⊲ therefore: N p N F � � ψ = c i ψ i + d j ψ N p + j i =1 j =1 slide 4 M. Rampp (RZG/MPG)

  5. Context: Grad-Shafranov equation in equilibrium codes ⊲ obtain weights c h , d k by linear regression with diagnostic signals m and computed ”response matrix” b :  c 1  c 2   . .       m 1 b 1 ( ψ 1 ) b 1 ( ψ 2 ) ... b 1 ( ψ N p + N F ) .     m 2 b 2 ( ψ 1 ) b 2 ( ψ 2 ) ... b 2 ( ψ N p + N F ) c N p        =  · . . . . . . . . . .       d 1 . . . . .       d 2 m N m b N m ( ψ 1 ) b N m ( ψ 2 ) ... b N m ( ψ N p + N F )   .  .  .   d N F Summary of the complete procedure ⊲ Picard iteration cycle: 1) solve GS-equation ∆ ∗ ψ = − g ( r, z ) individually for N p + N F basis functions. g ( r, z ) is computed using ψ of last iteration step 2) obtain c h , d k from linear regression 3) compute new ψ = � N p h =1 c h π h ( ψ ) + � N F k =1 d k ϕ k ( ψ ) ⊲ typical: 2. . . 10 iteration steps slide 5 M. Rampp (RZG/MPG)

  6. Requirements for computational performance Real-time control ⊲ AUG machine control cycle: ≃ 1 ms ⊲ assume 10 iteration steps required for convergence ! � runtime per iteration � 0 . 1 ms for · N p + N F ≃ 8 calls of the GS solver · linear regression, calculation of response matrix, . . . Offline analysis ⊲ as fast as possible ⊲ goals: allow significantly larger numerical resolution, enable new data analysis methodologies Basic strategy ⊲ starting point: GS solver dominates runtime, GEC/CLISTE: O (ms) for single basis function ( N p + N F = 1 ) ⊲ achieve performance gains by parallelization slide 6 M. Rampp (RZG/MPG)

  7. Excursus: why parallel? Moore’s law – reinterpreted ⊲ Moore’s law is holding, in the original terms of number of transistors · transistors on a chip still doubling every 18 months at constant cost · V ∼ l , I ∼ l ⇒ P ∼ l 2 (constant power density, Dennard scaling ) · but: era of decades of exponential clock rate growth has ended (around 2005) ⊲ Moore’s law ”reinterpreted” · number of cores per chip doubles every 18 months instead of clock � performance improvements are now coming from the increase in the number of cores on a processor . . . . . . plus other levels of parallelism (e.g. vec- tor instructions) H. Sutter: The Free Lunch Is Over. A Fundamental Turn Toward Concurrency in Software ◦ � free (performance) lunch is over http://www.gotw.ca/publications/concurrency-ddj.htm H. Esmaeilzadeh, et al.: Dark Silicon and the End of Multicore Scaling ◦ http://doi.acm.org/10.1145/2000064.2000108 ⊲ Future: 100 . . . 1000 cores on chip ? slide 7 M. Rampp (RZG/MPG)

  8. Excursus: parallel challenges Typical architecture of a present-day x86 64 cluster Node 0 Node 1 Node 2 Network: ETH, IB, ... CPU Socket: 2 cores, 4 threads Node: 4 multicore CPUs ⊲ multiple nodes (1 . . . 1000): connected by Ethernet, InfiniBand network ⊲ multiple sockets (CPUs) per node (2. . . 4): connected by QPI, HT (NUMA!) ⊲ multiple cores per CPU (2. . . 16) ⊲ multiple hyper-threads per core (2) ⊲ multiple instructions per cycle (2. . . 4 DP ops, SSE: 128 Bit, AVX: 256 Bit) slide 8 M. Rampp (RZG/MPG)

  9. Parallelization strategies Basic considerations ⊲ exploit fine-grained parallelism (spatial grid) within the multi-core CPU: threads, SIMD- vectorization (GPEC) ⊲ exploit coarse-grained parallelism (basis functions) across multiple CPUs (IDE) � fits well with program structure � flexibility of hybrid MPI(tasks)/OpenMP(threads) approach, examples: · distribute 8 BFs across 8 quadcore CPUs, each GPEC call uses all 4 CPU cores · distribute 8 BFs across 2 16-core CPUs, each GPEC call uses only 8 CPU cores Communication estimates ⊲ required bandwidth for communicating a 32 × 64 array of double precision numbers (corre- sponding to a single ψ k ): 32 × 64 · 8 Byte / 0 . 01 ms ≃ 1 . 6 GB/s ⊲ inter-node (InfiniBand: QDR: 1 GB/s, FDR: 1.7 GB/s) ⊲ intra-node (Stream 4-socket: ≃ 5 GB/s, 2-socket: ≃ 10 GB/s) slide 9 M. Rampp (RZG/MPG)

  10. GPEC: discretization GS equation: ∆ ∗ ψ = − g ( r, z ) s + i ψ i +1 ,j + δ i ψ i,j + s − − 1 ( ψ i,j +1 + ψ i,j − 1 ) = ρ i,j i ψ i − 1 ,j + r i ⊲ standard finite-differences discretization on ( M × N )-grid MxM MxM ⊲ five-point stencil for ∆ ∗ := r ∂ ∂r + ∂ 2 1 ∂ 1 ∂z 2 ∂r r N 2 N−1 ... ... ... ... ... ... ... ... ... ... z j ... 1 0 0 1 ... ... M−1 M r i ⊲ solution methods for linear system: · Block-Cyclic Reduction: GEC · Fourier analysis: GPEC N · FACR, multigrid, . . . slide 10 M. Rampp (RZG/MPG)

  11. GPEC: discretization & FA solution method Fourier analysis: applying a Discrete Sine i ˆ ψ i +1 ,l + ˆ δ i,l ˆ i − 1 ˆ s + ψ i,l + s + ψ i − 1 ,l = ˆ ρ i,l Transform (DST) allows to decouple lin- MxM ear system in one dimension 1 s + − 1 ( ψ i,j +1 + ψ i,j − 1 ) = ρ i,j i ψ i +1 ,j + δ i ψ i,j + s − i ψ i − 1 ,j + r i N − 1 ξ i,j = 2 2 � ˆ apply : ξ i,l sin( πjl/N ) ξ := ρ, ψ N l =1 i ˆ ψ i +1 ,l + ˆ δ i,l ˆ i − 1 ˆ s + ψ i,l + s + ⇒ ψ i − 1 ,l = ˆ ρ i,l ⊲ Basic solution scheme (since 1970’s): 1. compute M iDSTs of size N ⇒ ˆ ρ 2. solve N tridiag. systems of size M × M ⇒ ˆ ψ 3. compute M DSTs of size N ⇒ ψ ⊲ optimized FFTs and tridiag. solver N ⊲ fully parallelizable over M , N slide 11 M. Rampp (RZG/MPG)

  12. More details of the complete GPEC algorithm Complete algorithm (Hagenov & Lackner, 1975) ⊲ additional ”inner” iteration for taming Dirichlet boundary condition 1. solve: ∆ ∗ ψ 0 = − g ( r, z ) in R , ψ 0 = 0 on δR � ∂ψ 0 ( r ∗ ) � 2. compute ψ on δR using Greens function ψ 1 ( r ) = − 1 � r ∗ G ( r, r ∗ ) d s ∗ ∂R ∂n 3. solve: ∆ ∗ ψ = − g ( r, z ) using ψ 1 on δR � apply 2 times the (iDST → tridiagonal solve → DST)-cycle � inbetween, save some work by transforming only the ”boundary” values (cf. Giannone et al. 2010) Main algorithmic steps dst 1 : M inverse discrete sine transforms (iDSTs) of size N trid 1 : solution of N tridiagonal linear systems of size M dst spare1 : 2 DSTs of size N plus 2 × ( M − 2) scalar products of size N for computing 2( N + M − 2) Fourier coefficients matvec : 1 matrix-vector multiplication of rank 2( N + M ) × 2( N + M ) dst spare2 : 2 iDSTs of size N plus a corresponding rank-1 update of a N × M matrix trid 2 : solution of N tridiagonal linear systems of size M dst 2 : M DSTs of size N slide 12 M. Rampp (RZG/MPG)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend