presentation cyclic reduction type poisson and helmholtz
play

Presentation: Cyclic Reduction Type Poisson and Helmholtz Solvers on - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/341347588 Presentation: Cyclic Reduction Type Poisson and Helmholtz Solvers on a GPU Presentation February 2014 CITATIONS READS 0 3


  1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/341347588 Presentation: Cyclic Reduction Type Poisson and Helmholtz Solvers on a GPU Presentation · February 2014 CITATIONS READS 0 3 3 authors , including: Mirko Myllykoski Jari Toivanen Umeå University University of Jyväskylä 40 PUBLICATIONS 69 CITATIONS 115 PUBLICATIONS 1,892 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: PhD Thesis View project NLAFET View project All content following this page was uploaded by Mirko Myllykoski on 13 May 2020. The user has requested enhancement of the downloaded file.

  2. Cyclic Reduction Type Poisson and Helmholtz Solvers on a GPU Mirko Myllykoski 1 , 2 Tuomo Rossi 3 Jari Toivanen 3 , 4 February 16, 2014 1Department of Mathematical Information Technology, University of Jyv¨ askyl¨ a, P.O. Box 35 (Agora), FI-40014 University of Jyv¨ askyl¨ a, Finland, E-mail: mirko.myllykoski@jyu.fi 2The research of the first author was supported by the Academy of Finland, grant #252549 3Department of Mathematical Information Technology, University of Jyv¨ askyl¨ a 4Department of Aeronautics and Astronautics, Stanford University

  3. Introduction The topic of the presentation are the block cyclic reduction type linear system solvers and how they can be applied to the Poisson and Helmholtz equations on a GPU. The presented implementations are based on a method called radix-q PSCR. A total of three implementations are presented: a simplified radix-2 method, a simplified radix-4 method and a generalized radix-4 method. The presentation will focus on the GPU implementations and the obtained numerical results.

  4. PSCR method PSCR stands for Partial Solution variant of the Cyclic Reduction method. A cyclic reduction type recursive method for block tridiagonal linear systems. A conventional cyclic reduction algorithm operates by halving the size of the linear system at each reduction step. In other words, the algorithm is a radix-2 method. The PSCR method can be formulated in a generalized radix-q framework. The first (radix-2) formulation was presented in 80’s 5 . Further developmental done later in the decade 6 . A more generalized radix-q algorithm was formulated in 1996 7 . 5Vassilevski P. Fast algorithm for solving a linear algebraic problem with separable variables. Comptes Rendus de Academie Bulgare des Sciences 1984; 37:305–308 6Kuznetsov YA. Numerical methods in subspaces. Vychislitel’-nye Processy i Sistemy II 1985; 37:265–350. 7Kuznetsov YA, Rossi T. Fast direct method for solving algebraic systems with separable symmetric band matrices. East-West Journal of Numerical Mathematics 1996; 4:53–68.

  5. PSCR method: radix-2 vs radix-4 Radix-number: 2 ==> ==> ==> = = = = Radix-number: 4 Eliminated row Updated row ==> = =

  6. PSCR method: Kronecker tensor product Let A ∈ K n 1 × m 1 and B ∈ K n 2 × m 2 . The Kronecker matrix tensor product is defined as follows   a 1 , 1 B a 1 , 2 B a 1 , m 1 B . . . a 2 , 1 B a 2 , 2 B a 2 , m 1 B . . .    ∈ K n 1 n 2 × m 1 m 2 . A ⊗ B =  . . .  ... . . .   . . .  a n 1 , 1 B a n 1 , 2 B a n 1 , m 1 B . . . The PSCR method can be applied to the following linear system [ A 1 ⊗ M 2 + M 1 ⊗ A 2 + c ( M 1 ⊗ M 2 )] u = f , where the factor matrices A 1 , M 1 ∈ K n 1 × n 1 and A 2 , M 2 ∈ K n 2 × n 2 are symmetric and tridiagonal.

  7. PSCR method: Partial solution technique The PSCR method uses so-called partial solution technique 8 , 9 . The technique requires an initialization stage comprising from generalized eigenvalue problems. The actual solution process involves the solution of multiple sub-problems of the form � c j ( A 2 + α j M 2 ) − 1 ˆ u = ˆ f . j In total the PSCR method generates approximately O ( n 1 log n 1 ) tridiagonal sub-problems. 8Banegas A. Fast Poisson solvers for problems with sparsity. Mathematics of Computation 1978; 32:441–446 9Kuznetsov YA, Matsokin AM. On partial solution of systems of linear algebraic equations. Soviet Journal of Numerical Analysis and Mathematical Modelling 1978; 4:453–468

  8. Implementation A total of three implementations were implemented: a simplified radix-2 method a simplified radix-4 method and a generalized radix-4 method. The GPU implementations utilize the OpenCL framework. The implementations consist mostly of scalar-vector multiplication and vector-vector additions/summations, which can be implemented trivially. The tridiagonal sub-problems and the large vector summations require some additional attention.

  9. Implementation: Simplified implementations (1/2) The primary purpose of the simplified implementations was to test different ideas related to the formulation and the implementation. The implementations can be applied when the coefficient matrix is of the form A ⊗ I n 2 + I n 1 ⊗ ( D − 2 I n 2 ) , where A = tridiag {− 1 , 2 , − 1 } ∈ R n 1 × n 1 and D = tridiag { t , d , t } ∈ R n 2 × n 2 . In addition, it is assumed that n 1 = 2 k 1 − 1 and n 2 = 2 k 2 − 1 for some positive integers k 1 and k 2 .

  10. Implementation: Simplified implementations (2/2) Utilizes a 3-stage tridiagonal system solver. All stages utilize scalar cyclic reduction and the right hand side is permuted after each recursion step. See earlier work 10 for more details. A slightly more generalized tridiagonal solver would allow that the matrix D can be chosen more freely. Does not require an initialization stage because the eigenvalues and eigenvectors are known in a closed form. 10Myllykoski M, Rossi T, Toivanen J. Fast Poisson solvers for graphics processing units. In Applied Parallel and Scientific Computing, Lecture Notes in Computer Science, Vol. 7782, Manninen P, ¨ Oster P (eds). Springer Berlin Heidelberg: Berlin, Germany, 2013; 265–279

  11. Implementation: Generalized implementation Uses a generalized 4-stage tridiagonal system solver. Very similar to the one used in the simplified implementations but includes an additional stage which utilizes parallel cyclic reduction 11 . Uses the CPU to solve the generalized eigenvalue problems. This is not a major limitation because these methods are usually used in situations in which we must solve a large set of problems with nearly identical coefficient matrices. The matrix M 1 must be positive-definite. Includes a support for complex valued problems. 11Hockney R. W., Jesshope C. R. Parallel Computers: Architecture, Programming and Algorithm. Hilger, 1981

  12. Numerical results The CPU test were carried out using a quad-core Intel Xeon X5550 2.66 GHz processor (only one core used in practice). The GPU test were carried out using a nVidia GeForce GTX580 videocard with 512 cuda cores. The CPU implementations were written using C and fortran. The GPU implementations utilize the OpenCL framework. All test were done using double precision floating point arithmetic.

  13. Numerical results: Model problem A Model problem A: A two (or three) dimensional Poisson problem with homogeneous Dirichlet boundary conditions posed on a rectangle. Discrete form: Solve u ∈ R NN from       D − I u 1 f 1 ...       − I D u 2 f 2       = ,    .   .  ... ... . .       − I . .       − I D u N f N where D = tridiag {− 1 , 4 , − 1 } ∈ R N × N , when f ∈ R NN is given.

  14. Numerical results: Model problem B Model problem B: A two dimensional Helmholtz problem with Perfectly Matched Layer (PML) or Absorbing Boundary Conditions (ABC) posed on a rectangle. Discrete form: Solve u ∈ C NN from [( A 1 − ω 2 M 1 ) ⊗ M 2 + M 1 ⊗ A 2 ] u = f , where ω is the wave number, when f ∈ C NN is given. Here the matrices A 1 , A 2 , M 1 , M 2 ∈ C N × N depend on the chosen boundary conditions 12 . 12See, e.g., Heikkola E., Rossi T., Toivanen J. Fast direct solution of the Helmholtz equation with a perfectly matched layer or an absorbing boundary conditions. International Journal of Numerical Methods is Engineering 2003; 57:2007–2025

  15. Numerical results: Speedup gained when using a GPU (simplified radix-4) 25 Poisson 2d Poisson 3d 20 cpu run time / gpu run time 15 10 5 0 10 100 1000 10000 N^(1/2) or N^(1/3)

  16. Numerical results: Speedup gained when using a GPU (generalized radix-4) 12 Poisson 11 Helmholtz 10 cpu run time / gpu run time 9 8 7 6 5 4 3 2 1 100 1000 10000 N^(1/2)

  17. Numerical results: Other findings Example: 2D Poisson problem, n 1 = n 2 = 1023: Simplified radix-4 CPU: 0.082556s Simplified radix-4 GPU: 0.005389s Generalized radix-4 CPU: 0.196758s Generalized radix-4 GPU: 0.030405s Increased radix-number offers some additional speed-up, especially when the problem size is small 13 . The GPU implementation did not introduce any noticeable additional numerical error. The most important elements effecting the performance were the tridiagonal solver and the large vector summations. 13Myllykoski M, Rossi T, Toivanen J. Fast Poisson solvers for graphics processing units. In Applied Parallel and Scientific Computing, Lecture Notes in Computer Science, Vol. 7782, Manninen P, ¨ Oster P (eds). Springer Berlin Heidelberg: Berlin, Germany, 2013; 265–279

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend