speeding up a finite element computation on gpu nelson
play

Speeding up a Finite Element Computation on GPU Nelson Inoue - PowerPoint PPT Presentation

Speeding up a Finite Element Computation on GPU Nelson Inoue Summary Introduction Finite element implementation on GPU Results Conclusions 2 University and Researchers Pontifical Catholic University of Rio de Janeiro


  1. Speeding up a Finite Element Computation on GPU Nelson Inoue

  2. Summary • Introduction • Finite element implementation on GPU • Results • Conclusions 2

  3. University and Researchers • Pontifical Catholic University of Rio de Janeiro – PUC- Rio • Group of Technology in Petroleum Engineering - GTEP • Research Team PhD Sergio Fontoura PhD Nelson Inoue PhD Carlos Emmanuel MSc Guilherme Righetto MSc Rafael Albuquerque Leader Researcher Senior Researcher Researcher Researcher Researcher 3

  4. Introduction • Research & Development (R&D) project with Petrobras • The project began in 2010 • The subject of the project is Reservoir Geomechanics • There are great interest by oil and gas industry in this subject • This subject is still little researched 4

  5. Introduction • What is Reservoir Geomechanics? – Branch of the petroleum engineering that studies the coupling between the problems of fluid flow and rock deformation (stress analysis) • Hydromechanical Coupling – Oil production causes rock deformation – Rock deformation contributes to oil production 5

  6. Motivation • Geomechanical effects during reservoir production 1. Surface subsidence 2. Bedding-parallel slip 3. Fault reactivation 4. Caprock integrity 5. Reservoir compaction 6

  7. Challenge • Evaluate geomechanical effects in a real reservoir • Overcome two major challenges 1. To use a reliable coupling scheme between fluid flow and stress analysis 2. To speed up the stress analysis (Finite Element Method) Finite Element Analysis spends most part of the simulation time 7

  8. Hydromechanical coupling • Theoretical Approach Coupling program flowchart 8

  9. Finite Element Method • Partial Differential Equations arise in the mathematical modelling of many engineering problems • Analytical solution or exact solution is very complicated • Alternative: Numerical Solution – Finite element method , finite difference method, finite volume method, boundary element method, discrete element method, etc. 9

  10. Finite Element Method • Finite element method (FEM) is widely applied in stress analysis • The domain is an assembly of finite elements (FEs) (http://www.mscsoftware.com/product/dytran) Finite Element Domain 10

  11. CHRONOS: FE Program • Chronos has been implemented on GPU CETUS Computer with 4 GPUs – Motivation : to reduce the simulation time in the hydromechanical analysis – Why to use GPU? Much more processing power CPU GPU 4 x GPUs >> 4 - 8 cores 2880 cores GeForce GTX Titan 11

  12. Motivation • GPU Features: (Cuda C Programming Guide) – Highly parallel, multithreaded and manycore processor – Tremendous computational horsepower and very high memory bandwidth Number of FLoating-point Operations Per Second Bandwidth 12

  13. Our Implementation • GPUs have good performance • We have developed and implemented an optimized and parallel finite element program on GPU • Programming Language CUDA is used to implement the finite element code • We have Implemented on GPU: – Assembly of the stiffness matrix – Solution of the system of linear equation – Evaluation of the strain state – Evaluation of the stress state 13

  14. Global Memory Access on GPU • Getting maximum performance on GPU Coalesced Access Sequential/Aligned Strided Random Good Not so good Bad – Memory accesses are fully coalesced as long as all threads in a warp access the same relative address 14

  15. Development on CPU • The assembly of the global stiffness matrix in the conventional FEM – Simple 1D problem – Element Stiffness Matrix a)         1 1 k k •   Element  1 1 11 12     k   1 1   k k Real model 21 22 b)         2 2 k k • 1 2 3 4   Element 2  2 11 12     k   Model discretization 2 2   k k 21 22 c) 1         3 3 k k   • Element  3 3  11 12    1 2 k 2   3 3   k k 21 22 1 2 3 • 1 2 Continuous model is discretized by elements Three Finite elements 15

  16. Development on CPU • In terms of CPU implementation For i=1 , i ≤ numel=3 i =1 i =2 i =3                     3 3   k k   2 2 Evaluate Element 1 1 k k   k k       3  11 12     2  11 12    k 1 11 12 k     k       element 3 3 element 2 2   Stiffness Matrix 1 1  k k  k k   k k 21 22 21 22 21 22                   1 1 1 1 k k 0 0 1 1 k k 0 0 k k 0 0  11 12  11 12  11 12                        Assembly Global   1 1 2 2 1 1 1 1 2 2   k k 0 0   k k k k 0    k k k k 0         21 22 11 12 21 22 11 12 21 22 k k k               Stiffness Matrix     global  global global 2 2 3 3 2 2 0 k k 0 0 k k k k 0 0 0 0    21 22  21 22 11 12           3 3      0 0 0 0  0 0 k k   0 0 0 0 21 22 – The Storage in the memory Memory access is not coalesced             element  1 1 1 1 i =1 k k k 0 0 k k 0 0 0 0 0 0 0 0 0 0 11 12 21 22                       1 1 1 1 2 1 1 1 i =2 k k k 0 0 k k k k 0 0 k k 0 0 0 0 0 element 11 12 21 22 11 12 21 22                                1 1 1 1 2 2 2 2 3 3 3 3 i =3 k k k 0 0 k k k k 0 0 k k k k 0 0 k k element 11 12 21 22 11 12 21 22 11 12 21 22 16

  17. Development on GPU • The assembly of the global stiffness matrix on GPU – Simple 1D problem – Each row of the global stiffness matrix         ] •      Node row 1 1 1 1 [ k ] [ k k k k 11 22 11 12 Real model •         ] Node    2 row 2 1 1 2 2 [ k ] [ k k k k 21 22 11 12 1 1 2 •         ] Node    3 row 3 2 2 3 3 [ k ] [ k k k k 21 22 11 12 2 1 1 2 3 •         ] Node      3 row 4 3 3 [ k ] [ k k k k 21 22 11 12 3 2 2 3 4 3 4 3 • Continuous model is discretized by nodes Four finite elements nodes 17

  18. Development on GPU • In terms of GPU implementation Thread = 1 Column = 1     ]   row 1 1 1 [ k ] [ 0 k k 11 12     0 Thread = 2 Thread = 1                ]      row 2 1 1 2 2 k 1 All the threads do the same calculation [ k ] [ k k k k k Thread = 2   21 22 11 12 global 21       2 k Thread = 3 Thread = 3   21       3           ] k    row 3 2 2 3 3 [ k ] [ k k k k 21 21 22 11 12 – The Storage in the memory Column =1                    1 2 3 k 0 k k k global 21 21 21 Thread = 1 Thread = 2 Thread = 3 The memory access is sequential and aligned 18

  19. Development on GPU • In terms of GPU implementation Thread = 1 Column = 2     ]   row 1 1 1 [ k ] [ 0 k k 11 12      1 0 k Thread = 1 Thread = 2  12                   ]  k 1 1 2   row 2 1 1 2 2  k k k  Thread = 2 [ k ] [ k k k k global 21 22 11 12 21 22 11         2 2 3 k k k Thread = 3   21 22 11 Thread = 3        3 3   k k         ]    row 3 2 2 3 3 [ k ] [ k k k k 21 22 21 22 11 12 – The Storage in the memory Memory access is coalesced Column =2                              1 2 3 1 1 2 2 3 3 k 0 k k k k k k k k k global 21 21 21 12 22 11 22 11 22 Thread = 1 Thread = 2 Thread = 3 19

  20. Development on GPU • Solution of the systems of linear equations Ax = b – Direct solver – Iterative Solver – A = stiffness matrix, x = nodal displacement vector (unknown values) and b = nodal force vector Conjugate Gradient Algorithm – A is a symmetric and positive-definite • It was chosen the Conjugate Gradient Method – Iterative algorithm – Parallelizable algorithm on GPU – The operations of a conjugate gradient algorithm is suitable to implement on GPU 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend