experiences with opencl in pyfr 2014 present
play

Experiences with OpenCL in PyFR: 2014Present F.D. Witherden 1 and - PowerPoint PPT Presentation

Experiences with OpenCL in PyFR: 2014Present F.D. Witherden 1 and P.E. Vincent 2 1 Department of Ocean Engineering, Texas A&M University 2 Department of Aeronautics, Imperial College London Future Motivation High-Order PyFR OpenCL


  1. Experiences with OpenCL in PyFR: 2014—Present F.D. Witherden 1 and P.E. Vincent 2 1 Department of Ocean Engineering, Texas A&M University 2 Department of Aeronautics, Imperial College London

  2. Future Motivation High-Order PyFR OpenCL Backend Methods

  3. Motivation • Computational fluid dynamics (CFD) is the bedrock of several high-tech industries.

  4. Motivation • Interested in simulating unsteady, turbulent, flows.

  5. Motivation • Objective is to solve the Navier–Stokes equations in the vicinity of complex geometrical configurations . RANS URANS DES LES DNS Fidelity

  6. Future Motivation High-Order PyFR OpenCL Backend Methods

  7. High-Order Methods • Our choice of method is the high-order accurate Flux Reconstruction (FR) approach of Huynh. • Combines aspects of traditional finite volume (FVM) and finite element methods (FEM).

  8. High-Order Methods • Consider a smooth function

  9. High-Order Methods • In FVM we divide the domain into cells …

  10. High-Order Methods • …and in each cell store the average of the function.

  11. High-Order Methods • Cells are coupled via Riemann solves at the interfaces.

  12. High-Order Methods • In FR we divide the domain into elements …

  13. High-Order Methods • …and in each element store a discontinuous interpolating polynomial of degree p .

  14. High-Order Methods • As before elements are coupled via Riemann solves.

  15. High-Order Methods • Greater resolving power per degree of freedom (DOF)… • …and thus fewer overall DOFs for same accuracy. • Tight coupling between DOFs inside of an element… • …reduces indirection and saves memory bandwidth .

  16. High-Order Methods • Direct extension into 2D and 3D.

  17. High-Order Methods • Most operations can be cast as matrix-matrix products. • Element type determines if the operation is sparse or dense.

  18. Future Motivation High-Order PyFR OpenCL Backend Methods

  19. PyFR Python + Flux Reconstruction

  20. PyFR • High level structure. Python Outer Layer ( Hardware Independent ) • Setup • Distributed memory parallelism • Outer loop calls hardware specific kernels

  21. PyFR • Need to generate hardware specific kernels . Python Outer Layer ( Hardware Independent ) • Setup • Distributed memory parallelism • Outer loop calls hardware specific kernels

  22. PyFR • In FR two types of kernel are required. Python Outer Layer Matrix Multiply Point-Wise ( Hardware Independent ) Kernels Nonlinear Kernels • Setup • Data • Flux functions, • Distributed memory parallelism interpolation/ Riemann solvers • Outer loop calls hardware specific kernels extrapolation etc. etc.

  23. PyFR • Matrix multiplications are quite simple. Python Outer Layer Matrix Multiply Point-Wise ( Hardware Independent ) Kernels Nonlinear Kernels • Setup • Data • Flux functions, • Distributed memory parallelism interpolation/ Riemann solvers • Outer loop calls hardware specific kernels extrapolation etc. etc. Call GEMM

  24. PyFR • For the point-wise nonlinear kernels we use a DSL. Python Outer Layer Matrix Multiply Point-Wise ( Hardware Independent ) Kernels Nonlinear Kernels • Setup • Data • Flux functions, • Distributed memory parallelism interpolation/ Riemann solvers • Outer loop calls hardware specific kernels extrapolation etc. etc. Pass templates through Mako Call GEMM derived templating engine

  25. PyFR • Kernels are generated and compiled at start-up. Python Outer Layer Matrix Multiply Point-Wise ( Hardware Independent ) Kernels Nonlinear Kernels • Setup • Data • Flux functions, • Distributed memory parallelism interpolation/ Riemann solvers • Outer loop calls hardware specific kernels extrapolation etc. etc. C/OpenMP OpenCL CUDA Hardware Hardware Hardware Pass templates specific specific specific through Mako Call GEMM kernels kernels kernels derived templating engine

  26. PyFR • Which may then be called by the outer layer. Python Outer Layer Matrix Multiply Point-Wise ( Hardware Independent ) Kernels Nonlinear Kernels • Setup • Data • Flux functions, • Distributed memory parallelism interpolation/ Riemann solvers • Outer loop calls hardware specific kernels extrapolation etc. etc. C/OpenMP OpenCL CUDA Hardware Hardware Hardware Pass templates specific specific specific through Mako Call GEMM kernels kernels kernels derived templating engine

  27. PyFR • An example. PyFR-Mako <%namespace module='pyfr.backends.base.makoutil' name='pyfr'/> <%pyfr:macro name='inviscid_flux' params='s, f, p, v'> fpdtype_t invrho = 1.0/s[0]; fpdtype_t E = s[${nvars - 1}]; // Compute the velocities fpdtype_t rhov[${ndims}]; % for i in range(ndims): rhov[${i}] = s[${i + 1}]; CUDA v[${i}] = invrho*rhov[${i}]; % endfor C/OpenMP // Compute the pressure p = ${c['gamma'] - 1}*(E - 0.5*invrho*${pyfr.dot('rhov[{i}]', i=ndims)}); OpenCL // Density and energy fluxes % for i in range(ndims): f[${i}][0] = rhov[${i}]; f[${i}][${nvars - 1}] = (E + p)*v[${i}]; % endfor // Momentum fluxes % for i, j in pyfr.ndrange(ndims, ndims): f[${i}][${j + 1}] = rhov[${i}]*v[${j}]${' + p' if i == j else ''}; % endfor </%pyfr:macro>

  28. PyFR • An example. CUDA PyFR-Mako //AoSoA macros <%namespace module='pyfr.backends.base.makoutil' name='pyfr'/> #define SOA_SZ 32 #define SOA_IX(a, v, nv) ((((a) / SOA_SZ)*(nv) + (v))*SOA_SZ + (a) % <%pyfr:macro name='inviscid_flux' params='s, f, p, v'> SOA_SZ) fpdtype_t invrho = 1.0/s[0]; fpdtype_t E = s[${nvars - 1}]; // Typedefs // Compute the velocities typedef double fpdtype_t; fpdtype_t rhov[${ndims}]; % for i in range(ndims): rhov[${i}] = s[${i + 1}]; __global__ void tflux(int _ny, int _nx, fpdtype_t* __restrict__ f_v, int ldf, v[${i}] = invrho*rhov[${i}]; const fpdtype_t* __restrict__ smats_v, int ldsmats, const fpdtype_t* % endfor __restrict__ u_v, int ldu) // Compute the pressure { p = ${c['gamma'] - 1}*(E - 0.5*invrho*${pyfr.dot('rhov[{i}]', i=ndims)}); int _x = blockIdx.x*blockDim.x + threadIdx.x;int _y = // Density and energy fluxes blockIdx.y*blockDim.y + threadIdx.y; % for i in range(ndims): #define X_IDX (_x) f[${i}][0] = rhov[${i}]; #define X_IDX_AOSOA(v, nv) SOA_IX(X_IDX, v, nv) f[${i}][${nvars - 1}] = (E + p)*v[${i}]; if (_x < _nx && _y < _ny) % endfor { // Momentum fluxes % for i, j in pyfr.ndrange(ndims, ndims): // Compute the flux f[${i}][${j + 1}] = rhov[${i}]*v[${j}]${' + p' if i == j else ''}; fpdtype_t ftemp[2][4]; % endfor fpdtype_t p, v[2]; </%pyfr:macro> { fpdtype_t invrho_ = 1.0/u_v[ldu*_y + X_IDX_AOSOA(0, 4)]; fpdtype_t E_ = u_v[ldu*_y + X_IDX_AOSOA(3, 4)];

  29. PyFR //AoSoA macros #define SOA_SZ 32 #define SOA_IX(a, v, nv) ((((a) / SOA_SZ)*(nv) + (v))*SOA_SZ + (a) % SOA_SZ) // Typedefs typedef double fpdtype_t; • Abstracts data layout . __global__ void tflux(int _ny, int _nx, fpdtype_t* __restrict__ f_v, int ldf, const fpdtype_t* __restrict__ smats_v, int ldsmats, const fpdtype_t* CUDA PyFR-Mako __restrict__ u_v, int ldu) { int _x = blockIdx.x*blockDim.x + threadIdx.x;int _y = blockIdx.y*blockDim.y + threadIdx.y; <%namespace module='pyfr.backends.base.makoutil' name='pyfr'/> #define X_IDX (_x) #define X_IDX_AOSOA(v, nv) SOA_IX(X_IDX, v, nv) <%pyfr:macro name='inviscid_flux' params='s, f, p, v'> if (_x < _nx && _y < _ny) fpdtype_t invrho = 1.0/s[0]; { fpdtype_t E = s[${nvars - 1}]; // Compute the velocities // Compute the flux fpdtype_t rhov[${ndims}]; fpdtype_t ftemp[2][4]; % for i in range(ndims): fpdtype_t p, v[2]; rhov[${i}] = s[${i + 1}]; { v[${i}] = invrho*rhov[${i}]; % endfor fpdtype_t invrho_ = 1.0/u_v[ldu*_y + X_IDX_AOSOA(0, 4)]; // Compute the pressure fpdtype_t E_ = u_v[ldu*_y + X_IDX_AOSOA(3, 4)]; p = ${c['gamma'] - 1}*(E - 0.5*invrho*${pyfr.dot('rhov[{i}]', i=ndims)}); // Density and energy fluxes // Compute the velocities % for i in range(ndims): fpdtype_t rhov_[2]; f[${i}][0] = rhov[${i}]; rhov_[0] = u_v[ldu*_y + X_IDX_AOSOA(1, 4)]; f[${i}][${nvars - 1}] = (E + p)*v[${i}]; v[0] = invrho_*rhov_[0]; % endfor rhov_[1] = u_v[ldu*_y + X_IDX_AOSOA(2, 4)]; // Momentum fluxes v[1] = invrho_*rhov_[1]; % for i, j in pyfr.ndrange(ndims, ndims): f[${i}][${j + 1}] = rhov[${i}]*v[${j}]${' + p' if i == j else ''}; // Compute the pressure % endfor p = 0.4*(E_ - 0.5*invrho_*((rhov_[0])*(rhov_[0]) + (rhov_[1])*(rhov_[1]))); </%pyfr:macro> // Density and energy fluxes ftemp[0][0] = rhov_[0]; ftemp[0][3] = (E_ + p)*v[0];

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend