symengine
play

SymEngine Symbolic Executjon of OpenCL Kernels Alberto Magni - PowerPoint PPT Presentation

SymEngine Symbolic Executjon of OpenCL Kernels Alberto Magni Optjmize code for GPUs Optjmize Memory Accesses 2 GPU Memory Transactjons Coalesced Access GPU Core 1 Load Request = 4 Bytes per Thread 128 Bytes L1 32 Threads GPU Memory


  1. SymEngine Symbolic Executjon of OpenCL Kernels Alberto Magni

  2. Optjmize code for GPUs Optjmize Memory Accesses 2

  3. GPU Memory Transactjons Coalesced Access GPU Core 1 Load Request = 4 Bytes per Thread 128 Bytes L1 32 Threads GPU Memory Cache 1 Cache Line 3

  4. GPU Memory Transactjons UnCoalesced Access GPU Core 1 Load Request = 4 Bytes per Thread L1 512 Bytes 32 Threads GPU Memory Cache 4 Cache Lines 4

  5. GPU Memory Transactjons UnCoalesced Access GPU Core 1 Load Request = 4 Bytes per Thread L1 512 Bytes 32 Threads GPU Memory Cache 4 Cache Lines Wasted Bandwidth 5

  6. SymEngine Statjcally Detect Suboptjmal Accesses to Memory 6

  7. SymEngine Statjcally Detect Suboptjmal Accesses to Memory OpenCL Kernel int threadID = get_global_id(0); sX = x[threadID]; Resolve Address sY = y[threadId]; sZ = z[threadId]; sQr = Qr[threadId]; sQi = Qi[threadId]; for (int kIndex = 0; (kIndex < KERNEL_ELEMS_PER_GRID); kIndex ++, kGlobalIndex ++) { Compute fmoat expArg = PIx2 * (ck[kIndex].Kx * sX + ck[kIndex].Ky * sY + Number of Transactjons ck[kIndex].Kz * sZ); sQr += ck[kIndex].PhiMag * cos(expArg); sQi += ck[kIndex].PhiMag * sin(expArg); } Qr[threadId] = sQr; Qi[threadId] = sQi ; 7

  8. Symbolic Executjon OpenCL Code Warp-Id Hardware SymEngine Memory Number of Transactjons Threads Input Values 8

  9. Symbolic Executjon Threads in a Warp 0 1 2 3 4 … 29 30 31 Memory Memory Memory Memory ... Instructjon Instructjon Instructjon Instructjon SCEV SCEV SCEV SCEV Address Address Address Address 9

  10. Symbolic Executjon Threads in a Warp 0 1 2 3 4 … 29 30 31 Memory Memory Memory Memory ... Instructjon Instructjon Instructjon Instructjon SCEV SCEV SCEV SCEV Address Address Address Address Number of Cache Transactjon lines touched Number 10

  11. Validatjon – Nvidia GTX480 Against Hardware Performance counters Total HW Transactjons for Black-Scholes HW Counter Program Versions 11

  12. Validatjon – Nvidia GTX480 Against Hardware Performance counters Total HW Transactjons for Black-Scholes HW Counter Predictjon Program Versions 12

  13. Validatjon – Nvidia GTX480 13

  14. Validatjon – Nvidia GTX480 0.99 correlatjon with HW counters 14

  15. It's on GitHub! htup://github.com/HariSeldon/SymEngine 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend