atlas tracking optimization on gpu
play

Atlas Tracking Optimization on GPU Luis Domingues Professor: - PowerPoint PPT Presentation

Master Thesis Atlas Tracking Optimization on GPU Luis Domingues Professor: Frdric Bapst Supervisors: Paolo Calafiura Wim Lavrijsen Expert: Mathieu Monney 02/25/2015 Target Luis Domingues - January 2015 2 Code we started from


  1. Master Thesis Atlas Tracking Optimization on GPU Luis Domingues Professor: Frédéric Bapst Supervisors: Paolo Calafiura Wim Lavrijsen Expert: Mathieu Monney 02/25/2015

  2. Target Luis Domingues - January 2015 2

  3. Code we started from ● Demonstrator of ATLAS trigger on GPUs ● Basic host side – Take data – Send and compute data on GPU – Sleep waiting the response Luis Domingues - January 2015 3

  4. Code we started from Luis Domingues - January 2015 4

  5. Overlapping pixels and SCT ● The pixel and SCT processing are done in sequence ● Same event, but sequential processing... Time Time Pixel Kernels stamp stamp Time Time SCT Kernels stamp stamp Time Luis Domingues - January 2015 5

  6. Overlapping pixels and SCT Luis Domingues - January 2015 6

  7. CUDA Streams ● A stream is a queue of execution ● Non-default streams can be executed in parallel Stream1 H2D Kernel D2H Stream2 H2D Kernel D2H Stream3 H2D Kernel D2H Time H2D = Host to device transfer D2H = Device to host transfer Luis Domingues - January 2015 7

  8. Overlapping pixels and SCT ● Use CUDA Streams ● Start the processing of SCT before pixels end Time Time Pixel stream Kernels stamp stamp Time Time SCT stream Kernels stamp stamp Time Luis Domingues - January 2015 8

  9. Overlapping pixels and SCT Luis Domingues - January 2015 9

  10. Overlapping pixels and SCT ● For 2000 events, without overlapping – Avg Pixel: 2.03 ms – Avg SCT: 1.95 ms – Total avg: 3.98 ms ● For 2000 events, overlapping – Avg Pixel: 2.3 ms – Avg SCT: 2.5 ms Luis Domingues - January 2015 10

  11. Overlapping pixels and SCT ● Total execution – Without overlapping: 8.65 s – With overlapping: 6.53 s Luis Domingues - January 2015 11

  12. Multi-thread server side ● Huge amount of “small” data – They do not fulfill the GPU ● Parallelize the “event” level processing with streams Luis Domingues - January 2015 12

  13. Multi-thread server side Client Client Client Client FIFO Client Client Client Client Luis Domingues - January 2015 13

  14. Multi-thread server side ● Life of a thread Luis Domingues - January 2015 14

  15. Multi-thread server side Luis Domingues - January 2015 15

  16. Multi-thread server side ● Executions time – Without overlapping: 8.65 s – With overlapping: 6.53 s – Multi-threading server side: 4.7 s Luis Domingues - January 2015 16

  17. CUDA Occupancy ● A good setup of Grid/Block size in card can be significant ● CUDA offers an API to maximize the occupancy of the kernels Luis Domingues - January 2015 17

  18. CUDA Occupancy Cuda Core Multiprocessor GPU Luis Domingues - January 2015 18

  19. CUDA Occupancy ● Bad block size Setup Cuda Core Multiprocessor GPU Kernel 1 Kernel 2 Intra-block synchronization Luis Domingues - January 2015 19

  20. CUDA Occupancy ● Better block Setup Cuda Core Multiprocessor GPU Kernel 1 Kernel 2 Intra-block synchronization Luis Domingues - January 2015 20

  21. CUDA Occupancy ● Maximize the occupancy kills global performances ● Runs results for 2000 events – Big Blocks size: 10.88 s – Original configuration: 4.7 s – Small blocks size: 4.4 s Luis Domingues - January 2015 21

  22. CUDA Occupancy ● Maximize the occupancy kills global performances ● Runs results for 2000 events – Big blocks size: 3 kernels in parallel (Max 5) – Small blocks size: 4 kernels in parallel (Max 7) Luis Domingues - January 2015 22

  23. Conclusion ● Important points when using a GPU – Port of an algorithm to the GPU – Communicate with the GPU – Host side design ● Keep the GPU busy ● Big occupancy does not allow the GPU to schedule its tasks efficiently Luis Domingues - January 2015 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend