determinism of gpu solutions for ao real time computing
play

Determinism of GPU solutions for AO real-time computing E-ELT AO - PowerPoint PPT Presentation

Determinism of GPU solutions for AO real-time computing E-ELT AO RTC Architecture Hard real time system (~1 kHz) Big computation (5 TFLOPs) Low latency Maximum jitter : ~10% Jitter Where is the jitter ? Data transfer


  1. Determinism of GPU solutions for AO real-time computing

  2. E-ELT ● AO RTC Architecture – Hard real time system (~1 kHz) – Big computation (5 TFLOPs) – Low latency – Maximum jitter : ~10%

  3. Jitter ● Where is the jitter ? – Data transfer – Computation ● Jitter with standard transfer and computation Case Pipeline Time (jitter) (ms) 64x64 pixels 8x8 subpupils copy only 33 (35) copy + compute 96 (63) 240x240 pixels 40x40 subpupils copy only 204 (37) copy + compute 576 (57)

  4. Data transfer ● Normal way – Main memory is a buffer – 2 copies by communication – CPU manage the communication ● GPUdirect RDMA (Remote Direct Memory Access) – No unnecessary copy – CPU only use for launching kernel CREDIT : NVIDIA

  5. Transfer result ● GPUdirect – Reduce jitter during transfer to almost 0 – Reduces the transfer time by 2 ● But jitter still occurs during computations...

  6. Computation ● Normal way – High jitter – Depends on CPU – Need a Real-Time OS Time (in µs) for 8k empty kernel call (average : ~6.5µs, peak : ~31µs) ● Jitter with RDMA transfer and standard computation Case Pipeline Time(jitter) (ms) 64x64 pixels 8x8 subpupils copy only 12 (12) copy + compute 69 (59) 240x240 pixels 40x40 subpupils copy only 112 (10) copy + compute 475 (50)

  7. Perpetual kernel ● Pros – No scheduler – No additional cost Cpy Cpy Cpy Cpy Cpy – New features Comp Comp Comp Comp Comp ● Reduce computation Timeline for standard kernel call ● New synchronization features ● Cons – More complex implementation, Cpy Cpy Cpy Cpy Cpy test and debugging Comp Comp Comp Comp Comp – Hardware dependent Timeline for perpetual kernel call – Can't use any existing library Clock cycle count for 8k iterations

  8. What's next ? ● Implementation of RTC with perpetual kernel ● Integration with frame grabber – Test with pixel generator – Integration on the optical bench – Full loop profiling ● Study on floating point precision to reduce the number of GPU

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend