green flash
play

Green Flash Persistent Kernel : Real-Time, Low-Latency and High- - PowerPoint PPT Presentation

GTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and High- Performance Computation on Pascal Julien BERNARD Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014 Green Flash


  1. GTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and High- Performance Computation on Pascal Julien BERNARD Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014

  2. Green Flash ● Public and private actors – Paris Observatory – University of Durham – Microgate – PLDA ● Part of Horizon 2020 : EU Research and Innovation programme ● 3 years project ● 3,8 million € ● Involve about 30 people ● Research axes – Real time HPC with accelerators and smart interconnects – Energy efficient platform based on FPGA – Real Time Controller (RTC) prototype for European – Extremely Large Telescope Adaptive Optics (AO) system

  3. Contributors Maxime Lainé : software engineer Denis Perret : FPGA expert Arnaud Sevin : software lead Damien Gratadour : project lead Christophe Rouaud : PLDA project lead Gaetan Dufourcq : QuickPlay expert GTC 2017

  4. E-ELT : Adaptive Optics ● Compensate in real-time the wavefront perturbations ● Using a wavefront sensor - WFS to measure them ● Using a deformable mirror – DM to reshape the wavefront ● Commands to the mirror must be computed in real-time (~ms rate) GTC 2017

  5. RTC concept for ELT AO GTC 2017

  6. RTC concept for ELT AO GTC 2017

  7. Real Time controller Legacy architecture Sensor Switch RTC ● IE. SPARTA architecture Active elements – DSP & CPU – VXS backplane Instrument WFS meas. DM com. Freq Performance (Hz) (GMAC/s) Sphere 1 2.6K 1 1.3k 1.5k 5.2 AOF 4 2.4k 1 1.2k 1k 11.8 GTC 2017

  8. Real Time controller Sensor 0 RTC Cluster network Node 0 Sensor 1 architecture Sensor 2 Sensor 3 RTC Switch Sensor 4 Node ... Sensor 5 Active elements 0 Active elements 1 RTC Node N-1 Active elements 2 Instrument WFS meas. DM com. Freq Performance (Hz) (GMAC/s) Sphere 1 2.6K 1 1.3k 1.5k 5.2 AOF 4 2.4k 1 1.2k 1k 11.8 ELT 6 80k 3 15k 500 1.2k GTC 2017

  9. Legacy GPU programming main { setup(); while(run){ recv(…); cudaMemcpy(…, GPU 10GbE GPU HostToDevice); RAM NIC computing_kernel<<<>>>(…); cudaMemcpy(…, PCIe DeviceToHost); send(…); CPU } CPU RAM } GTC 2017

  10. Legacy GPU programming cudaMemcopy() overhead times (5.12Mo in, 64Ko out) Kernel launches overhead times Both cases : jitter of 20 to 30 µsec (40 µsec sometimes) GTC 2017

  11. Legacy GPU programming Leaves not enough time for computations GTC 2017

  12. Improvement GPU direct & I/O Persistent Kernel Memory mapping GTC 2017

  13. GPU direct & I/O Memory mapping GTC 2017

  14. GPU direct & I/O Memory mapping Host CPU app ram DMA Camera control P FPGA control Meas. Comp. Latency measures DMA C measurement UDP I- DMA GPU ram GPU answers Offmoad e Pixels Camera Engine bufger 3 Pixels protocol bufger compute DMA . handler kernels 0 DMC protocol DMA DM handler start com bufger FPGA NIC ● FPGA writes/reads directly to/from GPU memory ● CPU free for other kind of computations GTC 2017

  15. FPGA Development platform Eased devel. Process using the QuickPlay tool from PLDA GTC 2017

  16. FPGA Development platform ● Single generic design / multiple target boards – ExpressK-US board (hosting a Kintex UltraScale from Xilinx) – ExpressGX V board (hosting a Stratix V from Altera) – μXlink board from microgate (hosting a Arria 10 board from Altera) GTC 2017

  17. Persistent Kernel GTC 2017

  18. Classic implementation GTC 2017

  19. Persistent kernel implementation GTC 2017

  20. GPU direct, I/O Memory mapping & Persistent kernel main { setup(); persistent_kernel <<<>>>(…); … } GPU 10GbE GPU RAM FPGA persistent_kernel(…){ NIC while(run){ start pollMemory(…); PCIe computation(...); startDMATransfer(…); } CPU CPU } RAM GTC 2017

  21. Pipelining I/O and compute FPGA PLDA XPressG5 Camera EVT HS-2000M GPU Tesla C2070 10GbE network OS Debian wheezy SCAO Pyramid case: 240 x 240 pixels, encoded on 16b µsec No GPUDirect GPUDirect + persistent kernel iterations GTC 2017

  22. Pipelining I/O and compute GTC 2017

  23. DGX-1 benchmark ● FPGA is replace by CPU ● Each node master receive frame data ● Work is shared between all devices ● RTC master send back RTC Master Node masters final resut Slaves GTC 2017

  24. Result 1/2 : Time and jitter Histogram 4 devices case with 10,048 slopes x 15,000 commands Average : 0.45ms Jitter peak to peak : 17µs Variation : 1.8 % Time in ms GTC 2017

  25. Result 2/2 : Sync & Intercom time Intercommunication time Synchronize time Average : 24µs Jitter : 12µs Average : 15µs Jitter : 8.8µs

  26. Conclusion & future work ● Conclusion ● Future – Using GPUDirect and a Test on AO bench (with DM – persistent kernel allow efficient and WFS) data delivery to the RTC Use multi nodes architecture – – Lower jitter Test with fp16 – – Simpler execution stream – QuickPlay tool from PLDA ● Eased FPGA development cycle ● Mix communication protocols and data processing into the same streams ● Expandable ecosystem, with QuickStore / QuickAliance

  27. Thank you Question ? Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014

  28. ● DGX-1 benchmark ● Result 1/2 : Time and jitter ● Result 2/2 : Sync & Intercom time ● Conclusion & future work ● Thank you ● RTC AO prototype for E-ELT ● Test pipeline ● Time measurement strategies ● Conclusion : Persistent kernel ● future work ● New features ● Test architecture GTC 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend