Green Flash Persistent Kernel : Real-Time, Low-Latency and High- - PowerPoint PPT Presentation

GTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and High- Performance Computation on Pascal Julien BERNARD Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014

Green Flash ● Public and private actors – Paris Observatory – University of Durham – Microgate – PLDA ● Part of Horizon 2020 : EU Research and Innovation programme ● 3 years project ● 3,8 million € ● Involve about 30 people ● Research axes – Real time HPC with accelerators and smart interconnects – Energy efficient platform based on FPGA – Real Time Controller (RTC) prototype for European – Extremely Large Telescope Adaptive Optics (AO) system

Contributors Maxime Lainé : software engineer Denis Perret : FPGA expert Arnaud Sevin : software lead Damien Gratadour : project lead Christophe Rouaud : PLDA project lead Gaetan Dufourcq : QuickPlay expert GTC 2017

E-ELT : Adaptive Optics ● Compensate in real-time the wavefront perturbations ● Using a wavefront sensor - WFS to measure them ● Using a deformable mirror – DM to reshape the wavefront ● Commands to the mirror must be computed in real-time (~ms rate) GTC 2017

RTC concept for ELT AO GTC 2017

Real Time controller Legacy architecture Sensor Switch RTC ● IE. SPARTA architecture Active elements – DSP & CPU – VXS backplane Instrument WFS meas. DM com. Freq Performance (Hz) (GMAC/s) Sphere 1 2.6K 1 1.3k 1.5k 5.2 AOF 4 2.4k 1 1.2k 1k 11.8 GTC 2017

Real Time controller Sensor 0 RTC Cluster network Node 0 Sensor 1 architecture Sensor 2 Sensor 3 RTC Switch Sensor 4 Node ... Sensor 5 Active elements 0 Active elements 1 RTC Node N-1 Active elements 2 Instrument WFS meas. DM com. Freq Performance (Hz) (GMAC/s) Sphere 1 2.6K 1 1.3k 1.5k 5.2 AOF 4 2.4k 1 1.2k 1k 11.8 ELT 6 80k 3 15k 500 1.2k GTC 2017

Legacy GPU programming main { setup(); while(run){ recv(…); cudaMemcpy(…, GPU 10GbE GPU HostToDevice); RAM NIC computing_kernel<<<>>>(…); cudaMemcpy(…, PCIe DeviceToHost); send(…); CPU } CPU RAM } GTC 2017

Legacy GPU programming cudaMemcopy() overhead times (5.12Mo in, 64Ko out) Kernel launches overhead times Both cases : jitter of 20 to 30 µsec (40 µsec sometimes) GTC 2017

Legacy GPU programming Leaves not enough time for computations GTC 2017

Improvement GPU direct & I/O Persistent Kernel Memory mapping GTC 2017

GPU direct & I/O Memory mapping GTC 2017

GPU direct & I/O Memory mapping Host CPU app ram DMA Camera control P FPGA control Meas. Comp. Latency measures DMA C measurement UDP I- DMA GPU ram GPU answers Offmoad e Pixels Camera Engine bufger 3 Pixels protocol bufger compute DMA . handler kernels 0 DMC protocol DMA DM handler start com bufger FPGA NIC ● FPGA writes/reads directly to/from GPU memory ● CPU free for other kind of computations GTC 2017

FPGA Development platform Eased devel. Process using the QuickPlay tool from PLDA GTC 2017

FPGA Development platform ● Single generic design / multiple target boards – ExpressK-US board (hosting a Kintex UltraScale from Xilinx) – ExpressGX V board (hosting a Stratix V from Altera) – μXlink board from microgate (hosting a Arria 10 board from Altera) GTC 2017

Persistent Kernel GTC 2017

Classic implementation GTC 2017

Persistent kernel implementation GTC 2017

GPU direct, I/O Memory mapping & Persistent kernel main { setup(); persistent_kernel <<<>>>(…); … } GPU 10GbE GPU RAM FPGA persistent_kernel(…){ NIC while(run){ start pollMemory(…); PCIe computation(...); startDMATransfer(…); } CPU CPU } RAM GTC 2017

Pipelining I/O and compute FPGA PLDA XPressG5 Camera EVT HS-2000M GPU Tesla C2070 10GbE network OS Debian wheezy SCAO Pyramid case: 240 x 240 pixels, encoded on 16b µsec No GPUDirect GPUDirect + persistent kernel iterations GTC 2017

Pipelining I/O and compute GTC 2017

DGX-1 benchmark ● FPGA is replace by CPU ● Each node master receive frame data ● Work is shared between all devices ● RTC master send back RTC Master Node masters final resut Slaves GTC 2017

Result 1/2 : Time and jitter Histogram 4 devices case with 10,048 slopes x 15,000 commands Average : 0.45ms Jitter peak to peak : 17µs Variation : 1.8 % Time in ms GTC 2017

Result 2/2 : Sync & Intercom time Intercommunication time Synchronize time Average : 24µs Jitter : 12µs Average : 15µs Jitter : 8.8µs

Conclusion & future work ● Conclusion ● Future – Using GPUDirect and a Test on AO bench (with DM – persistent kernel allow efficient and WFS) data delivery to the RTC Use multi nodes architecture – – Lower jitter Test with fp16 – – Simpler execution stream – QuickPlay tool from PLDA ● Eased FPGA development cycle ● Mix communication protocols and data processing into the same streams ● Expandable ecosystem, with QuickStore / QuickAliance

Thank you Question ? Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014

● DGX-1 benchmark ● Result 1/2 : Time and jitter ● Result 2/2 : Sync & Intercom time ● Conclusion & future work ● Thank you ● RTC AO prototype for E-ELT ● Test pipeline ● Time measurement strategies ● Conclusion : Persistent kernel ● future work ● New features ● Test architecture GTC 2017

Green Flash Persistent Kernel : Real-Time, Low-Latency and High- - PowerPoint PPT Presentation

GTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and High- Performance Computation on Pascal Julien BERNARD Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014 Green Flash

2004: Poisson Matting 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: The

Arc Flash Protection Arc Flash Protection Electrical Reliability Services Arc Flash Hazard Arc

ReFlex: Remote Flash Local Flash Ana Klimovic Heiner Litz Christos Kozyrakis NVMW18

The Basics Of Flash Building A Web Application With Flash What is Flash? Introduction

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Arc Flash Arc Flash Mitigation Mitigation Remote Racking and Switching for Arc Flash danger

Flash Presentation The flash web designs which we make are attractive to captivate your website

Design of Flash- -Based DBMS: Based DBMS: Design of Flash Design of Flash-Based DBMS: An In-

A Case for Flash Memory SSD in A Case for Flash Memory SSD in A Case for Flash Memory SSD in

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it & why do we use it? *

Flash Memory and Micro SD Card Presented by: Krishna Goyal (200601195) Anirudh Tripathi

FLASH and Its Research Communities D. Q. Lamb Flash Center for Computational Science

Explosive Astrophysics with Flash Alan Calder (alan.calder@stonybrook.edu) Sean Couch

DFS: A Filesystem for Virtualized Flash Disks 25 February 2010 William Josephson

Flash What is Flash? Multimedia platform used to add animation, video, and interactivity to

What Youll Learn Today Review, Q&A: Flash Tweening Using Flash as a multimedia

Supplementary Notes to Horizon II macro 1 Horizon II macro Intro CONTENTS Page

DeployingLargeScaleAVB/TSN Networks: Handout * Jeff+Koftinoff+<jeffk@meyersound.com>+

Boundary Scan test control in the ATCA standard David Bckstrm 1 , Gunnar Carlsson 2 , Erik

NODDYS GUIDE ^ Storage 101 TO STORAGE DESIGN Alex

Ubotica Technologies Space Activities Overview Providing AI Solutions for Edge Based Computer

Main Slides on H3C S7500E Multi-Service Switch Network Product Dept. Contents Trend of the IP

Support Material for Presentation of Orange Book on LDPC Code Selection for CCSDS Standard CCSDS,

Suder v. Commissioner : The Swiss Army Knife for your Research Credit Claims February 19, 2015

Green Flash Persistent Kernel : Real-Time, Low-Latency and High- - PowerPoint PPT Presentation

GTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and High- Performance Computation on Pascal Julien BERNARD Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014 Green Flash

2004: Poisson Matting 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: The

Arc Flash Protection Arc Flash Protection Electrical Reliability Services Arc Flash Hazard Arc

ReFlex: Remote Flash Local Flash Ana Klimovic Heiner Litz Christos Kozyrakis NVMW18

The Basics Of Flash Building A Web Application With Flash What is Flash? Introduction

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Arc Flash Arc Flash Mitigation Mitigation Remote Racking and Switching for Arc Flash danger

Flash Presentation The flash web designs which we make are attractive to captivate your website

Design of Flash- -Based DBMS: Based DBMS: Design of Flash Design of Flash-Based DBMS: An In-

A Case for Flash Memory SSD in A Case for Flash Memory SSD in A Case for Flash Memory SSD in

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it &amp; why do we use it? *

Flash Memory and Micro SD Card Presented by: Krishna Goyal (200601195) Anirudh Tripathi

FLASH and Its Research Communities D. Q. Lamb Flash Center for Computational Science

Explosive Astrophysics with Flash Alan Calder (alan.calder@stonybrook.edu) Sean Couch

DFS: A Filesystem for Virtualized Flash Disks 25 February 2010 William Josephson

Flash What is Flash? Multimedia platform used to add animation, video, and interactivity to

What Youll Learn Today Review, Q&amp;A: Flash Tweening Using Flash as a multimedia

Supplementary Notes to Horizon II macro 1 Horizon II macro Intro CONTENTS Page

Deploying*Large*Scale*AVB/TSN* Networks*:* Handout * Jeff+Koftinoff+&lt;jeffk@meyersound.com&gt;+

Boundary Scan test control in the ATCA standard David Bckstrm 1 , Gunnar Carlsson 2 , Erik

NODDYS GUIDE ^ Storage 101 TO STORAGE DESIGN Alex

Ubotica Technologies Space Activities Overview Providing AI Solutions for Edge Based Computer

Main Slides on H3C S7500E Multi-Service Switch Network Product Dept. Contents Trend of the IP

Support Material for Presentation of Orange Book on LDPC Code Selection for CCSDS Standard CCSDS,

Suder v. Commissioner : The Swiss Army Knife for your Research Credit Claims February 19, 2015

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it & why do we use it? *

What Youll Learn Today Review, Q&A: Flash Tweening Using Flash as a multimedia

DeployingLargeScaleAVB/TSN Networks: Handout * Jeff+Koftinoff+<jeffk@meyersound.com>+