discussion on ML in FPGAs 1 T RIGGER S YSTEM 2 40 MHz subset data - - PowerPoint PPT Presentation

discussion on ml in fpgas
SMART_READER_LITE
LIVE PREVIEW

discussion on ML in FPGAs 1 T RIGGER S YSTEM 2 40 MHz subset data - - PowerPoint PPT Presentation

discussion on ML in FPGAs 1 T RIGGER S YSTEM 2 40 MHz subset data Level-1 full data Custom Electronics Absorbs ~100s Tb/s Trigger decision in ~10 s 100 kHz High Level Trigger ~13k CPU farm 100 ms/event 500 Hz Offline


slide-1
SLIDE 1

discussion on ML in FPGAs

1

slide-2
SLIDE 2

TRIGGER SYSTEM

“Level-1” Custom Electronics Absorbs ~100s Tb/s Trigger decision in ~10 μs “High Level Trigger” ~13k CPU farm 100 ms/event

40 MHz 100 kHz 500 Hz

full data subset data

“Offline Computing” Grid, O(10) Pb

2

slide-3
SLIDE 3

GOALS

Level-1 Trigger: 100s Tb/s, 40MHz pipeline, 10 us/event buffer High-level Trigger: 100 kHz pipeline, 100 ms/event Offline processing: minutes/event

Identify where we can get the biggest gains from ML inference with FPGAs for these very different classes of problems. Physicist “friendly” tools, engineering resources are scarce HLS4ML - better for physicist prototyping, faster development cycles, less expert knowledge needed Level 1 trigger is a custom target for LHC physics problems — not many available tools for O(500 ns) performance, need RTL; preference for HLS over tools like

  • penCL
  • pen-source, accessible for academia

3

“OFF THE

SHELF” HARDWARE CUSTOM
 HARDWARE/ FIRMWARE

slide-4
SLIDE 4

STATUS WITH AWS

A first test of SDSoC examples on Zynq works out-of-box

https://github.com/Xilinx/SDSoC_Examples/tree/master/cpp/getting_started uses the HLS “parlance”

Next: adapt our HLS4ML to work with the AWS F1 examples:

https://github.com/aws/aws-fpga/tree/master/SDAccel/examples

4

First tests of SDAccel examples on t2.2xlarge with FPGA AMI looks good too push in this direction in the next month

slide-5
SLIDE 5

FOR DISCUSSION

Level-1 Trigger: 100s Tb/s, 40MHz pipeline, 10 us/event buffer How optimal is HLS? we have some conceptual idea of the optimization, but… easy to compare a simple example against a verilog implementation? are we missing some obvious improvements? Useful for other fields? is it something other fields would be interested in? are there particular features we are not thinking of? how tied are we to Xilinx design tools? which HLS? we use Vivado HLS right now

5

slide-6
SLIDE 6

FOR DISCUSSION

High-level Trigger: 100 kHz pipeline, 100 ms/event Offline processing: minutes/event exploring hardware could we explore AWS F1 for such applications? what are the tools like? language? HLS is ok? comparison to GPUs, other FPGA architectures? throughput some studies by other colleagues from another experiment note that a PCIe interface is a bottleneck. infini-band? parallel input streams? how to paritition/organize multi-FPGA networks? non-ML applications porting some popular physics algorithms like Kalman Filter and cellular automata

6