November 13, 2020
Sixth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’20)
November 13, 2020 Sixth International Workshop on Heterogeneous - - PowerPoint PPT Presentation
November 13, 2020 Sixth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H 2 RC20) Motivation Computing projections for high energy physics (HEP) greatly outpace CPU growth, interest in ML rapidly increasing
November 13, 2020
Sixth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’20)
in ML rapidly increasing
ML computing tasks in HEP?
2
Particle collection energy regression Signal/background classification Particle classification
3
4
Request Response Network PCI-e Client
5
w.r.t traditional computing model
improvement
particularly well-suited for as-a-service
relative to large number
CPU
PCIe gRPC gRPC
models - refer to as FPGAs-as-a-Service Toolkit (FaaST)
6
PCIe gRPC gRPC
models - refer to as FPGAs-as-a-Service Toolkit (FaaST)
7
non-blocking gRPC call
PCIe gRPC gRPC
models - refer to as FPGAs-as-a-Service Toolkit (FaaST)
8
non-blocking gRPC call
PCIe gRPC gRPC
models - refer to as FPGAs-as-a-Service Toolkit (FaaST)
9
non-blocking gRPC call
Tools:
Coprocessors (SONIC) framework
10
External Processor Workflow Module Coprocessor acquire() produce() Event data Callback
inference on GPUs
11
Public top tagging data challenge
Averaged over 1000 jets
12
calorimeter energy regression 3-layer MLP
2k parameters 10M parameters
batch 16000 batch 10/batch 1 top quark image classification Large CNN
13
FACILE ResNet
neural networks for FPGAs and ASICs
14
pipelined, ~104 ns/inference
from FPGA DDR to host
input to dedicated buffers in host memory
15
hls4ml inference kernel on separate SLRs
multiple inputs, cycle through buffers
16
Alveo U250
17
18
19
computing cluster
at the same time
for each process
20
able to process over 5000 events/s
21
Fermilab
FACILE ResNet ResNet
8 FPGA 1 FPGA 1 FPGA
batch 16000 batch 10 batch 1
FPGA server
during data-taking traditionally performed using large CPU farm
HLT with calorimeter reconstruction replaced by FaaST server running FACILE
22
HLT time
for this single algorithm
1500 HLT instances
estimate saturation at ~3300 clients
23
24
applications
FPGA compute into existing workflows
25
26
27
28
Alveo U250 AWS f1