Machine Learning on a Heterogeneous Cluster NaifTarafdar , Giuseppe - PowerPoint PPT Presentation

AIgean: An Open Framework for Machine Learning on a Heterogeneous Cluster NaifTarafdar ¹ , Giuseppe Di Guglielmo², Philip C Harris³, Jeffrey D Krupa³, Vladimir Loncar ⁴ , Dylan S Rankin³, Nhan Tran ⁵ , Zhenbin Wu ⁶ , Qianfeng Shen¹ and Paul Chow¹ University of Toronto¹ Columbia University² Massachusetts Institute of Technology³ CERN ⁴ Fermilab ⁵ University of Illinois ⁶

Take Aways • Galapagos: Platform for multi-FPGA application deployment – A scalable giant FPGA comprised of individual FPGAs • AIgean: Mapping an ML application onto the giant FPGA – Could also be your own applications • Depending on your area of expertise and interest you can use different parts of this project November 13, 2020 H2RC 2020

Machine Learning • One of the most popular topics of research – In many areas, many applications (e.g medical, financial, safety, transportation etc.) – Also within the computing community • Wide usage in world pushes limits of devices – Metrics include performance and energy – Leading many researchers to consider heterogeneity! November 13, 2020 H2RC 2020

Heterogeneity All Around Us This Photo by Unknown author is licensed under CC BY-NC. This Photo by Unknown author is licensed under CC BY-SA-NC. This Photo by Unknown author is licensed under CC BY-NC-ND. November 13, 2020 H2RC 2020

Applying Machine Learning to a Heterogeneous Environment • Challenge: How do you design machine learning algorithms for a heterogenous space? – Hard enough with a homogenous computing environment – Is there a framework for such a thing? • Challenge: If such a framework exists can we get both flexibility and performance? November 13, 2020 H2RC 2020

Outline • Brief Motivation • Overview of machine learning frameworks – Categorized as an abstraction layer stack • Overview of AIgean – HLS4ML – Galapagos • Results November 13, 2020 H2RC 2020

MA MACH CHINE NE LEA EARNING RNING FR FRAM AMEW EWORKS ORKS November 13, 2020 H2RC 2020

Many Popular Examples! • Such as – Tensorflow – PyTorch – Caffe – Intel DLA – Xilinx XfDNN • What do these different frameworks offer? – Depends on who you ask! November 13, 2020 H2RC 2020

Machine Learning Stack Applications & Algorithms Cluster Deployment & Communication Hardware November 13, 2020 H2RC 2020

Machine Learning Stack E.g: Neural net layers, Applications & Algorithms quantization, compression, pruning Cluster Deployment & Communication Hardware November 13, 2020 H2RC 2020

Machine Learning Stack Applications & Algorithms E.g: Physical Connections Cluster Deployment & (PCIe, ethernet etc.), Communication Communication Protocols Hardware November 13, 2020 H2RC 2020

Machine Learning Stack Applications & Algorithms Cluster Deployment & Communication E.g: Hardware circuit Hardware (multipliers, shifters), memory architecture (caching etc.) November 13, 2020 H2RC 2020

Machine Learning Stack • Allows researchers to pick and choose layers Applications & Algorithms they wish to configure Cluster Deployment & Communication • Collapsable/Expandable Hardware for specific application and infrastructure! November 13, 2020 H2RC 2020

AIG AIGEAN AN OVE VERVI VIEW November 13, 2020 H2RC 2020

AIGean Introduction • Like the archipelago and sea • Combines two existing frameworks: – HLS4ML: • HLS IP cores of ML IP – Galapagos • Connects and deploys heterogeneous distributed application across multiple nodes November 13, 2020 H2RC 2020

HLS4ML • Open source project • Input: – Description of FPGA resources • LUT, BRAM, DSP – Description of neural net • PyT orch, Keras, Onyx support • Output: – HLS synthesizable C++ that fits within resource constraints implementing neural net • Tunable HLS code, made to fit the FPGA November 13, 2020 H2RC 2020

Galapagos User can define a FPGA cluster using • cluster description files and AXI-Stream VM kernels FPGA 1 File Network T ool Flow VM VM Kern Kern el el Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 November 13, 2020 H2RC 2020

Galapagos Communication Layer • Heterogeneous Stack • Allows users to create flexible heterogeneous clusters Middleware/Network Layer across CPUs/FPGAs Seamlessly prototype by implementing both on CPU and • Hypervisor Layer FPGA Physical Hardware Galapagos ensures functional portability for network – communication Essentially "network-connected" HLS kernels – • For both SW and HW Iterative development, selectively move bottleneck from SW – to hardware without modifying code • Flexibly change communication protocol without modifying user application – TCP, UDP, L1 etc – User application is agnostic to this November 13, 2020 H2RC 2020

Birth of AIgean • HLS4ML creates HLS IP core to maximize FPGA utilization • Galapagos can give a multi-FPGA fabric • Tools combined to deploy neural-net on multi- FPGA Fabric November 13, 2020 H2RC 2020

AIgean Tool Flow Tuning Tunin CPU/FPGA IP Cluster Cluster g Not connected HLS Model Partitio Training ML Model Hls4 ner HDF5 & JSON Layers ml* C++ & TCL Keras, PyTorch AIGean Automated ML2G + Flow HLS ML to Galapag C++ & TCL os Bridge November 13, 2020 H2RC 2020

AIgean Tool Flow Tuning CPU/FPGA IP Cluster Cluster Not connected HLS ML Model Layers Training Partitioner Model Hls4ml* HDF5 & JSON C++ & TCL Keras, PyTorch ML2G + HLS ML to Galapagos Bridge AIGean C++ & TCL Automated Flow November 13, 2020 H2RC 2020

HLS4ML Modifications • HLS4ML modified to create independent layers HLS ML Layers as separate HLS IP cores Hls4ml* C++ & TCL Model HDF5 & JSON – Each IP core is a streaming core with each stream per HLS ML to Galapagos Bridge dimension of the particular C++ & TCL layer November 13, 2020 H2RC 2020

HLS4ML Galapagos Bridge • Bridges custom made for the layers used in the network (different bridges needed for different number of dimensions) • If the user has a different application layer then they would need a different bridge Galapagos Kernel ML Layer ML Layer Bridge Bridge … … One 512 bit stream One 512 bit stream one 8 bit stream per dim one 8 bit stream per dim November 13, 2020 H2RC 2020

Partitioner Partitioner separates IP cores onto • different FPGAs • Currently using IP resources IP Cluster Not connected estimation from HLS Place and HLS ML route and performing simple greedy Layers Partitioner approach C++ & TCL • Does not place the bridges as that is AIgean specific, and this partitioner is general for all Galapagos IP kernels November 13, 2020 H2RC 2020

Machine Learning to Galapagos (ML2G) • Adds the appropriate bridges on the interfaces IP Cluster CPU/FPGA Not connected Cluster of the FPGAs • Creates the local ML2G connections for kernels + on the same FPGA HLS ML to Galapagos Bridge C++ & TCL November 13, 2020 H2RC 2020

RESU SULTS November 13, 2020 H2RC 2020

Experiment Setup • CPUs – Xeon E5-2650 • 24 Cores at 2.2 GHz • FPGAs – Fidus Sidewinder • ZU19EG FPGA – ~1 Million logic cells, 35 MB BRAM, 1968 DSP slices • 100 GB network interface – 100 GB UDP core November 13, 2020 H2RC 2020

Microbenchmarks Link Latency Throughput • Latency send Software to 0.029 ms 0.244 GB/s single flit Hardware • Throughput: Hardware to 0.00017 ms 100 GB/s Hardware maximum Hardware to 0.0203 ms N/A throughput of Software link (varying packet size for software) November 13, 2020 H2RC 2020

Machine Learning on a Heterogeneous Cluster NaifTarafdar , Giuseppe - PowerPoint PPT Presentation

AIgean: An Open Framework for Machine Learning on a Heterogeneous Cluster NaifTarafdar , Giuseppe Di Guglielmo, Philip C Harris, Jeffrey D Krupa, Vladimir Loncar , Dylan S Rankin, Nhan Tran , Zhenbin Wu , Qianfeng Shen and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

SoC Design SoC Design g L Lecture Lecture 6: IP Cores 6 IP C : IP Cores IP C Shaahin

On-hardware debugging of IP cores with free tools Anton Kuzmin 2020-02-02 1 / 12 Intro Why

FROM HARDWARE TO 0-DAY Or how to buy a security camera and never use it for its intended purpose

Securing Industrial IoT Device Attestation, Software Updates, and Data Protection Mauro Conti ,

Integrating OmpSs@FPGA within Eclipse Presentation for EclipseCon 2019 Ruben Cano-Daz and

IPBus framework on the zcu102 build instructions updated on 27 July 2019 Kristian Harder

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

1 Recovery From Packet Loss Loss is in a broader sense: packet never arrives or arrives