End to End Deep Learning Solution on Arm Architecture Jan. 14 2019, - PowerPoint PPT Presentation

End to End Deep Learning Solution on Arm Architecture Jan. 14 2019, Jammy Zhou

HPC and AI convergence TOP500 Trend Arm on the road More than 50 percent of additional flops in the Astra at Sandia National Lab of US is the first latest TOP500 rankings were from Nvidia Tesla Arm based supercomputer entering TOP500 list, GPUs according to TOP500 report numbered at 203 in the latest ranking. Half of TOP10 systems use Nvidia GPUs, and 122 Good momentum of Arm based supercomputers systems of TOP500 use Nvidia GPUs (64 systems around the world, Post-K from Japan, Tianhe-3 uses P100 GPUs, 46 systems uses V100 GPUs, 12 from China, Catalyst UK, GW4 Isambad and systems uses Kepler GPUs) CEA system from Europe Arm SVE is enabled by Post-K together with the More AI/ML/DL workloads are being added to HPC Tofu D interconnect and HBM2 memory, and will applications with wide adoption of Nvidia GPUs be used for some AI workloads Besides Nvidia GPUs, there are some other accelerator options in the market, for example, MI60/MI50 Radeon Instinct GPUs from AMD, Xilinx and Intel FPGAs, customized ASIC products, etc

HPC and AI in the Cloud HPC Services AI & ML Services Arm on the road Science Cloud with Arm based HPC from HPC Systems (supporting Hisilicon Hi1616 and Marvell Thunder X2) Amazon EC2 A1 instances based on AWS Graviton Arm 64-bit processor for scale-out and Arm based workloads Accelerator CPU Arm Neoverse continuous improvement Accelerators (GPUs, FPGAs, ASICs) Storage Network HPC & AI software stack (languages, Fast and scalable storage, 100 Gbps Ethernet, frameworks, libraries, drivers, compilers, etc), such as NVMe based local InfiniBand, Omni-Path, multi-node distributed support and MPI SSD RDMA and RoCE

HW Diversity & SW Fragmentation Big Data Analytics 1. Difficult to switch between TensorFlowOnSpark SparkFlow CaffeOnSpark ... frameworks by application and algorithm developers 1 DL Frameworks PyTorch Keras 2. Different backends to maintain by framework Caffe TensorFlow MXNet Theano developers for various Caffe2 CNTK Chainer... accelerators PaddlePaddle Model Formats (framework specific, ONNX, NNEF) 3. Multiple frameworks to 4 Deep Learning Compilers (TVM, Glow, XLA, ONNC, etc) support by chip and IP vendors with duplicated Framework support for multiple accelerators 2 efforts, and out-of-tree Libraries support by forking the MIOpen ACL CMSIS-NN cuDNN upstream FFT RNG SPARSE Eigen ... BLAS 4. Multiple configurations to HAL and Drivers support by OEMs/ODMs and cloud vendors Hardware (CPU, GPU, FPGA, ASIC, DSP) 3

Open Neural Network eXchange Ecosystem Framework interoperability & Hardware optimizations ONNX Format ONNX Models ONNX Tools ONNXIFI ONNX Runtime Create Convert Optimize Deploy

ONNX Specifications ONNX-ML Extension Neural-network-only ONNX Defines an extensible computation graph model, Classical machine learning extension built-in operators and standard data types Also support data types of sequences and maps, Support only tensors for input/output data types extend ONNX operator set with ML algorithms not based on neural networks ONNX v1.3 Released on Sep. 1st 2018 More to come... Control Flow support Quantization Functions (composable operators, experimental) Test/Compliance Enhanced shape inference Data pipelines Additional optimization passes Edge/Mobile/IoT ONNXIFI 1.0 (C-backend for accelerators)

ONNX Interface for Framework Integration ONNXIFI Backend ONNXIFI A combination of software layer and hardware Standardized interface for NN inference on device used to run an ONNX graph different accelerators The same software layer can expose multiple Runtime discovery and selection of execution backends backends, as well as ONNX operators supported on each backend Heterogeneous type of backend can distribute work across multiple device types internally Support ONNX format & online model conversion Applications ONNX ONNXIFI ... Frameworks Models libonnxifi.so libonnxifi-glow.so libonnxifi-a.so libonnxifi-b.so libonnxifi-c.dll libonnxifi-d.dylib Library C Glow Library A Library B Library D

ONNX Runtime High-performance and cross-platform inference engine for ONNX models Fully implements the ONNX specification including the ONNX-ML extension Arm platforms are supported on both Linux (experimental) and Windows Diagram from https://github.com/Microsoft/onnxruntime/blob/master/docs/HighLevelDesign.md TensorRT and nGraph support are work in progress

Machine Intelligence A Linaro Strategic Initiative Provide the best-in-class Deep Learning performance by leveraging Neural Network acceleration in IP and SoCs from the Arm ecosystem, through collaborative seamless integration with the ecosystem of AI/ML software frameworks and libraries

Scope from HPC to microcontroller training Edge node & device inference Initial focus on inference support for Cortex-A SOCs Common model description format and APIs to the runtime Common optimized runtime inference engine for Arm-based SoC Plug-in framework to support multiple 3rd party IPs (NPU, GPU, DSP, FPGA) Continuous integration testing and benchmarking Microcontroller * HPC, Data Center & Cloud * CMSIS-NN optimized frameworks/libraries on RTOS SVE based optimization for DL frameworks & libraries Frameworks like uTendor and TensorFlow Lite PCIe/CCIX based heterogeneous accelerator support (quantization, footprint reduction, etc) on Arm servers (drivers, compilers and framework integration, etc) IP based accelerator support & optimization Scale out support for distributed training * under discussion

ArmNN based collaborations - ongoing A good base for future collaborations: 100 man-years of effort, 340,000 lines of code Shipping in over 200 million Android devices based on estimation Impressive performance uplift by software-only improvements over a period of 6 months https://developer.arm.com/products/processors/machine-learning/arm-nn https://community.arm.com/tools/b/blog/posts/arm-nn-the-easy-way-to-deploy-edge-ml

Thanks!

End to End Deep Learning Solution on Arm Architecture Jan. 14 2019, - PowerPoint PPT Presentation

End to End Deep Learning Solution on Arm Architecture Jan. 14 2019, Jammy Zhou HPC and AI convergence TOP500 Trend Arm on the road More than 50 percent of additional flops in the Astra at Sandia National Lab of US is the first latest TOP500

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

2/17/2017 Continued from yesterday >java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5

Deep Reinforcement Learning and Complex Environments Raia Hadsell End-to-end Deep Learning

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

ARM A55 Cortex Austin Bae, Harrison Ding 12/5/2018 Introduction Implements the ARM v8.2-A

Porting FreeBSD on Xen on ARM How to support your OS as Xen ARM guest Julien Grall

1 What is it Really? ARM Chips ARM Chips ARM Chips ARM Chips Typically an Embedded

Pending Interest Table Sizing in Named Data Networking Luca Muscariello Orange Labs Networks /

Treating software-defined networks like disk arrays Zhiyuan Teo Cornell University Joint work

Key Properties of Programmable Data Plane Targets Dominik Scholz , Henning Stubbe, Sebastian

Mul Multitena nancy ncy for r Fast and nd Programmabl ble Network rks in n the he Cl

Relating Nominal and Higher-Order Pattern Unification James Cheney University of Edinburgh UNIF

Programmable Hardware Acceleration Vinay Gangadhar PhD Final Examination Thursday, Nov 16 th ,

Vitaliy Rusov Department of Theoretical and Experimental Nuclear Physics, Odessa National

HIGH SPEED SOFTWARE PROTOTYPE OF NAMED-DATA NETWORKING Lorenzo Saino (UCL), Massimo Gallo, Diego

End to End Deep Learning Solution on Arm Architecture Jan. 14 2019, - PowerPoint PPT Presentation

End to End Deep Learning Solution on Arm Architecture Jan. 14 2019, Jammy Zhou HPC and AI convergence TOP500 Trend Arm on the road More than 50 percent of additional flops in the Astra at Sandia National Lab of US is the first latest TOP500

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

2/17/2017 Continued from yesterday &gt;java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5

Deep Reinforcement Learning and Complex Environments Raia Hadsell End-to-end Deep Learning

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

ARM A55 Cortex Austin Bae, Harrison Ding 12/5/2018 Introduction Implements the ARM v8.2-A

Porting FreeBSD on Xen on ARM How to support your OS as Xen ARM guest Julien Grall

1 What is it Really? ARM Chips ARM Chips ARM Chips ARM Chips Typically an Embedded

Pending Interest Table Sizing in Named Data Networking Luca Muscariello Orange Labs Networks /

Treating software-defined networks like disk arrays Zhiyuan Teo Cornell University Joint work

Key Properties of Programmable Data Plane Targets Dominik Scholz , Henning Stubbe, Sebastian

Mul Multitena nancy ncy for r Fast and nd Programmabl ble Network rks in n the he Cl

Relating Nominal and Higher-Order Pattern Unification James Cheney University of Edinburgh UNIF

Programmable Hardware Acceleration Vinay Gangadhar PhD Final Examination Thursday, Nov 16 th ,

Vitaliy Rusov Department of Theoretical and Experimental Nuclear Physics, Odessa National

HIGH SPEED SOFTWARE PROTOTYPE OF NAMED-DATA NETWORKING Lorenzo Saino (UCL), Massimo Gallo, Diego

2/17/2017 Continued from yesterday >java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5