fpgaConvNet: A Framework for Mapping Convolutional Neural Networks - PowerPoint PPT Presentation

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs Stylianos I. Venieris , Christos-Savvas Bouganis stylianos.venieris10@imperial.ac.uk FCCM 2016, Washington DC 2 May 2016

Deep Learning and AI 2

Deep Learning Success Stories - ConvNets Image Recognition (Microsoft, 2015) “Deep Face” (Facebook, 2014) Image Captioning (Microsoft, 2015) 3

Deep Learning on FPGAs • Memory I/O Optimisation • Hand-tuned implementations • Design Space Exploration 4

What is missing? fpgaConvNet Deep Learning Developers • Caffe • TensorFlo • Theano • Torch • FPGA – ConvNet Functionality – Optimised for High Performance 5

Our approach - fpgaConvNet FPGA Target Platform ConvNet Description Specifications Automated Design Space Exploration fpgaConvNet ConvNet Hardware Mapping Supplied by Deep Bitstream Learning Expert 6

Convolutional Neural Networks (ConvNets) convolutional pooling convolutional pooling + nonlinearity + nonlinearity 7

fpgaConvNet– ConvNet Modelling Framework • Synchronous Data Flow Streaming – ConvNetas a data-driven graph – Represented as a matrix Analytical power – Each layer mapped to a tunable set of hardware building blocks 8

fpgaConvNet– Modelling ConvNetswith SDF ConvNet Hardware SDF Graph Nonlin Memory Input Pooling Layer Convo Convolutional Layer with 4 filters Layer Interface Data 9

fpgaConvNet– Design Space Perspective Design Space ConvNet Hardware SDF Graph 6 FPGA 1 Throughput 4 Current Design Point FPGA 2 2 0 0 5 10 Area Bottlenecks: Define a set of actions – Limited compute resources to move around the – Limited off-chip memory bandwidth design space – Limited on-chip memory for model parameters 10

Action 1: Coarse-grained Folding 4 Convolutions / cycle 2) Not enough off-chip 1) Exceeding the available memory bandwidth compute resources 11

Action 1: Coarse-grained Folding 2 Convolutions / cycle Fine-grained Folding Compute Resources Required Bandwidth Action 2 12

Action 3: Partitioning through Reconfiguration Input Intermediate Intermediate Final Off-chip Memory Data Results Results Results Bitstream Bitstream 1 Bitstream 2 Bitstream 3 Conv Nonlin Pool Conv Nonlin Pool Input Data Layer 1 Layer 1 Layer 1 Layer 2 Layer 2 Layer 2 Subgraph 1 Subgraph 2 Subgraph 3 Exceeding the available Not enough on-chip FPGA Reconfiguration compute resources memory 13

fpgaConvNet– SDF Analytical Power Window Size = K Hardware Stages Pool Size = P Interconnections Design 1 Design 2 Synchronous Data Flow • Actions as algebraic operations – Any local action propagates through the network – Static scheduling – Analytical Performance Model – Cast DSE to formal resource-constrained optimisation – 14

Evaluation - Experimental Setup • fpgaConvNet – Xilinx Zynq-7000 XC7Z020 SoC with 220 DSPs at 100 MHz – Q8.8 fixed-point precision to match existing work (also supports floating-point) – Current toolflowsupports the VivadoHLS toolchain 15

Performance Model Accuracy Performance Model Accuracy Scene Labelling ConvNet Sign Recognition ConvNet CNP Error between 1.73% and 11.76% MPCNN LeNet-5 CFF 0 2 4 6 8 10 12 14 Measured Performance (GOps/s) Predicted Performance (GOps/s) 16

fpgaConvNet vs. Existing FPGA Work Performance Density Comparison (GOps/s/Slice) 0.0005 0.00045 0.0004 0.00035 0.0003 0.00025 1.62× 0.0002 0.00015 0.0001 0.00005 0 Hand-tuned [1] Memory-optimised [2] Existing Work (GOps/s/Slice) fpgaConvNet (GOps/s/Slice) [1] C. Farabet et al., “CNP: An FPGA-Based Processor for Convolutional Networks”, in FPL, IEEE, 2009. [2] M. Peemen et al., “Memory-centric accelerator design for Convolutional Neural Networks”, in ICCD, 17 IEEE, 2013.

fpgaConvNet vs. Existing Embedded GPU Work Performance Efficiency Comparison (GOps/s/Watt) 8 Hand-tuned Embedded GPU 7 Tegra K1 at 800 MHz • 6 5 Memory Bandwidth: 12 GB/s • 4 fpgaConvNet 3 2 Zynq-7000 XC7Z020 at 100 MHz • 1 Memory Bandwidth: 4.26 GB/s 0 • Hand-tuned Embedded GPU [3] Existing Work (GOps/s/Watt) fpgaConvNet (GOps/s/Watt) [3] L. Cavigelli et al., “Accelerating real-time embedded scene labeling with convolutional networks”, in DAC, ACM/EDAC/IEEE, 2015. 18

Conclusions fpgaConvNet Deep Learning Developers • Caffe • TensorFlo • Theano • Torch 19

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks - PowerPoint PPT Presentation

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs Stylianos I. Venieris , Christos-Savvas Bouganis stylianos.venieris10@imperial.ac.uk FCCM 2016, Washington DC 2 May 2016 Deep Learning and AI 2 Deep Learning Success

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences.

texdoc 2.0 An update on creating LaTeX documents from within Stata Ben Jann University of Bern,

CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables

Internet DBMS versus Traditional DBMS Local distributed database system Much more users,

Improved constant factor for the unit distance problem Pter goston* and Dmtr Plvlgyi

Global fits to radiative b s transitions S ebastien Descotes-Genon Laboratoire de Physique

The Behavioral Approach to Systems Theory Paolo Rapisarda, Un. of Southampton, U.K. & Jan

2/25/2016 The SFA is the school district, nonpublic school or Residential Child Care Institution.

Sambuz

Useful Links

Newsletter

Mail Us

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks - PowerPoint PPT Presentation

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs Stylianos I. Venieris , Christos-Savvas Bouganis stylianos.venieris10@imperial.ac.uk FCCM 2016, Washington DC 2 May 2016 Deep Learning and AI 2 Deep Learning Success

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences.

texdoc 2.0 An update on creating LaTeX documents from within Stata Ben Jann University of Bern,

CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables

Internet DBMS versus Traditional DBMS Local distributed database system Much more users,

Improved constant factor for the unit distance problem Pter goston* and Dmtr Plvlgyi

Global fits to radiative b s transitions S ebastien Descotes-Genon Laboratoire de Physique

The Behavioral Approach to Systems Theory Paolo Rapisarda, Un. of Southampton, U.K. &amp; Jan

2/25/2016 The SFA is the school district, nonpublic school or Residential Child Care Institution.

Sambuz

Useful Links

Newsletter

Mail Us

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

The Behavioral Approach to Systems Theory Paolo Rapisarda, Un. of Southampton, U.K. & Jan