fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs
Stylianos I. Venieris, Christos-Savvas Bouganis stylianos.venieris10@imperial.ac.uk FCCM 2016, Washington DC 2 May 2016
fpgaConvNet: A Framework for Mapping Convolutional Neural Networks - - PowerPoint PPT Presentation
fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs Stylianos I. Venieris , Christos-Savvas Bouganis stylianos.venieris10@imperial.ac.uk FCCM 2016, Washington DC 2 May 2016 Deep Learning and AI 2 Deep Learning Success
Stylianos I. Venieris, Christos-Savvas Bouganis stylianos.venieris10@imperial.ac.uk FCCM 2016, Washington DC 2 May 2016
2
3 Image Recognition (Microsoft, 2015) Image Captioning (Microsoft, 2015) “Deep Face” (Facebook, 2014)
4
5
Deep Learning Developers
6
Automated Design Space Exploration ConvNet Hardware Mapping
fpgaConvNet
FPGA Target Platform Specifications
Supplied by Deep Learning Expert
ConvNet Description Bitstream
7 convolutional + nonlinearity pooling convolutional + nonlinearity pooling
8
Streaming Analytical power
9 ConvNet Hardware SDF Graph
Memory Interface Input Data
10 ConvNet Hardware SDF Graph
2 4 6 5 10
Throughput Area
Design Space
Current Design Point
FPGA 2 FPGA 1
11
12
Fine-grained Folding
Input Data Conv Layer 1 Nonlin Layer 1 Pool Layer 1 Conv Layer 2 Nonlin Layer 2 Pool Layer 2
Subgraph 1 Subgraph 2 Subgraph 3 Bitstream 1 Bitstream 2 Bitstream 3
13
Bitstream
Input Data Intermediate Results Intermediate Results Final Results
Off-chip Memory
14
Window Size = K Pool Size = P Design 1 Design 2
– Actions as algebraic operations – Any local action propagates through the network – Static scheduling – Analytical Performance Model – Cast DSE to formal resource-constrained optimisation
Hardware Stages Interconnections
15
16
2 4 6 8 10 12 14
CFF LeNet-5 MPCNN CNP Sign Recognition ConvNet Scene Labelling ConvNet
Performance Model Accuracy
Measured Performance (GOps/s) Predicted Performance (GOps/s)
Error between 1.73% and 11.76%
17
0.00005 0.0001 0.00015 0.0002 0.00025 0.0003 0.00035 0.0004 0.00045 0.0005
Hand-tuned [1] Memory-optimised [2]
Performance Density Comparison (GOps/s/Slice)
Existing Work (GOps/s/Slice) fpgaConvNet (GOps/s/Slice)
1.62×
[1] C. Farabet et al., “CNP: An FPGA-Based Processor for Convolutional Networks”, in FPL, IEEE, 2009. [2] M. Peemen et al., “Memory-centric accelerator design for Convolutional Neural Networks”, in ICCD, IEEE, 2013.
18
1 2 3 4 5 6 7 8
Hand-tuned Embedded GPU [3]
Performance Efficiency Comparison (GOps/s/Watt)
Existing Work (GOps/s/Watt) fpgaConvNet (GOps/s/Watt)
[3] L. Cavigelli et al., “Accelerating real-time embedded scene labeling with convolutional networks”, in DAC, ACM/EDAC/IEEE, 2015.
19
Deep Learning Developers