Ariane + NVDLA
Seamless Third-Party IP Integration with ESP
Davide Giri Kuan-Lin Chiu Guy Eichler Paolo Mantovani Nandhini Chandramoorthy (IBM Research) Luca P. Carloni
CARRV 2020
Ariane + NVDLA Seamless Third-Party IP Integration with ESP Davide - - PowerPoint PPT Presentation
Ariane + NVDLA Seamless Third-Party IP Integration with ESP Davide Giri Kuan-Lin Chiu Guy Eichler Paolo Mantovani Nandhini Chandramoorthy (IBM Research) CARRV 2020 Luca P. Carloni Motivation SoCs are increasingly heterogeneous [1]
Davide Giri Kuan-Lin Chiu Guy Eichler Paolo Mantovani Nandhini Chandramoorthy (IBM Research) Luca P. Carloni
CARRV 2020
→ IP reuse enables the design of complex SoCs
→ Proliferation of open-source IPs Seamless third-party IP integration is key!
2
[1] Shao, SLCA’15 [2] Khailani, DAC’18 [3] Gupta, IEEE Computer’17
Enhance ESP with support for third-party accelerators
3
[4] ESP: esp.cs.columbia.edu [5] Ariane: github.com/pulp-platform/ariane [6] NVDLA: nvdla.org
Demonstrate integration capabilities of ESP
Open-source release as part of ESP
4
5
6
Accelerator Flow
SoC Flow
floorplanning GUI
Rapid Prototyping SoC Integration HLS Design Flows RTL Design Flows
Vivado HLS Catapult HLS Stratus HLS
Ariane
accelerator
IP Library
accelerator
third-party accelerator
** By lewing@isc.tamu.edu Larry Ewing and The GIMP * By Nvidia Corporation
** *
7
** By lewing@isc.tamu.edu Larry Ewing and The GIMP
Rapid Prototyping SoC Integration
Ariane
accelerator
IP Library
accelerator
third-party accelerator
**
8
9
third-party accelerator
Third-party RTL and SW files list Accelerator definition (xml) RTL wrapper wiring Makefile targets definition
automated manual
ESP accelerator
Accelerator skeleton
Test behavior Generate RTL Test RTL Instantiate into SoC
… … …
accelerator accelerator acceleratorAccelerator specific functions
10
11
ESP processor tile
placed in the I/O tile
12
NVIDIA Deep Learning Accelerator
NVDLA small
13
SoCs evaluated on FPGA (Xilinx XCVU440)
14
Evaluation networks
15
3.8 4.5 1.3 0.4 1 2 3 4 5 LeNet Convnet SimpleNet ResNet50
frames / second 1 NVDLA
Performance of NVDLA small in ESP @ 50 MHz
1 2.1 3.1 3.9 1 2 3 4 5 1 NVDLA 1 mem ctrl 2 NVDLA 2 mem ctrl 3 NVDLA 3 mem ctrl 4 NVDLA 4 mem ctrl
frames / second (normalized) LeNet
Scaling NVDLA instances and DDR channels @ 50 MHz
18x lower than NVIDIA’s results @ 1GHz
performance preserved
Thank you from the ESP team!
sld.cs.columbia.edu esp.cs.columbia.edu sld-columbia/esp ColumbiaSld ESP channel
Seamless Third-Party IP Integration with ESP
Davide Giri Kuan-lin Chiu Guy Eichler Paolo Mantovani Nandhini Chandramoorthy (IBM) Luca P. Carloni
CARRV 2020