VTA 03.12.2019 | TU Darmstadt | ESA | F. Stock | 1 TaPaSCo - - PowerPoint PPT Presentation

vta
SMART_READER_LITE
LIVE PREVIEW

VTA 03.12.2019 | TU Darmstadt | ESA | F. Stock | 1 TaPaSCo - - PowerPoint PPT Presentation

TaPaSCo: Task-Parallel System Composer for FPGAs Deploy VTA on More Platforms Samuel Gro, Florian Stock, Carsten Heinz, Jaco A. Hofmann, Lukas Sommer, Lukas Weber, Andreas Koch Embedded Systems and Applications Group, TU Darmstadt VTA


slide-1
SLIDE 1

03.12.2019 | TU Darmstadt | ESA | F. Stock | 1

Samuel Groß, Florian Stock, Carsten Heinz, Jaco A. Hofmann, Lukas Sommer, Lukas Weber, Andreas Koch Embedded Systems and Applications Group, TU Darmstadt

TaPaSCo: Task-Parallel System Composer for FPGAs Deploy VTA on More Platforms

VTA

slide-2
SLIDE 2

03.12.2019 | TU Darmstadt | ESA | F. Stock | 2

TaPaSCo Framework

  • Builds complete FPGA SoC-designs from HLS kernels
  • r custom HDL cores
  • Automates Design-Space Exploration to determine

best system composition

  • Supports wide variety of Xilinx platforms
  • Includes software API for dispatching compute tasks to

FPGA

  • Available as free & open-source software
slide-3
SLIDE 3

03.12.2019 | TU Darmstadt | ESA | F. Stock | 3

TaPaSCo Design Flow

tapasco compose [vta x 2, sobel x 3] @ 100 MHz –p vc709 Core name Design frequency Core count Platform

(VTA kernel)

slide-4
SLIDE 4

03.12.2019 | TU Darmstadt | ESA | F. Stock | 4

TaPaSCo Architecture

slide-5
SLIDE 5

03.12.2019 | TU Darmstadt | ESA | F. Stock | 5

TaPaSCo Architecture

slide-6
SLIDE 6

03.12.2019 | TU Darmstadt | ESA | F. Stock | 6

TaPaSCo – VTA PE

slide-7
SLIDE 7

03.12.2019 | TU Darmstadt | ESA | F. Stock | 7

TaPaSCo Software API

slide-8
SLIDE 8

03.12.2019 | TU Darmstadt | ESA | F. Stock | 8

TaPaSCo Software API

TVM VTA

slide-9
SLIDE 9

03.12.2019 | TU Darmstadt | ESA | F. Stock | 9

TaPaSCo Platforms

Datacenter

  • Xilinx Alveo U250
  • Xilinx Virtex UltraScale+ VCU1525
  • Xilinx Virtex UltraScale+ VCU118
  • Xilinx Virtex UltraScale VCU108
  • Digilent NetFPGA SUME
  • Xilinx Virtex VC709
  • Amazon AWS F1 instance

Edge Devices

  • Xilinx Zynq UltraScale+ MPSoC ZCU102
  • Xilinx Zynq SoC ZC706
  • AVNET ZedBoard
  • Digilent Pynq-Z1
slide-10
SLIDE 10

03.12.2019 | TU Darmstadt | ESA | F. Stock | 10

TVM/VTA Stack

  • Advantages:
  • One generic driver
  • Many different platforms
  • Multiple VTA instances (WIP)
  • Larger VTA instances (WIP)
slide-11
SLIDE 11

03.12.2019 | TU Darmstadt | ESA | F. Stock | 11

Shameless Advertising Start to build your own AWS F1 accelerator system using TaPaSCo! Download TaPaSCo from Github: github.com/esa-tu-darmstadt/tapasco

slide-12
SLIDE 12

03.12.2019 | TU Darmstadt | ESA | F. Stock | 12

ADDITIONAL BONUS SLIDES

slide-13
SLIDE 13

03.12.2019 | TU Darmstadt | ESA | F. Stock | 13

TaPaSCo Software API – Example

Tapasco tapasco; auto a_wrapped = makeWrappedPointer(a.data(), a.size()); auto b_wrapped = makeWrappedPointer(b.data(), b.size()); auto job = tapasco.launch(SIMPLE_HLS_ID, makeInOnly(a_wrapped), makeOutOnly(b_wrapped)); job();

Wrap information about data-transfer Launch FPGA execution Provide information about data-transfer direction

slide-14
SLIDE 14

03.12.2019 | TU Darmstadt | ESA | F. Stock | 14

TaPaSCo in the Cloud

  • Amazon deploys Xilinx VU9+ FPGAs in AWS EC2 F1 instances
  • Most of the FPGA logic freely programmable, all interfaces routed through fixed Shell provided by

Amazon

Image source: Amazon

Shell Custom logic DDR4 channel 3 Optional DDR4 channels

slide-15
SLIDE 15

03.12.2019 | TU Darmstadt | ESA | F. Stock | 15

TaPaSCo in the Cloud - Challenges

  • Shell provides only a few frequencies, TaPaSCo supports arbitrary design frequencies
  • Include custom clock controller in programmable logic
  • DMA engine in Shell provides only limited throughput
  • Replace with TaPaSCo‘s own DMA engine
  • Shell provides only 16 interrupts, not enough for TaPaSCo architecture
  • Include custom interrupt controller for translation
  • Memory controllers for 3 DDR channels have to be placed in custom logic
  • Carefull timing necessary
slide-16
SLIDE 16

03.12.2019 | TU Darmstadt | ESA | F. Stock | 16

TaPaSCo in the Clouds – Conclusion

  • Completely automated toolflow to generate SoC-design from HLS code or custom HDL core for

Amazon AWS EC2 F1 FPGA instances

  • Generates ready-to-use Amazon FPGA Image (AFI)
  • Supports up to four independent memory channels
  • Easy-to-use software API for interfacing with FPGA accelerator
  • Open-source available!
slide-17
SLIDE 17

03.12.2019 | TU Darmstadt | ESA | F. Stock | 17

Existing FPGA Acceleration Toolflow

slide-18
SLIDE 18

03.12.2019 | TU Darmstadt | ESA | F. Stock | 18

Existing FPGA Accelerator Core