PCIeHLS Malte Vesper, Dirk Koch and Khoa Pham High level synthesis - - PowerPoint PPT Presentation

pciehls
SMART_READER_LITE
LIVE PREVIEW

PCIeHLS Malte Vesper, Dirk Koch and Khoa Pham High level synthesis - - PowerPoint PPT Presentation

PCIeHLS Malte Vesper, Dirk Koch and Khoa Pham High level synthesis half a solution Easy generation of kernels from popular languages Good results require tuning with knowledge about FPGA architecture No infrastructure for kernel C


slide-1
SLIDE 1

PCIeHLS

Malte Vesper, Dirk Koch and Khoa Pham

slide-2
SLIDE 2

High level synthesis – half a solution

  • Easy generation of kernels from popular languages
  • Good results require tuning with knowledge about FPGA architecture
  • No infrastructure for kernel

C++ C

slide-3
SLIDE 3

Vendor kernel integration

  • For selected boards only:

=> Popular academic board VC709 missing

  • Vendor partial flow

Intel SDK for OpenCL

slide-4
SLIDE 4

Partial flow

Ours

Potentially calls for minor manual adjustments on the static system Relocation of modules Combining partial regions Synthesis largely independent of static system Synthesis of partial and static with different tool versions

Xilinx

Commercial stability

slide-5
SLIDE 5

Things you don’t want to know

  • ICAP
  • PCIe
  • Memory controller
  • Decoupling
  • Clock domain crossing
  • Timing closure
slide-6
SLIDE 6

System diagram

Module 0 Module 1 Module n Module … Partial reconfiguration (ICAP) 256 32

slide-7
SLIDE 7

Static System

Floorplan

  • Up to 4 user modules
  • Each user module ≈13.5% Slices
  • Static system

≈46.0% Slices

  • Adjacent user module areas can

be combined

UserModule UserModule UserModule UserModule

slide-8
SLIDE 8

Flow

slide-9
SLIDE 9

Steps of our flow

  • Bus macro
  • Clock constraining
  • Block:
  • Fabric differences
  • Sites used by static system
  • Pips used by static system
  • Timing constraints
  • Cut out bitstream with Bitman
slide-10
SLIDE 10

Bus Macro

  • LUT – wire – LUT

LUT LUT

slide-11
SLIDE 11

Bus Macro

  • LUT – wire – LUT
  • Constrained:
  • LOC/BEL

LUT LUT LUT LUT LUT LUT

slide-12
SLIDE 12

Bus Macro

  • LUT – wire – LUT
  • Constrained:
  • LOC/BEL
  • LOCK_PINS

LUT LUT LUT LUT

slide-13
SLIDE 13

Bus Macro

  • LUT – wire – LUT
  • Constrained:
  • LOC/BEL
  • LOCK_PINS
  • FIXED_ROUTE

LUT LUT LUT LUT

slide-14
SLIDE 14

Bus Macro

  • LUT – wire – LUT
  • Constrained:
  • LOC/BEL
  • LOCK_PINS
  • FIXED_ROUTE
slide-15
SLIDE 15

Clock constraining

  • Ensure clock is driven
  • Block other h-wires
  • Issues: timing differences on

relocation, positive and negative skew

slide-16
SLIDE 16

Fabric differences

  • Special cells disturb regularity of

fabric (i.e. PCIe, ICAP, …)

  • Simply block differences

LUT LUT LUT LUT LUT LUT LUT LUT LUT LUT PCIe Pblock 0 Pblock 1

slide-17
SLIDE 17

Sites used by static system

  • Block
  • I/O does not actually matter, not

reconfigured

LUT LUT LUT LUT LUT LUT LUT LUT

I/O

Pblock 0 Pblock 1

I/O I/O I/O I/O I/O I/O I/O

slide-18
SLIDE 18

?

Optimization prevention

  • Floating wires tied off to 0
  • Optimization might remove logic

&

?

slide-19
SLIDE 19

Optimization prevention

  • Floating wires tied off to 0
  • Optimization might remove logic
  • Flop marked as DONT_TOUCH

prevents logic optimization

  • Works for signals into the partial

region as well

&

? ?

DONT_TOUCH

slide-20
SLIDE 20

Routing used by static system

slide-21
SLIDE 21

Routing used by static system

  • Route a blocker from outside the

PR through the wires (pips)

LUT LUT LUT LUT LUT LUT LUT Pblock 0 Pblock 1 LUT INT INT INT INT

slide-22
SLIDE 22

Timing constraints

  • Extract timing to Bus macro in static system
  • Calculate slowest as WORST
  • Constrain path of partial module to bus macro to period-worst
slide-23
SLIDE 23

Static System

Bitman cutting

  • extract partial bitstreams
  • Relocate bitstreams for modules
slide-24
SLIDE 24

Summary

  • Build modules:
  • Once, use in multiple locations
  • Independent of static system
  • Infrastructure provided:
  • ICAP partial reconfiguration
  • PCIe link to host
  • MMCM to adjust clock for partial modules
  • Memory
slide-25
SLIDE 25

Thank you for your attention

Questions