[PPT] - The Future Directions of Dataflow-Based Reconfigurable Hardware PowerPoint Presentation

SLIDE 1

The Future Directions of Dataflow-Based Reconfigurable Hardware Accelerators

Francesca Palumbo1, Claudio Rubattu1,2, Carlo Sau3, Tiziana Fanni3, Luigi Raffo3

1University of Sassari, PolComIng – Information Engineering Group 2University of Rennes, INSA Group 3University of Cagliari, Diee – Microelectronics and Bioengineering Group

Rennes, 12-14 December 2017

SLIDE 2

Outline

MDC Tool Summary

– Motivation and Approach – Current Functionalities and Future Directions

Hardware-Software Partitioning

– Co-Processing Support and Automated Characterization

Enhancing the MDC High-Level Synthesis Support

– Integration with the CAPH HLS engine

Run-time Monitoring of CGR Accelerators

– Extension of PAPI for dataflow in CGR hardware

Providing Further Degrees of Reconfigurability

– Mixed-Grain Reconfiguration Possibilities

SLIDE 3

Outline

MDC Tool Summary

– Motivation and Approach – Current Functionalities and Future Directions

Hardware-Software Partitioning

– Co-Processing Support and Automated Characterization

Enhancing the MDC High-Level Synthesis Support

– Integration with the CAPH HLS engine

Run-time Monitoring of CGR Accelerators

– Extension of PAPI for dataflow in CGR hardware

Providing Further Degrees of Reconfigurability

– Mixed-Grain Reconfiguration Possibilities

SLIDE 4

MDC tool Summary

Motivations HIGH PERFORMANCES

real time, portability, long battery life

UP-TO-DATE SOLUTIONS

last audio/video codecs, file formats...

MORE INTEGRATED FEATURES

MP3, Camera, Video, GPS...

MARKET DEMAND

convenient form factor, affordable price, fashion

SLIDE 5

MDC tool Summary

Approach

coarse grained substrate

C D A B C D A B

1:1

SLIDE 6

MDC tool Summary

Approach

coarse grained substrate

C D A B C D A B

1:1

coarse grained reconfigurable substrate

C D A B E D A

SB

E

SB

C D A B

2:1

SLIDE 7

Dynamic Power Manager Multi Dataflow Composer Tool Structural Profiler Co-Processor Generator

http://sites.unica.it/rpct/

MDC design suite

MDC tool Summary

Current Functionalities

SLIDE 8

Dynamic Power Manager Multi Dataflow Composer Tool Structural Profiler Co-Processor Generator

Functional Complexity Time to Market: Design & Mapping Automation

http://sites.unica.it/rpct/

MDC design suite

MDC tool Summary

Current Functionalities

SLIDE 9

Dynamic Power Manager Multi Dataflow Composer Tool Structural Profiler Co-Processor Generator

Functional Complexity Time to Market: Design & Mapping Automation Constraint Driven Optimisation

http://sites.unica.it/rpct/

MDC design suite

MDC tool Summary

Current Functionalities

SLIDE 10

Dynamic Power Manager Multi Dataflow Composer Tool Structural Profiler Co-Processor Generator

Power Efficiency Functional Complexity Time to Market: Design & Mapping Automation Constraint Driven Optimisation

http://sites.unica.it/rpct/

MDC design suite

MDC tool Summary

Current Functionalities

SLIDE 11

Dynamic Power Manager Multi Dataflow Composer Tool Structural Profiler Co-Processor Generator

Power Efficiency Functional Complexity Time to Market: Design & Mapping Automation Constraint Driven Optimisation

http://sites.unica.it/rpct/

Fast Integration and Prototyping

MDC design suite

MDC tool Summary

Current Functionalities

SLIDE 12

MDC tool Summary:

Future Directions MDC design suite

Dynamic Power Manager Baseline MDC Tool Structural Profiler Co-Processor Generator

SLIDE 13

MDC tool Summary:

Future Directions MDC design suite

Dynamic Power Manager Baseline MDC Tool Structural Profiler Co-Processor Generator

HW/SW Partitioning

SLIDE 14

MDC tool Summary:

Future Directions MDC design suite

Dynamic Power Manager Baseline MDC Tool Structural Profiler Co-Processor Generator

Enhancing HLS HW/SW Partitioning

SLIDE 15

Runtime Monitoring

MDC tool Summary:

Future Directions MDC design suite

Dynamic Power Manager Baseline MDC Tool Structural Profiler Co-Processor Generator

Enhancing HLS HW/SW Partitioning

SLIDE 16

Runtime Monitoring

MDC tool Summary:

Future Directions MDC design suite

Dynamic Power Manager Baseline MDC Tool Structural Profiler Co-Processor Generator

Enhancing HLS Reconfiguration Degrees HW/SW Partitioning

SLIDE 17

Outline

MDC Tool Summary

– Motivations and Approach – Current Functionalities and Future Directions

Hardware-Software Partitioning

– Co-Processing Support and Automated Characterization

Enhancing the MDC High-Level Synthesis Support

– Integration with the CAPH HLS engine

Run-time Monitoring of CGR Accelerators

– Extension of PAPI for dataflow in CGR hardware

Providing Further Degrees of Reconfigurability

– Mixed-Grain Reconfiguration Possibilities

SLIDE 18

Hardware-Software Partitioning

Co-Processing Support

MDC design suite Dynamic Power Manager Baseline MDC Tool (MDG+PC) Structural Profiler Co-Processor Generator

MDC is a dataflow-based design suite for the development

f

coarse- grained reconfigurable systems with the capability

f generating co-processing

units.

SLIDE 19

Hardware-Software Partitioning

Co-Processing Support

MDC design suite Dynamic Power Manager Baseline MDC Tool (MDG+PC) Structural Profiler Co-Processor Generator

MDC is a dataflow-based design suite for the development

f

coarse- grained reconfigurable systems with the capability

f generating co-processing

units.

MDC assembles ready-to-use platform-dependent IPs

SLIDE 20

Hardware-Software Partitioning

Co-Processing Support

MDC design suite Dynamic Power Manager Baseline MDC Tool (MDG+PC) Structural Profiler Co-Processor Generator

MDC is a dataflow-based design suite for the development

f

coarse- grained reconfigurable systems with the capability

f generating co-processing

units.

MDC assembles ready-to-use platform-dependent IPs
Designer can choose to opt for memory-mapped or stream-based

coupling.

SLIDE 21

Hardware-Software Partitioning

Automated Characterization

PREESM is rapid prototyping tool that generates code for heterogeneous multi/many- core embedded systems. It provides mapping of actors to multiple processing cores,

ptimizing execution latency

and balancing loads.

SLIDE 22

Hardware-Software Partitioning

Automated Characterization

PREESM is rapid prototyping tool that generates code for heterogeneous multi/many- core embedded systems. It provides mapping of actors to multiple processing cores,

ptimizing execution latency

and balancing loads.

Model the costs of the available communication schemes and

co-processing units

SLIDE 23

Hardware-Software Partitioning

Automated Characterization

PREESM is rapid prototyping tool that generates code for heterogeneous multi/many- core embedded systems. It provides mapping of actors to multiple processing cores,

ptimizing execution latency

and balancing loads.

Model the costs of the available communication schemes and

co-processing units

Connect PREESM and MDC to delegate specific computations (an

actor, a network of actors or a set of networks) to the most suitable co-processing units

SLIDE 24

Outline

MDC Tool Summary

– Approach – Baseline Functionality and Extensions

Hardware-Software Partitioning

– Co-Processing Support and Automated Characterization

Enhancing the MDC High-Level Synthesis Support

– Integration with the CAPH HLS engine

Run-time Monitoring of CGR Accelerators

– Extension of PAPI for dataflow in CGR hardware

Providing Further Degrees of Reconfigurability

– Mixed-Grain Reconfiguration Possibilities

SLIDE 25

Enhancing MDC High-Level Synthesis Support

Previous Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf TURNUS causation trace analysis worst case parsing script generation XRONOS high level synthesis

MDC back-end

IR.java multi-dataflow action weights

ptimal FIFOs

size per IR RVC-CAL dataflows multi-dataflow

ptimal FIFOs size

HDL components library

RVC-CAL hardware protocol

CGR substrate S B

SLIDE 26

Enhancing MDC High-Level Synthesis Support

Previous Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf TURNUS causation trace analysis worst case parsing script generation XRONOS high level synthesis

MDC back-end

IR.java multi-dataflow action weights

ptimal FIFOs

size per IR RVC-CAL dataflows multi-dataflow

ptimal FIFOs size

HDL components library

RVC-CAL hardware protocol

CGR substrate S B

SLIDE 27

Enhancing MDC High-Level Synthesis Support

Previous Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf TURNUS causation trace analysis worst case parsing script generation XRONOS high level synthesis

MDC back-end

IR.java multi-dataflow action weights

ptimal FIFOs

size per IR RVC-CAL dataflows multi-dataflow

ptimal FIFOs size

HDL components library

RVC-CAL hardware protocol

CGR substrate S B

SLIDE 28

Enhancing MDC High-Level Synthesis Support

Previous Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf TURNUS causation trace analysis worst case parsing script generation XRONOS high level synthesis

MDC back-end

IR.java multi-dataflow action weights

ptimal FIFOs

size per IR RVC-CAL dataflows multi-dataflow

ptimal FIFOs size

HDL components library

RVC-CAL hardware protocol

CGR substrate S B

High-Level Synthesis supports only FPGAs from one specific FPGA

vendor (Xilinx)

SLIDE 29

CAPH

is a domain- specific language for describing and implementing stream- processing applications.

Enhancing MDC High-Level Synthesis Support

CAPH

SLIDE 30

CAPH

is a domain- specific language for describing and implementing stream- processing applications.

Enhancing MDC High-Level Synthesis Support

CAPH

It relies upon the actor/dataflow model of computation

SLIDE 31

CAPH

is a domain- specific language for describing and implementing stream- processing applications.

Enhancing MDC High-Level Synthesis Support

CAPH

It is capable of generating VHDL code
It relies upon the actor/dataflow model of computation

SLIDE 32

CAPH

is a domain- specific language for describing and implementing stream- processing applications.

Enhancing MDC High-Level Synthesis Support

CAPH

It is platform agnostic
It is capable of generating VHDL code
It relies upon the actor/dataflow model of computation

SLIDE 33

Enhancing MDC High-Level Synthesis Support

Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf generation

MDC back-end

IR.java multi-dataflow RVC-CAL dataflows HDL components library CGR substrate S B

.cph

CAPH dataflows

SLIDE 34

Enhancing MDC High-Level Synthesis Support

Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf generation

MDC back-end

IR.java multi-dataflow RVC-CAL dataflows HDL components library CGR substrate S B

CAPH-to-RVC-CAL .cph

CAPH dataflows

SLIDE 35

Enhancing MDC High-Level Synthesis Support

Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf

1

generation

MDC back-end

IR.java multi-dataflow RVC-CAL dataflows

ptimal

FIFOs size per dataflow

HDL components library CGR substrate S B

CAPH systemC synthesis and simulation CAPH-to-RVC-CAL .cph

CAPH dataflows

SLIDE 36

Enhancing MDC High-Level Synthesis Support

Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf

1

generation

MDC back-end

IR.java multi-dataflow RVC-CAL dataflows

ptimal

FIFOs size per dataflow

HDL components library CGR substrate S B

CAPH systemC synthesis and simulation worst case parsing script CAPH-to-RVC-CAL

multi-dataflow

ptimal FIFOs size

.cph

CAPH dataflows

SLIDE 37

Enhancing MDC High-Level Synthesis Support

Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf

1

generation CAPH High-Level Synthesis

MDC back-end

IR.java multi-dataflow RVC-CAL dataflows

ptimal

FIFOs size per dataflow

HDL components library CAPH protocol CGR substrate S B

CAPH systemC synthesis and simulation worst case parsing script CAPH-to-RVC-CAL

multi-dataflow

ptimal FIFOs size

.cph

CAPH dataflows

SLIDE 38

Enhancing MDC High-Level Synthesis Support

Fully Automated Flow

composition Orcc font-end .cal

MDC front-end

ptimisation

.xdf

1

generation CAPH High-Level Synthesis

MDC back-end

IR.java multi-dataflow RVC-CAL dataflows

ptimal

FIFOs size per dataflow

HDL components library CAPH protocol CGR substrate S B

CAPH systemC synthesis and simulation worst case parsing script CAPH-to-RVC-CAL

multi-dataflow

ptimal FIFOs size

.cph

CAPH dataflows

SLIDE 39

Enhancing MDC High-Level Synthesis Support

Protocol Generalization

CGR substrate CGR substrate

A B A B

SLIDE 40

Enhancing MDC High-Level Synthesis Support

Protocol Generalization

CGR substrate CGR substrate

A

clk rst

B A B

SLIDE 41

reset

Enhancing MDC High-Level Synthesis Support

Protocol Generalization

CGR substrate CGR substrate

A

clk rst

B

reset reset clock clock

A B

SLIDE 42

reset

Enhancing MDC High-Level Synthesis Support

Protocol Generalization

CGR substrate CGR substrate

A

clk rst

B

reset reset clock clock

A B

clock reset FIFO_B

SLIDE 43

reset

Enhancing MDC High-Level Synthesis Support

Protocol Generalization

CGR substrate CGR substrate

A

clk rst

B

reset reset clock clock

A B

clock reset FIFO_B FANOUT_A

SLIDE 44

reset din

Enhancing MDC High-Level Synthesis Support

Protocol Generalization

CGR substrate CGR substrate

A

clk rst

B

reset reset clock clock

A B

dout wr full din wr full dout wr full din wr full dout rd empty din rd empty clock reset FIFO_B FANOUT_A

SLIDE 45

Enhancing MDC High-Level Synthesis Support

Prewitt/Sobel Multi-Flow Network

SLIDE 46

INPUT NETWORKS (provided by CAPH)

Enhancing MDC High-Level Synthesis Support

Prewitt/Sobel Multi-Flow Network

PREWITT NETWORK SOBEL NETWORK

SLIDE 47

OUTPUT NETWORK (provided by MDC) INPUT NETWORKS (provided by CAPH)

Enhancing MDC High-Level Synthesis Support

Prewitt/Sobel Multi-Flow Network

PREWITT NETWORK SOBEL NETWORK MERGED NETWORK

SLIDE 48

Enhancing MDC High-Level Synthesis Support

Preliminary Results

SLIDE 49

RESOURCES MDC+CAPH MDC+XRONOS XRONOS vs CAPH Altera Xilinx Altera Xilinx Altera Xilinx REG 1484 780

632
18,97%

LOGIC 1047 2347

1533
34,68%

RAM 15

6.5
+100%

DSP 36 36

100%

MAX FREQ [MHz] 105,80 93,69

142,86
+58,50%

EXEC TIME [cck] 15340 15340

15348
+0,05%

Enhancing MDC High-Level Synthesis Support

Preliminary Results

FPGA - Altera (5SGSMD5) and Xilinx (XC7VX485T)

SLIDE 50

RESOURCES MDC+CAPH MDC+XRONOS XRONOS vs CAPH Altera Xilinx Altera Xilinx Altera Xilinx REG 1484 780

632
18,97%

LOGIC 1047 2347

1533
34,68%

RAM 15

6.5
+100%

DSP 36 36

100%

MAX FREQ [MHz] 105,80 93,69

142,86
+58,50%

EXEC TIME [cck] 15340 15340

15348
+0,05%

Enhancing MDC High-Level Synthesis Support

Preliminary Results

Prewitt/Sobel Multi-Flow AREA [kGE] 269,82 466,90 (+73%) Max Freq [MHz] 417,36 399.04 (-4,4%)

ASIC - TSMC 45 nm CMOS technology FPGA - Altera (5SGSMD5) and Xilinx (XC7VX485T)

SLIDE 51

RESOURCES MDC+CAPH MDC+XRONOS XRONOS vs CAPH Altera Xilinx Altera Xilinx Altera Xilinx REG 1484 780

632
18,97%

LOGIC 1047 2347

1533
34,68%

RAM 15

6.5
+100%

DSP 36 36

100%

MAX FREQ [MHz] 105,80 93,69

142,86
+58,50%

EXEC TIME [cck] 15340 15340

15348
+0,05%

Enhancing MDC High-Level Synthesis Support

Preliminary Results

Prewitt/Sobel Multi-Flow AREA [kGE] 269,82 466,90 (+73%) Max Freq [MHz] 417,36 399.04 (-4,4%)

ASIC - TSMC 45 nm CMOS technology FPGA - Altera (5SGSMD5) and Xilinx (XC7VX485T)

COMING SOON:

EXPLORATION ON THE BENEFITS OF DATAFLOW-BASED HLS IN CGR ARCHITECTURES ON THE ROAD

SLIDE 52

Outline

MDC Tool Summary

– Motivations and Approach – Current Functionalities and Future Directions

Hardware-Software Partitioning

– Co-Processing Support and Automated Characterization

Enhancing the MDC High-Level Synthesis Support

– Integration with the CAPH HLS engine

Run-time Monitoring of CGR Accelerators

– Extension of PAPI for dataflow in CGR hardware

Providing Further Degrees of Reconfigurability

– Mixed-Grain Reconfiguration Possibilities

SLIDE 53

Run-time Monitoring of CGR Accelerators

PAPI for dataflow in software

PROCESSOR

C code processing

C D A B

dataflow application (RVC-CAL)

C code generation

SLIDE 54

Run-time Monitoring of CGR Accelerators

PAPI for dataflow in software

PROCESSOR

C code processing PAPI registers reading:

Total instructions
Type of operations
Memory usage

P M C

C D A B

dataflow application (RVC-CAL)

C code generation C code generation with PAPI

SLIDE 55

Run-time Monitoring of CGR Accelerators

PAPI for dataflow in software

PROCESSOR

C code processing PAPI registers reading:

Total instructions
Type of operations
Memory usage

Energy estimation

Based on board characterization

8 10 12 14 16 18

1 2 3 4

Power Workpoint PAPI estimation

Est Real

P M C

C D A B

dataflow application (RVC-CAL)

@design time @run time

C code generation C code generation with PAPI

SLIDE 56

C D A B

Run-time Monitoring of CGR Accelerators

Extension of PAPI for dataflow in CGR hardware

PROCESSOR

C code processing PAPI registers reading:

Total instructions
Type of operations
Memory usage

Energy estimation

Based on board characterization

8 10 12 14 16 18

1 2 3 4

Power Workpoint PAPI estimation

Est Real

P M C

C D A B

dataflow applications (RVC-CAL)

@run time

C code generation with PAPI

@design time

SLIDE 57

C D A B

Run-time Monitoring of CGR Accelerators

Extension of PAPI for dataflow in CGR hardware

PROCESSOR

C code processing PAPI registers reading:

Total instructions
Type of operations
Memory usage

Energy estimation

Based on board characterization

8 10 12 14 16 18

1 2 3 4

Power Workpoint PAPI estimation

Est Real

P M C

C D A B

dataflow applications (RVC-CAL)

@run time CGR accelerator

SB SB 2

A B D

SB 1

F E C

configurator

sel0 sel1 sel2

ID

1 1 1

FIFO_A FIFO_B FIFO_E FIFO_C FIFO_F

MDC CGR accelerator generation C code generation with PAPI

@design time

SLIDE 58

C D A B

Run-time Monitoring of CGR Accelerators

Extension of PAPI for dataflow in CGR hardware

PROCESSOR

C code processing PAPI registers reading:

Total instructions
Type of operations
Memory usage

Energy estimation

Based on board characterization

8 10 12 14 16 18

1 2 3 4

Power Workpoint PAPI estimation

Est Real

P M C

C D A B

dataflow applications (RVC-CAL)

@run time CGR accelerator

SB SB 2

A B D

SB 1

F E C

configurator

sel0 sel1 sel2

ID

1 1 1

FIFO_A FIFO_B FIFO_E FIFO_C FIFO_F

C code generation with PAPI

FIFOs operation

MDC CGR accelerator generation with PAPI

@design time

SLIDE 59

C D A B

PROCESSOR

C code processing PAPI registers reading:

Total instructions
Type of operations
Memory usage

Energy estimation

Based on board characterization

P M C

C D A B

dataflow applications (RVC-CAL)

CGR accelerator

SB SB 2

A B D

SB 1

F E C

configurator

sel0 sel1 sel2

ID

1 1 1

FIFO_A FIFO_B FIFO_E FIFO_C FIFO_F

MDC CGR accelerator generation C code generation with PAPI

FIFOs operation

MDC CGR accelerator generation with PAPI

@design time

Run-time Monitoring of CGR Accelerators

Extension of PAPI for dataflow in CGR hardware

Configuration Manager

SLIDE 60

CGR accelerator

SB 2

A B D

SB 1

F E C

configurator

sel0 sel1 sel2

ID

1 1 1

FIFO_A FIFO_B FIFO_E FIFO_C FIFO_F

SB

MDC CGR accelerator generation MDC CGR accelerator generation with PAPI

C D A B

PROCESSOR

C code processing PAPI registers reading:

Total instructions
Type of operations
Memory usage

Energy estimation

Based on board characterization

P M C

C D A B

dataflow applications (RVC-CAL)

C code generation with PAPI

FIFOs operation

@design time

Configuration Manager

α

Run-time Monitoring of CGR Accelerators

Extension of PAPI for dataflow in CGR hardware

SLIDE 61

CGR accelerator

SB 2

A B D

SB 1

F E C

configurator

sel0 sel1 sel2

ID

1 1 1

FIFO_A FIFO_B FIFO_E FIFO_C FIFO_F

SB

MDC CGR accelerator generation MDC CGR accelerator generation with PAPI

C D A B

PROCESSOR

C code processing PAPI registers reading:

Total instructions
Type of operations
Memory usage

Energy estimation

Based on board characterization

P M C

C D A B

dataflow applications (RVC-CAL)

C code generation with PAPI

FIFOs operation

@design time

β

Run-time Monitoring of CGR Accelerators

Extension of PAPI for dataflow in CGR hardware

Configuration Manager

SLIDE 62

Outline

MDC Tool Summary

– Motivations and Approach – Current Functionalities and Future Directions

Hardware-Software Partitioning

– Co-Processing Support and Automated Characterization

Enhancing the MDC High-Level Synthesis Support

– Integration with the CAPH HLS engine

Run-time Monitoring of CGR Accelerators

– Extension of PAPI for dataflow in CGR hardware

Providing Further Degrees of Reconfigurability

– Mixed-Grain Reconfiguration Possibilities

SLIDE 63

LB LB 1 LB 2 LB 3 LB 4 LB 5 LB 6 LB 7 LB 8

Providing Further Degrees of Reconfigurability

Fine-Grain and Partial Reconfiguration

PU PU 2 PU 5 PU 4 PU 1 PU 3

FINE-GRAIN RECONFIGURATION (FPGA) COARSE-GRAIN RECONFIGURATION (MDC ACCELERATOR)

SLIDE 64

LB LB 1 LB 2 LB 3 LB 4 LB 5 LB 6 LB 7 LB 8

Providing Further Degrees of Reconfigurability

Fine-Grain and Partial Reconfiguration

LUT 4x2

PU PU 2 PU 5 PU 4 PU 1 PU 3 datapath (mul, sh)

control (fsm)

/ 16 / 16

FINE-GRAIN RECONFIGURATION (FPGA) COARSE-GRAIN RECONFIGURATION (MDC ACCELERATOR)

bit-level reconfiguration word-level reconfiguration

SLIDE 65

LB LB 1 LB 2 LB 3 LB 4 LB 5 LB 6 LB 7 LB 8

Providing Further Degrees of Reconfigurability

Fine-Grain and Partial Reconfiguration

LUT 4x2

PU PU 2 PU 5 PU 4 PU 1 PU 3 datapath (mul, sh)

control (fsm)

/ 16 / 16

FINE-GRAIN RECONFIGURATION (FPGA) COARSE-GRAIN RECONFIGURATION (MDC ACCELERATOR)

bit-level reconfiguration very flexible (any kind of HDL defined system) word-level reconfiguration small flexibility (fixed set of predefined configuration)

SLIDE 66

LB LB 1 LB 2 LB 3 LB 4 LB 5 LB 6 LB 7 LB 8

Providing Further Degrees of Reconfigurability

Fine-Grain and Partial Reconfiguration

LUT 4x2

PU PU 2 PU 5 PU 4 PU 1 PU 3 datapath (mul, sh)

control (fsm)

/ 16 / 16

FINE-GRAIN RECONFIGURATION (FPGA) COARSE-GRAIN RECONFIGURATION (MDC ACCELERATOR)

bit-level reconfiguration very flexible (any kind of HDL defined system) slow to configure (lot of switches and LUTs) big memory footprint (long configuration bitstream) word-level reconfiguration small flexibility (fixed set of predefined configuration) fast to configure (small amount of switches) negligible memory footprint (log₂(#config) bits)

SLIDE 67

LB LB 1 LB 2 LB 3 LB 4 LB 5 LB 6 LB 7 LB 8

Providing Further Degrees of Reconfigurability

Fine-Grain and Partial Reconfiguration

PU PU 2 PU 5 PU 4 PU 1 PU 3 datapath (mul, sh)

control (fsm)

/ 16 / 16

FINE-GRAIN RECONFIGURATION (FPGA) COARSE-GRAIN RECONFIGURATION (MDC ACCELERATOR)

bit-level reconfiguration very flexible (any kind of HDL defined system) slow to configure (lot of switches and LUTs) big memory footprint (long configuration bitstream) word-level reconfiguration small flexibility (fixed set of predefined configuration) fast to configure (small amount of switches) negligible memory footprint (log₂(#config) bits)

DYNAMIC PARTIAL RECONFIGURATION (DPR) runtime reconfiguration of

nly a well defined

region of the FPGA

.bit

system configurations

SLIDE 68

LB LB 1 LB 2 LB 3 LB 4 LB 5 LB 6 LB 7 LB 8

Providing Further Degrees of Reconfigurability

Fine-Grain and Partial Reconfiguration

PU PU 2 PU 5 PU 4 PU 1 PU 3 datapath (mul, sh)

control (fsm)

/ 16 / 16

FINE-GRAIN RECONFIGURATION (FPGA) COARSE-GRAIN RECONFIGURATION (MDC ACCELERATOR)

word-level reconfiguration small flexibility (fixed set of predefined configuration) fast to configure (small amount of switches) negligible memory footprint (log₂(#config) bits) bit-level reconfiguration flexible (HDL systems precedently implemented) time to configure typically in terms of ms memory footprint to be considered

DYNAMIC PARTIAL RECONFIGURATION (DPR) runtime reconfiguration of

nly a well defined

region of the FPGA

.bit

system configurations

SLIDE 69

LB LB 1 LB 2 LB 3 LB 4 LB 5 LB 6 LB 7 LB 8

Providing Further Degrees of Reconfigurability

Fine-Grain and Partial Reconfiguration

PU PU 2 PU 5 PU 4 PU 1 PU 3 datapath (mul, sh)

control (fsm)

/ 16 / 16

FINE-GRAIN RECONFIGURATION (FPGA) COARSE-GRAIN RECONFIGURATION (MDC ACCELERATOR)

word-level reconfiguration small flexibility (fixed set of predefined configuration) fast to configure (small amount of switches) negligible memory footprint (log₂(#config) bits) power consumption due to reconfiguration bit-level reconfiguration flexible (HDL systems precedently implemented) time to configure typically in terms of ms memory footprint to be considered power consumption peak during reconfiguration

DYNAMIC PARTIAL RECONFIGURATION (DPR) runtime reconfiguration of

nly a well defined

region of the FPGA

.bit

system configurations

SLIDE 70

LB LB 1 LB 2 LB 3 LB 4 LB 5 LB 6 LB 7 LB 8

Providing Further Degrees of Reconfigurability

Fine-Grain and Partial Reconfiguration

PU PU 2 PU 5 PU 4 PU 1 PU 3 datapath (mul, sh)

control (fsm)

/ 16 / 16

FINE-GRAIN RECONFIGURATION (FPGA) COARSE-GRAIN RECONFIGURATION (MDC ACCELERATOR)

word-level reconfiguration small flexibility (fixed set of predefined configuration) fast to configure (small amount of switches) negligible memory footprint (log₂(#config) bits) power consumption due to reconfiguration bit-level reconfiguration flexible (HDL systems precedently implemented) time to configure typically in terms of ms memory footprint to be considered power consumption peak during reconfiguration

DYNAMIC PARTIAL RECONFIGURATION (DPR) runtime reconfiguration of

nly a well defined

region of the FPGA

.bit

system configurations

COMPLEMENTARITY

DPR BIG change, BIG overhead CGR SMALL change, SMALL overhead

SLIDE 71

CG reconfigurable substrate

Providing Further Degrees of Reconfigurability

FG into CG reconfiguration

PU0 PU2 PU5 PU4 PU1 PU3

SLIDE 72

FPGA FG reconfigurable substrate

CG reconfigurable substrate

Providing Further Degrees of Reconfigurability

FG into CG reconfiguration

PU0 PU2 PU5 PU4 PU1 PU3

SLIDE 73

FPGA FG reconfigurable substrate

CG reconfigurable substrate

Providing Further Degrees of Reconfigurability

FG into CG reconfiguration

PU0 PU2 PU5 PU4 PU1 PU3 DPR subjected region

SLIDE 74

FPGA FG reconfigurable substrate

CG reconfigurable substrate

Providing Further Degrees of Reconfigurability

FG into CG reconfiguration

PU0 PU2 PU5 PU4 PU1 PU3 DPR subjected region

PU1a PU1b PU1c PU1d

SLIDE 75

FPGA FG reconfigurable substrate

CG reconfigurable substrate

Providing Further Degrees of Reconfigurability

FG into CG reconfiguration

PU0 PU2 PU5 PU4 PU1 PU3 DPR subjected region

PU1a PU1b PU1c PU1d

To be stored into the FPGA internal memory PU1 a.bit PU1 b.bit PU1 c.bit PU1 d.bit

SLIDE 76

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate

host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

SLIDE 77

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate

host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region

SLIDE 78

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate CG reconfigurable substrate CG0

PU PU 2 PU 5 PU 4 PU 1 PU 3 host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region RUNTIME:

t0: config FG = CG0

execute CG0

SLIDE 79

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate CG reconfigurable substrate CG0

PU PU 2 PU 5 PU 4 PU 1 PU 3 host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region

PU 4

RUNTIME:

t0: config FG = CG0 t1: config CG = α

PU 1

execute α α execute CG0

SLIDE 80

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate CG reconfigurable substrate CG0

PU PU 2 PU 5 PU 4 PU 1 PU 3 host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region

PU 4

RUNTIME:

t0: config FG = CG0 t1: config CG = α … SMALL context change …

PU 1

α

SLIDE 81

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate CG reconfigurable substrate CG0

PU PU 2 PU 5 PU 4 PU 1 PU 3 host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region

PU 2 PU 4 PU 3

RUNTIME:

t0: config FG = CG0 t1: config CG = α … SMALL context change … t2: config CG = β

α execute β β

SLIDE 82

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate CG reconfigurable substrate CG0

PU PU 2 PU 5 PU 4 PU 1 PU 3 host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region

PU 2 PU 4 PU 3

RUNTIME:

t0: config FG = CG0 t1: config CG = α … SMALL context change … t2: config CG = β … SMALL context change …

α β

SLIDE 83

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate CG reconfigurable substrate CG0

PU PU 2 PU 5 PU 4 PU 1 PU 3 host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region

PU 2 PU 3

RUNTIME:

t0: config FG = CG0 t1: config CG = α … SMALL context change … t2: config CG = β … SMALL context change … t3: config CG = γ

PU 1 PU

α β execute γ γ

SLIDE 84

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate CG reconfigurable substrate CG0

PU PU 2 PU 5 PU 4 PU 1 PU 3 host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region

PU 2 PU 3

RUNTIME:

t0: config FG = CG0 t1: config CG = α … SMALL context change … t2: config CG = β … SMALL context change … t3: config CG = γ … BIG context change …

PU 1 PU

α β γ

SLIDE 85

Providing Further Degrees of Reconfigurability

CG into FG reconfiguration

FPGA

FG reconfgurable substrate CG reconfigurable substrate CG0

PU PU 2 PU 5 PU 4 PU 1 PU 3 host processor

DDR

||||||||||||||||| ||||||||||||||||| ||||||| |||||||

BUS if REGS

LOCAL MEM

DMA

Ethernet CTRL

DPR subjected region

PU 2 PU 3

RUNTIME:

t0: config FG = CG0 t1: config CG = α … SMALL context change … t2: config CG = β … SMALL context change … t3: config CG = γ … BIG context change … t4: config FG = CG1

PU 1 PU

α β γ execute CG1 ω

PU 2

CG reconfigurable substrate CG1

PU 2 PU 3

PU 4

PU PU 1

SLIDE 86

Providing Further Degrees of Reconfigurability

CG into Artico³

Artico3 is a DPR supporting architecture in charge of smartly manage performance, consumption and dependability.

hardware acceleration
hierarchical memory
bus based, DMA enabled communication

SLIDE 87

Providing Further Degrees of Reconfigurability

CG into Artico³

Artico3 is a DPR supporting architecture in charge of smartly manage performance, consumption and dependability.

enhance flexibility by enabling CGR

within Artico3 slots

hardware acceleration
hierarchical memory
bus based, DMA enabled communication

SLIDE 88

Providing Further Degrees of Reconfigurability

CG into Artico³

Artico3 is a DPR supporting architecture in charge of smartly manage performance, consumption and dependability.

exploit

dataflow to facilitate/ automate programmability

enhance flexibility by enabling CGR

within Artico3 slots

hardware acceleration
hierarchical memory
bus based, DMA enabled communication

SLIDE 89

Providing Further Degrees of Reconfigurability

The big picture within CERBERO

PREESM: dataflow based HW/SW and FGR/CGR partitioning

SLIDE 90

Providing Further Degrees of Reconfigurability

The big picture within CERBERO

PREESM: dataflow based HW/SW and FGR/CGR partitioning

SLIDE 91

Providing Further Degrees of Reconfigurability

The big picture within CERBERO

PREESM: dataflow based HW/SW and FGR/CGR partitioning PAPI: dataflow based runtime monitoring of the system to trigger reconfiguration

SLIDE 92

Providing Further Degrees of Reconfigurability

The big picture within CERBERO

ALPHA BETA GAMMA MULTI-FLOW

PREESM: dataflow based HW/SW and FGR/CGR partitioning PAPI: dataflow based runtime monitoring of the system to trigger reconfiguration

SLIDE 93

Providing Further Degrees of Reconfigurability

The big picture within CERBERO

ALPHA BETA GAMMA MULTI-FLOW

Performance Monitor Fault Monitor Accelerators (fine/coarse grain)

Monitoring Counters

Evaluate Monitors Output Fine/Coarse-grained accelerator reconfiguration

PREESM: dataflow based HW/SW and FGR/CGR partitioning PAPI: dataflow based runtime monitoring of the system to trigger reconfiguration

SLIDE 94

Thanks To …

Coordinator: Michal Masin (IBM), michaelm@il.ibm.com Scientific Coordinator: Francesca Palumbo (UniSS), fpalumbo@uniss.it Innovation Manager: Katiuscia Zedda (Abinsula), katiuscia.zedda@abinsula.com Dissemination-Communication Manager: Francesco Regazzoni (USI), francesco.regazzoni@usi.ch

www.cerbero-h2020.eu info@cerbero-h2020.eu @CERBERO_h2020

EU Commission for funding the CERBERO (Cross-layer modEl-based fRamework for multi-oBjective dEsign of Reconfigurable systems in unceRtain hybRid envirOnments) project as part of the H2020 Programme under grant agreement No 732105.

SLIDE 95