A distributed model of computation for reconfigurable devices based - PowerPoint PPT Presentation

A distributed model of computation for reconfigurable devices based on a streaming architecture Paolo Cretaro National Institute for Nuclear Physics FPL 2019 Barcelona, September 2019

The ExaNeSt project: hardware highlights Unit: Xilinx Zynq Ultrascale+ FPGA  Four 64bit ARM Cortex A53 @1.5GHz  Programmable logic  16 high speed serial links @16Gbps Node: Quad-FPGA Daughter-Board (QFDB)  All-to-all internal connectivity  10 HSS links to remote QFDB (through network FPGA)  64 GB DDR4 RAM (16GB per FPGA)  512 GB NVMe SSD on storage FPGA Blade/mezzanine  4 QFDB in Track 1  2 HSS links per edge (local direct network)  32 SFP+ connectors for inter-mezzanine hybrid network I worked in the team who made the 3D torus network, based on a custom Virtual Cut-Through protocol Paolo Cretaro - FPL2019 10/09/2019 2

Mixing acceleration and network  With High Level Synthesis tools, FPGAs are becoming a viable way to accelerate tasks  Accelerators must be able to access the network directly to achieve low-latency communication among themselves and other remote hosts  A dataflow programming paradigm could take advantage of this feature to optimize communication patterns and loads System memory mapped bus ACCELERATOR CPU CPU DDR DDR NETW NETWORK INTERFACE ORK Paolo Cretaro - FPL2019 10/09/2019 3

Kahn processing networks advantages Group of sequential processes communicating through FIFO channels  Determinism: for the same input history the network produces exactly the same output  No shared memory: processes can run concurrently and synchronize through blocking read on input channel FIFOs  Distributing tasks on multiple devices is easy A C P B Paolo Cretaro - FPL2019 10/09/2019 4

Accelerator hardware interface  Virtual input/output channels for each source/destination  Direct host memory access for buffering and configuration (a device driver is needed)  Direct coupling with the network NETWORK NETWORK NETWORK NETWORK ADAPTER ADAPTER ADAPTER ADAPTER ACCELERATION ACCELERATION CORE CORE HOST HOST MEMORY MEMORY Paolo Cretaro - FPL2019 10/09/2019 5

Steps description A C 1. Write kernels in HLS E 2. A config file delineates tasks B D and data dependencies 3. A directed graph is built and mapped on the network topology 4. Accelerator blocks are flashed CU CU CU 7 8 6 on targeted nodes 5. Data is fed into entry points and CU CU CU tasks are started 3/C 5 4/D,E 6. Each task consumes its data CU CU CU and send the results to the next 1/B 2 0/A ones Paolo Cretaro - FPL2019 10/09/2019 6

Simplified task graph configuration example Device0 { Type: FPGA Task0 { Impl: source_task.c Input_channels: 0 Output_channels { Ch0: Device1.Task0.Ch1 } } Task1 { Impl: source_task.c Input_channels: 0 Output_channels { Ch0: Device1.Task0.Ch0 } } } Device1 { Type: FPGA Task0 { Impl: example_task.c Input_channels: 2 Output_channels { Ch0: Device1.Task1.Ch0 } } Task1 { Impl: sink_task.c input_channels: 1 } } Paolo Cretaro - FPL2019 10/09/2019 7

Thank you! Paolo Cretaro - FPL2019 10/09/2019 8

A distributed model of computation for reconfigurable devices based - PowerPoint PPT Presentation

A distributed model of computation for reconfigurable devices based on a streaming architecture Paolo Cretaro National Institute for Nuclear Physics FPL 2019 Barcelona, September 2019 The ExaNeSt project: hardware highlights Unit: Xilinx

Reconfigurable Computing Reconfigurable Computing Reconfigurable Architectures Reconfigurable

Reconfigurable Computing Computing Reconfigurable Reconfigurable Architectures Architectures

Reconfigurable Computing Computing Reconfigurable Design and implementation implementation

Reconfigurable Computing Reconfigurable Computing Design and implementation Design and

Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive

Using Reconfigurable Logic Using Reconfigurable Logic to Simulate Computer Systems Derek Chiou

Reconfigurable Computing Computing Reconfigurable Partial reconfiguration reconfiguration

Reconfigurable Computing Reconfigurable Computing Partitioning Partitioning Chapter 5 Chapter

Reconfigurable Computing Reconfigurable Computing for System on a Chip for System on a Chip

Reconfigurable Computing Reconfigurable Computing VHDL Crash Course VHDL Crash Course Chapter 2

Reconfigurable Computing Reconfigurable Computing Introduction Introduction Chapter 1 1

Reconfigurable Computing Computing Reconfigurable On- -line line communication communication

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 4.

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Model of Computation and Runtime Analysis Model of Computation Model of Computation Specifies

MDT FE Power Consumption M. Fras, 06 June 2019 ASD Power Depending on Voltage ASD Supply [V]

Status of the Test-Station for Silicon-Strip-Detectors in Bonn Max Becker HISKP, University of

LBNF Cryogenics Progress David Montanari FS Integration 21 February 2018 Thanks to Mark

H667 Priority Trunk Mains Rehabilitation PHASE 1 (Option 2) Proposed

P r i v a c y b y D e f a u l t . p p P r i v a c y b y D e f a

CS32 Summer 2013 Object-Oriented Programming in C++ RTTI, Advanced Inheritance, Intro to

christopher.brooks@clientship.com UK Spain Mexico - Argentina Customer Experience Matters

The Hardware/So>ware Interface CSE351 Winter 2013 Memory

Sambuz

Useful Links

Newsletter

Mail Us

A distributed model of computation for reconfigurable devices based - PowerPoint PPT Presentation

A distributed model of computation for reconfigurable devices based on a streaming architecture Paolo Cretaro National Institute for Nuclear Physics FPL 2019 Barcelona, September 2019 The ExaNeSt project: hardware highlights Unit: Xilinx

Reconfigurable Computing Reconfigurable Computing Reconfigurable Architectures Reconfigurable

Reconfigurable Computing Computing Reconfigurable Reconfigurable Architectures Architectures

Reconfigurable Computing Computing Reconfigurable Design and implementation implementation

Reconfigurable Computing Reconfigurable Computing Design and implementation Design and

Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive

Using Reconfigurable Logic Using Reconfigurable Logic to Simulate Computer Systems Derek Chiou

Reconfigurable Computing Computing Reconfigurable Partial reconfiguration reconfiguration

Reconfigurable Computing Reconfigurable Computing Partitioning Partitioning Chapter 5 Chapter

Reconfigurable Computing Reconfigurable Computing for System on a Chip for System on a Chip

Reconfigurable Computing Reconfigurable Computing VHDL Crash Course VHDL Crash Course Chapter 2

Reconfigurable Computing Reconfigurable Computing Introduction Introduction Chapter 1 1

Reconfigurable Computing Computing Reconfigurable On- -line line communication communication

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 4.

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Model of Computation and Runtime Analysis Model of Computation Model of Computation Specifies

MDT FE Power Consumption M. Fras, 06 June 2019 ASD Power Depending on Voltage ASD Supply [V]

Status of the Test-Station for Silicon-Strip-Detectors in Bonn Max Becker HISKP, University of

LBNF Cryogenics Progress David Montanari FS Integration 21 February 2018 Thanks to Mark

H667 Priority Trunk Mains Rehabilitation PHASE 1 (Option 2) Proposed

P r i v a c y b y D e f a u l t . p p P r i v a c y b y D e f a

CS32 Summer 2013 Object-Oriented Programming in C++ RTTI, Advanced Inheritance, Intro to

christopher.brooks@clientship.com UK Spain Mexico - Argentina Customer Experience Matters

The Hardware/So&gt;ware Interface CSE351 Winter 2013 Memory

Sambuz

Useful Links

Newsletter

Mail Us

The Hardware/So>ware Interface CSE351 Winter 2013 Memory