Introduction to Metal FS and FPGA Programming Hands-On Robert Schmid - - PowerPoint PPT Presentation

introduction to metal fs and fpga programming hands on
SMART_READER_LITE
LIVE PREVIEW

Introduction to Metal FS and FPGA Programming Hands-On Robert Schmid - - PowerPoint PPT Presentation

Introduction to Metal FS and FPGA Programming Hands-On Robert Schmid , Max Plauth, Sven Khler, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group 19.06.2019 Interest in FPGAs is growing (again) Logic Blocks Programmable F


slide-1
SLIDE 1

Introduction to Metal FS and FPGA Programming Hands-On

Robert Schmid, Max Plauth, Sven Köhler, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group 19.06.2019

slide-2
SLIDE 2

Field-Programmable Gate Array: programmable hardware circuit

Algorithms are represented as a hardware configuration

Reasons for using FPGAs

Energy efficiency

Parallel and pipelined data processing

‘Computing wires’

Technology Advancements

New Generation of Interconnects (OpenCAPI, CCIX, ...)

High-Level Synthesis (HLS) languages

‘Accelerators become first-class citizens in the system’

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 2

Interest in FPGAs is growing (again)

Programmable Interconnect Logic Blocks IO Blocks RAM/ALU/... Blocks

slide-3
SLIDE 3

How should end-users interact with FPGAs?

Just like with any other executable program!

Analogy: Builtin UNIX tools (cat, grep, sed, awk, …)

Do one thing, and do it well!

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 3

First-class citizens?

UNIX Pipe Redirect Standard Output to File

$ echo "Hello World" | fpga-encrypt –k key.bin > encrypted_file.bin

‘Operator’

slide-4
SLIDE 4

Goal: Improve the accessibility of FPGA accelerators using a file system abstraction

Foundations

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 4

Metal FS

IBM POWER CAPI + SNAP Xilinx Vivado

slide-5
SLIDE 5

19.06.2019 Robert Schmid Chart 5

Operators are specified in Vivado HLS

void my_metal_operator(mtl_stream & in, mtl_stream & out, snapu64_t offset) { mtl_stream_element element; do { element = in.read(); element.data += offset;

  • ut.write(element);

} while (!element.last); }

Operator

Configuration Input Stream Output Stream

ParProg 2019 Metal FS

slide-6
SLIDE 6

What happens here?

In between FPGA processing steps, data should not be copied to the CPU’s main memory (slow)

In conclusion:

Multiple operators must be deployed on the FPGA at once

Active subset and order of Operators should be configurable at runtime

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 6

Chaining Operators

$ cat encrypted_file.bin | fpga-decrypt | fpga-uppercase HELLO WORLD

slide-7
SLIDE 7

Streaming Data from different types of memory

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 7

Metal FS Operator Pipelines

Composition of Pipelines by using AXI Stream Switch

C++ API

Stream Switch Blowfish Encrypt Blowfish Decrypt Change Case Host Memory Non- volatile Memory No-op

OperatorRegistry registry; auto encrypt = registry.operators().at("encrypt"); encrypt->setOption("key", keyBuffer); auto dataSource = create_data_source(inputBuffer); auto dataSink = create_data_sink(outputBuffer); PipelineDefinition pipeline ({ dataSource, encrypt, dataSink }); pipeline.run();

slide-8
SLIDE 8

ap_clk ap_rst_n s_axi_ctrl_reg axi_metal_cpc AXI Protocol Converter S_AXI M_AXI aclk aresetn axi_metal_ctrl_crossbar AXI Crossbar S00_AXI M00_AXI M01_AXI M02_AXI M03_AXI M04_AXI aclk aresetn

  • p_colorfilter

Hls_operator_colorfilter (Pre-Production) s_axi_control axis_input axis_output ap_clk ap_rst_n interrupt axi_datamover_mm2s AXI DataMover M_AXI_MM2S S_AXIS_MM2S_CMD M_AXIS_MM2S_STS M_AXIS_MM2S m_axi_mm2s_aclk m_axi_mm2s_aresetn mm2s_err m_axis_mm2s_cmdsts_aclk m_axis_mm2s_cmdsts_aresetn hls_streamgen Hls_streamgen (Pre-Production) s_axi_ctrl

  • ut_r
ap_clk ap_rst_n interrupt metal_switch AXI4-Stream Switch S00_AXIS M00_AXIS S01_AXIS M01_AXIS S_AXI_CTRL aclk aresetn s_axi_ctrl_aclk s_axi_ctrl_aresetn interrupt_zero Constant dout[0:0] data_selector AXI4-Stream Switch S00_AXIS M00_AXIS S01_AXIS M01_AXIS S02_AXIS M02_AXIS S_AXI_CTRL aclk aresetn s_axi_ctrl_aclk s_axi_ctrl_aresetn axi_perf_mon_0 AXI Performance Monitor S_AXI SLOT_0_AXIS SLOT_1_AXIS s_axi_aclk s_axi_aresetn slot_0_axis_aclk slot_0_axis_aresetn slot_1_axis_aclk slot_1_axis_aresetn capture_event reset_event core_aclk core_aresetn interrupt interrupt_concat Concat In0[0:0] In1[0:0] In2[0:0] In3[0:0] In4[0:0] In5[0:0] In6[0:0] In7[0:0] dout[7:0] axi_streamsink Hls_streamsink (Pre-Production) data ap_clk ap_rst_n axi_datamover_s2mm AXI DataMover M_AXI_S2MM S_AXIS_S2MM S_AXIS_S2MM_CMD M_AXIS_S2MM_STS m_axi_s2mm_aclk m_axi_s2mm_aresetn s2mm_err m_axis_s2mm_cmdsts_awclk m_axis_s2mm_cmdsts_aresetn
  • ne

Constant dout[0:0] dm_smartconnect AXI SmartConnect S00_AXI S01_AXI M00_AXI aclk aresetn snap_action Hls_action (Pre-Production) s_axi_ctrl_reg m_axi_host_mem mm2s_cmd_V_V mm2s_sts s2mm_cmd_V_V s2mm_sts m_axi_metal_ctrl_V interrupt_reg_V_V interrupt_reg_V_V_TVALID interrupt_reg_V_V_TDATA[7:0] ap_clk ap_rst_n interrupt axi_host_mem_crossbar AXI Crossbar S00_AXI M00_AXI S01_AXI aclk aresetn m_axi_host_mem interrupt

Composition of Hardware Components: Block Design

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 8

slide-9
SLIDE 9

Foundations

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 9

Metal FS: Architecture Overview

CPU (‘Host’) FPGA Operator Pipelines Operator Pipelines IBM POWER CAPI + SNAP Xilinx Vivado

Operator Pipelines

AXI Stream Switch C++ API

slide-10
SLIDE 10

Leverage the NVMe storage on the Nallatech N250S FPGA card

One use case for Operator Pipelines

File System Metadata is maintained in an LMDB Key-Value Store on the host

inodes, directory entries, free extents

Block Mapper on the FPGA translates file offsets to physical addresses using extent lists

All file accesses are implemented as Operator Pipelines (Read -> Write)

Data transformations can be transparently added (e.g. encryption)

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 10

Metal FS Hybrid Filesystem

slide-11
SLIDE 11

Foundations

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 11

Metal FS: Architecture Overview

CPU (‘Host’) FPGA Operator Pipelines Operator Pipelines IBM POWER CAPI + SNAP Xilinx Vivado

Operator Pipelines

AXI Stream Switch C++ API

Hybrid File System

Data Sources and Sinks Block Mapper Filesystem Metadata Store

slide-12
SLIDE 12

Users can mount Metal FS as a Linux file system

Implemented in user space

Example:

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 12

Metal FS FUSE Filesystem

$ cp ~/orders.tbl /metal_fs/files/

slide-13
SLIDE 13

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 13

Metal FS Symbolic Executables

File System Driver & Pipeline Orchestrator Process change_case encrypt decrypt

Message flow via UNIX Socket Data flow via Memory-Mapped Files $ echo "Hello World" \ | /metal_fs/operators/change_case \ | /metal_fs/operators/encrypt \ | /metal_fs/operators/decrypt

slide-14
SLIDE 14

Foundations

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 14

Metal FS: Architecture Overview

CPU (‘Host’) FPGA Operator Pipelines Operator Pipelines IBM POWER CAPI + SNAP Xilinx Vivado

Operator Pipelines

AXI Stream Switch C++ API

Hybrid File System

Data Sources and Sinks Block Mapper Filesystem Metadata Store

User Interface & Instrumentation

Linux Filesystem Driver Symbolic Executables AXI Performance Monitor

slide-15
SLIDE 15

Demo

slide-16
SLIDE 16

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 16

Demo Screencast

slide-17
SLIDE 17

Steps:

1.

git clone https://github.com/rs22/metalfs-workshop

2.

Start the development container by using the start script:

start_linux, start_osx, start_win.bat

3.

Build a simulation image

make model

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 17

Hands-On: Prepare the Metal FS Simulation

slide-18
SLIDE 18

HLS translates the mtl_stream references into AXI Stream interfaces

We require the keep and last signals which are optional channels in the AXI Stream protocol

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 18

Anatomy of an Operator in Vivado HLS

struct mtl_stream_element { ap_uint<64> data; ap_uint<8> keep; ap_uint<1> last; }; void my_operator(mtl_stream &in, mtl_stream &out) { mtl_stream_element element; do { element = in.read();

  • ut.write(element);

} while (!element.last); }

slide-19
SLIDE 19

HLS offers the ap_uint types for integers with arbitrary bit precision

snapu{8, 16, 32, 64}_t are typedefs for ap_uint<>

Access bit ranges of an ap_uint like this (similar to VHDL):

Concatenate integers:

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 19

Programming with HLS: Arbitrary-Precision integers

snapu16_t my_integer; snapu8_t high_byte = my_integer(15, 8); snapu8_t high_byte = 0xFF; snapu8_t low_byte = 0x0A; snapu16_t both_bytes = (high_byte, low_byte);

slide-20
SLIDE 20

Steps:

1.

Build a new model

make model

2.

Start the simulation

make sim

3.

In the simulation window:

snap_maint

metal_fs /mnt

4.

Start a second shell in the container using the start script

cat src/hls_operator_colorfilter/apples_simulation.bmp \ | /mnt/operators/colorfilter \ > out.bmp

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 20

Hands-On: Run the Metal FS Simulation

slide-21
SLIDE 21

Goal: Implement an operator that processes a bitmap image and converts it to grayscale, except for pixels where red is the dominant color

Operator is prepared in src/hls_operator_colorfilter/hls_operator_colorfilter.cpp

Bitmap header is not aligned to a stream word boundary

The template temporarily inserts a padding to make processing easier

Task 1: Exclude the bitmap header data from being transformed

Task 2: Leave those pixels unmodified where red is the dominant color

The operator code can be compiled into software, useful for testing

src/hls_operator_colorfilter $ make test

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 21

Hands-On: Implement a Grayscale Filter operator

slide-22
SLIDE 22

Try out your implementation in the simulation environment

Profiling results look like this:

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 22

Hands-On: Simulating the Operator Implementation

Shell 1: $ make model $ make sim Simulation Shell: $ snap_maint $ metal_fs /mnt Shell 2: # -p enables profiling $ cat src/hls_operator_colorfilter/apples_simulation.bmp \ | /mnt/operators/colorfilter -p \ > out.bmp

STREAM BYTES TRANSFERRED ACTIVE CYCLES DATA WAIT CONSUMER WAIT TOTAL CYCLES MB/s input 6538 818 21% 634 16% 2439 63% 3897 419.43

  • utput 6538 818 21% 3076 79% 0 0% 3897 419.43

Our operator limits the pipeline throughput

slide-23
SLIDE 23

Open the xsim GUI

xsim –gui $SNAP_ROOT/hardware/sim/xsim/latest/top.wdb &

Only every four cycles a new stream element is processed

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 23

Hands-On: Inspecting the Simulation Waveform

4 cycles read write

slide-24
SLIDE 24

Open the colorfilter operator project in Vivado HLS

vivado_hls &

Project: src/hls_operator_colorfilter/hls_operator_colorfilter_sln_[…]

Switch to Analysis View

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 24

Hands-On: Vivado HLS Analysis View

Inner processing loop has four steps and is not pipelined

slide-25
SLIDE 25

Add a HLS PIPELINE pragma inside the do-while loop

New Performance Profile:

Profiling Results:

19.06.2019 Robert Schmid ParProg 2019 Metal FS Chart 25

Hands-On: Vivado HLS Pipeline pragma

STREAM BYTES TRANSFERRED ACTIVE CYCLES DATA WAIT CONSUMER WAIT TOTAL CYCLES MB/s input 6538 818 55% 668 45% 0 0% 1489 1097.72

  • utput 6538 818 55% 668 45% 0 0% 1489 1097.72
slide-26
SLIDE 26

Thank you for your attention!