Design patterns for code reuse in HLS packet processing pipelines - PowerPoint PPT Presentation

Design patterns for code reuse in HLS packet processing pipelines Haggai Eran ∗ † , Lior Zeno ∗ , Zsolt István ‡ , and Mark Silberstein ∗ ∗ Technion — Israel Institute of Technology † Mellanox Technologies ‡ IMDEA Software Institute FCCM 2019 1

Network packet processing & FPGAs ● High-throughput ● Low latency ● Predictable performance ● Flexibility E.g. ● AccelNet on Microsoft Azure [Firestone et al. NSDI’18] 2

Network packet processing on FPGAs is hard! ● Require hardware design expertise ● Lack of software-like reusable libraries Compared to CPU: Click Modular Router 3

There are three great virtues of a programmer: Laziness , Impatience and Hubris. Larry Wall Creator of the Perl programming language 4

Why focus on high-level synthesis (HLS)? High-level code (C++) RTL (Verilog) FPGA bitstream ● Abstract underlying hardware details ○ Automatic scheduling & pipelining ○ Reuse a design on different hardware ● High level language features (objects & polymorphism) Focus on Xilinx Vivado HLS (C++) 5

How is HLS used for packet processing? ● Data-flow design ○ A fixed graph of independent elements ○ Operate on data when inputs are ready ○ Examples: [Blott ’13], [XAPP1209 ‘14], [Sidler ’15], ClickNP [Li ’16]. Our methodology focuses on data-flow designs. 6

Why is it hard to build an HLS networking lib? ● Only a subset of C++ is synthesizable. ○ Virtual functions cannot be used. ● Strict interfaces and patterns for performance. Our ntl library overcomes these problems. 7

ntl : Networking Template Library ● New methodology for developing reusable data-flow HLS elements. ● Template class library that applies our methodology for network packet processing applications. 8

How to build reusable data-flow element pattern? ● Basic elements ○ C++ classes for each data-flow element ○ State kept as member variables ○ step() method implements functionality ○ Inline methods embedded in the caller ○ All interfaces are hls::stream (members/parameters) ● Reuse with customization via functional programming ● Composed through aggregation: reusable sub-graph. 9

Networking Template Library ( ntl ) Class library of packet processing building blocks. Category Classes Header processing elements pop/push_header, push_suffix Data-structures array, hash_table Scheduler scheduler Basic elements map, scan, fold, dup, zip, link Specialized stream wrappers pack_stream, pfifo, stream<Tag> Control-plane gateway 10

Networking Template Library ( ntl ) Class library of packet processing building blocks. Category Classes Header processing elements pop/push_header, push_suffix Data-structures array, hash_table Scheduler scheduler Basic elements map, scan, fold , dup, zip, link Specialized stream wrappers pack_stream, pfifo , stream<Tag> Control-plane gateway 11

Example: scan and fold Common operators in functional and reactive programming. Modified to reset state for every packet. Input stream: 3 3 3 1 2 3 scan.step(input, plus()) 1 3 6 3 6 9 fold.step(input, plus()) 6 9 Can serve basis for more complex operators. 12

Fold & scan usage: parser example 256b ← packet → flit input extract counter <idx, flit> fields header dup zip output scan fold 13

Programmable threshold FIFO Dependency between FIFO check and write → decreased throughput hls::stream replacement 14

Evaluation ● How does ntl compare against legacy HLS, P4? ● Can we build a relatively complex application with ntl ? Targeting Mellanox Innova Flex SmartNIC ● Xilinx Kintex UltraScale XCKU060 FPGA ● Shell dictates 216.25 MHz clock rate ● Mellanox ConnectX-4 Lx ASIC NIC 15

Stateless UDP firewall example Use hash-table to classify packets. Thpt. Latency LUTs FFs BRAM LoC HLS/ ntl 72 Mpps 25 cycles 5296 7179 12 218 HLS legacy 72 Mpps 16 cycles 4087 4287 12 593 P4 (SDNet 2018.2) 108 Mpps 211 cycles 34531 49042 193 92 16

Stateless UDP firewall example Use hash-table to classify packets. All exceed line rate (59.5 Mpps) Thpt. Latency LUTs FFs BRAM LoC HLS/ ntl 72 Mpps 25 cycles 5296 7179 12 218 HLS legacy 72 Mpps 16 cycles 4087 4287 12 593 P4 (SDNet 2018.2) 108 Mpps 211 cycles 34531 49042 193 92 17

Stateless UDP firewall example x2.7 less lines of Use hash-table to classify packets. code compared to legacy Thpt. Latency LUTs FFs BRAM LoC HLS/ ntl 72 Mpps 25 cycles 5296 7179 12 218 HLS legacy 72 Mpps 16 cycles 4087 4287 12 593 P4 (SDNet 2018.2) 108 Mpps 211 cycles 34531 49042 193 92 18

Stateless UDP firewall example Use hash-table to classify packets. Thpt. Latency LUTs FFs BRAM LoC HLS/ ntl 72 Mpps 25 cycles 5296 7179 12 218 HLS legacy 72 Mpps 16 cycles 4087 4287 12 593 P4 (SDNet 2018.2) 108 Mpps 211 cycles 34531 49042 193 92 ntl requires more LoC, but improves latency & area 19

Key-value store cache ● Cache memcached values on SmartNIC. ● GET hits served directly from cache. ● Multi-tenant support. Uses the NICA framework [ATC’19] Both NICA and KVS cache use ntl . As seen on demo night: 20

Key-value store cache Processes 16-byte GET hits at 40.3 Mtps. For 75% hit rate: 9× compared to CPU-only. Uses: hash tables, header processing, scheduler, control plane, programmable FIFOs, ... 21

Related work: HLS methodology ● Xilinx application note [XAPP1209 ‘14] We adapt a similar data-flow design, but improve code reuse. ● Improving high-level synthesis with decoupled data structure optimization, [Zhao ’16]. We similarly wrap data-structures, but remain within C++. ● Module-per-Object: a human-driven methodology for C++-based high-level synthesis design, [Silva ’19]. Complementary methodology; we share some aspects but focus on data-flow packet processing and provide ntl . 22

Related work Packet processing DSLs / libraries: P4 [Wang ’17], [Silva ’18], [SDNet], ClickNP [Li ’16], Emu [Sultana ’17], Maxeler. We focus on general purpose C++ for its flexibility. Dataflow HLS designs: Image/video processing [Oezkan ‘17], [OpenCV], HPC designs [de Fine Licht ‘18]. Higher order functions in HLS : [Thomas ‘16], [Richmond ‘18]. We apply similar techniques to packet processing. 23

Conclusion We show a methodology for reusable packet processing in HLS, and create reusable building blocks for line-rate processing in the ntl library. Try out ntl : https://github.com/acsl-technion/ntl Thank you! Questions? 24

Design patterns for code reuse in HLS packet processing pipelines - PowerPoint PPT Presentation

Design patterns for code reuse in HLS packet processing pipelines Haggai Eran , Lior Zeno , Zsolt Istvn , and Mark Silberstein Technion Israel Institute of Technology Mellanox Technologies IMDEA Software

Streaming HLS We've seen how to host and play HLS videos Now we'll convert a .mp4 video

High Level Synthesis Eunike, Pierri, Matthew Seminar Overview Significance of HLS Breakdown of

Worm Detection ICMP Packet Analysis Ankur Agiwal 1 2 Packet Content Matching Packet

Human Landing System (HLS) 2020 1 1 D-20-13184_HLS_Releaseble HLS Mission 2

Hardware Algos Made Easy: Deploy your trading strategies on FPGAs with the nxAccess HLS Framework

Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd Wave of HLS the One Industry Will

Design Patterns Applications Programming What is design patterns? The design patterns are

Design Patterns in Eiffel Dr. Till Bay design patterns? [Design Patterns] are

1 Infrastructure Requirements Limit Reuse Planned Indirect Potable Reuse (Purple pipe may be a

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Design Patterns 1 What are Design Patterns? Design patterns describe common (and successful)

More Design Patterns Horstmann ch.10.1,10.4 Design patterns Structural design patterns

Introduction to Packet Tracer What is Packet Tracer? Packet Tracer is a protocol simulator

Chapter 7 Packet-Switching Networks Routing in Packet Networks Shortest Path Routing Chapter 7

Chapter 8, Object Design: Reuse and Patterns Using UML, Patterns, and Java Podcast Ch08-01

Design Patterns Massimo Felici Massimo Felici Design Patterns 2011 c 1 Design Patterns

Introduction to Software Testing Software Testing - Module 1 Part 1 The Software Engineering

Design & (Design level) Class Diagram Week 8 Announcement Reminder Announcement

A Clean Slate Approach to High School CS Where do we stand? Jan Cuny 7/15/2009 Plummeting CS

Recursive po2DFA: Hierarchical Automata for FO-definable Languages Simoni S. Shah Joint work

CSC 309 Lecture Notes Week 2 General Design Principles High-Level Design Patterns Examples of

Kick-Off: TDAQ Phase-II Upgrade - Overview Outline o High-level design o Effort and cost

Multipath TCP Architecture: Towards Consensus Towards Consensus draft-ford-mptcp-architecture-01

Systems for Resource Management Corso di Sistemi e Architetture per Big Data A.A. 2019/2020

Design patterns for code reuse in HLS packet processing pipelines - PowerPoint PPT Presentation

Design patterns for code reuse in HLS packet processing pipelines Haggai Eran , Lior Zeno , Zsolt Istvn , and Mark Silberstein Technion Israel Institute of Technology Mellanox Technologies IMDEA Software

Streaming HLS We've seen how to host and play HLS videos Now we'll convert a .mp4 video

High Level Synthesis Eunike, Pierri, Matthew Seminar Overview Significance of HLS Breakdown of

Worm Detection ICMP Packet Analysis Ankur Agiwal 1 2 Packet Content Matching Packet

Human Landing System (HLS) 2020 1 1 D-20-13184_HLS_Releaseble HLS Mission 2

Hardware Algos Made Easy: Deploy your trading strategies on FPGAs with the nxAccess HLS Framework

Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd Wave of HLS the One Industry Will

Design Patterns Applications Programming What is design patterns? The design patterns are

Design Patterns in Eiffel Dr. Till Bay design patterns? [Design Patterns] are

1 Infrastructure Requirements Limit Reuse Planned Indirect Potable Reuse (Purple pipe may be a

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Design Patterns 1 What are Design Patterns? Design patterns describe common (and successful)

More Design Patterns Horstmann ch.10.1,10.4 Design patterns Structural design patterns

Introduction to Packet Tracer What is Packet Tracer? Packet Tracer is a protocol simulator

Chapter 7 Packet-Switching Networks Routing in Packet Networks Shortest Path Routing Chapter 7

Chapter 8, Object Design: Reuse and Patterns Using UML, Patterns, and Java Podcast Ch08-01

Design Patterns Massimo Felici Massimo Felici Design Patterns 2011 c 1 Design Patterns

Introduction to Software Testing Software Testing - Module 1 Part 1 The Software Engineering

Design &amp; (Design level) Class Diagram Week 8 Announcement Reminder Announcement

A Clean Slate Approach to High School CS Where do we stand? Jan Cuny 7/15/2009 Plummeting CS

Recursive po2DFA: Hierarchical Automata for FO-definable Languages Simoni S. Shah Joint work

CSC 309 Lecture Notes Week 2 General Design Principles High-Level Design Patterns Examples of

Kick-Off: TDAQ Phase-II Upgrade - Overview Outline o High-level design o Effort and cost

Multipath TCP Architecture: Towards Consensus Towards Consensus draft-ford-mptcp-architecture-01

Systems for Resource Management Corso di Sistemi e Architetture per Big Data A.A. 2019/2020

Design & (Design level) Class Diagram Week 8 Announcement Reminder Announcement