OpenCL-Based Design Pattern for Line Rate Packet Processing - PowerPoint PPT Presentation

Mar 16, 2024 •104 likes •292 views

OpenCL-Based Design Pattern for Line Rate Packet Processing Jehandad Khan, Peter Athanas (Virginia Tech) John Marshall, Skip Booth (Cisco Systems) Programmable Packet Processor P4.org P4 programs specify how a switch processes packets. FPGAs

OpenCL-Based Design Pattern for Line Rate Packet Processing Jehandad Khan, Peter Athanas (Virginia Tech) John Marshall, Skip Booth (Cisco Systems)
Programmable Packet Processor
P4.org P4 programs specify how a switch processes packets.
FPGAs for Packet Processing • The ideal co-processor – Highly parallel – arbitrary data paths – No cache delays – Low power
We FPGAs
FPGAs for Packet Processing • The not-so-ideal co-processor – Long compile times – Complicated design process – Less abundant expertise – Cost
We FPGA Design
OpenCL for FPGA Design • OpenCL simplifies the design problem – Programmable by a larger community – Simulation capability – Timing guarantees – Pipelining – Memory replication – Downside: limited expressiveness
Objective of Investigation Is OpenCL a good intermediate format? • What is the achievable throughput ? • What are the tradeoffs ? • What are the design constructs we need ?
OpenCL Problems OpenCL assumes a host / device model: a.Host copies data to device b.Host launches work on device c.Device signals completion d.Host copies data back NOT SUITABLE FOR PACKET PROCESSING!
Solution: “Persistent Kernels” Launch-once-never-terminate kernels Infinite loop in the kernel waits for data and OpenCL processes it. kernel Input Ouput Output Channel Channel or OpenCL realized as FIFOs Pipe for input on the FPGA
Overall Architecture Ingress IPv4 LPM Send Frame Fwd Exact Parser Chip Mem Off I/O Packet Server Chip Channel Mem Off ∭ Based on simple_router.p4 I/O Deparser / Egress
Match + Action Stage Persistent Kernel listens on both Control Plane Host Launches kernels to channels update state Data Plane Update State storage for persistent kernel Kernel local type_t entries[SIZE] Update Req Updates Output Channel PHV In PHV Out Packet Header Vector (PHV) Infinite Loop passed stage to stage Match+Action Kernel
Match Engines in Prototype 1. One TCAM a. Longest Prefix Match 2. Two exact match engines a. Source MAC address b. Destination MAC address All using on-chip RAM Core first written in OpenCL, yet rewritten in Verilog (RTL)
Test Platform Cisco UCS C240 server Arria 10 DevKit Altera Arria 10 AX115S2 FPGA
Results Capable of running at 70 Mpps
Follow up Work • P4 -> HMC enabled FPGAs J. Khan, P. Athanas , “Creating Custom Network Packet Processing Pipelines on HMC - Enabled FPGAs”, ACM SIGCOMM 2017, The Third Workshop on Networking and Programming Languages (NetPL 2017)
Conclusion • Using some clever tricks we can create a high-performance packet pipeline in OpenCL • A high throughput design is possible – The design patterns can serve as guidelines for any data flow problem – Optimal use of on-chip resources is essential • Performance portability …

Recommend

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James Price, Tim Mattson and Benedict Gaster under the "attribution CC BY" creative commons license. Shipping OpenCL Kernels OpenCL

549 views • 26 slides

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014.

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014. Outline What is OpenCL? VecGeom in a few words What are the goals? Results , conclusions Gabor Biro (ELTE, Hungary) OpenCL support

992 views • 19 slides

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James Price, Tim Mattson and Benedict Gaster under the "attribution CC BY" creative commons license. Host programs can be verbose OpenCLs

441 views • 21 slides

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I worked for Apple developing OpenCL Im biased (But not in the way you might think) 2 What is OpenCL? Low-level language for

1.04k views • 61 slides

OpenCL on FPGAs Contains material from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

OpenCL on FPGAs Contains material from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James Price, Tim Mattson and Benedict Gaster under the "attribution CC BY" creative commons license. What are FPGAs? Reprogrammable hardware

427 views • 17 slides

Synchronization in OpenCL Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Synchronization in OpenCL Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James Price, Tim Mattson and Benedict Gaster under the "attribution CC BY" creative commons license. Consider N-dimensional domain of

851 views • 8 slides

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing with Non- with Non -Functional Requirements Functional Requirements with Non with Non - -

431 views • 27 slides

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any regularity in data X Pattern Recognition: Discovery of any regularity in data, through computer algorithms and takes actions (Such as

1.11k views • 62 slides

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A Pattern-Growth View Pattern-Growth View Frequent Pattern Mining by Jian Pei and Jiawei Han Constraints presentation by Rafal Rak CMPUT 695

186 views • 7 slides

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list, Pattern: take a list and produce a new list where each element of the output is calculated from the of all the elements of the first list that

218 views • 5 slides

CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL

CUDA (Compute Unified Device Architecture) and OpenCL (Open Compute Language): Programming GPUs CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL (Open Compute About me... Language):

964 views • 50 slides

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation of LDPC codes

673 views • 27 slides

PERFORMANCE CONSIDERATIONS FOR OPENCL ON NVIDIA GPUS Karthik Raghavan Ravi, 4/4/16 THE PROBLEM

April 4-7, 2016 | Silicon Valley PERFORMANCE CONSIDERATIONS FOR OPENCL ON NVIDIA GPUS Karthik Raghavan Ravi, 4/4/16 THE PROBLEM OpenCL is portable across vendors and implementations, but not always at peak performance 2 4/14/2016 OBJECTIVE

796 views • 61 slides

Accelerating Tandem MS Protein Database Searches Using OpenCL Programming devices the

Rick Weber, David D. Jenkins, Nicholas Lineback, Robert Hettich, Gregory D. Peterson Accelerating Tandem MS Protein Database Searches Using OpenCL Programming devices the intractable way Programming devices with OpenCL T andem MS/MS

721 views • 17 slides

Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL Dmitri

Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL Dmitri Yudanov (Advanced Micro Devices, USA) Leon Reznik (Rochester Institute of Technology, USA) WCCI 2012, IJCNN, June 12 Agenda Motivation OpenCL.

336 views • 16 slides

Experiences with OpenCL in PyFR: 2014Present F.D. Witherden 1 and P.E. Vincent 2 1 Department

Experiences with OpenCL in PyFR: 2014Present F.D. Witherden 1 and P.E. Vincent 2 1 Department of Ocean Engineering, Texas A&M University 2 Department of Aeronautics, Imperial College London Future Motivation High-Order PyFR OpenCL

461 views • 44 slides

Elastic RSS Co-Scheduling Packets and Cores Using Programmable NICs Alexander Rucker Tushar

Elastic RSS Co-Scheduling Packets and Cores Using Programmable NICs Alexander Rucker Tushar Swamy, Muhammad Shahbaz, and Kunle Olukotun Stanford University August 17, 2019 How do we meet tail latency constraints? 1 Existing systems have

624 views • 34 slides

ATLAST Assessing Teacher Learning About Science Teaching How Do You Know Whether They Are

ATLAST Assessing Teacher Learning About Science Teaching How Do You Know Whether They Are Learning What You Want Them to Learn? Sean Smith Horizon Research, Inc. Science Education and Workforce Development: Key Challenges for Innovation in

521 views • 22 slides

Urban Forests Session 1 30 SECONDS What is a tree? a woody perennial plant, typically having

Lesson Plan: Urban Forests Session 1 30 SECONDS What is a tree? a woody perennial plant, typically having a single stem or trunk growing to a considerable height and bearing lateral branches at some distance from the ground Many

855 views • 41 slides

Print version Updated: 4 March 2020 Lecture #25 Dissolved Carbon Dioxide: Open & Closed

Print version Updated: 4 March 2020 Lecture #25 Dissolved Carbon Dioxide: Open & Closed Systems VI (Stumm & Morgan, Chapt.4 ) Benjamin; Chapter 7 David Reckhow CEE 680 #25 1 Lake Erie David Reckhow CEE 680 #25 2 Lake

398 views • 13 slides

Gaussian elimination: recording the transformations 2 3 2 3 = A U 1 4 5 4 5 2 3 2 3

Gaussian elimination: recording the transformations 2 3 2 3 = A U 1 4 5 4 5 2 3 2 3 2 3 M 1 A = U 2 4 5 4 5 4 5 2 3 2 3 2 3 2 3 M 2 M 1 A = U 3 4 5 4 5 4 5 4 5 2 3 2 3 2 3 2 3 2 3 = M 3 M 2 M 1 A

785 views • 39 slides

Kalman Filter State Space Model Review Derivation x n +1 = F n x n + G n u n Examples y n

Kalman Filter State Space Model Review Derivation x n +1 = F n x n + G n u n Examples y n = H n x n + v n Time and Measurement Updates x 0 0 0 0 0 x 0 u k , = 0 Q n nk S n

664 views • 22 slides

Sunday Homework 3 : an Diniohlet Allocation Model Latent Generative : Generative model

Dirichet Allocation Latent Lecture ( Part 11 2) : Jordan Yuan Andrea Scribes : , , Midterm This Wednesday : Due Sunday Homework 3 : an Diniohlet Allocation Model Latent Generative : Generative model : Pu Diricwetlw

166 views • 13 slides

Agenda Working with files and directories lseek stat, fstat, lstat Working with

367 views • 5 slides