OpenCL-Based Design Pattern for Line Rate Packet Processing - - PowerPoint PPT Presentation

opencl based design pattern for
SMART_READER_LITE
LIVE PREVIEW

OpenCL-Based Design Pattern for Line Rate Packet Processing - - PowerPoint PPT Presentation

OpenCL-Based Design Pattern for Line Rate Packet Processing Jehandad Khan, Peter Athanas (Virginia Tech) John Marshall, Skip Booth (Cisco Systems) Programmable Packet Processor P4.org P4 programs specify how a switch processes packets. FPGAs


slide-1
SLIDE 1

OpenCL-Based Design Pattern for Line Rate Packet Processing

Jehandad Khan, Peter Athanas (Virginia Tech) John Marshall, Skip Booth (Cisco Systems)

slide-2
SLIDE 2

Programmable Packet Processor

slide-3
SLIDE 3

P4.org

P4 programs specify how a switch processes packets.

slide-4
SLIDE 4

FPGAs for Packet Processing

  • The ideal co-processor

–Highly parallel –arbitrary data paths –No cache delays –Low power

slide-5
SLIDE 5

We FPGAs

slide-6
SLIDE 6

FPGAs for Packet Processing

  • The not-so-ideal co-processor

–Long compile times –Complicated design process –Less abundant expertise –Cost

slide-7
SLIDE 7

We FPGA Design

slide-8
SLIDE 8

OpenCL for FPGA Design

  • OpenCL simplifies the design problem

–Programmable by a larger community –Simulation capability –Timing guarantees –Pipelining –Memory replication –Downside: limited expressiveness

slide-9
SLIDE 9

Objective of Investigation

Is OpenCL a good intermediate format?

  • What is the achievable throughput ?
  • What are the tradeoffs ?
  • What are the design constructs we need ?
slide-10
SLIDE 10

OpenCL Problems

OpenCL assumes a host / device model:

a.Host copies data to device b.Host launches work on device c.Device signals completion d.Host copies data back

NOT SUITABLE FOR PACKET PROCESSING!

slide-11
SLIDE 11

Solution: “Persistent Kernels”

Launch-once-never-terminate kernels

Ouput Input

Infinite loop in the kernel waits for data and processes it. OpenCL kernel Channel or OpenCL Pipe for input Output Channel realized as FIFOs

  • n the FPGA
slide-12
SLIDE 12

Overall Architecture

Parser IPv4 LPM Fwd Exact Send Frame Ingress Packet Server

Deparser / Egress Off Chip Mem Off Chip Mem I/O Channel I/O

Based on simple_router.p4

slide-13
SLIDE 13

Match + Action Stage

Control Plane

Update Req

PHV Out Match+Action Kernel Infinite Loop

local type_t entries[SIZE]

PHV In Update Kernel Updates Data Plane

State storage for persistent kernel Output Channel Persistent Kernel listens on both channels Packet Header Vector (PHV) passed stage to stage Host Launches kernels to update state

slide-14
SLIDE 14

Match Engines in Prototype

  • 1. One TCAM
  • a. Longest Prefix Match
  • 2. Two exact match engines
  • a. Source MAC address
  • b. Destination MAC address

All using on-chip RAM Core first written in OpenCL, yet rewritten in Verilog (RTL)

slide-15
SLIDE 15

Test Platform

Altera Arria 10 AX115S2 FPGA Cisco UCS C240 server Arria 10 DevKit

slide-16
SLIDE 16

Results

Capable of running at 70 Mpps

slide-17
SLIDE 17

Follow up Work

  • P4 -> HMC enabled FPGAs
  • J. Khan, P. Athanas, “Creating Custom Network Packet Processing Pipelines on HMC-Enabled FPGAs”,

ACM SIGCOMM 2017, The Third Workshop on Networking and Programming Languages (NetPL 2017)

slide-18
SLIDE 18

Conclusion

  • Using some clever tricks we can create

a high-performance packet pipeline in OpenCL

  • A high throughput design is possible

–The design patterns can serve as guidelines for any data flow problem –Optimal use of on-chip resources is essential

  • Performance portability …