Key Properties of Programmable Data Plane Targets Dominik Scholz , - - PowerPoint PPT Presentation

key properties of programmable data plane targets
SMART_READER_LITE
LIVE PREVIEW

Key Properties of Programmable Data Plane Targets Dominik Scholz , - - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Key Properties of Programmable Data Plane Targets Dominik Scholz , Henning Stubbe, Sebastian Gallenmller, Georg Carle Chair of Network


slide-1
SLIDE 1

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Key Properties of Programmable Data Plane Targets

Dominik Scholz, Henning Stubbe, Sebastian Gallenmüller, Georg Carle Chair of Network Architectures and Services Department of Informatics Technical University of Munich

slide-2
SLIDE 2

Motivation

Move to the Data Plane From SDN with OpenFlow to (fully) programmable data planes (e.g. P4, POF , eBPF)

Lots of new P4 applications that run in the data plane

  • inband network telemetry
  • in-network computation
  • protocol acceleration (e.g. congestion control)
  • middleboxes (DDoS mitigation)
  • . . .

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 2

slide-3
SLIDE 3

Motivation

Move to the Data Plane From SDN with OpenFlow to (fully) programmable data planes (e.g. P4, POF , eBPF)

Lots of new P4 applications that run in the data plane

  • inband network telemetry
  • in-network computation
  • protocol acceleration (e.g. congestion control)
  • middleboxes (DDoS mitigation)
  • . . .

Image from https://bit.ly/2LHVmDZ

P4 is of high interest to industry, e.g. avionics

  • rapid prototyping
  • program verification
  • . . .
  • e.g. used for 20+ years with same hardware

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 2

slide-4
SLIDE 4

Motivation

Move to the Data Plane From SDN with OpenFlow to (fully) programmable data planes (e.g. P4, POF , eBPF)

Lots of new P4 applications that run in the data plane

  • inband network telemetry
  • in-network computation
  • protocol acceleration (e.g. congestion control)
  • middleboxes (DDoS mitigation)
  • . . .

Lots of new target platforms

  • CPU
  • Network Processing Unit (NPU)
  • FPGA
  • ASIC

Lots of key performance indicators

  • throughput & packet rate
  • latency & jitter
  • resources
  • price
  • . . .

→ Need to understand properties of devices and P4 programs → Focus on certain aspects for modeling

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 2

slide-5
SLIDE 5

Outline

P4 Programmable Network Devices Methodology CPU Performance Model ASIC Resource Model Conclusion

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 3

slide-6
SLIDE 6

P4 Programmable Network Devices

What is P4?

Image from https://bit.ly/3mDpaE9

Programmable data planes

  • custom network device behavior
  • blocks: parser, match-action, deparser
  • (ideally) target independent

Centerpiece: Match-Action tables

  • matches key to action
  • key: packet or meta data
  • exact, ternary, LPM match

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 4

slide-7
SLIDE 7

P4 Programmable Network Devices

Comparison of Available Targets CPU NPU FPGA ASIC Throughput + ++ +++ ++++ Latency > 10 µs 5 µs to 10 µs < 2 µs < 2 µs Jitter −−−− −−− −− − Resources ++++ +++ ++ + Flexibility ++++ +++ ++ + Example t4p4s DPDK NFP-4000 SmartNIC NetFPGA SUME Intel Tofino

Table 1: Categorizations are estimates for available products based on own measurements and related work

In this work we focus on the extremes: CPU and ASIC

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 5

slide-8
SLIDE 8

Methodology

Performance Analysis of P4 Programs

Dang et al. [1] divide P4 program into components

  • parser
  • processing
  • packet modification
  • actions
  • . . .

Idea: evaluate components (e.g. match-action tables) in isolation [1]

[1] Dang et al. "Whippersnapper: A p4 language benchmark suite." Proceedings of the Symposium on SDN Research. 2017.

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 6

slide-9
SLIDE 9

Methodology

Model P4 Programs

Model P4 components individually

→ Combine component models to model complete system

Match-Action table properties

  • match type (exact, ternary, LPM)
  • entry size (key, action, action data)
  • number of entries
  • number of (independent) tables

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 7

slide-10
SLIDE 10

CPU Performance Model

t4p4s – a DPDK-Based Software P4 Target

t4p4s core P4 pipeline No P4 Baseline DPDK runtime NIC t4p4s

  • P4 compiler
  • generates hardware-independent C code
  • hardware-dependent library for e.g. DPDK

Device-under-Test hardware

  • Intel Xeon CPU E5-2640 v2 (2.0 GHz)
  • Intel X540-AT2 NIC (dual port, 10 Gbit/s)
  • turboboost and hyperthreading disabled (jitter)

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 8

slide-11
SLIDE 11

CPU Performance Model

Baseline – Maximum Packet Rate with 64 B Packets 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 5 10 15 20 Bound by 10 GbE line-rate Multi-core scaling Core Frequency [GHz] Packet Rate [Mpps]

No P4 No P4 (extrap.) Baseline Model P Baseline Baseline (extrap.)

  • 6 Mpps reduction for baseline P4 program
  • bottleneck: CPU

Derive model for packet rate P

  • using linear regression for baseline

Derive model for CPU cycle usage C

  • C = CPU frequency
  • P
  • Cbase = 146

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 9

slide-12
SLIDE 12

CPU Performance Model

Number of Table Entries – Exact Match 100 101 102 103 104 105 106 107 5 10 15 Table Entries [log] Packet Rate [Mpps] 1 core 2 cores

Observations

  • double cores results in double performance
  • 2 different “phases”
  • bottleneck: L3 cache

100 101 102 103 104 105 106 107 108 2 · 108 Table Entries [log] L3 Misses 5 10 15 Packet Rate [Mpps] Packet Rate L3 Misses

  • estim. L3 Cache Limit

Model resources Rexact based on L3 cache size

  • Rexact(n, k, a) = 2 · 64 B + (k · n)
  • Hash table

+ (8 B · n)

Entries

+ (a · n)

  • Actions

= 128 B + n · (k + a + 8 B)

  • Table entry size

n number of entries k key size (4x4 B) a action size (64 B) RL3 20 MB L3 cache Set Rexact = RL3, solve for n

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 10

slide-13
SLIDE 13

CPU Performance Model

Number of Table Entries – Exact Match Model

100 101 102 103 104 105 106 107 5 10 15 Table Entries [log] Packet Rate [Mpps] 1 core 2 cores

  • Pexact(n, 1)
  • Pexact(n, 2)

300 600 900 Cycles per Packet

  • Cexact(n, 1)
  • Cexact(n, 2)

Derive model for packet rate Pexact

  • linear regression for 1 core
  • scale for multiple cores

Derive model for CPU cycles Cexact

  • Ce,exact(n, c) = 1

c ·

  • p · ln(q · n) + r,
  • R(n) < RL3

s t·n+u + v,

  • therwise

with parameters {p, q, r, s, t, u, v}

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 11

slide-14
SLIDE 14

CPU Performance Model

Number of Table Entries – Ternary & LPM Match Model

Ternary Match

100 101 102 103 104 105 5 10 15 Table Entries [log] Packet Rate [Mpps] 300 600 900 Cycles per Packet

  • CPU cycles: exponential increase
  • ternary match difficult to implement in software
  • currently: loop over all elements
  • in hardware:

ternary content-addressable memory (TCAM)

LPM Match

100 101 102 103 104 105 5 10 15 Table Entries [log] Packet Rate [Mpps] 300 600 900 Cycles per Packet

  • CPU cycles: logarithmic increase (log scale!)
  • DIR-24-8 data-structure for IPv4
  • theoretic search complexity: O(1)
  • bottleneck: shared L3 cache size
  • part of data structure already requires 64 MB

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 12

slide-15
SLIDE 15

ASIC Resource Model

Intel Tofino ASIC

P4 programmable switch ASIC

  • 64 100 Gbit/s ports

→ guarantees switching 6,4 Tbit/s for any program

  • latency well below 1 µs
  • stable latency: no jitter or long-tail

Focus on resource consumption

  • SRAM & TCAM resources limited
  • need to fit program on chip
  • model to indicate if program will fit

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 13

slide-16
SLIDE 16

ASIC Resource Model

Table Resources

Resources R for individual table (e.g. exact match)

R(n, k, a) = n · (Rwidth(k) + a) n number of entries k key size a action data

Resources Rwidth for key width

Rwidth(k) = p · k + q with parameters p, q

SRAM usage for different exact match widths

0.2 0.4 0.6 0.8 1 ·105 5 10 15 20 Table Entries SRAM Usage [%] 9 b 25 b 41 b 73 b 105 b 153 b 201 b

Determine p, q: interpolate gradients

20 40 60 80 100 120 140 160 180 200 1 2 3 4 ·10−3 Key Width [bit] Gradient exact interpol. ternary/lpm interpol.

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 14

slide-17
SLIDE 17

Conclusion

Increase of P4 programmable data planes

  • more applications
  • more platforms
  • more metrics

→ need for models → focus on certain aspects → Model isolated P4 components → Model for P4 centerpiece: match-action tables

CPU – performance model

  • high-performance DPDK-based switch
  • linear scaling with CPU cores
  • typical DPDK latency histogram
  • platform-dependent influences

ASIC – resource model

  • line-rate guaranteed
  • no long-tail latency
  • number of table entries limit program complexity
  • simplified model

Future work: compare with other modeling approaches, e.g. network calculus

Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 15