Performance Evaluation of Software Dataplanes Final talk for the - - PowerPoint PPT Presentation

performance evaluation of software dataplanes
SMART_READER_LITE
LIVE PREVIEW

Performance Evaluation of Software Dataplanes Final talk for the - - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Performance Evaluation of Software Dataplanes Final talk for the Masters Thesis by Maximilian Endra advised by Dominik Scholz, Henning


slide-1
SLIDE 1

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Performance Evaluation of Software Dataplanes

Final talk for the Master’s Thesis by

Maximilian Endraß

advised by Dominik Scholz, Henning Stubbe, Sebastian Gallenmüller Wednesday 27th November, 2019 Chair of Network Architectures and Services Department of Informatics Technical University of Munich

slide-2
SLIDE 2

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Table of contents

  • Background
  • T4P4S contributions
  • Measurements and models
  • Conclusion
  • Bibliography
  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

2

slide-3
SLIDE 3

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

T4P4S architecture

code.p4 C core compiler.py

DPDK

NetHAL Linked against

DPDK

Switch runtime gcc Core calls

T4P4S switch

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

3

slide-4
SLIDE 4

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

T4P4S pipeline

T4P4S switch

NIC NIC

DPDK runtime

Core handle_packet() Parser Verify C’sum Ingress/ Egress Compute C’sum Deparser

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

4

slide-5
SLIDE 5

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

T4P4S contributions

1

T

4

P

4

S

t h r e a d 2

T

4

P

4

S

t h r e a d s 3

T

4

P

4

S

t h r e a d s 4

T

4

P

4

S

t h r e a d s 5

T

4

P

4

S

t h r e a d s 6

T

4

P

4

S

t h r e a d s 7

T

4

P

4

S

t h r e a d s 8

T

4

P

4

S

t h r e a d s 2 4 6 8 10 12 14 16 Packets received [Mpps]

exact table - T4P4S contributions

Shared pointer No Shared pointer

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

5

slide-6
SLIDE 6

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

exact tables - Mean throughput

100 101 102 103 104 105 106 107 Table entries 2 4 6 8 10 12 14 16 Packets received [Mpps]

Mean throughput - exact table with 4 × 4 Byte fields

1 T4P4S thread 2 T4P4S threads 3 T4P4S threads 4 T4P4S threads 5 T4P4S threads 6 T4P4S threads 7 T4P4S threads 8 T4P4S threads

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

6

slide-7
SLIDE 7

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

exact tables - Caching effects

100 101 102 103 104 105 106 107 Table entries [log] 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 L3 miss count ×108

Cache miss count for exact matches (4 × 4 Bytes)

L3 miss count

  • approx. L3-filling table size

2 4 6 8 10 12 14 16 Packet rate [Mpps] Throughput (1 lcore)

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

7

slide-8
SLIDE 8

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

exact tables - model

  • Texact(n,c)=

   c ·

  • 1932.81057

1.44044·n+1608.13472 + 4.97731

  • ,

for n < ∼ L3 cache filling

c ·

  • 19575541.84198

55.85253·n+2.75052 + 2.67665

  • ,
  • therwise.

(1)

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

8

slide-9
SLIDE 9

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

exact tables - Throughput

100 101 102 103 104 105 106 107 Table entries 2 4 6 8 10 12 14 16 Packets received [Mpps]

Mean throughput - exact table with 4 × 4 Byte fields

1 T4P4S thread 2 T4P4S threads 3 T4P4S threads 4 T4P4S threads 5 T4P4S threads 6 T4P4S threads 7 T4P4S threads 8 T4P4S threads

  • Texact(n, 1)
  • Texact(n, 2)
  • Texact(n, 3)
  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

9

slide-10
SLIDE 10

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

exact tables - Latencies

Table entries (sorted by lcore) 10 20 30 40 50 Latency [µs] 1 lcore 2 lcores 3 lcores 4 lcores 5 lcores 6 lcores 7 lcores 8 lcores

Latency boxplot - exact table with 4 × 4 Byte entries

Baseline linear latency model ˜ L(c) Median latency

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

10

slide-11
SLIDE 11

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

exact tables - Key segmentation

100 101 102 103 104 105 106 107 Table entries [log] 2 4 6 8 10 12 14 16 Packets received [Mpps]

Mean throughput for exact table - 16 Byte key segmentation

4 × 4 Bytes 2 × 8 Bytes 1 × 16 Bytes

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

11

slide-12
SLIDE 12

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

exact tables - Load Store Forwarding

1 200 400 600 800 1000 2000 4000 6000 8000 10000 20000 40000 60000 100000 500000 1000000 2000000 4000000 6000000 8000000 10000000 15000000 Table entries 20 40 60 80 100 Loads Blocked by Store Forwarding [%]

Key segmentation - exact matching - % Loads blocked by failed Store Forwarding

4 × 4 Bytes 2 × 8 Bytes 1 × 16 Bytes

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

12

slide-13
SLIDE 13

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Load-Store Forwarding

Successful Load-Store Forwarding mov dword ptr [esi], eax ; Write 4 bytes mov edx, dword ptr [esi] ; Read 4 bytes Failed Load-Store Forwarding mov dword ptr [esi], eax ; Write lower 4 bytes mov dword ptr [esi+4], edx ; Write upper 4 bytes movq xmm0, qword ptr [esi] ; Read 8 bytes. Stall Excerpt from [2].

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

13

slide-14
SLIDE 14

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

ternary tables - Mean throughput

100 101 102 103 104 105 106 Table entries [log] 2 4 6 8 10 12 14 16 Packets received [Mpps]

Mean throughput for ternary table entries (4 × 4 Byte keys) - 64 Byte packets

1 T4P4S thread 2 T4P4S threads 3 T4P4S threads 4 T4P4S threads 5 T4P4S threads 6 T4P4S threads 7 T4P4S threads 8 T4P4S threads Throughput model Tt(n, 1) Throughput model Tt(n, 2) Throughput model Tt(n, 3) Line-rate

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

14

slide-15
SLIDE 15

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

ternary tables - Latencies

0%50% 90% 99% 99.9% 99.99% 99.999% 99.9999% Percentile [%]

1.00 10.00 100.00 1000.00 10000.00 2.00 4.00 6.00 8.00 20.00 40.00 60.00 80.00 200.00 400.00 600.00 800.00 2000.00 4000.00 6000.00 8000.00 20000.00

Latency [µs]

Latency HDR histogram for ternary matching on 4 × 4 Byte keys (1 lcore)

1 entry 10 entries 50 entries 100 entries 150 entries 200 entries 400 entries 600 entries 800 entries 1000 entries 2000 entries Baseline latency ˜ L(1)

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

15

slide-16
SLIDE 16

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

lpm tables - Mean throughput

100 101 102 103 104 105 Table entries [log] 2 4 6 8 10 12 14 16 Packets received [Mpps]

Mean throughput for lpm table entries (4 Byte key) - 24 bit prefix length

1 T4P4S thread 2 T4P4S threads 3 T4P4S threads 4 T4P4S threads 5 T4P4S threads 6 T4P4S threads 7 T4P4S threads 8 T4P4S threads

  • Tlpm(n, 1)
  • Tlpm(n, 2)
  • Tlpm(n, 3)
  • Tlpm(n, 4)

Line rate

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

16

slide-17
SLIDE 17

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Conclusion

  • Reproducible P4 measurements for the DPDK switch runtime and

P4 pipeline:

  • Parser
  • Tables (Ingress/Egress Match-Action Pipelines)
  • Deparser
  • Models for throughput and latency
  • Performance scales across multiple cores
  • Many platform specific (x86) behaviors that are not present in

hardware switches

  • Unique possibility to modify switch itself
  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

17

slide-18
SLIDE 18

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Conclusion

Questions?

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

18

slide-19
SLIDE 19

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

[1] DPDK Project. rte_hash_crc.h File Reference, 2019. https://doc.dpdk.org/api-19.02/rte__hash__crc_8h_source.html; last ac- cessed on 2019/11/18. [2] A. Fog. The microarchitecture of intel, amd and via cpus: An optimization guide for as- sembly programmers and compiler makers. Copenhagen University College of Engineering, page 134, 2012. [3] P4@ELTE.

T4P4S - Retargetable compiler for the P4 language, 2019.

https://github.com/P4ELTE/t4p4s; last accessed on 2019/11/24.

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

19

slide-20
SLIDE 20

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Deparser - Header addition

1 2 3 4 5 6 7 8 9 10 Number of 4 Byte headers added 5 10 15 Packets received [Mpps]

11.6 7.7 7.2 6.9 6.8 6.5 6.1 5.5 5.5 4.7 4.4

Mean throughput Header addition - Number of headers

  • Tadd,h(n)

1 2 3 4 5 6 7 8 9 10 Number of 4 Byte headers added 5 10 15 20 Latency [µs]

Median latencies Header addition - Number of headers

  • Ladd,h(n)
  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

20

slide-21
SLIDE 21

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Deparser - Header modification

1 2 3 4 5 6 7 8 9 10 Number of 4 Byte header fields modified 2 4 6 8 10 Packets received [Mpps]

Mean throughput Header modification - Number of fields

1 2 3 4 5 6 7 8 9 10 Number of 4 Byte header fields modified 5 10 15 Latency [µs]

Median latency Header modification - Number of fields

˜ L(1)

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

21

slide-22
SLIDE 22

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Deparser - Header removal

1 2 3 4 5 6 7 8 9 10 Number of 4 Byte headers removed 2 4 6 8 10 Packets received [Mpps]

Mean throughput Header removal - Number of headers

1 2 3 4 5 6 7 8 9 10 Number of 4 Byte headers removed 5 10 15 20 Latency [µs]

Median latency Header removal - Number of headers

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

22

slide-23
SLIDE 23

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Deparser - Header removal

void store_headers_for_emit(packet_descriptor_t* pd, /*...*/) { uint8_t* storage = pd->header_tmp_storage; pd->emit_headers_length = 0; for (int i = 0; i < pd->emit_hdrinst_count; ++i) { header_descriptor_t hdr = pd->headers[pd->header_reorder[i]]; if (hdr.pointer == NULL) continue; // skipping invalid (i.e. removed) headers memcpy(storage, hdr.pointer, hdr.length); storage += hdr.length; pd->emit_headers_length += hdr.length; } }

Excerpt from autogenerated dataplane.c [3].

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

23

slide-24
SLIDE 24

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Parser - Headers

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of 4 Byte headers parsed 2 4 6 8 10 12 14 16 Packets received [Mpps] 13.3 13.0 12.6 12.6 11.9 11.7 11.5 11.4 10.9 10.3 10.5 10.2 10.0 9.4 9.5

Parser - Mean throughput - Number of 4 Byte headers parsed

Linear throughput model Tp,headers(n) Mean throughput

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

24

slide-25
SLIDE 25

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Parser - Transition possibilities

Start transition select(hdr.benchmarkdata.bytes0) { 32w0: accept; 32w1: accept; 32w2: accept; 32w3: reject; 32w4: reject; default: reject; } Accept Reject State #1 0,1,2 3,4, *

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

25

slide-26
SLIDE 26

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Parser - Transition possibilities

1 10 25 50 100 150 200 250 500 1000 2000 Number of 4 Byte transitions 5 10 15 Packet rate [Mpps]

Mean throughput Number of transition possibilities

1 10 25 50 100 150 200 250 500 1000 2000 Number of 4 Byte transitions 5 10 15 Latency [µs]

Median latencies Number of transition possibilities

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

26

slide-27
SLIDE 27

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Parser - Transition states

Start State #1 State #2 ... Accept 0,1 0,1 0,1 0,1

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

27

slide-28
SLIDE 28

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Parser - Transition states

1 2 3 4 5 6 7 8 9 10 11 12 Number of states 5 10 15 Throughput [Mpps]

Mean throughput Number of transition states

1 2 3 4 5 6 7 8 9 10 11 12 Number of states 5 10 15 Latency [µs]

Median latency Number of transition states

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

28

slide-29
SLIDE 29

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Load-Store Forwarding

void table_benchmarktable_0_key(packet_descriptor_t* pd, uint8_t* key) {

֒ →

EXTRACT_INT32_BITS_PACKET(pd, header_instance_benchmarkdata, field_benchmark_data_t_bytes1, *(uint32_t*)key)

֒ →

key += sizeof(uint32_t); // ... EXTRACT_INT32_BITS_PACKET(pd, header_instance_benchmarkdata, field_benchmark_data_t_bytes4, *(uint32_t*)key)

֒ →

key += sizeof(uint32_t); } // dataplane@101

Excerpt from autogenerated T4P4S dataplane.c - exact table - 4 × 4 Byte headers [3]

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

29

slide-30
SLIDE 30

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Load-Store Forwarding

static inline uint32_t rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val) { unsigned i; uintptr_t pd = (uintptr_t) data; for (i = 0; i < data_len / 8; i++) { init_val = rte_hash_crc_8byte(*(const uint64_t *)pd, init_val);

֒ →

pd += 8; } // ... }

Excerpt from rte_hash_crc.h [1].

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

30

slide-31
SLIDE 31

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Load-Store Forwarding

void table_benchmarktable_0_key(packet_descriptor_t* pd, uint8_t* key) {

֒ →

EXTRACT_BYTEBUF_PACKET(pd, header_instance_benchmarkdata, field_benchmark_data_t_bytes1, key)

֒ →

key += 8; EXTRACT_BYTEBUF_PACKET(pd, header_instance_benchmarkdata, field_benchmark_data_t_bytes2, key)

֒ →

key += 8; } // dataplane@101

Excerpt from autogenerated T4P4S dataplane.c - exact table - 2 × 8 Byte headers [3]

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

31

slide-32
SLIDE 32

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

ternary table - model

  • Tt(n, c) = c ·
  • 2.25839

0.00271 · n + 0.28445 − 0.06607

  • (2)
  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

32

slide-33
SLIDE 33

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

lpm table - model

  • Tlpm(n, c) = 2.72095 · c + 0.75416

3.53772

·

  • 1135.05341

2.48695 · n + 640.39117 + 3.58236

  • (3)
  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

33

slide-34
SLIDE 34

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

lpm table - L3 cache misses

100 101 102 103 104 105 Table entries [log] 20 40 60 80 100 DRAM Bound (%)

Percentage of executions of rte_lpm_lookup that are DRAM-bound

DRAM bound (%) 2 4 6 8 10 12 14 16 Packets received [Mpps] Throughput (1 lcore)

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

34

slide-35
SLIDE 35

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Baseline - Median latency

1

T

4

P

4

S

t h r e a d 2

T

4

P

4

S

t h r e a d s 3

T

4

P

4

S

t h r e a d s 4

T

4

P

4

S

t h r e a d s 5

T

4

P

4

S

t h r e a d s 6

T

4

P

4

S

t h r e a d s 7

T

4

P

4

S

t h r e a d s 8

T

4

P

4

S

t h r e a d s 5 10 15 20 Latency [µs] 12.812.8 14.714.8 16.316.5 17.517.7 19.419.5 20.720.9 22.021.8 23.623.5

Baseline - Median latency

˜ L(c) = 1.4928 · c + 11.7305 No P4 Minimal P4

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

35

slide-36
SLIDE 36

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Baseline - minimal P4 core

#include <core.p4> #include <v1model.p4> parser ParserImpl(packet_in packet, out headers hdr, inout metadata meta, inout standard_metadata_t standard_metadata) { ֒ → @name(".start") state start { standard_metadata.egress_port = 9w2; transition accept; } } control egress(...) { apply { } } control ingress(...) { apply { } } control DeparserImpl(...) { apply { } } control verifyChecksum(...) { apply { } } control computeChecksum(...) { apply { } } V1Switch(ParserImpl(), verifyChecksum(), ingress(), egress(), computeChecksum(), DeparserImpl()) main;

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

36

slide-37
SLIDE 37

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Related work

  • Huynh Tu Dang et al.

Whippersnapper: A P4 Language Benchmark Suite., In: Proceedings of the Symposium on SDN Research. SOSR ’17.

  • Dominik Scholz et al.

Cryptographic Hashing in P4 Data Planes, In: 2nd P4 Workshop in Europe (EUROP4). Cambridge, UK, Sept. 2019.

  • Alexander Frank

Evaluation and Analysis of a Hardware Programmable High-Performance Switch, Master’s Thesis. Technical University Munich, 2019

  • Henning Stubbe

Benchmarking Programmable Network Hardware, Master’s Thesis. Technical University of Munich, 2018

  • M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)

37