Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Performance Evaluation of Software Dataplanes Final talk for the - - PowerPoint PPT Presentation
Performance Evaluation of Software Dataplanes Final talk for the - - PowerPoint PPT Presentation
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Performance Evaluation of Software Dataplanes Final talk for the Masters Thesis by Maximilian Endra advised by Dominik Scholz, Henning
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Table of contents
- Background
- T4P4S contributions
- Measurements and models
- Conclusion
- Bibliography
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
2
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
T4P4S architecture
code.p4 C core compiler.py
DPDK
NetHAL Linked against
DPDK
Switch runtime gcc Core calls
T4P4S switch
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
3
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
T4P4S pipeline
T4P4S switch
NIC NIC
DPDK runtime
Core handle_packet() Parser Verify C’sum Ingress/ Egress Compute C’sum Deparser
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
T4P4S contributions
1
T
4
P
4
S
t h r e a d 2
T
4
P
4
S
t h r e a d s 3
T
4
P
4
S
t h r e a d s 4
T
4
P
4
S
t h r e a d s 5
T
4
P
4
S
t h r e a d s 6
T
4
P
4
S
t h r e a d s 7
T
4
P
4
S
t h r e a d s 8
T
4
P
4
S
t h r e a d s 2 4 6 8 10 12 14 16 Packets received [Mpps]
exact table - T4P4S contributions
Shared pointer No Shared pointer
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
5
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
exact tables - Mean throughput
100 101 102 103 104 105 106 107 Table entries 2 4 6 8 10 12 14 16 Packets received [Mpps]
Mean throughput - exact table with 4 × 4 Byte fields
1 T4P4S thread 2 T4P4S threads 3 T4P4S threads 4 T4P4S threads 5 T4P4S threads 6 T4P4S threads 7 T4P4S threads 8 T4P4S threads
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
6
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
exact tables - Caching effects
100 101 102 103 104 105 106 107 Table entries [log] 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 L3 miss count ×108
Cache miss count for exact matches (4 × 4 Bytes)
L3 miss count
- approx. L3-filling table size
2 4 6 8 10 12 14 16 Packet rate [Mpps] Throughput (1 lcore)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
7
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
exact tables - model
- Texact(n,c)=
c ·
- 1932.81057
1.44044·n+1608.13472 + 4.97731
- ,
for n < ∼ L3 cache filling
c ·
- 19575541.84198
55.85253·n+2.75052 + 2.67665
- ,
- therwise.
(1)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
8
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
exact tables - Throughput
100 101 102 103 104 105 106 107 Table entries 2 4 6 8 10 12 14 16 Packets received [Mpps]
Mean throughput - exact table with 4 × 4 Byte fields
1 T4P4S thread 2 T4P4S threads 3 T4P4S threads 4 T4P4S threads 5 T4P4S threads 6 T4P4S threads 7 T4P4S threads 8 T4P4S threads
- Texact(n, 1)
- Texact(n, 2)
- Texact(n, 3)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
9
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
exact tables - Latencies
Table entries (sorted by lcore) 10 20 30 40 50 Latency [µs] 1 lcore 2 lcores 3 lcores 4 lcores 5 lcores 6 lcores 7 lcores 8 lcores
Latency boxplot - exact table with 4 × 4 Byte entries
Baseline linear latency model ˜ L(c) Median latency
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
10
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
exact tables - Key segmentation
100 101 102 103 104 105 106 107 Table entries [log] 2 4 6 8 10 12 14 16 Packets received [Mpps]
Mean throughput for exact table - 16 Byte key segmentation
4 × 4 Bytes 2 × 8 Bytes 1 × 16 Bytes
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
11
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
exact tables - Load Store Forwarding
1 200 400 600 800 1000 2000 4000 6000 8000 10000 20000 40000 60000 100000 500000 1000000 2000000 4000000 6000000 8000000 10000000 15000000 Table entries 20 40 60 80 100 Loads Blocked by Store Forwarding [%]
Key segmentation - exact matching - % Loads blocked by failed Store Forwarding
4 × 4 Bytes 2 × 8 Bytes 1 × 16 Bytes
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
12
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Load-Store Forwarding
Successful Load-Store Forwarding mov dword ptr [esi], eax ; Write 4 bytes mov edx, dword ptr [esi] ; Read 4 bytes Failed Load-Store Forwarding mov dword ptr [esi], eax ; Write lower 4 bytes mov dword ptr [esi+4], edx ; Write upper 4 bytes movq xmm0, qword ptr [esi] ; Read 8 bytes. Stall Excerpt from [2].
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
13
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
ternary tables - Mean throughput
100 101 102 103 104 105 106 Table entries [log] 2 4 6 8 10 12 14 16 Packets received [Mpps]
Mean throughput for ternary table entries (4 × 4 Byte keys) - 64 Byte packets
1 T4P4S thread 2 T4P4S threads 3 T4P4S threads 4 T4P4S threads 5 T4P4S threads 6 T4P4S threads 7 T4P4S threads 8 T4P4S threads Throughput model Tt(n, 1) Throughput model Tt(n, 2) Throughput model Tt(n, 3) Line-rate
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
14
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
ternary tables - Latencies
0%50% 90% 99% 99.9% 99.99% 99.999% 99.9999% Percentile [%]
1.00 10.00 100.00 1000.00 10000.00 2.00 4.00 6.00 8.00 20.00 40.00 60.00 80.00 200.00 400.00 600.00 800.00 2000.00 4000.00 6000.00 8000.00 20000.00
Latency [µs]
Latency HDR histogram for ternary matching on 4 × 4 Byte keys (1 lcore)
1 entry 10 entries 50 entries 100 entries 150 entries 200 entries 400 entries 600 entries 800 entries 1000 entries 2000 entries Baseline latency ˜ L(1)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
15
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
lpm tables - Mean throughput
100 101 102 103 104 105 Table entries [log] 2 4 6 8 10 12 14 16 Packets received [Mpps]
Mean throughput for lpm table entries (4 Byte key) - 24 bit prefix length
1 T4P4S thread 2 T4P4S threads 3 T4P4S threads 4 T4P4S threads 5 T4P4S threads 6 T4P4S threads 7 T4P4S threads 8 T4P4S threads
- Tlpm(n, 1)
- Tlpm(n, 2)
- Tlpm(n, 3)
- Tlpm(n, 4)
Line rate
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
16
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Conclusion
- Reproducible P4 measurements for the DPDK switch runtime and
P4 pipeline:
- Parser
- Tables (Ingress/Egress Match-Action Pipelines)
- Deparser
- Models for throughput and latency
- Performance scales across multiple cores
- Many platform specific (x86) behaviors that are not present in
hardware switches
- Unique possibility to modify switch itself
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
17
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Conclusion
Questions?
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
18
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
[1] DPDK Project. rte_hash_crc.h File Reference, 2019. https://doc.dpdk.org/api-19.02/rte__hash__crc_8h_source.html; last ac- cessed on 2019/11/18. [2] A. Fog. The microarchitecture of intel, amd and via cpus: An optimization guide for as- sembly programmers and compiler makers. Copenhagen University College of Engineering, page 134, 2012. [3] P4@ELTE.
T4P4S - Retargetable compiler for the P4 language, 2019.
https://github.com/P4ELTE/t4p4s; last accessed on 2019/11/24.
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
19
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Deparser - Header addition
1 2 3 4 5 6 7 8 9 10 Number of 4 Byte headers added 5 10 15 Packets received [Mpps]
11.6 7.7 7.2 6.9 6.8 6.5 6.1 5.5 5.5 4.7 4.4
Mean throughput Header addition - Number of headers
- Tadd,h(n)
1 2 3 4 5 6 7 8 9 10 Number of 4 Byte headers added 5 10 15 20 Latency [µs]
Median latencies Header addition - Number of headers
- Ladd,h(n)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
20
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Deparser - Header modification
1 2 3 4 5 6 7 8 9 10 Number of 4 Byte header fields modified 2 4 6 8 10 Packets received [Mpps]
Mean throughput Header modification - Number of fields
1 2 3 4 5 6 7 8 9 10 Number of 4 Byte header fields modified 5 10 15 Latency [µs]
Median latency Header modification - Number of fields
˜ L(1)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
21
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Deparser - Header removal
1 2 3 4 5 6 7 8 9 10 Number of 4 Byte headers removed 2 4 6 8 10 Packets received [Mpps]
Mean throughput Header removal - Number of headers
1 2 3 4 5 6 7 8 9 10 Number of 4 Byte headers removed 5 10 15 20 Latency [µs]
Median latency Header removal - Number of headers
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
22
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Deparser - Header removal
void store_headers_for_emit(packet_descriptor_t* pd, /*...*/) { uint8_t* storage = pd->header_tmp_storage; pd->emit_headers_length = 0; for (int i = 0; i < pd->emit_hdrinst_count; ++i) { header_descriptor_t hdr = pd->headers[pd->header_reorder[i]]; if (hdr.pointer == NULL) continue; // skipping invalid (i.e. removed) headers memcpy(storage, hdr.pointer, hdr.length); storage += hdr.length; pd->emit_headers_length += hdr.length; } }
Excerpt from autogenerated dataplane.c [3].
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
23
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Parser - Headers
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of 4 Byte headers parsed 2 4 6 8 10 12 14 16 Packets received [Mpps] 13.3 13.0 12.6 12.6 11.9 11.7 11.5 11.4 10.9 10.3 10.5 10.2 10.0 9.4 9.5
Parser - Mean throughput - Number of 4 Byte headers parsed
Linear throughput model Tp,headers(n) Mean throughput
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
24
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Parser - Transition possibilities
Start transition select(hdr.benchmarkdata.bytes0) { 32w0: accept; 32w1: accept; 32w2: accept; 32w3: reject; 32w4: reject; default: reject; } Accept Reject State #1 0,1,2 3,4, *
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
25
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Parser - Transition possibilities
1 10 25 50 100 150 200 250 500 1000 2000 Number of 4 Byte transitions 5 10 15 Packet rate [Mpps]
Mean throughput Number of transition possibilities
1 10 25 50 100 150 200 250 500 1000 2000 Number of 4 Byte transitions 5 10 15 Latency [µs]
Median latencies Number of transition possibilities
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
26
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Parser - Transition states
Start State #1 State #2 ... Accept 0,1 0,1 0,1 0,1
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
27
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Parser - Transition states
1 2 3 4 5 6 7 8 9 10 11 12 Number of states 5 10 15 Throughput [Mpps]
Mean throughput Number of transition states
1 2 3 4 5 6 7 8 9 10 11 12 Number of states 5 10 15 Latency [µs]
Median latency Number of transition states
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
28
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Load-Store Forwarding
void table_benchmarktable_0_key(packet_descriptor_t* pd, uint8_t* key) {
֒ →
EXTRACT_INT32_BITS_PACKET(pd, header_instance_benchmarkdata, field_benchmark_data_t_bytes1, *(uint32_t*)key)
֒ →
key += sizeof(uint32_t); // ... EXTRACT_INT32_BITS_PACKET(pd, header_instance_benchmarkdata, field_benchmark_data_t_bytes4, *(uint32_t*)key)
֒ →
key += sizeof(uint32_t); } // dataplane@101
Excerpt from autogenerated T4P4S dataplane.c - exact table - 4 × 4 Byte headers [3]
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
29
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Load-Store Forwarding
static inline uint32_t rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val) { unsigned i; uintptr_t pd = (uintptr_t) data; for (i = 0; i < data_len / 8; i++) { init_val = rte_hash_crc_8byte(*(const uint64_t *)pd, init_val);
֒ →
pd += 8; } // ... }
Excerpt from rte_hash_crc.h [1].
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
30
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Load-Store Forwarding
void table_benchmarktable_0_key(packet_descriptor_t* pd, uint8_t* key) {
֒ →
EXTRACT_BYTEBUF_PACKET(pd, header_instance_benchmarkdata, field_benchmark_data_t_bytes1, key)
֒ →
key += 8; EXTRACT_BYTEBUF_PACKET(pd, header_instance_benchmarkdata, field_benchmark_data_t_bytes2, key)
֒ →
key += 8; } // dataplane@101
Excerpt from autogenerated T4P4S dataplane.c - exact table - 2 × 8 Byte headers [3]
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
31
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
ternary table - model
- Tt(n, c) = c ·
- 2.25839
0.00271 · n + 0.28445 − 0.06607
- (2)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
32
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
lpm table - model
- Tlpm(n, c) = 2.72095 · c + 0.75416
3.53772
·
- 1135.05341
2.48695 · n + 640.39117 + 3.58236
- (3)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
33
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
lpm table - L3 cache misses
100 101 102 103 104 105 Table entries [log] 20 40 60 80 100 DRAM Bound (%)
Percentage of executions of rte_lpm_lookup that are DRAM-bound
DRAM bound (%) 2 4 6 8 10 12 14 16 Packets received [Mpps] Throughput (1 lcore)
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
34
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Baseline - Median latency
1
T
4
P
4
S
t h r e a d 2
T
4
P
4
S
t h r e a d s 3
T
4
P
4
S
t h r e a d s 4
T
4
P
4
S
t h r e a d s 5
T
4
P
4
S
t h r e a d s 6
T
4
P
4
S
t h r e a d s 7
T
4
P
4
S
t h r e a d s 8
T
4
P
4
S
t h r e a d s 5 10 15 20 Latency [µs] 12.812.8 14.714.8 16.316.5 17.517.7 19.419.5 20.720.9 22.021.8 23.623.5
Baseline - Median latency
˜ L(c) = 1.4928 · c + 11.7305 No P4 Minimal P4
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
35
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Baseline - minimal P4 core
#include <core.p4> #include <v1model.p4> parser ParserImpl(packet_in packet, out headers hdr, inout metadata meta, inout standard_metadata_t standard_metadata) { ֒ → @name(".start") state start { standard_metadata.egress_port = 9w2; transition accept; } } control egress(...) { apply { } } control ingress(...) { apply { } } control DeparserImpl(...) { apply { } } control verifyChecksum(...) { apply { } } control computeChecksum(...) { apply { } } V1Switch(ParserImpl(), verifyChecksum(), ingress(), egress(), computeChecksum(), DeparserImpl()) main;
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)
36
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Related work
- Huynh Tu Dang et al.
Whippersnapper: A P4 Language Benchmark Suite., In: Proceedings of the Symposium on SDN Research. SOSR ’17.
- Dominik Scholz et al.
Cryptographic Hashing in P4 Data Planes, In: 2nd P4 Workshop in Europe (EUROP4). Cambridge, UK, Sept. 2019.
- Alexander Frank
Evaluation and Analysis of a Hardware Programmable High-Performance Switch, Master’s Thesis. Technical University Munich, 2019
- Henning Stubbe
Benchmarking Programmable Network Hardware, Master’s Thesis. Technical University of Munich, 2018
- M. Endraß — Performance Evaluation of Software Dataplanes (T4P4S)