Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Comparison of Queuing Data Structures for Traffic Analysers - - PowerPoint PPT Presentation
Comparison of Queuing Data Structures for Traffic Analysers - - PowerPoint PPT Presentation
Chair of Network Architectures and Services Department of Informatics Technical University of Munich Comparison of Queuing Data Structures for Traffic Analysers Maximilian Pudelko August 3, 2016 Chair of Network Architectures and Services
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Contents
Motivation Existing Data Structures and Found Problems Analysis Implementation of QQ Evaluation of QQ Conclusion/Future work
Maximilian Pudelko – QQ 2
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Motivation
Level Speed NIC 40 Gbit/s RAM (copy) 225 Gbit/s tcpdump up to 5.9 Gbit/s [1] SSD 3.6 Gbit/s (400 MB/s)
Table 1: Common throughput values
- Speed gap between NICs and storages increases
- Capturing becomes more complex (special drivers, filtering, ...)
- Exploiting multi core architectures is a must
- Efficient data structure required
Maximilian Pudelko – QQ 3
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Tested Data Structures
- Cameron: ReaderWriterQueue and ConcurrentQueue
- Folly: ProducerConsumerQueue and MPMCQueue
- DPDK: rte ring + rte mbuf
Maximilian Pudelko – QQ 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Tested Data Structures
- Cameron: ReaderWriterQueue and ConcurrentQueue
- Folly: ProducerConsumerQueue and MPMCQueue
- DPDK: rte ring + rte mbuf
Features:
- Lock-free (atomic, CAS) in critical path
- Optimized for low latency
- Generic (RW, CQ, PCQ, MPMC), packet structure required
- Specialized for packet processing (DPDK)
Maximilian Pudelko – QQ 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Found Problems - Throughput
RWQ PCQ CQ MPMC rte ring 25 50 75 100 Data structure Packet rate [Mpps] Individual allocation Message buffer 15 30 45 60 Data rate [Gbit/s]
Figure 1: Throughput comparison with 64 byte packets, 4 inserters
Per packet allocation malloc(packet len):
- Synchronized across all threads, slow
- Memory reuse not possible
- Leads to heap fragmentation (TLB pollution)
- Specialized implementations exist, out of scope
Maximilian Pudelko – QQ 5
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Found Problems - Throughput
RWQ PCQ CQ MPMC rte ring 25 50 75 100 Data structure Packet rate [Mpps] Individual allocation Message buffer 15 30 45 60 Data rate [Gbit/s]
Figure 1: Throughput comparison with 64 byte packets, 4 inserters
Message buffers:
- Fixed size chunks allocated in bulk
- Chained together to accommodate larger packets
- No implemented by general purpose queues
- Space overhead from partially used buffers
Maximilian Pudelko – QQ 5
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Found Problems - DPDK Limitations
- Memory pool capacity limit: 227 ≈ 134 M slots
- Only 2.68 sec of traffic at 50 Mpps
- Space overhead of > 50 B per packet
- Not compatible with regular threads
Maximilian Pudelko – QQ 6
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Found Problems - Missing Special Features
- Random access to packets
- Roles: Peek access for analysers
- Time coherent storing, even distribution at low rates
- Option to dump to persistent storage
Maximilian Pudelko – QQ 7
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Analysis
A suitable data structure must ...
- 1. ... handle 10 Gbit/s single and 40 Gbit/s combined.
- 2. ... store large amount of packets.
- 3. ... provide the special features.
- 4. ... scale on multi core architectures.
Maximilian Pudelko – QQ 8
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Approach/Implementation - Overview
NIC Inserter QQ Dumper Inserter Inserter NIC NIC HDD pcap Writer
signal(filter, <range>) dequeue() enqueue() peek()
Analyser
Figure 2: QQ overview
Maximilian Pudelko – QQ 9
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Approach/Implementation - Outer Queue
- 2-Layer model: Queue of Queues
- One continuous memory block for packet data
- Allocated at initialization via transparent hugepages
- Segmented into equal sized chunks for independent inner queues
- Threads request references to inner queues
- Operations guarded by mutex
Maximilian Pudelko – QQ 10
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Approach/Implementation - Inner Queue
Packet 0 0x0000 0x0060 Offset vector Granted memory block Packet 1 Packet Header Packet Header ... Mutex base head ... Timer
Figure 3: Inner queue structure
- Separate mutex, ”lock once, store often”
- In place storage of packets in granted block
- Hold timer to prevent monopolization
- API similar to std::vector
- Direct access possible, management via wrapper preferred
Maximilian Pudelko – QQ 11
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Approach/Implementation - Packet Header
- Perpended to actual packet
- POD struct, compatible with FFIs
- Stores metadata: timestamp, vlan, packet length
Maximilian Pudelko – QQ 12
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Evaluation - Throughput
1 2 3 4 5 25 50 75 100 125 150 175 200 225 Number of inputs Rate [Gbit/s] 64 B 128 B 256 B 512 B 1024 B 1514 B STREAM
Figure 4: Theoretical QQ throughput compared to STREAM
Maximilian Pudelko – QQ 13
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Evaluation - Influence of Inner Queue Capacity
2 4 8 16 32 64 128 5 10 15 20 25 Inner queue capacity [MiB] Throughput [Gbit/s]
Figure 5: Single inner queue throughput with 64 byte packets
- Less impact than expected
- Caching benefits offset by large amount of data
Maximilian Pudelko – QQ 14
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Evaluation - Packet Delay
delaymin = min sizeStorage rmax , timeout
- · #Inputs
Storage size 2 MiB 4 MiB 32 MiB 10 Gbit/s & 2 inserters 3.335 ms 6.711 ms 53.69 ms 40 Gbit/s & 4 inserters 1.667 ms 3.335 ms 26.84 ms
Table 2: Minimum packet traversal time at various configurations
Maximilian Pudelko – QQ 15
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Conclusion/Future work
- >130 Gbit/s and >170 Mpps
- 2.3 to 100 more compared to existing data structures
- Delay still within low millisecond range
- Future work: Replacement of handwritten filter functions with JIT
compiled filter expressions (BPF, tcpdump -ni)
Maximilian Pudelko – QQ 16
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Bibliography I
[1] A. Brown. I3: Maximizing packet capture performance. Wireshark Developer and User Conference, 2014. Maximilian Pudelko – QQ 17
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
function dumpTask(qq , path) local pcap_writer = pcapLib.create_pcap_writer (path) while mg.running () do
- - Signaling
with analyzer
- mitted
local storage = qq:dequeue () -- QQ API call for i=0, tonumber(storage:size ())-1 do -- loop over al local pkt = storage:getPacket(i) local udpPkt = pktLib.getUdpPacket (pkt) -- MoonGens p if udpPkt.ip4:getSrc () == ip_match then pcap_writer:store(pkt:getTimestamp (), pkt:getLength (), pkt.data) end end storage:release ()
- - explicit