Comparison of Queuing Data Structures for Traffic Analysers - - PowerPoint PPT Presentation

comparison of queuing data structures for traffic
SMART_READER_LITE
LIVE PREVIEW

Comparison of Queuing Data Structures for Traffic Analysers - - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Comparison of Queuing Data Structures for Traffic Analysers Maximilian Pudelko August 3, 2016 Chair of Network Architectures and Services


slide-1
SLIDE 1

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Comparison of Queuing Data Structures for Traffic Analysers

Maximilian Pudelko

August 3, 2016 Chair of Network Architectures and Services Department of Informatics Technical University of Munich

slide-2
SLIDE 2

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Contents

Motivation Existing Data Structures and Found Problems Analysis Implementation of QQ Evaluation of QQ Conclusion/Future work

Maximilian Pudelko – QQ 2

slide-3
SLIDE 3

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Motivation

Level Speed NIC 40 Gbit/s RAM (copy) 225 Gbit/s tcpdump up to 5.9 Gbit/s [1] SSD 3.6 Gbit/s (400 MB/s)

Table 1: Common throughput values

  • Speed gap between NICs and storages increases
  • Capturing becomes more complex (special drivers, filtering, ...)
  • Exploiting multi core architectures is a must
  • Efficient data structure required

Maximilian Pudelko – QQ 3

slide-4
SLIDE 4

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Tested Data Structures

  • Cameron: ReaderWriterQueue and ConcurrentQueue
  • Folly: ProducerConsumerQueue and MPMCQueue
  • DPDK: rte ring + rte mbuf

Maximilian Pudelko – QQ 4

slide-5
SLIDE 5

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Tested Data Structures

  • Cameron: ReaderWriterQueue and ConcurrentQueue
  • Folly: ProducerConsumerQueue and MPMCQueue
  • DPDK: rte ring + rte mbuf

Features:

  • Lock-free (atomic, CAS) in critical path
  • Optimized for low latency
  • Generic (RW, CQ, PCQ, MPMC), packet structure required
  • Specialized for packet processing (DPDK)

Maximilian Pudelko – QQ 4

slide-6
SLIDE 6

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Found Problems - Throughput

RWQ PCQ CQ MPMC rte ring 25 50 75 100 Data structure Packet rate [Mpps] Individual allocation Message buffer 15 30 45 60 Data rate [Gbit/s]

Figure 1: Throughput comparison with 64 byte packets, 4 inserters

Per packet allocation malloc(packet len):

  • Synchronized across all threads, slow
  • Memory reuse not possible
  • Leads to heap fragmentation (TLB pollution)
  • Specialized implementations exist, out of scope

Maximilian Pudelko – QQ 5

slide-7
SLIDE 7

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Found Problems - Throughput

RWQ PCQ CQ MPMC rte ring 25 50 75 100 Data structure Packet rate [Mpps] Individual allocation Message buffer 15 30 45 60 Data rate [Gbit/s]

Figure 1: Throughput comparison with 64 byte packets, 4 inserters

Message buffers:

  • Fixed size chunks allocated in bulk
  • Chained together to accommodate larger packets
  • No implemented by general purpose queues
  • Space overhead from partially used buffers

Maximilian Pudelko – QQ 5

slide-8
SLIDE 8

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Found Problems - DPDK Limitations

  • Memory pool capacity limit: 227 ≈ 134 M slots
  • Only 2.68 sec of traffic at 50 Mpps
  • Space overhead of > 50 B per packet
  • Not compatible with regular threads

Maximilian Pudelko – QQ 6

slide-9
SLIDE 9

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Found Problems - Missing Special Features

  • Random access to packets
  • Roles: Peek access for analysers
  • Time coherent storing, even distribution at low rates
  • Option to dump to persistent storage

Maximilian Pudelko – QQ 7

slide-10
SLIDE 10

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Analysis

A suitable data structure must ...

  • 1. ... handle 10 Gbit/s single and 40 Gbit/s combined.
  • 2. ... store large amount of packets.
  • 3. ... provide the special features.
  • 4. ... scale on multi core architectures.

Maximilian Pudelko – QQ 8

slide-11
SLIDE 11

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Approach/Implementation - Overview

NIC Inserter QQ Dumper Inserter Inserter NIC NIC HDD pcap Writer

signal(filter, <range>) dequeue() enqueue() peek()

Analyser

Figure 2: QQ overview

Maximilian Pudelko – QQ 9

slide-12
SLIDE 12

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Approach/Implementation - Outer Queue

  • 2-Layer model: Queue of Queues
  • One continuous memory block for packet data
  • Allocated at initialization via transparent hugepages
  • Segmented into equal sized chunks for independent inner queues
  • Threads request references to inner queues
  • Operations guarded by mutex

Maximilian Pudelko – QQ 10

slide-13
SLIDE 13

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Approach/Implementation - Inner Queue

Packet 0 0x0000 0x0060 Offset vector Granted memory block Packet 1 Packet Header Packet Header ... Mutex base head ... Timer

Figure 3: Inner queue structure

  • Separate mutex, ”lock once, store often”
  • In place storage of packets in granted block
  • Hold timer to prevent monopolization
  • API similar to std::vector
  • Direct access possible, management via wrapper preferred

Maximilian Pudelko – QQ 11

slide-14
SLIDE 14

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Approach/Implementation - Packet Header

  • Perpended to actual packet
  • POD struct, compatible with FFIs
  • Stores metadata: timestamp, vlan, packet length

Maximilian Pudelko – QQ 12

slide-15
SLIDE 15

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Evaluation - Throughput

1 2 3 4 5 25 50 75 100 125 150 175 200 225 Number of inputs Rate [Gbit/s] 64 B 128 B 256 B 512 B 1024 B 1514 B STREAM

Figure 4: Theoretical QQ throughput compared to STREAM

Maximilian Pudelko – QQ 13

slide-16
SLIDE 16

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Evaluation - Influence of Inner Queue Capacity

2 4 8 16 32 64 128 5 10 15 20 25 Inner queue capacity [MiB] Throughput [Gbit/s]

Figure 5: Single inner queue throughput with 64 byte packets

  • Less impact than expected
  • Caching benefits offset by large amount of data

Maximilian Pudelko – QQ 14

slide-17
SLIDE 17

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Evaluation - Packet Delay

delaymin = min sizeStorage rmax , timeout

  • · #Inputs

Storage size 2 MiB 4 MiB 32 MiB 10 Gbit/s & 2 inserters 3.335 ms 6.711 ms 53.69 ms 40 Gbit/s & 4 inserters 1.667 ms 3.335 ms 26.84 ms

Table 2: Minimum packet traversal time at various configurations

Maximilian Pudelko – QQ 15

slide-18
SLIDE 18

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Conclusion/Future work

  • >130 Gbit/s and >170 Mpps
  • 2.3 to 100 more compared to existing data structures
  • Delay still within low millisecond range
  • Future work: Replacement of handwritten filter functions with JIT

compiled filter expressions (BPF, tcpdump -ni)

Maximilian Pudelko – QQ 16

slide-19
SLIDE 19

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Bibliography I

[1] A. Brown. I3: Maximizing packet capture performance. Wireshark Developer and User Conference, 2014. Maximilian Pudelko – QQ 17

slide-20
SLIDE 20

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

function dumpTask(qq , path) local pcap_writer = pcapLib.create_pcap_writer (path) while mg.running () do

  • - Signaling

with analyzer

  • mitted

local storage = qq:dequeue () -- QQ API call for i=0, tonumber(storage:size ())-1 do -- loop over al local pkt = storage:getPacket(i) local udpPkt = pktLib.getUdpPacket (pkt) -- MoonGens p if udpPkt.ip4:getSrc () == ip_match then pcap_writer:store(pkt:getTimestamp (), pkt:getLength (), pkt.data) end end storage:release ()

  • - explicit

release at end of loop end end

Listing 1: Dump task

Maximilian Pudelko – QQ 18