Revisiting Benchmarking Methodology for Interconnect Devices Daniel - - PowerPoint PPT Presentation

revisiting benchmarking methodology for interconnect
SMART_READER_LITE
LIVE PREVIEW

Revisiting Benchmarking Methodology for Interconnect Devices Daniel - - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Revisiting Benchmarking Methodology for Interconnect Devices Daniel Raumer, Sebastian Gallemller, Florian Wohlfart, Paul Emmerich, Patrick


slide-1
SLIDE 1

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Revisiting Benchmarking Methodology for Interconnect Devices

Daniel Raumer, Sebastian Gallemüller, Florian Wohlfart, Paul Emmerich, Patrick Werneck, and Georg Carle

July 16, 2016

slide-2
SLIDE 2

Contents

Case study: benchmarking software routers Flaws of benchmarks Latency metrics Latency under load Traffic pattern Omitted tests Reproducibility Conclusion

Daniel Raumer – Revisiting Benchmarking Methodology 2

slide-3
SLIDE 3

Why to revisit benchmarking state of the art?

  • Numerous standards, recommendations, best practices
  • Well-known benchmarking definition RFC 2544
  • Various extensions
  • Divergence of benchmarks
  • New class of devices
  • High speed network IO frameworks
  • Virtual switching
  • Many core CPU architectures:

CPU NIC

Daniel Raumer – Revisiting Benchmarking Methodology 3

slide-4
SLIDE 4

Common metrics

  • Throughput: highest rate that the devices under test (DuT) can serve

without loss.

  • Back-to-Back frame burst size: longest duration (in frames) without loss.
  • Frame loss rate: percentage of dropped frames under a given load.
  • Latency: average duration a packet stays within the DuT.
  • . . . extended metrics, e.g., FIB-dependent performance
  • . . . additional SHOULDs, rarely measured

Daniel Raumer – Revisiting Benchmarking Methodology 4

slide-5
SLIDE 5

Case study: RFC 2544 benchmarks

DuT RFC 2544 Test Suite

◭ ◮ ◭ ◮

Three different DuTs

  • Linux router
  • FreeBSD router
  • MikroTik router

Daniel Raumer – Revisiting Benchmarking Methodology 5

slide-6
SLIDE 6

Flaws of benchmarks: selected examples

Daniel Raumer – Revisiting Benchmarking Methodology 6

slide-7
SLIDE 7

Meaningful latency measurements: case study

5 10 15 20 25 30 0.2 0.4 0.6 Latency [µs] Probability [%]

  • FreeBSD, 64-byte packets
  • Average does not reflect long tail distribution

Daniel Raumer – Revisiting Benchmarking Methodology 7

slide-8
SLIDE 8

Meaningful latency measurements: 2nd example

1 1.5 2 2.5 3 3.5 4 5 10 Latency [µs] Probability [%]

  • Pica8 switch tested in [IFIP NETWORKING 16]
  • Different processing paths through a device
  • Bimodal distribution
  • Average latency is misleading

→ Extensive reports: histograms for visualization → Short reports: percentiles (25th, 50th, 75th, 95th, 99th, and 99.9th)

Daniel Raumer – Revisiting Benchmarking Methodology 8

slide-9
SLIDE 9

Latency under load

0.5 1 1.5 2 100 200 Offered load [Mpps] Latency [µs] CBR (median) CBR (25th/75th percentile)

  • Open vSwitch (Linux NAPI & ixgbe) [IMC15]
  • Latency at maximum throughput is not worst case

→ Measurements at different loads (10, 20, ..., 100% max. throughput)

Daniel Raumer – Revisiting Benchmarking Methodology 9

slide-10
SLIDE 10

Traffic pattern & latency

0.5 1 1.5 2 100 200 Offered load [Mpps] Latency [µs] CBR (median) CBR (25th/75th percentile) Poisson (median) Poisson (25th/75th percentile)

  • Open vSwitch (NAPI + ixgbe) [IMC15]
  • Different behavior for different traffic patterns

→ Tests with different traffic patterns → Poisson process to approximate real world traffic

Daniel Raumer – Revisiting Benchmarking Methodology 10

slide-11
SLIDE 11

Omitted tests

100 101 102 103 104 105 106 1.2 1.4 1.6 ·106 IP addresses [log] Rate [Mpps] 100 101 102 103 104 105 1060 5 10 [Cache Misses/Pkt.] Throughput L1 L2 L3

  • CPU caches affect the performance

→ Additional tests for certain device classes → Functionality dependent tests

Daniel Raumer – Revisiting Benchmarking Methodology 11

slide-12
SLIDE 12

Reproducibility of configurations

  • Manual device configuration is error prone
  • Device configuration is hard to reproduce

→ Reproducible configuration of DuT via scripts → Configuration scripts executed by benchmarking tool

Daniel Raumer – Revisiting Benchmarking Methodology 12

slide-13
SLIDE 13

Conclusion

  • Novel class of devices requires additional tests
  • There are arguments for reconsidering best practice:
  • Average latency may be misleading

→ Histograms / percentiles

  • Latency is load dependent

→ Measure 10, 20, ..., 100% of max. throughput

  • CBR traffic is a unrealistic test pattern

→ Poisson process

  • Device specific functionality

→ Perform device specific benchmarks

  • Manual configuration is error prone

→ Automatic configuration by benchmark tool

Daniel Raumer – Revisiting Benchmarking Methodology 13

slide-14
SLIDE 14

Novelty: RFC 2544 test suite on commodity hardware

  • MoonGen [IMC15] is a fast software packet generator
  • Hardware-assisted latency measurements (misusing PTP support)
  • Precise software rate control and traffic patterns
  • http://net.in.tum.de/pub/router-benchmarking/
  • RFC 2544 benchmark reports for Linux, FreeBSD, and MikroTik
  • Early version of the MoonGen RFC 2544 module

Daniel Raumer – Revisiting Benchmarking Methodology 14