MCC : a Predictable and Scalable Massive Client Load Generator - - PowerPoint PPT Presentation

mcc a predictable and scalable massive client load
SMART_READER_LITE
LIVE PREVIEW

MCC : a Predictable and Scalable Massive Client Load Generator - - PowerPoint PPT Presentation


slide-1
SLIDE 1

MCC: a Predictable and Scalable Massive Client Load Generator

Wenqing Wu*, Xiao Feng*, Wenli Zhang, Mingyu Chen

1State Key Laboratory of Computer Architecture, Institute of Computing Technology,

Chinese Academy of Science

2University of Chinese Academy of Sciences 3Peng Cheng Laboratory, Shenzhen, China

  • Nov 14-16, 2019 // Denver, Colorado, USA
slide-2
SLIDE 2

Outline

I. Background/Motivation II. Design

  • III. Evaluation
  • IV. Conclusion
slide-3
SLIDE 3

Background

  • Ø Network load to simulate

Data Center Load generator DUT

1 2

Stateless

  • No connections
  • Uni-directional
  • Supported:

Bandwidth test Packet loss test

1

Stateful

  • TCP connection oriented
  • Bi-directional
  • Supported:

Bandwidth test Packet loss test Latency analysis NAT/Firewall test Application interaction

2

ü Stateful ü Tremendous ü Various distributions

IoT

(Device under Test)

slide-4
SLIDE 4

State of the Art

  • ØHardware-based load Generators

Ixia (Specialized device, > $100,000 ) OSNT (NetFPGA, $ 2,000) Sprient (Specialized device)

Precise & accurate High throughput Stateless Inflexible(firm, not open source) Expensive

✓ ✓ ✘ ✘ ✘

slide-5
SLIDE 5

4

ØSoftware-based load Generators

Load Generators Description Stateful Comments trafgen Packet generator based on AF_PACKET

  • MoonGen

Packet generator fueled-by DPDK

  • D-ITG

Distributed framework

  • A multiplatform tool

Iperf Bandwidth and jitter analysis ✓ Close-loop, No concurrency support wrk HTTP benchmarking tool ✓ Limited throughput, poor scalability

Cheap (Run on cheap commodity hardware ) Some are flexible (Open source, easy to add new features) Stateful generators can not achieve microsecond precision Stateful generators show poor scalability

✘ ✓ ✓ ✘

State of the Art

slide-6
SLIDE 6

Imprecision in stateful load generation

  • Ø Why software-based stateful load generators are less precise?

ü Scheduling policies in Operating System (OS)

  • E.g. , sleep() does not guarantee one microsecond precision.

ü POSIX blocking I/O interface

  • E.g. , select() introduces at least 20 µs deviation to timed tasks.

ü Heavy kernel protocol stack

  • Uncertain stack processing time poisons the precision of timed operations running

in application layer.

slide-7
SLIDE 7

MCC (Massive Client Connections)

  • Ø Design goal:
  • A predictable and scalable massive client load generator

ü Stateful

  • TCP connection oriented

ü Predictable load generation

  • Two-stage time mechanism achieves one microsecond precision

ü High throughput while preforming flow-level simulation

  • Lightweight protocol stack based to simplify packet processing
  • High-speed I/O with kernel-bypass technique

ü Scalability in multi-core systems

  • Shared-nothing multi-threaded model
slide-8
SLIDE 8

MCC Overview

  • I/O Layer

Application Layer

Connection Manager Load Modeling Reactor Model Data Modeling

App Timer I/O Timer TCP/IP Stack Device Driver

Control Logic Data Path 1 2

Ø Load generation model of MCC ü Load modeling

Adjust packet size, add timestamp Adjust packets inter departure time

ü Reactor NIO pattern ü User-level stack based ü Customized I/O layer

Controllable I/O

ü Two-stage timer mechanism

Control packet I/O precisely

1 2

slide-9
SLIDE 9

User-level load generator

Ø MCC runs fully in user space ü We are able to optimize the full path of load generation

  • An I/O Thread is added to achieve precise control
  • Stack thread

LoadGen Thread I/O thread DPDK I/O library Kernel stack LoadGen Application Packet I/O Device driver Device driver

User Kernel

Socket API Socket-like API

Stateful Customized I/O layer Kernel-based solutions MCC’s approach Fully in user space

Stateless

slide-10
SLIDE 10

Two-stage timer

Ø Two-stage timer helps to generate predictable load

  • Register

queue (RQ)

xmit()

App Timer Connection manager

send() encapsulation

I/O Timer I/O Thread Stack Thread LoadGen Thread DPDK IO library

xmit()

Data with timestamp

1 2 4 5 3

  • App Timer

ü Flow-level ü Control send operation

  • IO Timer

ü Packet-level ü Control xmit operation according to timestamp added in application layer

  • Step
  • Initialize Load Generation thread
  • Initialize I/O thread
  • Send data according to the App timer
  • Encapsulate packet and enqueue it to RQ
  • Xmit packet according to the I/O timer

1 2 3 4 5

slide-11
SLIDE 11

ü Avoid the imprecision resulting from scheduling policies in OS

  • Structures
  • Timed task: 2-tuple (timestamp, function)
  • Task set: RB-tree (fast insertion/deletion)
  • Steps

I. Register (Add timed task) II. Trigger timeout (Polling check) III. Execute (Run callback function)

Ø Polling-based application layer timer

Non-blocking event loop Task Set sched(ts, func)

Add timed task

P

  • l

l i n g c h e c k Execute

Store with RB-tree

User APIs

3 Microsecond precision

  • App timer
slide-12
SLIDE 12

Ø Novel I/O timer added in customized I/O layer 1 Microsecond precision

  • NIC

DPDK I/O Thread Stack Thread

RQ IO Timer

sndbuf ring queue

ü Eliminate timing error introduced by protocol stack (tens of microseconds )

  • I/O layer (I/O thread)
  • Dedicated I/O thread
  • Lockless Register queue (RQ)
  • Steps

I. Insert encapsulated packet into RQ II. Polling check RQ III. Send packet out at specific time

I/O timer

slide-13
SLIDE 13

Scalable Multi-threaded Model

Ø Shared-nothing multi-threaded Model

  • Per-core threading

ü Per-core listening queue ü Per-core file descriptors ü Run-to-completion model

  • Core affinity
  • RSS (Receive-Side Scaling)
  • Distributor thread

ü Parse/distribute configuration ü Aggregate statistics

Scalability

  • worker

DPDK Stack Thread Stack Thread Stack Thread LoadGen Thread LoadGen Thread LoadGen Thread

core0 core1 core2 Distributor thread

queue1

NIC queue0 queue2 RSS

slide-14
SLIDE 14

Ø Message passing model between Distributor and Workers ü Avoid synchronization primitives (lock, memory barrier, atomic operations, …) ü Easy to extend for multiple Workers Worker Distributor

pull pull push push Task queue Result queue (statistics, state) 13

Scalable Multi-threaded Model

slide-15
SLIDE 15

Evaluation

slide-16
SLIDE 16

Experimental Setup

ü Machines(client/server ):

  • CPU: Intel(R) Xeon(R) CPU E5645 12 cores @ 2.40GHz
  • Memory: 96 GB
  • NIC: Intel 82599ES 10Gb Ethernet Adapter

ü Experiments:

  • Microbenchmark:
  • Precision with different timers
  • Predictable load generation
  • Throughput & Scalability
slide-17
SLIDE 17

Precision with different timers

Precision of the load generator when generating constant bit rate (CBR) traffic ü Timers in MCC bring one microsecond precision.

  • Baseline: sleep() + Linux kernel stack

(“Linux” in the table indicates “Linux kernel stack”)

slide-18
SLIDE 18

Predictable Load Generation

ü MCC is able to generate traffic following the analytical Poisson distribution. Poisson Traffic Generation

  • Packet interval (µs)

20 40 60 80 100 120 0.175 0.150 0.125 0.100 0.075 0.050 0.025 0.000 Probability Density App timer vs. analytic App timer + IO timer vs. analytic

slide-19
SLIDE 19

Throughput & Scalability

ü 2.4x ~ 3.5x higher throughput than wrk (A kernel-based HTTP benchmark) ü Almost linear scalability before reaching line rate HTTP load generation

(File size: 64B)

  • 1.5

2.4 4.75 6.3 7.65 8.6 3.56 6.96 13.87 19.27 28.57 30.83

5 10 15 20 25 30 35 1 2 4 6 8 10

Requests per Second (x105) Number of CPU cores

wrk MCC

slide-20
SLIDE 20

Conclusion

Ø MCC:A predictable and scalable massive client load generator ü Predictable load generation

  • Two-stage timer mechanism

ü High throughput

  • Lightweight user-level stack
  • Kernel-bypass

ü Scalability in multi-core systems

  • Shared-nothing multi-threaded model
slide-21
SLIDE 21
  • Please feel free to email us at wuwenqing@ict.ac.cn

if you have any questions.