GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi - - PowerPoint PPT Presentation

gen a gpu accelerated elastic framework for nfv
SMART_READER_LITE
LIVE PREVIEW

GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi - - PowerPoint PPT Presentation

GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi Chen Sun Heng Yu Hongxin Hu Zili Meng Shuhe Wang Kai Gao Jianping Wu Network Function Virtualization (NFV) Dedicated Dedicated Dedicated Dedicated NFV: Commodity


slide-1
SLIDE 1

GEN: A GPU-Accelerated Elastic Framework for NFV

Zhilong Zheng Jun Bi Chen Sun Heng Yu Hongxin Hu Zili Meng Shuhe Wang Kai Gao Jianping Wu

slide-2
SLIDE 2

Network Function Virtualization (NFV)

2

VPN Firewall Monitor Load Balancer

NFV: Commodity Hardware Devices

VM VM VM VM

Dedicated Dedicated Dedicated Dedicated

Low cost Service provisioning flexibility Elasticity control

Virtualization Techniques

Service Function Chain (SFC)

slide-3
SLIDE 3

CPU-based NFV

3

OpenNetVM

(HotMiddlebox’16)

NetBricks

(OSDI’16)

NFP

(SIGCOMM’17)

Metron

(NSDI’18)

NFV Platforms

General-purpose Multi-core Servers

NFV Infrastructure

  • Problems

– Low performance with negative improvement expectation – Coarse-grained scaling

slide-4
SLIDE 4

Problems of CPU-based NFV

4

  • Low performance with negative improvement expectation

– Hard to achieve high performance (e.g., 40~100Gbps) for a wide range of NFs – The slow/end of Moore’s Law

  • Coarse-grained scaling

IPSec

(AES & SHA1)

2.6 ~ 7.7 Gbps

Go, Younghwan, et al. "APUNet: Revitalizing GPU as Packet Processing Accelerator." NSDI. 2017.

10 Mpps

9 Mpps

10 Mpps

11 Mpps 2 CPU cores

1 Mpps

1 CPU core

NIDS

(Aho-Corasick)

4.2 ~ 10.4 Gbps E5-2650 v2 (8 Cores, 2.6 GHz)

slide-5
SLIDE 5

GPU as An Accelerator for NFV

5

  • Existing work

– Router (PacketShader, SIGCOMM’10) – SSL proxy (SSLShader, NSDI’11) – NIDS (Kargus, CCS’12) – IPSec (NBA, EuroSys’15) – NFV framework (G-NET, NSDI’18)

  • Benefits of GPU

– Massive processing cores – Fine-grained computing units

High-performance SFCs Fine-grained fast Scaling High-performance NFs Fine-grained resource

Potential Problems Unsolved

slide-6
SLIDE 6

GEN exploits GPU to support high-performance SFCs with fine-grained scaling

slide-7
SLIDE 7

SFC Configurations

GEN Framework Overview

7

Server CPU

SFC Manager

Server CPU

SFC Manager

GPU GPU GPU GPU

SFC Controller SFC Controller SFC Controller SFC Controller SFC Controller SFC Controller

Infrastructure Orchestrator

SFC Configurations

Server CPU

SFC Manager SFC Controllers

Server CPU

SFC Manager SFC Controllers

GPU GPU GPU GPU

slide-8
SLIDE 8

Infrastructure Design

8

SFC Manager

10 / 40 / 100 GbE Ports Rx Tx

CPU (User Space) NIC

SFC Controller #1 SFC Controller #n

R

GPU (2k~3k physical cores)

Chain #n NF #1 …… Chain #n NF #mn

Global Memory

SFC Starter Adaptive Batcher Packet Dropper Packet Forwarder Chain #1 NF #1 Chain #1 NF #2 Chain #1 NF #3

R

SFC Agent #1 SFC Agent #n Output Queuing

……

Chain Classifier

High Performance Elastic Scaling

slide-9
SLIDE 9

Problem #1: SFC Model Selection

9

Pipelining Run-to-completion (RTC)

NF1 NF2

Packets Instance #1 Instance #2

NF1 NF2

Packets Instance #1

slide-10
SLIDE 10

SFC Model Selection: Pipelining

  • Two potential ways to support pipelining in GPU

NF1

GPU CPU

Worker- NF1

  • 2. Kernel invocation
  • 4. Synchronization
  • 5. Next NF

NF2 Worker- NF2

  • 6. Kernel invocation
  • 8. Synchronization

Packet batch Packet Buffer

  • 7. Reading
  • 3. Reading

Out

Persistent kernels

NF1

(persistent)

GPU CPU

Worker- SFC

Packet batch Packet Buffer

  • 2. Reading

NF2

(persistent)

  • 3. Next NF
  • 4. Reading

Out

Sequenced invocations

High overhead from frequent kernel invocations (~5us per invocation)

Hard and costly scaling

10

slide-11
SLIDE 11

SFC Model Selection: RTC

  • NFs are integrated into a specific SFC Agent kernel fusion
  • SFC Agent (in GPU) is Launched by SFC Starter (in CPU)

GPU CPU

Worker- SFC

Packet batch Packet Buffer Out

NF1 NF2

RTC Model

  • 2. Kernel invocation
  • 4. Synchronization
  • RTC-based Model

Easier scaling (not persistent) Less kernel invocations (once per SFC)

Packet

11

slide-12
SLIDE 12

Problem #2: Elastic Scaling

  • Avoid state management caused by scale out / in

– Intuition: Use scale up / down to avoid state management

  • Avoid monitoring NF load for scaling

– Avoid deciding when to scale – Avoid deciding to what extent an NF should be scaled – Avoid considering how to quickly carry out NF scaling

  • Adaptive Batcher

12

slide-13
SLIDE 13

Elastic Scaling – Adaptive Batcher

  • Design of the adaptive batcher

– Keeping the buffer occupancy at a low level – Scaling up/in GPU resource provisioning

Buffer

Adaptive Batcher Packets Fetching

All packets In the buffer

Batching

GPU

Scaling up/in more mini-batches in GPU

Load monitoring avoidance State management avoidance

13

slide-14
SLIDE 14

Preliminary Evaluation

  • Hardware

– CPU: Two Intel Xeon E5-2650 v4 (10 physical cores) – GPU: NVIDIA TITAN Xp – NIC: Two Intel X520 (40 Gbps in total)

  • NFs & SFCs

– IPV4Router (1k entries) → NIDS (3k rules) → IPSec (SHA1 & AES-128-CBC)

  • Software

– DPDK 17.11 for networking IO – CUDA 8.0 for GPU programming

14

slide-15
SLIDE 15

Performance of RTC vs. Pipelining

64 128 256 512 1024 1600 10 15 20 25 30 35 40 Throughput (Gbps) Packet Size (Byte)

Pipeline RTC

100 200 300 400 500 0.0 0.2 0.4 0.6 0.8 1.0 CDF Latency (ms)

Pipeline RTC

29.2% and 28.1% 33.7%

95th 15

slide-16
SLIDE 16

Fast Elastic Scaling

5 10 15 20 25 30 35 10 20 30 40 Throughput (Gbps) Timeline (second)

Fast converging (< 100ms)

16

slide-17
SLIDE 17

Conclusion and Future Work

  • Gen: a GPU-accelerated elastic framework for NFV

– High-performance SFC – Elastic scaling

  • Future work

– More SFC performance enhancement in GPU – Coordination between CPU and GPU – Impact of dynamic traffic load

17

slide-18
SLIDE 18

Thank You

http://netarchlab.tsinghua.edu.cn