GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi - - PowerPoint PPT Presentation
GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi - - PowerPoint PPT Presentation
GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi Chen Sun Heng Yu Hongxin Hu Zili Meng Shuhe Wang Kai Gao Jianping Wu Network Function Virtualization (NFV) Dedicated Dedicated Dedicated Dedicated NFV: Commodity
Network Function Virtualization (NFV)
2
VPN Firewall Monitor Load Balancer
NFV: Commodity Hardware Devices
VM VM VM VM
Dedicated Dedicated Dedicated Dedicated
Low cost Service provisioning flexibility Elasticity control
Virtualization Techniques
Service Function Chain (SFC)
CPU-based NFV
3
OpenNetVM
(HotMiddlebox’16)
NetBricks
(OSDI’16)
NFP
(SIGCOMM’17)
Metron
(NSDI’18)
…
NFV Platforms
General-purpose Multi-core Servers
NFV Infrastructure
- Problems
– Low performance with negative improvement expectation – Coarse-grained scaling
Problems of CPU-based NFV
4
- Low performance with negative improvement expectation
– Hard to achieve high performance (e.g., 40~100Gbps) for a wide range of NFs – The slow/end of Moore’s Law
- Coarse-grained scaling
IPSec
(AES & SHA1)
2.6 ~ 7.7 Gbps
Go, Younghwan, et al. "APUNet: Revitalizing GPU as Packet Processing Accelerator." NSDI. 2017.
10 Mpps
9 Mpps
10 Mpps
11 Mpps 2 CPU cores
1 Mpps
1 CPU core
NIDS
(Aho-Corasick)
4.2 ~ 10.4 Gbps E5-2650 v2 (8 Cores, 2.6 GHz)
GPU as An Accelerator for NFV
5
- Existing work
– Router (PacketShader, SIGCOMM’10) – SSL proxy (SSLShader, NSDI’11) – NIDS (Kargus, CCS’12) – IPSec (NBA, EuroSys’15) – NFV framework (G-NET, NSDI’18)
- Benefits of GPU
– Massive processing cores – Fine-grained computing units
High-performance SFCs Fine-grained fast Scaling High-performance NFs Fine-grained resource
Potential Problems Unsolved
GEN exploits GPU to support high-performance SFCs with fine-grained scaling
SFC Configurations
GEN Framework Overview
7
Server CPU
SFC Manager
Server CPU
SFC Manager
GPU GPU GPU GPU
SFC Controller SFC Controller SFC Controller SFC Controller SFC Controller SFC Controller
Infrastructure Orchestrator
SFC Configurations
Server CPU
SFC Manager SFC Controllers
Server CPU
SFC Manager SFC Controllers
GPU GPU GPU GPU
Infrastructure Design
8
SFC Manager
10 / 40 / 100 GbE Ports Rx Tx
CPU (User Space) NIC
SFC Controller #1 SFC Controller #n
R
GPU (2k~3k physical cores)
Chain #n NF #1 …… Chain #n NF #mn
Global Memory
SFC Starter Adaptive Batcher Packet Dropper Packet Forwarder Chain #1 NF #1 Chain #1 NF #2 Chain #1 NF #3
R
SFC Agent #1 SFC Agent #n Output Queuing
……
Chain Classifier
①
②
High Performance Elastic Scaling
Problem #1: SFC Model Selection
9
Pipelining Run-to-completion (RTC)
NF1 NF2
Packets Instance #1 Instance #2
NF1 NF2
Packets Instance #1
SFC Model Selection: Pipelining
- Two potential ways to support pipelining in GPU
NF1
GPU CPU
Worker- NF1
- 2. Kernel invocation
- 4. Synchronization
- 5. Next NF
NF2 Worker- NF2
- 6. Kernel invocation
- 8. Synchronization
Packet batch Packet Buffer
- 7. Reading
- 3. Reading
Out
Persistent kernels
NF1
(persistent)
GPU CPU
Worker- SFC
Packet batch Packet Buffer
- 2. Reading
NF2
(persistent)
- 3. Next NF
- 4. Reading
Out
Sequenced invocations
High overhead from frequent kernel invocations (~5us per invocation)
Hard and costly scaling
10
SFC Model Selection: RTC
- NFs are integrated into a specific SFC Agent kernel fusion
- SFC Agent (in GPU) is Launched by SFC Starter (in CPU)
GPU CPU
Worker- SFC
Packet batch Packet Buffer Out
NF1 NF2
RTC Model
- 2. Kernel invocation
- 4. Synchronization
- RTC-based Model
Easier scaling (not persistent) Less kernel invocations (once per SFC)
Packet
11
Problem #2: Elastic Scaling
- Avoid state management caused by scale out / in
– Intuition: Use scale up / down to avoid state management
- Avoid monitoring NF load for scaling
– Avoid deciding when to scale – Avoid deciding to what extent an NF should be scaled – Avoid considering how to quickly carry out NF scaling
- Adaptive Batcher
12
Elastic Scaling – Adaptive Batcher
- Design of the adaptive batcher
– Keeping the buffer occupancy at a low level – Scaling up/in GPU resource provisioning
Buffer
Adaptive Batcher Packets Fetching
All packets In the buffer
Batching
GPU
Scaling up/in more mini-batches in GPU
Load monitoring avoidance State management avoidance
13
Preliminary Evaluation
- Hardware
– CPU: Two Intel Xeon E5-2650 v4 (10 physical cores) – GPU: NVIDIA TITAN Xp – NIC: Two Intel X520 (40 Gbps in total)
- NFs & SFCs
– IPV4Router (1k entries) → NIDS (3k rules) → IPSec (SHA1 & AES-128-CBC)
- Software
– DPDK 17.11 for networking IO – CUDA 8.0 for GPU programming
14
Performance of RTC vs. Pipelining
64 128 256 512 1024 1600 10 15 20 25 30 35 40 Throughput (Gbps) Packet Size (Byte)
Pipeline RTC
100 200 300 400 500 0.0 0.2 0.4 0.6 0.8 1.0 CDF Latency (ms)
Pipeline RTC
29.2% and 28.1% 33.7%
95th 15
Fast Elastic Scaling
5 10 15 20 25 30 35 10 20 30 40 Throughput (Gbps) Timeline (second)
Fast converging (< 100ms)
16
Conclusion and Future Work
- Gen: a GPU-accelerated elastic framework for NFV
– High-performance SFC – Elastic scaling
- Future work
– More SFC performance enhancement in GPU – Coordination between CPU and GPU – Impact of dynamic traffic load
17