the case for a flexible low level backend for software
play

The Case for a Flexible Low-Level Backend for Software Data Planes - PowerPoint PPT Presentation

The Case for a Flexible Low-Level Backend for Software Data Planes Sean Choi 1 , Xiang Long 2 , Muhammad Shahbaz 3 , Skip Booth 4 , Andy Keep 4 , John Marshall 4 , Changhoon Kim 5 1 3 4 2 5 Why software data planes? VM hypervisors VM


  1. The Case for a Flexible Low-Level Backend for Software Data Planes Sean Choi 1 , Xiang Long 2 , Muhammad Shahbaz 3 , Skip Booth 4 , Andy Keep 4 , John Marshall 4 , Changhoon Kim 5 1 3 4 2 5

  2. Why software data planes? • VM hypervisors VM VM • Cost savings with commodity general Software Switch purpose processing units – where desired throughput < ~100 Gbps • Prototyping protocol design Virtual Ports • Prototyping hardware DP architecture Physical Port

  3. Software Switch PISCES [1] [1] PISCES. ACM SIGCOMM 2016.

  4. Software switch DSLs High-level, close to protocol Abstract forwarding model

  5. Nice for programmers… • Familiar and logical model in mind when programming, e.g. match/action pipelines • Can specify packet data without worrying about implementation • Portable code across platforms • …

  6. Not so nice for compilers • Abstract forwarding model not designed for e.g. CPU-based architectures • Limited in expressiveness • Insulated from underlying low-level APIs • Result: Difficult to realize full performance potential of underlying hardware

  7. Hypothesis If software switches exposed more low-level characteristics to the data plane compiler improvements are possible in performance and features

  8. Our contribution • Identify a software switch that can be programmed at low-level w.r.t to the hardware architecture • Create compiler targeting that switch to allow it to support high-level data plane programs • Compare performance

  9. Target Switch: Vector Packet Processor (VPP) • Open sourced by Cisco • Can be programmed at low-level • Part of the FD.io project

  10. Vector Packet Processing (VPP) Platform … dpdk-input … • Modular packet ip6-input llc-input ip4-input processing node ip6-lookup graph abstraction ip6-rewrite- transmit dpdk-output

  11. Vector Packet Processing (VPP) Platform … dpdk-input … • Each node can execute ip6-input llc-input ip4-input almost arbitrary C code ip6-lookup on vectors of packets ip6-rewrite- transmit dpdk-output

  12. Vector Packet Processing (VPP) Platform … dpdk-input … • Code is divided into ip6-input llc-input ip4-input nodes to optimize for i- ip6-lookup and d-cache locality ip6-rewrite- transmit dpdk-output

  13. Vector Packet Processing (VPP) Platform Packet Vector … dpdk-input … Custom-input ip4-input ip6-input llc-input … ip6-lookup Node 1 Node 2 Node i ip6-rewrite- transmit Node j Node k Standard VPP Nodes Custom Plugin dpdk-output • Extensible packet processing through first-class plugins

  14. Vector Packet Processing (VPP) Platform • Proven performance [1] • Multiple MPPS from a single x86_64 core 1 core: 9 MPPS ipv4 in+out forwarding 2 cores: 13.4 MPPS ipv4 in+out forwarding 4 cores: 20.0 MPPS ipv4 in+out forwarding • > 100Gbps full-duplex on a single physical host • Outperforms Open vSwitch in various scenarios [1] https://wiki.fd.io/view/VPP/What_is_VPP%3F

  15. Vector Packet Processing (VPP) Platform • Disadvantage: large burden on the programmer • Requires knowledge from different fields: protocols, operating systems, processor architecture, C compiler optimization…. • Some Magic Required for good performance

  16. Some Magic Required Manually fetch two packets Consequence of being low-level

  17. Ease of programmability sacrificed for performance at low-level Can a high-level DSL compiler help? + Programmable Vector Packet Processor (PVPP)

  18. PVPP structure VPP Plugin P4 Cog Program Templates BMv2 BMv2 JSON-VPP Front-end Mid-end Back-end Compiler Compiler Compiler Compiler JSON Reference P4 Compiler (P4C) C Files Standard compiler optimizations are also VPP Plugin Directory applied, e.g. redundant table removal

  19. Experimental Setup PVPP 10Gx3 10Gx3 MoonGen MoonGen Sender/Receiver Sender/Receiver DPDK M1 M2 M3 CPU : Intel Xeon E5-2640 v3 2.6GHz Memory : 32GB RDIMM, 2133 MT/s, Dual Rank NICs : Intel X710 DP/QP DA SFP+ Cards HDD : 1TB 7.2K RPM NLSAS 6Gbps

  20. Benchmark Application IPv4_match Destination MAC Source MAC Parse Match: ip.dstAddr Match: ip.dstAddr Match: egress_port Ethernet/ Action: Set_nhop Action: Set_dmac Action: Set_dmac IPv4 drop drop drop

  21. Baseline Performance 64 byte packets, single 10G port Single Node Multiple Node 9 7.86 Throughput (Mpps) 8 7.05 7 6 5 4 3 2 1 0 64 Packet Size (Bytes)

  22. Vector Packet Processing (VPP) Platform … dpdk-input … • Each node can execute ip6-input llc-input ip4-input almost arbitrary C code ip6-lookup on vectors of packets ip6-rewrite- transmit dpdk-output

  23. Optimized Performance 64 byte packets, single 10G port Single Node Multiple Node 12 10.21 10.01 10 9.58 9.51 9.51 9.25 9.20 9.02 8.89 8.80 Throughput (Mpps) 8.50 8.38 7.86 8 7.05 6 4 2 0 Baseline Removing Reducing Metadata Loop Unrolling Bypassing Reducing Pointer Caching Logical HW Redundant Tables Access Redundant Nodes Dereferences Interface

  24. Scalability 64 byte packets across 3 x 10G ports Single Node Multiple Node 60 53.11 49.34 50 44.23 Throughput (Mpps) 40.69 40 35.83 33.41 30 26.40 24.14 20 17.03 16.57 8.52 8.14 10 0 1 2 3 4 5 6 Number of CPU cores

  25. Performance Comparison PVPP PISCES (with Microflow) PISCES (without Microflow) 70 63.49 59.53 60 49.31 Throughput (Mpps) 50 47.23 40 34.71 34.72 30.22 30.22 30.20 30 26.78 26.78 26.78 20 10 0 64 128 192 256 Packet Size (Bytes)

  26. Future work • Microbenchmarking VPP to inform VPP-specific optimizations • P4 compiler annotations for low-level constructs • Explore when multi-node compilation is beneficial for PVPP • Demonstrate use cases where OVS microflow cache is defeated – to show PVPP is just as programmable without resorting to separated fast/slow path

  27. Summary • High-level DSLs are great for programmers of software switches, but lack expressivity for optimizations. • Low-level software switches such as VPP are performant but hard to program. • We propose that best of both is possible with PVPP. • Comparable to state-of-art performance achieved but still work in progress.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend