you can t always spin to win
play

You Cant Always Spin to Win Hossein Golestani , Amirhossein - PowerPoint PPT Presentation

Soft Softwar are e Da Data Planes: ta Planes: You Cant Always Spin to Win Hossein Golestani , Amirhossein Mirhosseini, Thomas F. Wenisch University of Michigan ACM Symposium on Cloud Computing (SoCC) November 22, 2019 adacenter.org


  1. Soft Softwar are e Da Data Planes: ta Planes: You Can’t Always Spin to Win Hossein Golestani , Amirhossein Mirhosseini, Thomas F. Wenisch University of Michigan ACM Symposium on Cloud Computing (SoCC) November 22, 2019 adacenter.org @ADA_Center This work is supported by the Semiconductor Research Corporation (SRC) and DARPA

  2. What’s Up in the Cloud? • Virtual μ s-scale computing era … VM #1 VM #n Address Firewall Translation Load Routing Balancing Server #1 Server #2 Network function Microservices I/O virtualization virtualization • Service objectives • High throughput High-speed I/O * • Low average/tail latency * Image credits: Mellanox, Intel Software Data Planes: You Can’t Always Spin to Win 2

  3. Softw Softwar are e Stac Stacks: Under ks: Under Revision vision • Then vs. now User app User app CPU I/O Kernel … … Kernel CPU … CPU I/O … I/O CPU I/O • Kernel-bypass architectures (just a handful) Andromeda [NSDI’18] mTCP [NSDI’14] Shinjuku [NSDI’19] Arrakis [OSDI’14] ReFlex [ASPLOS’17] Snap [SOSP’19] IX [OSDI’14] Shenango [NSDI’19] ZygOS [SOSP’17] Software Data Planes: You Can’t Always Spin to Win 3

  4. Softwar Softw are e Da Data ta Planes Planes • Key mechanisms • User-level shared queues • Spin-polling cores I/O • Fast notification by cache coherence write signals SPDK • Widely adopted in industry STORAGE PERFORMANCE DEVELOPMENT KIT Software Data Planes: You Can’t Always Spin to Win 4

  5. Spin Spin-polling: polling: Not a Not a Panacea anacea • An easy-to-use and fast model for communication and signaling • But far from ideal, especially when scaled • We show that spin-based data planes: • Perform more work when there is less • Are not scalable to many cores • Are not scalable to many queues • Are not well-suited for shared queues Software Data Planes: You Can’t Always Spin to Win 5

  6. Outline Outline • Introduction to Software Data Planes • Methodology • Characterization of Software Data Plane Challenges • Solution Directions • Conclusion Software Data Planes: You Can’t Always Spin to Win 6

  7. Methodology Methodolog • Setup • DPDK-based applications • Skylake cores • 100GbE Mellanox NIC • Experiments Inefficiencies of spin-polling 1 Lack of queue scalability 2 3 Impracticality of queue sharing Software Data Planes: You Can’t Always Spin to Win 7

  8. Inef Inefficiencies ficiencies of of Spin Spin-polling polling • Polling “tax” • Body of poll loop 1 • Useless polling on idle queues (possibly causing cache misses) • Affects throughput scalability with cores (1) While forever: NIC … 2 Port 1 (2) For each RX queue: (3) Read packets from RX queue; Core (4) If there are any packets: Route packets using LPM * ; (5) NIC … Port 2 (6) Send packets to TX queue(s); 3 * LPM: Longest Prefix Match Polling tax can be 20-28% of total CPU cycles even in 100% load Software Data Planes: You Can’t Always Spin to Win 8

  9. IPC IPC != Useful != Useful Wor ork • IPC (Instructions Per Cycle) of routing core at varying loads 1 2.75 2.50 2.25 IPC of routing core 2.00 1.75 1.50 1.25 2 1 queue 1.00 0.75 4 queues 0.50 8 queues 0.25 0.00 0 5 10 15 20 25 30 3 Routing throughput (Mpps) IPC decreases as load increases, resulting in energy inefficiency , fast aging , and severe co-runner interference Software Data Planes: You Can’t Always Spin to Win 9

  10. Ef Effect ect on SMT on SMT Co Co-runner unner • More (useless) instructions executed in lighter traffic 1 2.5 • Co-running: 2.24 • Matrix mult 2.0 IPC of matrix mult 1.56 • Spin-based routing (0-100% load) 1.54 1.5 2 1.0 • Executed on: 0.5 • SMT cores of a physical CPU • Different physical CPUs 0.0 3 Not collocated Collocated Collocated Routing-0% Routing-100% Useless spinning wastes execution resources of an SMT co-runner Software Data Planes: You Can’t Always Spin to Win 10

  11. Lac Lack of k of Queue Scala Queue Scalability bility • Traffic flows spread among multiple queues 1 • Limited size of CPU caches: a performance antagonist • Experiment 2 • Forwarding packets by a single core NIC … • Scaling up the number of queues Port 1 Core 3 NIC … Port 2 Software Data Planes: You Can’t Always Spin to Win 11

  12. Ef Effect ect on La on Latenc tency • Round-trip latency of packet forwarding • Light traffic (minimal queuing delay) 1 25 Average latency ( μ s) 20 15 2 10 5 0 0 64 128 192 256 320 384 448 512 Number of queues 3 Latency is severely affected as queue heads fall out of L1/L2 caches Software Data Planes: You Can’t Always Spin to Win 12

  13. Ef Effect ect on P on Peak T eak Thr hroughput oughput • Balanced traffic: Passing through all queues • Unbalanced traffic: Passing through only one queue 1 40 Balanced 35 Throughput (Mpps) Unbalanced 30 25 2 20 15 10 5 0 0 64 128 192 256 320 384 448 512 3 Total number of queues Cache misses not interleaved with transmits severely hurt peak throughput in unbalanced traffic Software Data Planes: You Can’t Always Spin to Win 13

  14. Scale-up Queuing Scale up Queuing Is Is Impr Impractical actical • (a) Scale-out vs. (b) Scale-up queuing (shared queue) 1 Core Core 1 1 … … … Core Core n n 2 (a) (b) • Scale-up queuing • Strong theoretical merits 3 • Synchronization disadvantage Software Data Planes: You Can’t Always Spin to Win 14

  15. Scale Scale-out out vs. Scale vs. Scale-up up • Processing hiccups cause head-of-line (HoL) blocking in scale-out 1 • Round-trip latency with 10 parallel cores (a) No hiccups 400 400 (a) (b) (b) 1μs processing hiccup Average latency ( μ s) Average latency ( μ s) 350 350 300 300 with 1% probability 250 250 2 200 200 150 150 100 100 Scale-out Scale-out 50 50 Scale-up Scale-up 0 0 0 20 40 60 0 20 40 60 3 Throughput (Mpps) Throughput (Mpps) Although effective in avoiding HoL blocking, spin-polling in scale-up queuing saturates at lower loads Software Data Planes: You Can’t Always Spin to Win 15

  16. Futur Future Da e Data ta Planes Planes Software Data Planes: You Can’t Always Spin to Win 16

  17. Solution Dir Solution Direction(s) ection(s) • QWAIT , a multi-address monitoring scheme • Inspired by x86 MWAIT • Avoids polling tax, useless polling, and disruption to SMT co-runners • Needs hardware support • Programming model similar to select - case in Go QWAIT (queue_set): case queue_1: process_queue_1(); … case queue_n: process_queue_n(); Software Data Planes: You Can’t Always Spin to Win 17

  18. Conc Conclusion lusion • Key mechanisms of software data planes • User-level shared queues • Spin-polling cores • Although easy-to-use and low-latency, software data planes have deficiencies, especially when scaled • Using DPDK, we quantified these deficiencies: • Incurring polling overhead and useless work • Not scalable to many cores/queues • Not well-suited for scale-up queuing Software Data Planes: You Can’t Always Spin to Win 18

  19. Q Q & & A Thank you! Software Data Planes: You Can’t Always Spin to Win 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend