You Cant Always Spin to Win Hossein Golestani , Amirhossein - PowerPoint PPT Presentation

Soft Softwar are e Da Data Planes: ta Planes: You Can’t Always Spin to Win Hossein Golestani , Amirhossein Mirhosseini, Thomas F. Wenisch University of Michigan ACM Symposium on Cloud Computing (SoCC) November 22, 2019 adacenter.org @ADA_Center This work is supported by the Semiconductor Research Corporation (SRC) and DARPA

What’s Up in the Cloud? • Virtual μ s-scale computing era … VM #1 VM #n Address Firewall Translation Load Routing Balancing Server #1 Server #2 Network function Microservices I/O virtualization virtualization • Service objectives • High throughput High-speed I/O * • Low average/tail latency * Image credits: Mellanox, Intel Software Data Planes: You Can’t Always Spin to Win 2

Softw Softwar are e Stac Stacks: Under ks: Under Revision vision • Then vs. now User app User app CPU I/O Kernel … … Kernel CPU … CPU I/O … I/O CPU I/O • Kernel-bypass architectures (just a handful) Andromeda [NSDI’18] mTCP [NSDI’14] Shinjuku [NSDI’19] Arrakis [OSDI’14] ReFlex [ASPLOS’17] Snap [SOSP’19] IX [OSDI’14] Shenango [NSDI’19] ZygOS [SOSP’17] Software Data Planes: You Can’t Always Spin to Win 3

Softwar Softw are e Da Data ta Planes Planes • Key mechanisms • User-level shared queues • Spin-polling cores I/O • Fast notification by cache coherence write signals SPDK • Widely adopted in industry STORAGE PERFORMANCE DEVELOPMENT KIT Software Data Planes: You Can’t Always Spin to Win 4

Spin Spin-polling: polling: Not a Not a Panacea anacea • An easy-to-use and fast model for communication and signaling • But far from ideal, especially when scaled • We show that spin-based data planes: • Perform more work when there is less • Are not scalable to many cores • Are not scalable to many queues • Are not well-suited for shared queues Software Data Planes: You Can’t Always Spin to Win 5

Outline Outline • Introduction to Software Data Planes • Methodology • Characterization of Software Data Plane Challenges • Solution Directions • Conclusion Software Data Planes: You Can’t Always Spin to Win 6

Methodology Methodolog • Setup • DPDK-based applications • Skylake cores • 100GbE Mellanox NIC • Experiments Inefficiencies of spin-polling 1 Lack of queue scalability 2 3 Impracticality of queue sharing Software Data Planes: You Can’t Always Spin to Win 7

Inef Inefficiencies ficiencies of of Spin Spin-polling polling • Polling “tax” • Body of poll loop 1 • Useless polling on idle queues (possibly causing cache misses) • Affects throughput scalability with cores (1) While forever: NIC … 2 Port 1 (2) For each RX queue: (3) Read packets from RX queue; Core (4) If there are any packets: Route packets using LPM * ; (5) NIC … Port 2 (6) Send packets to TX queue(s); 3 * LPM: Longest Prefix Match Polling tax can be 20-28% of total CPU cycles even in 100% load Software Data Planes: You Can’t Always Spin to Win 8

IPC IPC != Useful != Useful Wor ork • IPC (Instructions Per Cycle) of routing core at varying loads 1 2.75 2.50 2.25 IPC of routing core 2.00 1.75 1.50 1.25 2 1 queue 1.00 0.75 4 queues 0.50 8 queues 0.25 0.00 0 5 10 15 20 25 30 3 Routing throughput (Mpps) IPC decreases as load increases, resulting in energy inefficiency , fast aging , and severe co-runner interference Software Data Planes: You Can’t Always Spin to Win 9

Ef Effect ect on SMT on SMT Co Co-runner unner • More (useless) instructions executed in lighter traffic 1 2.5 • Co-running: 2.24 • Matrix mult 2.0 IPC of matrix mult 1.56 • Spin-based routing (0-100% load) 1.54 1.5 2 1.0 • Executed on: 0.5 • SMT cores of a physical CPU • Different physical CPUs 0.0 3 Not collocated Collocated Collocated Routing-0% Routing-100% Useless spinning wastes execution resources of an SMT co-runner Software Data Planes: You Can’t Always Spin to Win 10

Lac Lack of k of Queue Scala Queue Scalability bility • Traffic flows spread among multiple queues 1 • Limited size of CPU caches: a performance antagonist • Experiment 2 • Forwarding packets by a single core NIC … • Scaling up the number of queues Port 1 Core 3 NIC … Port 2 Software Data Planes: You Can’t Always Spin to Win 11

Ef Effect ect on La on Latenc tency • Round-trip latency of packet forwarding • Light traffic (minimal queuing delay) 1 25 Average latency ( μ s) 20 15 2 10 5 0 0 64 128 192 256 320 384 448 512 Number of queues 3 Latency is severely affected as queue heads fall out of L1/L2 caches Software Data Planes: You Can’t Always Spin to Win 12

Ef Effect ect on P on Peak T eak Thr hroughput oughput • Balanced traffic: Passing through all queues • Unbalanced traffic: Passing through only one queue 1 40 Balanced 35 Throughput (Mpps) Unbalanced 30 25 2 20 15 10 5 0 0 64 128 192 256 320 384 448 512 3 Total number of queues Cache misses not interleaved with transmits severely hurt peak throughput in unbalanced traffic Software Data Planes: You Can’t Always Spin to Win 13

Scale-up Queuing Scale up Queuing Is Is Impr Impractical actical • (a) Scale-out vs. (b) Scale-up queuing (shared queue) 1 Core Core 1 1 … … … Core Core n n 2 (a) (b) • Scale-up queuing • Strong theoretical merits 3 • Synchronization disadvantage Software Data Planes: You Can’t Always Spin to Win 14

Scale Scale-out out vs. Scale vs. Scale-up up • Processing hiccups cause head-of-line (HoL) blocking in scale-out 1 • Round-trip latency with 10 parallel cores (a) No hiccups 400 400 (a) (b) (b) 1μs processing hiccup Average latency ( μ s) Average latency ( μ s) 350 350 300 300 with 1% probability 250 250 2 200 200 150 150 100 100 Scale-out Scale-out 50 50 Scale-up Scale-up 0 0 0 20 40 60 0 20 40 60 3 Throughput (Mpps) Throughput (Mpps) Although effective in avoiding HoL blocking, spin-polling in scale-up queuing saturates at lower loads Software Data Planes: You Can’t Always Spin to Win 15

Futur Future Da e Data ta Planes Planes Software Data Planes: You Can’t Always Spin to Win 16

Solution Dir Solution Direction(s) ection(s) • QWAIT , a multi-address monitoring scheme • Inspired by x86 MWAIT • Avoids polling tax, useless polling, and disruption to SMT co-runners • Needs hardware support • Programming model similar to select - case in Go QWAIT (queue_set): case queue_1: process_queue_1(); … case queue_n: process_queue_n(); Software Data Planes: You Can’t Always Spin to Win 17

Conc Conclusion lusion • Key mechanisms of software data planes • User-level shared queues • Spin-polling cores • Although easy-to-use and low-latency, software data planes have deficiencies, especially when scaled • Using DPDK, we quantified these deficiencies: • Incurring polling overhead and useless work • Not scalable to many cores/queues • Not well-suited for scale-up queuing Software Data Planes: You Can’t Always Spin to Win 18

Q Q & & A Thank you! Software Data Planes: You Can’t Always Spin to Win 19

You Cant Always Spin to Win Hossein Golestani , Amirhossein - PowerPoint PPT Presentation

Soft Softwar are e Da Data Planes: ta Planes: You Cant Always Spin to Win Hossein Golestani , Amirhossein Mirhosseini, Thomas F. Wenisch University of Michigan ACM Symposium on Cloud Computing (SoCC) November 22, 2019 adacenter.org

SEPG 2007 SEPG 2007 SPIN Panel SPIN Panel SEPG2007 - SPIN Panel Session SEPG2007 - SPIN Panel

You Can Be an Energy Solutions Partner - ESP 1 Its a Win -Win-Win or (Win 3 ) Customer - ESP -

An Industrial Waste Heat Win-Win! Ray Deyoe Managing Director Integral Power, LLC An Industrial

INNOVATIVE THINKING AT WORK WIN+WIN Jorge Bugallo COACHING WIN+WIN COACHING Cell phone: +34

Win/Win Heifer Grazing Hayden Dore Veterinarian Vet South 1 Win/Win Heifer Grazing Owner

Tutorial 2: Promela/Spin Running Spin General Usage and Tips CISC422/853 Advice for

9/2/2015 Spin Currents An overview Sources of Spin Currents Spin current introduction Spin

Guest Speaker Joe Cornell, CFA Spin-Off Advisors, LLC Publishers of Spin-Off Research

Vorticity and spin polarization Vorticity and spin polarization Vorticity and spin polarization

Thermoelectric spin voltage in graphene 2017/12/22

Mairbek Chshiev European School on Magnetism Spintronics Conventional spintronics Spin-orbit

Spin Valves (I) Exchange Coupling Spin Valves (I) Exchange Coupling Spin Valves (I) Exchange

The Manufacturing Bootcamp: Crea5ng a Win- Win-Win

WIN, WIN, WIN - TRIPLE BOTTOM LINE! 5 yrs ago, was working full time on Dads farm, planted 3

Product Knowledge Training Win Dor Meeting Overview Introduction to Win Dor Inc.

Win-Win-Win How Fujifilm uses RxMS to improve machine performance, lower costs, and improve

Substance Abuse Maladaptive pattern of substance use leading to clinically significant

TB TEACHING POINTS Dominic Gaziano, M.D., F.C.C.P., Medical Director West Virginia Division of

Major theories of schizophrenia NMDA hypofunction theory (NMDAR antagonists given to humans

SCHIZOPHRENIA IN LONG-TERM CARE Douglas Steenblock, MD Iowa Veterans Home DISCLOSURES NONE

Bucindolol is Associated with a Lower Incidence of Dose Limiting Bradycardia in Heart Failure

John Karanicolas Our computational toolbox Structure-based Ligand-based approaches

* * To review the AS AM guidelines on the treatment of opioid addiction * To develop an

International Study Of Comparative Health Effectiveness With Medical And Invasive Approaches

Sambuz

Useful Links

Newsletter

Mail Us