ix a protected dataplane opera3ng system for high
play

IX:$A$Protected$Dataplane$Opera3ng$ System$for$High$Throughput$and$ - PowerPoint PPT Presentation

IX:$A$Protected$Dataplane$Opera3ng$ System$for$High$Throughput$and$ Low$Latency$ Adam%Belay ,$George$Prekas,$$ Samuel$Grossman,$Ana$Klimovic,$$ Christos$Kozyrakis,$Edouard$Bugnion$ HW$is$fast,$but$SW$is$a$BoLleneck$ $ $ 64Obyte$TCP$Echo:$


  1. IX:$A$Protected$Dataplane$Opera3ng$ System$for$High$Throughput$and$ Low$Latency$ Adam%Belay ,$George$Prekas,$$ Samuel$Grossman,$Ana$Klimovic,$$ Christos$Kozyrakis,$Edouard$Bugnion$

  2. HW$is$fast,$but$SW$is$a$BoLleneck$ $ $ 64Obyte$TCP$Echo:$ 60$ 10$ Millions% 50$ 8$ 40$ 6$ HW$Limit$ 30$ 4$ Linux$ 20$ IX$ 2$ 10$ 0$ 0$ Microseconds$ Requests$per$Second$ 2$

  3. HW$is$fast,$but$SW$is$a$BoLleneck$ $ $ 64Obyte$TCP$Echo:$ 60$ 10$ Millions% 50$ 8$ 40$ 6$ HW$Limit$ 4.8x% 8.8x% 30$ Gap% 4$ Gap% Linux$ 20$ IX$ 2$ 10$ 0$ 0$ Microseconds$ Requests$per$Second$ 3$

  4. IX$Closes$the$SW$Performance$Gap$ $ $ 64Obyte$TCP$Echo:$ 60$ 10$ Millions% 50$ 8$ 40$ 6$ HW$Limit$ 30$ 4$ Linux$ 20$ IX$ 2$ 10$ 0$ 0$ Microseconds$ Requests$per$Second$ 4$

  5. Two$Contribu3ons$ #1:$Protec3on$and$direct$HW$access$through$virtualiza3on$ $ #2:$Execu3on$model$for$low$latency$and$high$throughput$ 60$ 10$ Millions% 50$ 8$ 40$ 6$ HW$Limit$ 30$ 4$ Linux$ 20$ IX$ 2$ 10$ 0$ 0$ Microseconds$ Requests$per$Second$ 5$

  6. Why$is$SW$Slow?$ Complex$Interface$ Code$Paths$Convoluted$by$Interrupts$and$Scheduling$ Created$by:$Arnout$Vandecappelle$ 6$ hLp://www.linuxfounda3on.org/collaborate/workgroups/networking/kernel_flow$

  7. Problem:$1980s$Sobware$Architecture$ • Berkeley$sockets,$designed$for$CPU$3me$sharing$ • Today’s$largeOscale$datacenter$workloads:$ Hardware:%Dense%Mul;core%+%10%GbE%(soon%40)% O API$scalability$cri3cal!$ O Gap$between$compute$and$RAM$O>$Cache$behavior$maLers$ O Packet$interOarrival$3mes$of$50$ns$ Scale%out%access%paFerns% O FanOin$O>$Large$connec3on$counts,$high$request$rates$ O FanOout$O>$Tail$latency$maLers!$ 7$

  8. Conven3onal$Wisdom$ • Bypass$the$kernel$ – Move$TCP$to$userOspace$(Onload,$mTCP,$Sandstorm)$ – Move$TCP$to$hardware$(TOE)$ • Avoid$the$connec3on$scalability$boLleneck$ – Use$datagrams$instead$of$connec3ons$(DIY$conges3on$management)$ – Use$proxies$at$the$expense$of$latency$ • Replace$classic$Ethernet$ – Use$a$lossless$fabric$(Infiniband)$ – Offload$memory$access$(rDMA)$ • Common%thread:%Give%up%on%systems%soJware% 8$

  9. Our$Approach$ • Bypass$the$kernel$ Robust%Protec;on% Between%App% – Move$TCP$to$userOspace$(Onload,$mTCP,$Sandstorm)$ – Move$TCP$to$hardware$(TOE)$ and%Netstack% • Avoid$the$connec3on$scalability$boLleneck$ Connec;on% – Use$datagrams$instead$of$connec3ons$(DIY$conges3on$management)$ Scalability% – Use$proxies$at$the$expense$of$latency$ • Replace$classic$Ethernet$ Commodity%10Gb% – Use$a$lossless$fabric$(Infiniband)$ Ethernet% – Offload$memory$access$(rDMA)$ • Tackle%the%problem%head%on…% 9$

  10. Separa3on$of$Control$and$Data$Plane$ CP$ DP$ DP$ $ $ Userspace% Host$ Kernelspace% Kernel$ C$ C$ C$ C$ C$ 10$

  11. Separa3on$of$Control$and$Data$Plane$ CP$ DP$ DP$ $ $ Userspace% Host$ Kernelspace% Kernel$ RX$ RX$ RX$ RX$ TX$ TX$ TX$ TX$ C$ C$ C$ C$ C$ 11$

  12. Separa3on$of$Control$and$Data$Plane$ IX$CP$ Ring%3% IX$DP$ IX$DP$ $ $ Guest% Ring%0% Host$ Host% Kernel$ Ring%0% RX$ RX$ RX$ RX$ TX$ TX$ TX$ TX$ C$ C$ C$ C$ C$ 12$

  13. Separa3on$of$Control$and$Data$Plane$ IX$CP$ Ring%3% IX$DP$ IX$DP$ $ $ Guest% Ring%0% Linux$kernel$ Host% $ Ring%0% Dune$ RX$ RX$ RX$ RX$ $ TX$ TX$ TX$ TX$ C$ C$ C$ C$ C$ 13$

  14. Separa3on$of$Control$and$Data$Plane$ HTTPd$ Memcached$ IX$CP$ Ring%3% libIX$ libIX$ Guest% IX$ IX$ Ring%0% Linux$kernel$ Host% $ Ring%0% Dune$ RX$ RX$ RX$ RX$ $ TX$ TX$ TX$ TX$ C$ C$ C$ C$ C$ 14$

  15. The$IX$Execu3on$Pipeline$ 3 Ring%3% eventOdriven$app$ Event$ Batched$ libIX$ Condi3ons$ Syscalls$ Guest% 2 4 Ring%0% TCP/IP$ TCP/IP$ 5 Timer$ RX$ FIFO$ 6 RX$ TX$ 1 15$

  16. Design$(1):$Run$to$Comple3on$ 3 Ring%3% eventOdriven$app$ Event$ Batched$ libIX$ Condi3ons$ Syscalls$ Guest% 2 4 Ring%0% TCP/IP$ TCP/IP$ 5 Timer$ RX$ FIFO$ 6 RX$ TX$ 1 Improves%DataVCache%Locality% 16$ Removes%Scheduling%Unpredictably%

  17. Design$(2):$Adap3ve$Batching$ 3 Ring%3% eventOdriven$app$ Event$ Batched$ libIX$ Condi3ons$ Syscalls$ Guest% 2 4 Ring%0% TCP/IP$ TCP/IP$ Adap3ve$Batch$ 5 Calcula3on$ Timer$ RX$ FIFO$ 6 RX$ TX$ 1 Improves%Instruc;onVCache%Locality%and%Prefetching% 17$

  18. See$the$Paper$for$more$Details$ • Design$(3):$Flow$consistent$hashing$ – Synchroniza3on$&$coherence$free$opera3on$ • Design$(4):$Na3ve$zeroOcopy$API$ – Flow$control$exposed$to$applica3on$ • Libix:$LibeventOlike$eventObased$programming$ • IX$prototype$implementa3on$$ – Dune,$DPDK,$LWIP,$~40K$SLOC$of$kernel$code$ 18$

  19. Evalua3on$ • Comparison$IX$to$Linux$and$mTCP$[NSDI$’14]$ • TCP$microbenchmarks$and$Memcached$ ~$25$Linux$Hosts$ 10GbE$Switch$ 4x10GbE$ 1x10GbE$ w/$L3+L4$bond$ IX$ IX$ 19$

  20. TCP$Netpipe$ 10 8 ½% ½% Bandwidth% Bandwidth% Goodput (Gbps) @%20%KB% @%135%KB% 6 4 2 IX-IX Linux-Linux 5.7%us% mTCP-mTCP ½%RTT% 0 0 100 200 300 400 500 20$ Message Size (KB)

  21. TCP$Echo:$Mul3core$Scalability$ for$Short$Connec3ons$ 4 IX 10GbE IX 4x10GbE 3.5 Linux 10GbE Messages/sec (x 10 6 ) Linux 4x10GbE 3 mTCP 10GbE 2.5 Saturates% 2 1x10GbE% 1.5 1 0.5 0 0 1 2 3 4 5 6 7 8 Number of CPU cores 21$

  22. Connec3on$Scalability$ 14 IX-40Gbps IX-10Gbps 12 ~10,000% Linux-40Gbps Connec;ons% Linux-10Gbps Messages/sec (x 10 6 ) Limited%by%L3% 10 8 6 4 2 0 10 100 1000 10000 100000 Connection Count (log scale) 22$

  23. Memcached$over$TCP$ 750 IX (p99) IX (avg) Linux (p99) Linux (avg) 500 SLA Latency ( µ s) 3.6x% More% RPS% 2x%Less% 250 Tail% Latency% 0 6x%Less%Tail% 0 250 500 750 1000 1250 1500 1750 2000 Latency% USR: Throughput (RPS x 10 3 ) With%IX%clients% 23$

  24. IX$Conclusion$ • A$protected$dataplane$OS$for$datacenter$ applica3ons$with$an$eventOdriven$model$and$ demanding$connec3on$scalability$ requirements$ • Efficient$access$to$HW,$without$sacrificing$ security,$through$virtualiza3on$ • High$throughput$and$low$latency$enabled$by$a$ dataplane$execu3on$model$ 24$

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend