Low-Latency TCP/IP Stack for Data Center Applications David Sidler, - PowerPoint PPT Presentation

Low-Latency TCP/IP Stack for Data Center Applications David Sidler, Zsolt Istv´ an, Gustavo Alonso Systems Group, Dept. of Computer Science, ETH Z¨ urich Systems Group, Dept. of Computer Science, ETH Z¨ urich

Original Architecture [1] 10 Gbps bandwidth TCP/IP stack Supporting thousands of concurrent connections Generic implementation as close to specification as possible Enables seamless integration of FPGA-based applications into existing networks [1] Sidler et al., Scalable 10 Gbps TCP/IP Stack Architecture for Reconfigurable Hardware , FCCM’15, http://github.com/dsidler/fpga-network-stack Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 2 / 6

Application Integration DDR FPGA TCP/IP App App module module 10G Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 3 / 6

Application Integration DDR FPGA Requires DDR to buffer packet payloads Applications require TCP/IP DDR memory App App module module 10G Memory bandwidth is shared among multiple modules → potential bottleneck Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 3 / 6

Application Integration DDR FPGA Requires DDR to buffer packet payloads Applications require TCP/IP DDR memory App App module module 10G Memory bandwidth is shared among multiple modules → potential bottleneck Distributed systems rely on very low latency → to guarantee latency bounds to clients Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 3 / 6

Assumptions Application Client requests fit into an MTU (maximum transfer unit) Synchronous clients Application logic consumes data at line-rate Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 4 / 6

Assumptions Application Client requests fit into an MTU (maximum transfer unit) Synchronous clients Application logic consumes data at line-rate Data center network High reliability and structured topology Data loss less common → fewer retransmission Packets are rarely reordered Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 4 / 6

Optimizations for Data Center Applications RX RX Engine Buffer Application Network Event State App Timers Engine Tables If TX Engine TX Buffer Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 5 / 6

Optimizations for Data Center Applications Replace RX buffer with BRAM RX RX Engine Buffer Application Network Event State App Timers Engine Tables If TX Engine TX Buffer Only read for retransmission Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 5 / 6

Optimizations for Data Center Applications - Tuning Timers Replace RX buffer with BRAM - Reducing ACK delay RX RX Engine Buffer Application Network Event State App Timers Engine Tables If TX Engine TX Buffer Disabling Nagle’s algorithm Only read for retransmission Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 5 / 6

Results Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 6 / 6

Results RX org. RX opt. cycles @ 156.25 MHz TX org. TX opt. 800 600 2-3x lower 400 Latency 200 1 64 128 256 512 1024 1460 Payload size [B] Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 6 / 6

Results RX org. RX opt. TCP org. (SM) TCP org. (DM) cycles @ 156.25 MHz TX org. TX opt. max. goodput TCP opt. (SM) 800 10 Goodput [Gb/s] 8 600 2-3x lower 6 High Throughput 400 Latency 4 200 2 64 256 512 1 , 024 1 , 460 1 64 128 256 512 1024 1460 Payload size [B] Payload size [B] Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 6 / 6

Results RX org. RX opt. TCP org. (SM) TCP org. (DM) cycles @ 156.25 MHz TX org. TX opt. max. goodput TCP opt. (SM) 800 10 Goodput [Gb/s] 8 600 2-3x lower 6 High Throughput 400 Latency 4 200 2 64 256 512 1 , 024 1 , 460 1 64 128 256 512 1024 1460 Payload size [B] Payload size [B] Mem. allocated Mem. bandwidth TCP org. 1,300 MB 40 Gbps TCP opt. 650 MB 10 Gbps Diff -50% -75% Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 6 / 6

Results RX org. RX opt. TCP org. (SM) TCP org. (DM) cycles @ 156.25 MHz TX org. TX opt. max. goodput TCP opt. (SM) 800 10 Goodput [Gb/s] 8 600 2-3x lower 6 High Throughput 400 Latency 4 200 2 64 256 512 1 , 024 1 , 460 1 64 128 256 512 1024 1460 Payload size [B] Payload size [B] These results enabled a consistent distributed key-value store [2] Mem. allocated Mem. bandwidth TCP org. 1,300 MB 40 Gbps TCP opt. 650 MB 10 Gbps Diff -50% -75% [2] Istv´ an et al., Consensus in a Box: Inexpensive Coordination in Hardware , NSDI16 Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 6 / 6

Results RX org. RX opt. TCP org. (SM) TCP org. (DM) cycles @ 156.25 MHz TX org. TX opt. max. goodput TCP opt. (SM) 800 10 Goodput [Gb/s] 8 600 2-3x lower 6 High Throughput 400 Latency 4 200 2 64 256 512 1 , 024 1 , 460 1 64 128 256 512 1024 1460 Visit our poster for more results and details! Payload size [B] Payload size [B] Find the source at: http://github.com/dsidler/fpga-network-stack Mem. allocated Mem. bandwidth TCP org. 1,300 MB 40 Gbps TCP opt. 650 MB 10 Gbps Diff -50% -75% [2] Istv´ an et al., Consensus in a Box: Inexpensive Coordination in Hardware , NSDI16 Systems Group, Dept. of Computer Science, ETH Z¨ urich FPL’16, Lausanne August 30, 2016 6 / 6

Low-Latency TCP/IP Stack for Data Center Applications David Sidler, - PowerPoint PPT Presentation

Low-Latency TCP/IP Stack for Data Center Applications David Sidler, Zsolt Istv an, Gustavo Alonso Systems Group, Dept. of Computer Science, ETH Z urich Systems Group, Dept. of Computer Science, ETH Z urich Original Architecture [1] 10

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

TCP Pacing in Data Center Networks Monia Ghobadi, Yashar Ganjali Department of Computer

Attacks on TCP 1 Outline What is TCP protocol? How the TCP Protocol Works SYN

Data Center TCP (DCTCP) 1 TCP in the Data Center Well see TCP does not meet demands of

Stack and Queue Stack Overview Stack ADT Basic operations of stack Pushing, popping

Stack ADT Tiziana Ligorio 1 Todays Plan Questons? Stack ADT 2 Abstract Data Types

TCP on Wireless Ad Hoc Networks CS 218 Oct 22, 2003 TCP overview Ad hoc TCP : mobility,

TCP TCP Congestion Control Congestion Control Essential strategy :: The TCP host sends

Hacking the MPTCP socket API draft-hesmans-mptcp-socket-00 MultiPath TCP WiFi 4G LTE MultiPath

Call Stack Stack Bottom Memory region managed with stack discipline Procedures and the Call

Taming Latency In Data Center Applications Ph.D. Defense of Dissertation Mohan Kumar Advisor:

Tcp/Ip Applications Programming for Os/2: With Applications for Presentation Manager Tcp/Ip

How to port a TCP/IP stack in your kernel TCP/IP stacks without an Ethernet driver Focus on lw IP

The Stack Eric McCreath The Stack The stack is a simple but useful data structure in computer

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Proton Scattering and Losses in HL-LHC 8th HL-LHC Collaboration Meeting Veronica Berglyd Olsen,

S-38.180 - Quality of Service in Internet Exercise 3: Differentiated Services Timo Viipuri

Visualization Systems 11-1 Ronald Peikert SciVis 2008 - Visualization Systems Modular

Introduction to Network Simulator Giovanni N EGLIA and Mouhamad I BRAHIM gneglia@sophia.inria.fr,

Analysis of Two Competing TCP/IP Connections Eitan Altman 1 , 2 enez 2 Tania Jim nez-Queija 3 ,

TCP for OMNeT+ + Roland Bless Mark Doll Institute of Telematics University of Karlsruhe,

Cryogenic Particle Detectors Instrumentation Frontier Community Meeting (CPAD) Argonne National

Dual Phase Photon Detection System Consortium Meeting Ins Gil-Botella DPPD Consortium Meeting