CompSci 514: Computer Networks Lecture 15 Practical Datacenter - PowerPoint PPT Presentation

CompSci 514: Computer Networks Lecture 15 Practical Datacenter Networks Xiaowei Yang

Overview • Wrap up DCTCP analysis • Today – Google’s datacenter networks • Topology, routing, and management – Inside Facebook’s datacenter networks • Services and traffic patterns

The DCTCP Algorithm 3

Review: The TCP/ECN Control Loop Sender 1 ECN = Explicit Conges1on No1fica1on ECN Mark (1 bit) Receiver Sender 2 4

Two Key Ideas 1. React in proportion to the extent of congestion, not its presence . ü Reduces variance in sending rates, lowering queuing requirements. ECN Marks TCP DCTCP 1 0 1 1 1 1 0 1 1 1 Cut window by 50% Cut window by 40% 0 0 0 0 0 0 0 0 0 1 Cut window by 50% Cut window by 5% 2. Mark based on instantaneous queue length. ü Fast feedback to better deal with bursts. 18

Small Queues & TCP Throughput: The Buffer Sizing Story • Bandwidth-delay product rule of thumb: – A single flow needs buffers for 100% Throughput. Cwnd Buffer Size B Throughput 100% 17

Data Center TCP Algorithm B K Don’t Switch side: Mark Mark – Mark packets when Q ueue Length > K. Sender side: – Maintain running average of fraction of packets marked ( α ) . In each RTT: The picture can't be displayed. Ø Adaptive window decreases: – Note: decrease factor between 1 and 2. 19

Analysis • How low can DCTCP maintain queues without loss of throughput? • How do we set the DCTCP parameters? Ø Need to quantify queue size oscillations (Stability). Window Size W*+1 W* (W*+1)(1-α/2) Time 22

Analysis • How low can DCTCP maintain queues without loss of throughput? • How do we set the DCTCP parameters? Ø Need to quantify queue size oscillations (Stability). Packets sent in this Window Size RTT are marked. W*+1 W* (W*+1)(1-α/2) Time 22

Analysis • Q(t) = NW(t) − C × RTT • The key observa8on is that with synchronized senders, the queue size exceeds the marking threshold K for exactly one RTT in each period of the saw-tooth, before the sources receive ECN marks and reduce their window sizes accordingly. • S(W 1 ,W 2 )=(W 22 −W 12 )/2 • Cri8cal window size when ECN marking occurs: W ∗ =(C×RTT+K)/N

• α = S(W ∗ ,W ∗ + 1)/S((W ∗ + 1)(1 − α/2),W ∗ + 1) • α 2 (1 − α/4) = (2W ∗ + 1)/(W ∗ + 1) 2 ≈ 2/W ∗ – Assuming W*>>1 • α ≈ sqrt(2/W ∗ ) • Single flow oscillation – D = (W ∗ +1)−(W ∗ +1)(1−α/2) A = ND = N ( W ∗ + 1) α / 2 ≈ N √ 2 W ∗ 2 = 1 p 2 N ( C × RT T + K ) , (8) 2 T C = D = 1 p 2( C × RT T + K ) /N (in RTTs). (9) 2 Finally, using (3), we have: Q max = N ( W ∗ + 1) − C × RT T = K + N. (10)

Analysis • How low can DCTCP maintain queues without loss of throughput? • How do we set the DCTCP parameters? Ø Need to quan+fy queue size oscilla+ons (Stability). Q min = Q max − A (11) = K + N − 1 p 2 N ( C × RTT + K ) . (12) 2 Minimizing Qmin 85% Less Buffer than TCP 22

Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat

What’s this paper about • Experience track • How Google datacenter evolve over a decade

Key takeaways • Customized switches built using merchant silicon • Recursive Clos to scale to a large number of servers • Centralized control/management

• Bandwidth demands in the datacenter are doubling every 12-15 months, even faster than the wide area Internet.

Traditional four-post cluster Top of Rack (ToR) switches serving 40 1G-connected servers were • connected via 1G links to four 512 1G port Cluster Routers (CRs) connected with 10G sidelinks. 512*40 ~20K hosts

• When a lot of traffic leaves a rack, conges3on occurs

Solutions • Use merchant silicon to build non- blocking/high port density switches • Watchtower: 16*10G silicon

Exercise • 24*10G silicon • 12-line cards • 288 port non-blocking switch

Jupiter • Dual redundant 10G links for fast failover • Centauri as ToR • Four Centauris made up a Middle Block (MB) • Each ToR connects to eight MBs. • Six Centauris in a spine plane block

• Four MBs per rack • Two spine blocks per rack

Without bundle With bundling

Summary • Customized switches built using merchant silicon • Recursive Clos to scale to a large number of servers

Inside the Social Network’s (Datacenter) Network Arjun Roy, Hongyi Zeng†, Jasmeet Bagga†, George Porter, and Alex C. Snoeren

Motivation • Measurement can help make design decisions – Traffic pa(ern determines the op2mal network topology – Flow size distribu2on helps with traffic engineering – Packet size helps with SDN control

Service level architecture of FB • Servers are organized into clusters • Clusters may not fit into one rack

Measurement methodology

Summary • Traffic is neither rack-local nor all-to-all; locality depends upon the service but is stable across :me periods from seconds to days • Many flows are long-lived but not very heavy. • Packets are small

Today • Wrap up DCTCP analysis • Today – Google’s datacenter networks • Topology, routing, and management – Inside Facebook’s datacenter networks • Services and traffic patterns

CompSci 514: Computer Networks Lecture 15 Practical Datacenter - PowerPoint PPT Presentation

CompSci 514: Computer Networks Lecture 15 Practical Datacenter Networks Xiaowei Yang Overview Wrap up DCTCP analysis Today Googles datacenter networks Topology, routing, and management Inside Facebooks datacenter

CompSci 514: Computer Networks Lecture 14 Datacenter Transport protocols II Xiaowei Yang

CompSci 514: Computer Networks Lecture 16: Network Function Virtualization Xiaowei Yang Adapted

CompSci 514: Computer Networks Lecture 13 TCP incast and Solutions Xiaowei Yang Roadmap

Camera Calibration COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Camera

CompSci 514: Computer Networks Lecture 5: Congestion Control Xiaowei Yang 1 Outline

CompSci 514: Computer Networks Lecture 13: Distributed Hash Table Xiaowei Yang Overview

CompSci 514: Computer Networks Lecture 04: Evolution of the Internet Xiaowei Yang

CompSci 514: Computer Networks Lecture 17: Datacenter Network Architectures Xiaowei Yang

CompSci 514 Computer Networks Lecture 20: Combating Denial of Service Attacks Xiaowei Yang How

CompSci 514: Computer Networks Lecture 17: Network Support for Remote Direct Memory Access

CompSci 514: Computer Networks Lecture 21-2: From BitTorrent to BitTyrant Problem Statement

CompSci 514: Computer Networks Lecture 11: Software Defined Networking Xiaowei Yang 1

CompSci 514: Computer Networks Lecture 10: BGP problems Xiaowei Yang 1 Today Known

CompSci 514: Computer Networks Lecture 3: The Design Philosophy of the DARPA Internet Protocols

Rigid Geometric Transformations COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Training Neural Nets COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Training

COMPGA11: Research in Information Security Steven Murdoch University College London based

2007 Day In The Life Mistakes We Dont Want To Repeat Duane Wessels The Measurement

The Spartan 3e FPGA The Spartan 3e FPGA Whats inside the chip? How does it implement

UMBC A B M A L T F O U M B C I M Y O R T 1 (5/1/07) I E S R C E O V U

Minimizing Clos Networks Alexander Martin and Peter Lietz Darmstadt University of Technology

New Impossibility Results for Concurrent Composition and a Non-Interactive Completeness Theorem

A model for the extended predicative Mahlo Universe Anton Setzer (joint work with Reinhard Kahle,

CEKgo extensions M ::= . . . | go M | here M F ::= ( W ) | ( M E ) |