CompSci 514: Computer Networks Lecture 15 Practical Datacenter - - PowerPoint PPT Presentation

compsci 514 computer networks lecture 15 practical
SMART_READER_LITE
LIVE PREVIEW

CompSci 514: Computer Networks Lecture 15 Practical Datacenter - - PowerPoint PPT Presentation

CompSci 514: Computer Networks Lecture 15 Practical Datacenter Networks Xiaowei Yang Overview Wrap up DCTCP analysis Today Googles datacenter networks Topology, routing, and management Inside Facebooks datacenter


slide-1
SLIDE 1

CompSci 514: Computer Networks Lecture 15 Practical Datacenter Networks

Xiaowei Yang

slide-2
SLIDE 2

Overview

  • Wrap up DCTCP analysis
  • Today

– Google’s datacenter networks

  • Topology, routing, and management

– Inside Facebook’s datacenter networks

  • Services and traffic patterns
slide-3
SLIDE 3

The DCTCP Algorithm

3

slide-4
SLIDE 4

Review: The TCP/ECN Control Loop

4

Sender 1 Sender 2 Receiver

ECN Mark (1 bit)

ECN = Explicit Conges1on No1fica1on

slide-5
SLIDE 5

Two Key Ideas

  • 1. React in proportion to the extent of congestion, not its presence.

ü Reduces variance in sending rates, lowering queuing requirements.

  • 2. Mark based on instantaneous queue length.

ü Fast feedback to better deal with bursts.

18

ECN Marks TCP DCTCP 1 0 1 1 1 1 0 1 1 1 Cut window by 50% Cut window by 40% 0 0 0 0 0 0 0 0 0 1 Cut window by 50% Cut window by 5%

slide-6
SLIDE 6

Small Queues & TCP Throughput:

The Buffer Sizing Story

17

  • Bandwidth-delay product rule of thumb:

– A single flow needs buffers for 100% Throughput.

B Cwnd Buffer Size Throughput 100%

slide-7
SLIDE 7

Data Center TCP Algorithm

Switch side:

– Mark packets when Queue Length > K.

19

Sender side:

– Maintain running average of fraction of packets marked (α). In each RTT: Ø Adaptive window decreases: – Note: decrease factor between 1 and 2. B K

Mark Don’t Mark

The picture can't be displayed.
slide-8
SLIDE 8

Analysis

  • How low can DCTCP maintain queues without loss of throughput?
  • How do we set the DCTCP parameters?

22

Ø Need to quantify queue size oscillations (Stability).

Time

(W*+1)(1-α/2) W*

Window Size

W*+1

slide-9
SLIDE 9

Packets sent in this RTT are marked.

Analysis

  • How low can DCTCP maintain queues without loss of throughput?
  • How do we set the DCTCP parameters?

22

Ø Need to quantify queue size oscillations (Stability).

Time

(W*+1)(1-α/2) W*

Window Size

W*+1

slide-10
SLIDE 10

Analysis

  • Q(t) = NW(t) − C × RTT
  • The key observa8on is that with synchronized senders,

the queue size exceeds the marking threshold K for exactly one RTT in each period of the saw-tooth, before the sources receive ECN marks and reduce their window sizes accordingly.

  • S(W1,W2)=(W22 −W12)/2
  • Cri8cal window size when ECN marking occurs: W∗

=(C×RTT+K)/N

slide-11
SLIDE 11
  • α = S(W∗,W∗ + 1)/S((W∗ + 1)(1 − α/2),W∗ + 1)
  • α2(1 − α/4) = (2W∗ + 1)/(W∗ + 1)2 ≈ 2/W∗

– Assuming W*>>1

  • α ≈ sqrt(2/W∗)
  • Single flow oscillation

– D = (W∗ +1)−(W∗ +1)(1−α/2)

A = ND = N(W ∗ + 1)α/2 ≈ N 2 √2W ∗ = 1 2 p2N(C × RT T + K), (8) TC = D = 1 2 p2(C × RT T + K)/N (in RTTs). (9) Finally, using (3), we have: Qmax = N(W ∗ + 1) − C × RT T = K + N. (10)

slide-12
SLIDE 12

Analysis

  • How low can DCTCP maintain queues without loss of throughput?
  • How do we set the DCTCP parameters?

22

Ø Need to quan+fy queue size oscilla+ons (Stability).

85% Less Buffer than TCP

Qmin = Qmax − A (11) = K + N − 1 2 p 2N(C × RTT + K). (12)

Minimizing Qmin

slide-13
SLIDE 13

Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network

Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat

slide-14
SLIDE 14

What’s this paper about

  • Experience track
  • How Google datacenter evolve over a decade
slide-15
SLIDE 15

Key takeaways

  • Customized switches built using merchant

silicon

  • Recursive Clos to scale to a large number of

servers

  • Centralized control/management
slide-16
SLIDE 16
  • Bandwidth demands in the datacenter are

doubling every 12-15 months, even faster than the wide area Internet.

slide-17
SLIDE 17

Traditional four-post cluster

  • Top of Rack (ToR) switches serving 40 1G-connected servers were

connected via 1G links to four 512 1G port Cluster Routers (CRs) connected with 10G sidelinks. 512*40 ~20K hosts

slide-18
SLIDE 18
  • When a lot of traffic leaves a rack, conges3on
  • ccurs
slide-19
SLIDE 19
slide-20
SLIDE 20

Solutions

  • Use merchant silicon to build non-

blocking/high port density switches

  • Watchtower: 16*10G silicon
slide-21
SLIDE 21

Exercise

  • 24*10G silicon
  • 12-line cards
  • 288 port non-blocking switch
slide-22
SLIDE 22
slide-23
SLIDE 23

Jupiter

  • Dual redundant 10G links for fast failover
  • Centauri as ToR
  • Four Centauris made up a Middle Block (MB)
  • Each ToR connects to eight MBs.
  • Six Centauris in a spine plane block
slide-24
SLIDE 24
  • Four MBs per rack
  • Two spine blocks per rack
slide-25
SLIDE 25

Without bundle With bundling

slide-26
SLIDE 26
slide-27
SLIDE 27

Summary

  • Customized switches built using merchant

silicon

  • Recursive Clos to scale to a large number of

servers

slide-28
SLIDE 28

Inside the Social Network’s (Datacenter) Network

Arjun Roy, Hongyi Zeng†, Jasmeet Bagga†, George Porter, and Alex C. Snoeren

slide-29
SLIDE 29

Motivation

  • Measurement can help make design decisions

– Traffic pa(ern determines the op2mal network topology – Flow size distribu2on helps with traffic engineering – Packet size helps with SDN control

slide-30
SLIDE 30
slide-31
SLIDE 31

Service level architecture of FB

  • Servers are organized into clusters
  • Clusters may not fit into one rack
slide-32
SLIDE 32

Measurement methodology

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36

Summary

  • Traffic is neither rack-local nor all-to-all;

locality depends upon the service but is stable across :me periods from seconds to days

  • Many flows are long-lived but not very heavy.
  • Packets are small
slide-37
SLIDE 37

Today

  • Wrap up DCTCP analysis
  • Today

– Google’s datacenter networks

  • Topology, routing, and management

– Inside Facebook’s datacenter networks

  • Services and traffic patterns