Ch 6b: Data-center networking Holger Karl Future Internet Computer - - PowerPoint PPT Presentation

ch 6b data center networking
SMART_READER_LITE
LIVE PREVIEW

Ch 6b: Data-center networking Holger Karl Future Internet Computer - - PowerPoint PPT Presentation

Ch 6b: Data-center networking Holger Karl Future Internet Computer Networks Group Universitt Paderborn Outline Evolution of data centres Topologies Networking issues Case study: Jupiter rising SS 19, v 0.9 FI - Ch 6b:


slide-1
SLIDE 1

Computer Networks Group Universität Paderborn

Ch 6b: Data-center networking

Holger Karl Future Internet

slide-2
SLIDE 2

Outline

  • Evolution of data centres
  • Topologies
  • Networking issues
  • Case study: Jupiter rising

SS 19, v 0.9 FI - Ch 6b: Data-center networking 2

slide-3
SLIDE 3

Evolution of data centres

  • Scale
  • Workloads: north-south traffic to east-west traffic
  • Data-parallel applications, map-reduce frameworks
  • Requires different optimization: bisection bandwidth
  • Latency!
  • Virtualization
  • Many virtual machines – scale
  • Moving virtual machines – reassign MAC addresses?

SS 19, v 0.9 FI - Ch 6b: Data-center networking 3

slide-4
SLIDE 4

Evolution: Scale

SS 19, v 0.9 FI - Ch 6b: Data-center networking 4

slide-5
SLIDE 5

Example: CERN LHC

  • 24 Gigabytes/s produced
  • 30 petabytes produced

per year

  • > 300 petabytes online disk storage
  • > 1 petabyte per day processed
  • > 550.000 cores
  • > 2 million jobs/day

https://home.cern/about/computing www.computerworld.com/article/2960642/cloud

  • storage/cerns-data-stores-soar-to-530m-

gigabytes.html SS 19, v 0.9 FI - Ch 6b: Data-center networking 5

slide-6
SLIDE 6

Evoluation: Workloads

  • Conventional: Mostly north-south traffic
  • From individual machine to gateway
  • Typical: Webserver farm
  • Modern: East-west traffic
  • From server to server
  • Typical: data-parallel applications like map/reduce

SS 19, v 0.9 FI - Ch 6b: Data-center networking 6

slide-7
SLIDE 7

Programming model – Rough idea

Duis dolor vel Duis dolor vel Duis dolor vel

Server #1 Server #3 Server #2 Server #4

Lore m ipsum dolor Lore m ipsum dolor Dui s aut e vel Dui s aut e vel

Process Process Process Intermedia te results Intermediate results Intermedia te results Related Intermedia te results Related Intermedia te results Related Intermedia te results Process Process Process Final result s Final result s Final result s

Map Shuffle Reduce ? Collect

SS 19, v 0.9 FI - Ch 6b: Data-center networking 7

slide-8
SLIDE 8

Evolution: Virtualization

  • Virtualize machines!
  • Many more MAC addresses to handle
  • Easily: hundreds of thousands of VMs
  • Scaling problem for switches
  • Give hierarchical MAC addresses? Eases routing
  • ARP!
  • More problematic: Moving a VM from one physical

machine to another

  • Must not change IP address – one L2 domain!!
  • ARP? Caching?
  • Keep MAC address? Makes hierarchical MACs infeasible

SS 19, v 0.9 FI - Ch 6b: Data-center networking 8

slide-9
SLIDE 9

Topologies in data centres

  • Basic physical setups
  • 19’’ racks
  • Often: 42 units high (one unit: 1.75’’)
  • Server: 1U – 4U
  • Two processors per 1U@32 cores each: up to 2688 cores per rack

(as of 2019), one core easily deals with 10 VMs

  • Blade enclosure: 10 U
  • Networking inside a rack: Top of rack (ToR) switch
  • 48 ports, 1G or 10G typical
  • 2-4 uplinks, often 10G, evolution to 40G, perhaps 100G in the

future

  • Some (small) number of gateways to outside world
  • Core question: how to connect ToRs?
  • To support N/S and E/W traffic

SS 19, v 0.9 FI - Ch 6b: Data-center networking 9

slide-10
SLIDE 10

Topologies: requirements

  • High throughput / bisection bandwidth
  • Fault-tolerant setup: typically, 2-connected
  • Means: multiple paths between any two end hots in operation!
  • Not just spanning tree!
  • But: Loop freedom
  • VM migration support
  • ne L2 domain!

SS 19, v 0.9 FI - Ch 6b: Data-center networking 10

slide-11
SLIDE 11

Topology: Example

  • Example: Cisco standard recommendation

SS 19, v 0.9 FI - Ch 6b: Data-center networking 11

slide-12
SLIDE 12

Clos Network

SS 19, v 0.9 FI - Ch 6b: Data-center networking 12 https://upload.wikimedia.org/wikipedia/en/9/9a/Closnet work.png

3-stage Clos

IEEE ANTS 2012 Tutorial

5-stage Clos

  • Idea: Build an nxn crossbar

switch out of smaller kxk crossbar switches

  • Nonblocking
slide-13
SLIDE 13

Fat-Tree Topology: Special case of Clos

SS 19, v 0.9 FI - Ch 6b: Data-center networking 13

IEEE ANTS 2012 Tutorial

slide-14
SLIDE 14

SS 19, v 0.9 FI - Ch 6b: Data-center networking 14

slide-15
SLIDE 15

Questions to answer

  • Which path to use?
  • To exploit entire bisection bandwidth, without overload
  • Options
  • Central point
  • Valiant load balancing
  • Equal Cost Multi-Pathing (ECMP)
  • Choose path by hashing

SS 19, v 0.9 FI - Ch 6b: Data-center networking 15

slide-16
SLIDE 16

Papers to know

  • If we had time, we now would talk about:
  • Portland
  • VL2
  • Helios
  • Hedera

SS 19, v 0.9 FI - Ch 6b: Data-center networking 16

slide-17
SLIDE 17

Networking issues

  • How to make sure forwarding works in a huge L2 domain?
  • With multi-pathing, so no spanning tree solution plausible
  • One approach: IETF TRILL (Transparent Interconnection
  • f Lots of Links)
  • Idea: Start from a plain Ethernet with bridges that do spanning tree
  • But replace (subset of) bridges with RoutingBridges (Rbridge)
  • Operating on L2
  • Still looks like a giant Ethernet domain to IP
  • Other buzzwords: Bridging protocols (802.1q), Provider

Bridging (802.11ad), Provider Backbone Bridging(802.1ah); Shortest Path Bridging (IEEE 802.1aq), data center bridging (802.1Qaz. .1Qbb, 1Qau)

SS 19, v 0.9 FI - Ch 6b: Data-center networking 17

slide-18
SLIDE 18

TRILL operations

  • Rbridges find each other using a link-state protocol
  • Do routing on these link states
  • Along those computed paths, do tunneling over the

Rbridges

  • Needs an extra header, first Rbridge encapsulates packet

SS 19, v 0.9 FI - Ch 6b: Data-center networking 18

slide-19
SLIDE 19

Case study: Jupiter rising

  • Google sigcom paper, 2015
  • https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.

pdf

  • Figures, tables taken from that paper
  • Describes evolution of Google’s internal data center

networks

  • Starting point:

ToRs connected to ring of routers

SS 19, v 0.9 FI - Ch 6b: Data-center networking 19

Figure 9: A 128x10G port Watchtower chassis (top left).

The internal non-blocking topology over eight linecards (bottom left). Four chassis housed in two racks cabled with fiber (right). 188

slide-20
SLIDE 20

Jupiter rising

SS 19, v 0.9 FI - Ch 6b: Data-center networking 20

slide-21
SLIDE 21

Jupiter rising: Challenges

SS 19, v 0.9 FI - Ch 6b: Data-center networking 21

Challenge Our Approach (Section Discussed in) Introducing the network to production Initially deploy as bag-on-the-side with a fail-safe big-red button (3.2) High availability from cheaper components Redundancy in fabric, diversity in deployment, robust software, necessary protocols only, reliable out of band control plane (3.2, 3.3, 5.1) High fiber count for deployment Cable bundling to optimize and expedite deployment (3.3) Individual racks can leverage full uplink capacity to external clusters Introduce Cluster Border Routers to aggregate external bandwidth shared by all server racks (4.1) Incremental deployment Depopulate switches and optics (3.3) Routing scalability Scalable in-house IGP, centralized topology view and route control (5.2) Interoperate with external vendor gear Use standard BGP between Cluster Border Routers and vendor gear (5.2.5) Small on-chip buffers Congestion window bounding on servers, ECN, dynamic buffer sharing of chip buffers, QoS (6.1) Routing with massive multipath Granular control over ECMP tables with proprietary IGP (5.1) Operating at scale Leverage existing server installation, monitoring software; tools build and

  • perate fabric as a whole; move beyond individual chassis-centric network

view; single cluster-wide configuration (5.3) Inter cluster networking Portable software, modular hardware in other applications in the network hierarchy (4.2)

Table 1:

High-level summary of challenges we faced and our approach to address them. 186

slide-22
SLIDE 22

Jupiter rising: Generations

SS 19, v 0.9 FI - Ch 6b: Data-center networking 22

Datacenter First Merchant ToR Aggregation Spine Block Fabric Host Bisection Generation Deployed Silicon Config Block Config Config Speed Speed BW Four-Post CRs 2004 vendor 48x1G

  • 10G

1G 2T Firehose 1.0 2005 8x10G 2x10G up 2x32x10G (B) 32x10G (NB) 10G 1G 10T 4x10G (ToR) 24x1G down Firehose 1.1 2006 8x10G 4x10G up 64x10G (B) 32x10G (NB) 10G 1G 10T 48x1G down Watchtower 2008 16x10G 4x10G up 4x128x10G (NB) 128x10G (NB) 10G nx1G 82T 48x1G down Saturn 2009 24x10G 24x10G 4x288x10G (NB) 288x10G (NB) 10G nx10G 207T Jupiter 2012 16x40G 16x40G 8x128x40G (B) 128x40G (NB) 10/40G nx10G/ 1.3P nx40G

Table 2:

Multiple generations of datacenter networks. (B) indicates blocking, (NB) indicates Nonblocking. 186

slide-23
SLIDE 23

Jupiter rising: Firehose 1.0

SS 19, v 0.9 FI - Ch 6b: Data-center networking 23

Figure 5: Firehose 1.0 topology. Top right shows a sam-

ple 8x10G port fabric board in Firehose 1.0, which formed Stages 2, 3 or 4 of the topology. 186

slide-24
SLIDE 24

Jupiter rising: Saturn

SS 19, v 0.9 FI - Ch 6b: Data-center networking 24

Figure 12: Components of a Saturn fabric. A 24x10G Pluto

ToR Switch and a 12-linecard 288x10G Saturn chassis (in- cluding logical topology) built from the same switch chip. Four Saturn chassis housed in two racks cabled with fiber (right). 189

slide-25
SLIDE 25

Jupiter rising: Jupiter

SS 19, v 0.9 FI - Ch 6b: Data-center networking 25

Figure 13: Building blocks used in the Jupiter topology.

190

Figure 14: Jupiter Middle blocks housed in racks.

190

Figure 15: Four options to connect to the external network

layer.

Figure 16:

Two-stage fabrics used for inter-cluster and intra-campus connectivity. 191