The Computer Network behind the Social Network Jam James s Hon - - PowerPoint PPT Presentation

the computer network behind the social network
SMART_READER_LITE
LIVE PREVIEW

The Computer Network behind the Social Network Jam James s Hon - - PowerPoint PPT Presentation

The Computer Network behind the Social Network Jam James s Hon Hongyi yi Ze Zeng ng Engineering Manager, Network Infra APNet 2019, Beijing, China Facebook Family 2.7B people every month 2.1B people every day (Q2, 2019) About Me


slide-1
SLIDE 1

Engineering Manager, Network Infra

APNet 2019, Beijing, China

The Computer Network behind the Social Network

Jam James s Hon Hongyi yi Ze Zeng ng

slide-2
SLIDE 2

Facebook Family

2.7B people every month 2.1B people every day

(Q2, 2019)

slide-3
SLIDE 3
  • Joined Facebook networking in 2014
  • Supporting Routing and UI team
  • https://research.fb.com/category/systems-and-networking/

About Me

slide-4
SLIDE 4

Pr Prineville, OR Lo Los Lu Lunas, NM Pa Papillion, NE Fo Fort Worth, , TX Fo Forest City, , NC Alt Altoona, IA Cl Clonee, , Ireland Lu Luleå, , Sweden Od Odense, , Denmark

slide-5
SLIDE 5
slide-6
SLIDE 6

How Users Reach Facebook

Internet Backbone

Edge Network Backbone Network Datacenter Network

slide-7
SLIDE 7
  • Edge Network
  • Backbone Network
  • Datacenter Network

Agenda

slide-8
SLIDE 8
  • Edge Network
  • Backbone Network
  • Datacenter Network

Agenda

slide-9
SLIDE 9
  • Goal: Delivers the traffic to ISP and ultimately to users
  • Majority of users are on mobile
  • Majority of users are on IPv6
  • IPv6 penetration rate is at 56% in the United States
  • https://www.facebook.com/ipv6/

Edge Network

slide-10
SLIDE 10

Facebook’s Traffic

Photos Videos News Feed Likes Dynamic Requests (not Cachable) JavaScript Status Updates Static Requests (Cachable) Messaging

slide-11
SLIDE 11

DNS Based Load Balancing

Web Server Web Server Web Server Web Server L7LB L7LB L7LB L7LB L4LB L4LB L4LB

Internet

Web Server Web Server Web Server Web Server L7LB L7LB L7LB L7LB L4LB L4LB L4LB DNS LB

www?

US-east.facebook.com

US-EAST US-WEST

slide-12
SLIDE 12

POP + DC

L7LB L7LB L7LB L7LB L4LB L4LB L4LB

Internet

Web Server Web Server Web Server Web Server L7LB L7LB L7LB L7LB L4LB L4LB L4LB

Backbone

slide-13
SLIDE 13

How about static content?

L7LB L7LB L7LB L7LB L4LB L4LB L4LB

Internet

Web Server Web Server Web Server Web Server L7LB L7LB L7LB L7LB L4LB L4LB L4LB

Backbone

JS Photo Video

slide-14
SLIDE 14
  • Software Hierarchy to scale
  • DNS Load Balancer (to Datacenter/POP)
  • Router + Anycast BGP, Layer 3 Load balancer (to Layer 4 Load Balancer)
  • Layer 4 Load Balancer (to Layer 7 Load Balancer)
  • Layer 7 Load Balancer (to Web Server)
  • POP + DC to scale
  • Reduce RTT for initial setup
  • Cache content closer to users

Edge Network Summary

slide-15
SLIDE 15
  • Edge Network
  • Backbone Network
  • Datacenter Network

Agenda

slide-16
SLIDE 16
  • Classic Backbone (CBB)
  • Connects POP and DCs
  • RSVP-TE, Vendor software solution
  • Express Backbone (EBB)
  • Connects DC and DC
  • Centralized control

Backbones at Facebook

slide-17
SLIDE 17

Three Datacenters

A C B

slide-18
SLIDE 18

Add Planes

A C B

slide-19
SLIDE 19

N-way Active-active Redundancy

A C B

slide-20
SLIDE 20

Incremental changes and canary

A C B

slide-21
SLIDE 21

A/B Testing

Algorithm 1 Algorithm 2

slide-22
SLIDE 22
  • Routing Protocol supports EBB
  • Establish basic reachability among routers (OSPF, IS-IS)
  • Extensible (e.g., key-value store)
  • In-house software
  • Run as agent on EBB routers
  • EBB is first production network where Open/R is the sole IGP

Open/R

slide-23
SLIDE 23

Typical IGP metric configuration

Type Link Metric Trans Atlantic 100 Trans Pacific 150 US-West to US- East 50

slide-24
SLIDE 24

Open/R: Calculate link metric with RTT

Open/R Open/R

RTT = 200ms metric = 200

slide-25
SLIDE 25
  • Two backbones
  • CBB: Connects POPs and DCs
  • EBB: Inter-DC backbone
  • Plane architecture
  • Reliability, maintenance, experiment
  • Software
  • Centralized control
  • Innovative distributed routing protocols to minimize configuration

Backbone Network Summary

slide-26
SLIDE 26
slide-27
SLIDE 27
  • Edge Network
  • Backbone Network
  • Datacenter Network

Agenda

slide-28
SLIDE 28

Classic Facebook Fabric

48 48 48 48 48 1 5 9 48 1 2 6 10 3 7 11 4 8 1 1 1 1 2 3 48 4 2 2 2 1 2 3 48 4 3 3 3 1 2 3 48 4 4 4 4 4 4 4 4 1 2 3 48 1 2 3 4 1 2 3 4

48 ports in

Pod 1 Pod 2 Pod 3 Pod 4 Pod X Pod Y

slide-29
SLIDE 29

Growing Pressure

Expanding Mega Regions (5-6 buildings) = accelerated fabric-to-fabric East-West demand Compute-Storage and AI disaggregation requires Terabit capacity per Rack

48 . . . 48 48 48 48 1 5 9 48 1 2 6 10 3 7 11 4 8 1 1 1 1 2 3 48

. . . . . .

. . . . . . . . . 4 2 2 2 1 2 3 48 4 3 3 3 1 2 3 48 4 4 4 4 4 4 4 4 1 2 3 48 1 2 3 4 1 2 3 4

. . . . . . . . .

. . .

. . .

. . . . . .

Both require larger fabric Spine capacity (by 2-4x) ... Mega Regions Disaggregated Services

slide-30
SLIDE 30

F16 – Facebook’s new topology

16-plane architecture 6-16x spine capacity on day 1 1.6T raw capacity per rack Fewer chips* = better power & space

slide-31
SLIDE 31

Mega Region

slide-32
SLIDE 32

Mega Region

F16 Fabric Aggregator

slide-33
SLIDE 33

Minipack – 128 x 100G Switch

  • Single 12.8T ASIC
  • Modular design
  • Mature optics
  • Lower power/smaller size
slide-34
SLIDE 34

Fabric Aggregator

  • Disaggregated design for scale
  • Built upon smaller commodity switches
slide-35
SLIDE 35
  • Customized hardware
  • Pick the minimal

software needed for the specific network

  • Powerful CPU to run

more complex software

White Box Switch

Customizable switch hardware and software

Power Supply Fan Temperature Sensor x86 CPU SSD BMC Switch ASIC CPLD QSFP Ports

slide-36
SLIDE 36

FBOSS Overview

FBOSS

Switch ASIC

Switch Software Switch Hardware

Protocols (BGP, ECMP) Network Configurator Monitoring Service

External Software

slide-37
SLIDE 37
  • Switch-as-a-Server
  • Continuous integration and staged deployment
  • Integrate closely with existing software services
  • Open-source software
  • Deploy-Early-and-Iterate
  • Focus on developing and deploying minimal set of features
  • Quickly iterate with smaller “diffs”

FBOSS Design Principles

slide-38
SLIDE 38
  • Continuous Canary
  • Deploy all commits continuously to 1~2 switches for each type
  • Daily Canary
  • Deploy all of single day’s commits to 10~20 switches for each type
  • Staged Deployment
  • Final stage to push all the commits to all the switches in the DC
  • Performed once every two weeks for reliability

FBOSS Testing and Deployment

3 Stage Deployment via fbossdeploy

slide-39
SLIDE 39
  • Datacenters are huge
  • Internally: Clos topology
  • Intra-region connectivity is challenging too
  • In-house Hardware and Software
  • Minipack, Fabric Aggregator
  • FBOSS

Datacenter Network Summary

slide-40
SLIDE 40
slide-41
SLIDE 41

Summary

Internet Backbone

Edge Network Backbone Network Datacenter Network

slide-42
SLIDE 42
  • Inside the Social Network’s (Datacenter) Network, SIGCOMM 2015
  • Robotron: Top-down Network Management at Facebook Scale,

SIGCOMM 2016

  • Engineering Egress with Edge Fabric: Steering Oceans of Content

to the World, SIGCOMM 2017

  • FBOSS: Building Switch Software at Scale, SIGCOMM 2018

Extended Reading

slide-43
SLIDE 43