Cr Cryst stalNet alNet Faithfully Emulating Large Production - - PowerPoint PPT Presentation

cr cryst stalnet alnet
SMART_READER_LITE
LIVE PREVIEW

Cr Cryst stalNet alNet Faithfully Emulating Large Production - - PowerPoint PPT Presentation

Cr Cryst stalNet alNet Faithfully Emulating Large Production Networks Hongqiang Harry Liu, Yibo Zhu Jitu Padhye, Jiaxin Cao, Sri T allapragada, Nuno Lopes, Andrey Rybalchenko Guohan Lu, Lihua Yuan Micr croso soft t Resear arch h Micr


slide-1
SLIDE 1

Faithfully Emulating Large Production Networks

1

Hongqiang Harry Liu, Yibo Zhu Nuno Lopes, Andrey Rybalchenko Micr croso soft t Resear arch h Jitu Padhye, Jiaxin Cao, Sri T allapragada, Guohan Lu, Lihua Yuan Micr croso soft t Azure

Cr Cryst stalNet alNet

slide-2
SLIDE 2

Reliability is vital to cloud customers

2

I can trust clouds, right? Cloud Computing Services

slide-3
SLIDE 3

However, cloud reliability remains elusive

3

slide-4
SLIDE 4

4

Cloud downtime cost:

  • 80% reported $50k/

k/hour ur or above

  • 25% reported $500k/

k/hour ur or above

Cloud availability requirement:

  • 82% require 99.9% (3 nines

nes) or above

  • 42% require 99.99% (4 nines)

es) or above

  • 12% require 99.999% (5 nines)

es) or above

– USA SA Today y Su Surve vey y of 200 data cen enter er manager ers – High h avail ilability bility survey vey over er 100 0 companie nies by Infor

  • rmation

ation Tec echn hnolo

  • logy

gy and Intel ellig ligen ence ce Corp

slide-5
SLIDE 5

What caused these outages?

5

slide-6
SLIDE 6

Network is a major root cause of outages

6

Impact Root Cause Down Time Service Fabric, SQL DB, IoT Hub, HDInsight, etc. An incorrect network configuration change. 2 hours Date Aug 22nd, 2017 Sep 20th, 2017 DynamoDB service disruption in US-East A network disruption. 3.5 hours Jun 8th, 2017 asia-northeast1 region experienced a loss of network connectivity Configuration error during network upgrades. 1.1 hours

June 2017-Sep 2017

Cloud A Cloud B Cloud C

Availability Maximum Downtime per Year 99.99% (four nines) 52.56 minutes 99.999% (five nines) 5.26 minutes

We must prevent vent such ch outages es proactively! actively!

slide-7
SLIDE 7

We do test network hardware and software

7

managem agement ent softwar are switch ch config igur urat ation

  • n

switch ch

Unit Tests Feature Tests Testbeds Vendor Tests

These e tests s say little tle about t how they y work k in production uction

  • Software works differently at different scales
  • Software/hardware bugs in corner cases
slide-8
SLIDE 8

Root causes of Azure’s network incidents

8

So Softw twar are e Bugs(36%) (36%) Configuration iguration Bugs(27% 27%) Human n Errors(6% (6%) Hardwar ware e Failur lures es(29% (29%) Unident dentified ified(2%) (2%)

Bugs s in route ters, s, middle leboxes es and management agement tools wrong g ACL polici cies es, , route e leaki king ng, , route e black ckhol holes es ASIC driver ver failur ures es, silent nt pack cket et drops, s, fiber cuts, s, power failur ures es

Data Interval: 01/2015 – 01/2017

slide-9
SLIDE 9

Ideal tests: network validation on ac actu tual al pr production

  • duction

configuration + software + hardware + topology

9

slide-10
SLIDE 10

Copying production network is infeasible

10

Product duction ion Network

Configuration Software Hardware Configuration Software Hardware Configuration Software Hardware

A C Copy of P Product duction ion Network

Configuration Software Hardware Configuration Software Hardware Configuration Software Hardware

Exp xpensi ensive! ve!

switch ch Most t cost t is from hardwar are

slide-11
SLIDE 11

High-fidelity production environments

11

Product duction ion Network

Configuration Software Hardware Configuration Software Hardware Configuration Software Hardware

An E Emulat lation ion Product ductio ion n Network

Configuration Software vHardware Configuration Software vHardware Configuration Software vHardware

An E Emulat lation ion Product ductio ion n Network with h Real l Hardwar are

Configuration Software vHardware Configuration Software vHardware Configuration Software Hardware

High-fid idel elit ity y product uctio ion n envir ironm

  • nment

ents

Softwar ware Bugs Config figura ration

  • n Bugs

Human Errors Hardwar are e Failu lures es Unidentifie fied

CUSTOMER OMER IMPACTI CTING NG NETWO WORK RK INCIDENTS ENTS IN AZURE E (2015-20 2017) 7) >69%

slide-12
SLIDE 12

CrystalNet

A high-fidelity, cloud-scale network emulator

12

slide-13
SLIDE 13

Overview of CrystalNet

13

S1 S2

Host A Host B Host C Management VM (Linux, Windows, etc.)

T

  • ols

ls by Operat rators

Management agement

  • verl

rlay ay Virtual tual links nks

Control Prepare

Orch chestrator estrator

L1 T1 T2 L2 T3 T4 Monitor

Production uction external

B1

topo config software version route

Host D

Probing & testing traffic

slide-14
SLIDE 14

Challenges to realize CrystalNet

  • sc

scalabi bilit lity to emulate large networks

  • fl

flexibi xibilit lity to accommodate heterogeneous switches

  • co

correct ctness ness and co cost st eff ffici ciency ency of emulation boundary

14

slide-15
SLIDE 15

Emulation must scale out to multiple servers

15

1 CPU Core X 5000 = 5000 CPU Cores

switch network

You need cloud to emulate a cloud network!

slide-16
SLIDE 16

Emulation can cross cloud boundary

16

Publi lic c Cloud ud

Internet

Priva vate te Cloud

  • ud

private network load balancer special hardware

slide-17
SLIDE 17

Challenges to realize CrystalNet

  • scalab

labil ility ity to emulate large networks

▪ scaling out emulations transparently on multiple hosts and clouds

  • flexib

exibilit ility to accommodate heterogeneous switches

  • co

corr rrectne ctness ss and cost effi fficiency ciency of emulation boundary

17

slide-18
SLIDE 18

Heterogenous switch software sandboxes

18

Potential ntial switch tch sandbo boxes xes

Docke ker Container iner:

  • Efficient
  • Supported by all cloud providers

Virtual ual Machin hine:

  • Several vendors only offer this option

Bare-met metal al:

  • Non-virtualizable devices (e.g. middlebox)
  • Needed for hardware integration tests
slide-19
SLIDE 19

Management challenges by heteroginousity

19

S1 S2

Host A Host B Host C

L1 T1 T2 L2 T3 T4 B1

Container VM Bare Metal

Management Agent Management Agent Management Agent

slide-20
SLIDE 20

Management challenges by heteroginousity

20

S1 S2

Host A Host B Host C

L1 T1 T2 L2 T3 T4 B1

Container VM Bare Metal

slide-21
SLIDE 21

Building a homogenous network layer

21

S1’ S2’ Host A Host B L1’ T1’ T2’ L1 T1 T2 T3’ T4’ T3 T4 L2’ L2 Host C S1 S2 S’ PhyNet container L T S Heterogenous switch Share network namespace

Key idea: a: maintaining taining network

  • rk with

h a h homogeno enous us layer er of contai ainer ers

  • start a PhyNet container for each switch
  • build overlay networks among PhyNet containers
  • Managing overlay networks with in PhyNet containers

Management Agent

slide-22
SLIDE 22

Challenges to realize CrystalNet

  • scala

labilit bility to emulate large networks

▪ scaling out emulations transparently on multiple hosts and clouds

  • fl

flexibil ibility ity to accommodate heterogeneous devices

▪ Introducing a homogeneous PhyNet layer to open and unify network name space of devices

  • correctness

ctness and cost t eff fficie iency ncy of emulation boundary

22

slide-23
SLIDE 23

A transparent boundary is needed

23

S1 S2 L1 L2 T1 T2 L3 L4 T3 T4 L5 L6 T5 T6 C1 C2

Core Network & Internet (non-emulate emulated) Data Center Network (emulate ated) A t trans nspar parent ent boundar dary:

  • No sense of the existence of a boundary
  • Behaving identically as real networks
  • We cannot extend the emulation to the whole Internet
  • cost
  • hard to get software or policy beyond our administrative domain

C1 C2

slide-24
SLIDE 24

CrystalNet constructs static boundaries

24

S1 S2 L1 L2 T1 T2 L3 L4 T3 T4 L5 L6 T5 T6 C1 C2

Static ic speake aker devic ices es:

  • terminate the topology
  • maintain connections with emulated devices
  • customizable initial routes to emulation
  • no

no reaction to dynamics inside emulation 0.0.0.0/0 Core Network & Internet (non-emulate emulated) Data Center Network (emulate ated) routing information

S1 S2 L1 L2 T1 T2 L3 L4 T3 T4 L5 L6 T5 T6

Correctness?

slide-25
SLIDE 25

An example of an unsafe boundary

25

S1 S2 L1 L2 T1 T2 L3 L4 T3 T4 L5 L6 T5 T6 Add 10.1.0.0/16

A unsafe boundary

slide-26
SLIDE 26

A proven safe boundary

26

S1 S2 L1 L2 T1 T2 L3 L4 T3 T4 L5 L6 T5 T6 Add 10.1.0.0/16

A proven safe boundary

AS100 AS200 AS300

The boundary is a single AS, announcements never return

Se See paper r for proofs s and safe e boundar ary y for OSP SPF, IS-IS, S, etc.

slide-27
SLIDE 27

Small boundaries significantly reduce cost

27

S1 S2 L1 L2 T1 T2 L3 L4 T3 T4 L5 L6 T5 T6

AS100

AS200 AS300

S1 S2 L1 L2 T1 T2 L3 L4 T3 T4 L5 L6 T5 T6

AS100 AS200 AS300

Emulating individual PodSet Emulating Particular Layers

Cost savin vings gs from Emulatin lating g the entir ire e DC: 96%~9 ~98% 8% (S (See the paper r for the algorit

  • rithm)

hm)

slide-28
SLIDE 28

Case study

28

slide-29
SLIDE 29

Shifting to regional backbones

29

US-EAST US-WEST Core Backbone Regional Backbone Regional Backbone Good news:

  • Significantly better

performance for intra- region traffic once the migration is finished Bad news:

  • It is difficult to achieve

this migration without user impact DC-1 DC-2 DC-3 DC-4

slide-30
SLIDE 30

The migration is a complex operation

30

Common policies in Azure’s datacenters: 1. Reachability among the servers are always on;

  • 2. Private IP blocks are never announced out of a datacenter;
  • 3. Special private IP blocks must be announced out of a datacenter;
  • 4. Intra-region traffic needs to go through regional backbone;
  • 5. Inter-region traffic needs to go through core backbone;
  • 6. All IP addresses need to be aggregated before being announced by Spines;
  • 7. None public IP addresses should be announced to Leaves or below;
  • Operators have developed software and tools, but they have no way to try them
  • ut in realistic setting

[Severity 1: Id XXX] Date: 10/19/2016 Impact act: the entire region X unreachable Root Cause: Human Typo [Severity 1: Id YYY] Date: 10/26/2016 Impact act: VM crashes and service failures region Y Root Cause: Wrong operation order

slide-31
SLIDE 31

31

Zero customer impact with CrystalNet

  • Networks: two largest datacenters and the core and regional backbone between them.
  • Cost of e

emulat lation ion: $30/hour ($1000/hour without safe & small boundary design)

  • Bugs

s found: 50+, including configuration, management script, switch software and operation errors

  • Potenti

ntial al savin ing: 5+ outages

  • Incid

idents nts in produc uction ion: 0

slide-32
SLIDE 32

Performance

32

slide-33
SLIDE 33

CrystalNet starts large scale emulations in minutes

33

#Bor

  • rder

ers #Spines ines #Leave eaves #T

  • Rs

#Route utes O(10) O(100) O(1000) O(1000) O(10M) One of t the large gest st DC n network

  • rk in Azure

10 20 30 40 50 60

500VM/2000Cores 1000VM/4000Cores Latency (Minutes)

Emulation Startup Latency

Network Ready Route Ready

slide-34
SLIDE 34

Future work

  • How to provide root cause trace back
  • How to provide automatic testing
  • How to check bugs or performance issues in hardware
  • What is the theory to search the minimum boundary

34

slide-35
SLIDE 35

Conclusion

  • Network is a major contributor to clouds’ outages
  • We build CrystalNet have to prevent network incidents proactively
  • cloud based scalability

alability

  • flexibili

xibility ty to handle heterogeneous switch sandboxes

  • corr

rrectn ectness ess & cost t effic icienc iency of a transparent emulation boundary

  • CrystalNet is easy to use and has been used as a mandatory

network validation process in Azure

35

slide-36
SLIDE 36

36

Azur ure e is consid sidering ing providi viding ng Crysta stalNet lNet as a s service vice Inte terested? sted? Contact ntact us!

crystal stalne net-dev@mic dev@micros

  • soft.c

.com

  • m

At Microsoft, our mission is to empower every person and every organization on the planet to achieve more.