A Network-State Management Service
Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft
A Network-State Management Service Peng Sun Ratul Mahajan, - - PowerPoint PPT Presentation
A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft Complex Infrastructure Microsoft Azure Number of 2010 2014 Data Center A few 10s Network
Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft
Variety of vendors/models/time
1
Number of 2010 2014 Data Center A few 10s Network Device 1,000s 10s of 1,000s Network Capacity 10s of Tbps Pbps
Microsoft Azure
2
Traffic Engineering Load Balancing Link Corruption Mitigation Device Firmware Upgrade
3
4
Traffic Engineering Link Corruption Mitigation Firmware Upgrade Network Devices
4
Traffic Engineering Link Corruption Mitigation Firmware Upgrade Network Devices
Agg A ToRs Agg B Core1 2
6
Link-corruption- mitigation shuts down faulty Agg A Firmware-upgrade schedules Agg B to upgrade
7
Traffic Engineering Firmware Upgrade Link Corruption Mitigation
already individually complicated
8
applications
9
Traffic Engineering Firmware Upgrade Link Corruption Mitigation
10
Application Routing Device Config Traffic Engineering Firmware upgrade
11
Monolithic Indepen- dent Explicitly coordinate
Simple Complex
12
safe multi-application operation
13
14
Network Devices Network State
State Variable Value Device Power Status Up, down Device Firmware Version number Device SDN Agent Boot Up, down Device Routing State Routing rules Link Admin Status Up, down Link Control Plane BGP, OpenFlow, …
15
Past Now
16
SNMP, OF, vendor API, … Read Write
Network Devices Network Devices Network State Application
Device Statistics
Application
Device- specific cmds
17
Network Devices Observed State
Observed State Actual state of the whole network Target State Desired state to be updated on the whole network
Target State Application Application Application
Network Devices
18
Observed State Target State Application Application Application
Network Devices
18
Observed State Target State
One More View Proposed State A group of entity-variable-values desired by an application
Proposed State Application Application Application
into a safe target state
19
least 50% capacity
20
Hinder application too frequently
Tight Loose
Cannot protect network operation
21
Observed State Target State Proposed State
What we see from the network What we want the network to be What can be actually done
Statesman
Application Application Application
application cares
22
A B C D
23
TE writes new value
for tunneling traffic Firmware-upgrade writes new value of firmware state of B
24
PowerState FirmwareVersion ConfigurationState RoutingState AdminState ConfigurationState PathState
Device Link
state variable
25
26
Target State Monitor Updater Checker Proposed State Observed State Storage Service
10 months
27
28
Inter-DC TE & Firmware-upgrade
29
BR 1 BR 2
DC 1
BR 8 BR 7
DC 4
BR 3 BR 4
DC 2
BR 5
DC 3
BR 6 DC = Data Center BR = Border Router
30
… … … …
30
Firmware-upgrade acquires lock of BR1
… … … …
30
TE fails to acquire lock, and moves traffic away
… … … …
30
TE fails to acquire lock, and moves traffic away
… … … …
30
BR1 firmware upgrade starts
… … … …
30
BR1 firmware upgrade starts BR1 firmware upgrade
… … … …
30
BR1 firmware upgrade starts TE re-acquires lock, and moves traffic back
… … … …
30
BR1 firmware upgrade starts TE re-acquires lock, and moves traffic back
… … … …
31
Firmware-upgrade & Link-corruption-mitigation
32
ToR Agg
Core
Pod 4 4 1 1 n
Pod 1 4 1 1 n
Pod 10 4 1 1 n 1 4 Link corrupting packets
33
Upgrade proceeds in normal speed in Pod 3 and 5
… … … … …
33
Upgrade proceeds in normal speed in Pod 3 and 5
… … … … …
33
Upgrade proceeds in normal speed in Pod 3 and 5
… … … … …
33
Upgrade proceeds in normal speed in Pod 3 and 5 Upgrade in Pod 4 is slowed down by checker due to lost capacity
… … … … …
33
Upgrade proceeds in normal speed in Pod 3 and 5 Upgrade in Pod 4 is slowed down by checker due to lost capacity
… … … … …
progresses
requirements
34
multiple management applications
35
36
Check paper for related works