Challenges in Distributed SDN
Duarte Nunes duarte@midokura.com @duarte_nunes
Challenges in Distributed SDN Duarte Nunes duarte@midokura.com - - PowerPoint PPT Presentation
Challenges in Distributed SDN Duarte Nunes duarte@midokura.com @duarte_nunes MidoNet transform this... IP Fabric VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM Bare Metal Server VM VM VM VM VM VM VM
Duarte Nunes duarte@midokura.com @duarte_nunes
Bare Metal Server Bare Metal Server
VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM
IP Fabric
Bare Metal Server Bare Metal Server
VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM
FW LB FW LB Internet/ WAN FW
Bare Metal Server Bare Metal Server
VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM
FW LB FW LB Internet/ WAN FW
Bare Metal Server Bare Metal Server
VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM
IP Fabric midonet nsdb 2 midonet nsdb 3 midonet nsdb 1 midonet gateway 2 midonet gateway 3 midonet gateway 1 IP Fabric IP Fabric Internet/ WAN
○ virtual devices become distributed ○ a packet can traverse a particular virtual device at any host in the cloud ○ distributed virtual bridges, routers, NATs, FWs, LBs, etc.
Gateway 1
Quagga, bgpd
OVS kmod
IP3 eth0 eth1 VXLAN Tunnel Port
Internet/WAN
port1 port2 port3, veth0 veth1
MidoNet Agent (Java Daemon)
Compute 1
VM VM VM VM VM VM VM VM
IP Fabric OVS kmod
IP1 VXLAN Tunnel Port eth0 port5, tap12345
MidoNet Agent (Java Daemon)
○ by simulating a packet’s path through the virtual topology ○ without fetching any information off-box (~99% of the time)
○ the tunnel key encodes the egress port ○ no computation is needed at the egress
○ reliable subscription to topology changes
packets
○ a virtual bridge learns a MAC-port mapping a host and needs to read it in other hosts ○ a virtual router emits an ARP request out of one host and receives the reply on another host
○
interested agents subscribe to tables to get updates
○
the owner of an entry manages its lifecycle
○
use ZK Ephemeral nodes so entries go away if a host fails
VM VM
ARP Table
IP Fabric
VM
VM
VM VM
ARP Table
IP Fabric
VM
VM
VM VM
VM
ARP Table
IP Fabric Encapsulated ARP request
VM
VM VM
ARP Table
IP Fabric ARP reply handled locally and written to ZK ZK notification
VM
VM
Encapsulated packet
VM VM
ARP Table
IP Fabric
VM
VM
○ thus, they need to share state
VM
LB
VM VM
NIC
Internet/ WAN Return flow Forward flow 180.0.1.100:80 10.0.0.2 10.0.0.2:6456 Internet/ WAN
NIC
Internet/ WAN
NIC NIC
VM
LB
VM
NIC
Internet/ WAN
NIC NIC
VM
LB
VM
Forward flow
NIC
Internet/ WAN
NIC NIC
VM
LB
VM
Return flow
NIC
Internet/ WAN
NIC NIC
VM
LB
VM
Return flow
○ Key: 5 tuple + ingress device UUID ○ Value: NA ○ Forward state not needed ○ One flow state entry per flow
○ Key: 5 tuple + device UUID under which NAT was performed ○ Value: (IP, port) binding ○ Possibly multiple flow state entries per flow
Node 2 Node 1
Node 4 (possible asym.
Node 3 (possible asym.
local state
state to interested set
Node 2 Node 1
Node 4 (possible asym.
Node 3 (possible asym.
Node 2 Node 1
Node 3 (possible asym.
arrives at different node
Node 4 (possible asym.
risk of the return flow being computed without the flow state?
VM VM VM
NIC
10.0.0.2 10.0.0.2:6456 180.0.1.100:9043
VM
dst: 216.58.210.164:80 Internet/ WAN Internet/ WAN
VM VM VM
NIC
10.0.0.2 10.0.0.2:6456 180.0.1.100:9043
VM
dst: 216.58.210.164:80 NAT Target: (start_ip..end_ip, start_port..end_port) e.g. 180.0.1.100..180.0.1.100 5000..65535 Internet/ WAN Internet/ WAN
10.0.0.1
VM VM
NIC
10.0.0.2 10.0.0.2:6456 180.0.1.100:9043 dst: 216.58.210.164:80 10.0.0.1:7182 180.0.1.100:9044
VM VM
Internet/ WAN Internet/ WAN
VM VM VM
NIC
10.0.0.2 10.0.0.2:6456 180.0.1.100:9043
VM
dst: 216.58.210.164:80 10.0.0.1 10.0.0.1:7182 180.0.1.100:? Internet/ WAN Internet/ WAN
Flow table Flow state ARP broker CPU Flow table Flow state ARP broker CPU Flow table Flow state ARP broker CPU Flow table Flow state ARP broker CPU
Upcall Output Simulation Datapath Backchannel Backchannel Backchannel Backchannel Virtual Topology
User Kernel
○ Share nothing model ○ Each simulation thread is responsible for a subset of the installed flows ○ Each simulation thread is responsible for a subset of the flow state ○ Each thread ARPs individually ○ Communication by message passing through “backchannels”
○ When a piece of the virtual topology is needed, simulations are parked