Networking approaches in a Container World Who we are Flavio - - PowerPoint PPT Presentation
Networking approaches in a Container World Who we are Flavio - - PowerPoint PPT Presentation
Networking approaches in a Container World Who we are Flavio Castelli Neil Jerram Antoni Segura Puimedon Engineering Manager Senior Sw. Engineer Principal Sw. Engineer Disclaimer There a many container engines, well focus on Docker
Antoni Segura Puimedon Principal Sw. Engineer Flavio Castelli Engineering Manager Neil Jerram Senior Sw. Engineer
Who we are
Disclaimer
- There a many container engines, we’ll focus on Docker
- Multiple networking solutions are available:
○ Introduce the core concepts ○ Many projects → cover only some of them
- Container orchestration engines:
○ Often coupled with networking ○ Focus on Docker Swarm and Kubernetes
- Remember: the container ecosystem moves at a fast pace, things can
suddenly change
The problem
- Containers are lightweight
- Containers are great for microservices
- Microservices: multiple distributed processes communicating
- Lots of containers that need to be connected together
Single host
host networking
Containers have full access to the host interfaces!!!
host container-a eth0 lo ...
host networking
Containers able to:
- See all host interfaces
- Use all host interfaces
Containers can’t (without CAPS)
- Modify their IP addresses
- Modify their IP routes
- Create virtual devices
- Interact with iptables/ebtables
$ docker run --net=host -it --rm alpine /bin/sh / # ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: wlp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether e4:b3:18:d2:f6:ea brd ff:ff:ff:ff:ff:ff 3: enp0s31f6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether c8:5b:76:36:b6:0b brd ff:ff:ff:ff:ff:ff 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:7e:62:3d:37 brd ff:ff:ff:ff:ff:ff / #
Bridged networking
- Linux bridge
- Containers connected to the
bridge with veth pairs
- Each container gets its own IP
and kernel networking namespace
- Containers can talk to each
- ther and to the host via IP
host eth0 docker0
172.17.0.0/16
Forwarding container-a veth0 veth1 container-b veth2 veth3
Bridged networking
- Outwards connectivity via IP
forwarding and masquerading
- The bridge and containers use a
private subnet
$ ip address show dev docker0 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:7e:62:3d:37 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever $ sudo iptables -t nat -L POSTROUTING Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- 172.17.0.0/16 anywhere $ docker run --net=bridge -it --rm alpine /bin/sh -c '/sbin/ip -4 address show dev eth0; ip -4 route show' 50: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP inet 172.17.0.2/16 scope global eth0 valid_lft forever preferred_lft forever default via 172.17.0.1 dev eth0 172.17.0.0/16 dev eth0 src 172.17.0.2
Bridged networking
- Services are exposed with
iptables DNAT rules
- Iptables performance
deteriorates as rule amount increases
- Limited to how many host ports
are free to be bound
$ docker run --net=bridge -d --name nginx -p 8000:80 nginx $ sudo iptables -t nat -n -L Chain PREROUTING (policy ACCEPT) target prot opt source destination DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain OUTPUT (policy ACCEPT) target prot opt source destination DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 MASQUERADE tcp -- 172.17.0.2 172.17.0.2 tcp dpt:80 Chain DOCKER (2 references) target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8000 to:172.17.0.2:80
Multi host
host-A host-B host-C container-0 2 container-0 1 container-0 3 container-0 4 container-0 5 container-0 6 frontend network application network database network eth0 eth0
Multi host networking scenarios
eth0
a big host-A frontend network container-0 2 container-0 1 container-0 3 container-0 4 container-0 5 container-0 6 application network database network VM-1 VM-2 VM-3
Multi host networking scenarios
Multi host routing solutions
Routing approach
- Managed common IP space at
the container level
- Assigns /24 subnet to each
host
- Inserts routes to each host /24
into the routing table of each host
- Main implementations
○ Calico ○ Flannel ○ Romana ○ Kuryr ■ Calico
host-a eth0 docker0
10.0.8.1/24
container-a
10.0.8.2/24
container-b
10.0.8.3/24
host-b eth0 docker0
10.0.9.1/24
container-c
10.0.9.2/24
container-d
10.0.9.3/24
172.16.0.0/16 172.16.0.4/16 172.16.0.5/16
Calico’s approach
- Felix agent agent per node that
sets up a vRouter: ○ Kernel’s L3 forwarding ○ Handles ACLs with iptables ○ Uses BIRD’s BGP to keep /32 or /128 routes to each container updated ○ Etcd as data store ○ Replies container ARP reqs with host hwaddr
host-a eth0 docker0
10.0.8.1/24
container-a
10.0.8.2/24
container-b
10.0.8.3/24
host-b eth0 docker0
10.0.9.1/24
container-c
10.0.9.2/24
container-d
10.0.9.3/24
172.16.0.0/16 172.16.0.4/16 172.16.0.5/16 BGP vRouter BGP vRouter
Flannel approach
- Flanneld agent
○ Etcd as data store ○ Keeps /24 routes to hosts up to date ○ No ACLs/isolation
Canal
- Developed by Tigera
- Announced on May 9th 2016
Multi host overlay solutions
- Encapsulates multiple networks
- ver the physical networking
○ UDP ○ vxlan ○ geneve ○ GRE
- Connect containers to virtual
networks
- Main projects
○ Docker’s native overlay ○ Flannel ○ Weave ○ Kuryr ■ OVS (OVN, Dragonflow) ■ MidoNet ■ PLUMgrid
Overlay approach
host-a eth0 net-x
10.0.8.1/24
container-a
10.0.8.2/24
container-b
10.0.8.3/24
host-b eth0 net y
10.0.9.1/24
container-c
10.0.7.4/24
container-d
10.0.7.3/24
172.16.0.0/16 172.16.0.4/16 172.16.0.5/16
net-y
10.0.7.1/24
container-c
10.0.7.2/24
encapsulation encapsulation encapsulated container traffic
OpenStack & containers with Kuryr
- Allows you to have both VMs,
containers and containers-in-VMs in the same
- verlay
- Allows reusing VM nets for
containers and viceversa
- Allows you to have separate
- verlay nets routed to each
- ther
- Isolation from the host
networking
- Can have Swarm and
Kubernetes on the same overlay
Overlay underlay
Routing vs Overlay Good Bad Routing
- Native performance
- Easy debugging
- Requires control over the
infrastructure
- Hybrid cloud more complicated
(requires VPN)
- Can run out of addresses
(mitigation: IPv6)
Overlay
- Easier inter-cloud
- Easier hybrid workloads
- Doesn’t require control over the
infrastructure
- More implementation choice
- Inferior performances (mitigation:
hw acceleration and jumbo frame)
- Debugging more complicated
Competing COE-Networking interaction
Container Network Model (CNM)
- Implemented by Docker’s Libnetwork
- Separated IPAM and Remote Drivers
- Docker ≥ 1.12 Swarm mode only works
with native overlay driver
- Some of the Libnetwork remote drivers:
○ OpenStack Kuryr ○ Calico ○ Weave
Container Network Interface (CNI)
- Implemented by Kubernetes, rkt, Mesos,
Cloud Foundry and Kurma
- Plugins:
○ Calico ○ Flannel ○ Weave ○ OpenStack Kuryr (unreleased)
More challenges
Service discovery
- Producer: A container that runs a service
- Consumer: A container when consuming a service
- Need a way for consumers to find producer endpoints
Service discovery challenges
#1 Finding the producer host-A web-01 eth0 Where is redis? host-B redis-01 eth0 Lacks SD host-A web-01 eth0 host-B redis-01 eth0 host-C redis-02 eth0 #2 Moving services
Service discovery challenges
#3 Multiple choice host-A web-01 eth0 Which redis? host-B redis-01 eth0 host-C redis-02 eth0 host-D redis-03 eth0
Addressing service discovery
Use DNS
- Problematic for highly dynamic deployments:
○ Containers can die/be moved somewhere more often than DNS caches expire ○ If we try to improve it by reducing DNS TTL → more load on the server ○ Some clients ignore TTL → old entries are cached
Note well:
- Docker < 1.11: updates /etc/hosts dynamically
- Docker ≥1.11: integrates a DNS server
Key-value store
- Rely on a k/v store
○ etcd, ○ consul, ○ zookeeper
- Producer register its IP and port
- Orchestration engine handles this data to the consumer
- At run time either:
○ Change your application to read data straight from the k/v ○ Rely on some helper that exposes the values via environment file or configuration file
Changes, multiple choices & ingress traffic
Orchestration engine services
- Services get a unique and
stable Virtual IP Address
- VIP always points to one of the
service containers
- Consumers are pointed to the
VIP
- Offered by Kubernetes and
Docker 1.12+
- Can run in parallel to DNS for
legacy apps
host-B redis-01 eth0 host-C redis-02 eth0 host-A web-01 eth0
redis service VIP
33
Ingress traffic: Routing requests to ever-changing container topology
Kubernetes has three service modes:
- ClusterIP: VIP internal to cluster
communication only. (can use externalIP)
- NodePort: Like Docker 1.12+
- Loadbalancer: Uses NodePort at the
cluster level and uses one of its pluggable load balancer drivers to instantiate and update external load balancers (gce, aws, OSt) Docker 1.12+ service approach
- Define services using the --publish/-p flag
- Services get exposed on all cluster nodes
- n specific port mappings
node_IP:service_port
Load balancer host-B guestbook
- 01
8081 blog-01 8080 host-A guestbook
- 01
8081 8080 host-C 8081 blog-01 8080
Load balanced ingress traffic flow
- Load balancer picks a host
- Traffic is handled by cluster
service
- Works even when the node
chosen by the LB is not running the container
Recap
Not just a matter of connecting containers:
- Service discovery
- Handling changes & multiple
choices
- Handling ingress traffic
Approach Spec Calico routing CNI, CNM Docker
- verlay
CNM Flannel routing, overlay CNI, CNM Kuryr routing, overlay CNI, CNM Weave
- verlay
CNI, CNM