Networking approaches in a Container World Who we are Flavio - - PowerPoint PPT Presentation

networking approaches in a container world who we are
SMART_READER_LITE
LIVE PREVIEW

Networking approaches in a Container World Who we are Flavio - - PowerPoint PPT Presentation

Networking approaches in a Container World Who we are Flavio Castelli Neil Jerram Antoni Segura Puimedon Engineering Manager Senior Sw. Engineer Principal Sw. Engineer Disclaimer There a many container engines, well focus on Docker


slide-1
SLIDE 1

Networking approaches in a Container World

slide-2
SLIDE 2

Antoni Segura Puimedon Principal Sw. Engineer Flavio Castelli Engineering Manager Neil Jerram Senior Sw. Engineer

Who we are

slide-3
SLIDE 3

Disclaimer

  • There a many container engines, we’ll focus on Docker
  • Multiple networking solutions are available:

○ Introduce the core concepts ○ Many projects → cover only some of them

  • Container orchestration engines:

○ Often coupled with networking ○ Focus on Docker Swarm and Kubernetes

  • Remember: the container ecosystem moves at a fast pace, things can

suddenly change

slide-4
SLIDE 4

The problem

  • Containers are lightweight
  • Containers are great for microservices
  • Microservices: multiple distributed processes communicating
  • Lots of containers that need to be connected together
slide-5
SLIDE 5

Single host

slide-6
SLIDE 6

host networking

Containers have full access to the host interfaces!!!

host container-a eth0 lo ...

slide-7
SLIDE 7

host networking

Containers able to:

  • See all host interfaces
  • Use all host interfaces

Containers can’t (without CAPS)

  • Modify their IP addresses
  • Modify their IP routes
  • Create virtual devices
  • Interact with iptables/ebtables

$ docker run --net=host -it --rm alpine /bin/sh / # ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: wlp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether e4:b3:18:d2:f6:ea brd ff:ff:ff:ff:ff:ff 3: enp0s31f6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether c8:5b:76:36:b6:0b brd ff:ff:ff:ff:ff:ff 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:7e:62:3d:37 brd ff:ff:ff:ff:ff:ff / #

slide-8
SLIDE 8

Bridged networking

  • Linux bridge
  • Containers connected to the

bridge with veth pairs

  • Each container gets its own IP

and kernel networking namespace

  • Containers can talk to each
  • ther and to the host via IP

host eth0 docker0

172.17.0.0/16

Forwarding container-a veth0 veth1 container-b veth2 veth3

slide-9
SLIDE 9

Bridged networking

  • Outwards connectivity via IP

forwarding and masquerading

  • The bridge and containers use a

private subnet

$ ip address show dev docker0 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:7e:62:3d:37 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever $ sudo iptables -t nat -L POSTROUTING Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- 172.17.0.0/16 anywhere $ docker run --net=bridge -it --rm alpine /bin/sh -c '/sbin/ip -4 address show dev eth0; ip -4 route show' 50: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP inet 172.17.0.2/16 scope global eth0 valid_lft forever preferred_lft forever default via 172.17.0.1 dev eth0 172.17.0.0/16 dev eth0 src 172.17.0.2

slide-10
SLIDE 10

Bridged networking

  • Services are exposed with

iptables DNAT rules

  • Iptables performance

deteriorates as rule amount increases

  • Limited to how many host ports

are free to be bound

$ docker run --net=bridge -d --name nginx -p 8000:80 nginx $ sudo iptables -t nat -n -L Chain PREROUTING (policy ACCEPT) target prot opt source destination DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain OUTPUT (policy ACCEPT) target prot opt source destination DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 MASQUERADE tcp -- 172.17.0.2 172.17.0.2 tcp dpt:80 Chain DOCKER (2 references) target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8000 to:172.17.0.2:80

slide-11
SLIDE 11

Multi host

slide-12
SLIDE 12

host-A host-B host-C container-0 2 container-0 1 container-0 3 container-0 4 container-0 5 container-0 6 frontend network application network database network eth0 eth0

Multi host networking scenarios

eth0

slide-13
SLIDE 13

a big host-A frontend network container-0 2 container-0 1 container-0 3 container-0 4 container-0 5 container-0 6 application network database network VM-1 VM-2 VM-3

Multi host networking scenarios

slide-14
SLIDE 14

Multi host routing solutions

slide-15
SLIDE 15

Routing approach

  • Managed common IP space at

the container level

  • Assigns /24 subnet to each

host

  • Inserts routes to each host /24

into the routing table of each host

  • Main implementations

○ Calico ○ Flannel ○ Romana ○ Kuryr ■ Calico

host-a eth0 docker0

10.0.8.1/24

container-a

10.0.8.2/24

container-b

10.0.8.3/24

host-b eth0 docker0

10.0.9.1/24

container-c

10.0.9.2/24

container-d

10.0.9.3/24

172.16.0.0/16 172.16.0.4/16 172.16.0.5/16

slide-16
SLIDE 16

Calico’s approach

  • Felix agent agent per node that

sets up a vRouter: ○ Kernel’s L3 forwarding ○ Handles ACLs with iptables ○ Uses BIRD’s BGP to keep /32 or /128 routes to each container updated ○ Etcd as data store ○ Replies container ARP reqs with host hwaddr

host-a eth0 docker0

10.0.8.1/24

container-a

10.0.8.2/24

container-b

10.0.8.3/24

host-b eth0 docker0

10.0.9.1/24

container-c

10.0.9.2/24

container-d

10.0.9.3/24

172.16.0.0/16 172.16.0.4/16 172.16.0.5/16 BGP vRouter BGP vRouter

slide-17
SLIDE 17

Flannel approach

  • Flanneld agent

○ Etcd as data store ○ Keeps /24 routes to hosts up to date ○ No ACLs/isolation

slide-18
SLIDE 18

Canal

  • Developed by Tigera
  • Announced on May 9th 2016
slide-19
SLIDE 19

Multi host overlay solutions

slide-20
SLIDE 20
  • Encapsulates multiple networks
  • ver the physical networking

○ UDP ○ vxlan ○ geneve ○ GRE

  • Connect containers to virtual

networks

  • Main projects

○ Docker’s native overlay ○ Flannel ○ Weave ○ Kuryr ■ OVS (OVN, Dragonflow) ■ MidoNet ■ PLUMgrid

Overlay approach

host-a eth0 net-x

10.0.8.1/24

container-a

10.0.8.2/24

container-b

10.0.8.3/24

host-b eth0 net y

10.0.9.1/24

container-c

10.0.7.4/24

container-d

10.0.7.3/24

172.16.0.0/16 172.16.0.4/16 172.16.0.5/16

net-y

10.0.7.1/24

container-c

10.0.7.2/24

encapsulation encapsulation encapsulated container traffic

slide-21
SLIDE 21

OpenStack & containers with Kuryr

  • Allows you to have both VMs,

containers and containers-in-VMs in the same

  • verlay
  • Allows reusing VM nets for

containers and viceversa

  • Allows you to have separate
  • verlay nets routed to each
  • ther
  • Isolation from the host

networking

  • Can have Swarm and

Kubernetes on the same overlay

Overlay underlay

slide-22
SLIDE 22

Routing vs Overlay Good Bad Routing

  • Native performance
  • Easy debugging
  • Requires control over the

infrastructure

  • Hybrid cloud more complicated

(requires VPN)

  • Can run out of addresses

(mitigation: IPv6)

Overlay

  • Easier inter-cloud
  • Easier hybrid workloads
  • Doesn’t require control over the

infrastructure

  • More implementation choice
  • Inferior performances (mitigation:

hw acceleration and jumbo frame)

  • Debugging more complicated
slide-23
SLIDE 23

Competing COE-Networking interaction

Container Network Model (CNM)

  • Implemented by Docker’s Libnetwork
  • Separated IPAM and Remote Drivers
  • Docker ≥ 1.12 Swarm mode only works

with native overlay driver

  • Some of the Libnetwork remote drivers:

○ OpenStack Kuryr ○ Calico ○ Weave

Container Network Interface (CNI)

  • Implemented by Kubernetes, rkt, Mesos,

Cloud Foundry and Kurma

  • Plugins:

○ Calico ○ Flannel ○ Weave ○ OpenStack Kuryr (unreleased)

slide-24
SLIDE 24

More challenges

slide-25
SLIDE 25

Service discovery

  • Producer: A container that runs a service
  • Consumer: A container when consuming a service
  • Need a way for consumers to find producer endpoints
slide-26
SLIDE 26

Service discovery challenges

#1 Finding the producer host-A web-01 eth0 Where is redis? host-B redis-01 eth0 Lacks SD host-A web-01 eth0 host-B redis-01 eth0 host-C redis-02 eth0 #2 Moving services

slide-27
SLIDE 27

Service discovery challenges

#3 Multiple choice host-A web-01 eth0 Which redis? host-B redis-01 eth0 host-C redis-02 eth0 host-D redis-03 eth0

slide-28
SLIDE 28

Addressing service discovery

slide-29
SLIDE 29

Use DNS

  • Problematic for highly dynamic deployments:

○ Containers can die/be moved somewhere more often than DNS caches expire ○ If we try to improve it by reducing DNS TTL → more load on the server ○ Some clients ignore TTL → old entries are cached

Note well:

  • Docker < 1.11: updates /etc/hosts dynamically
  • Docker ≥1.11: integrates a DNS server
slide-30
SLIDE 30

Key-value store

  • Rely on a k/v store

○ etcd, ○ consul, ○ zookeeper

  • Producer register its IP and port
  • Orchestration engine handles this data to the consumer
  • At run time either:

○ Change your application to read data straight from the k/v ○ Rely on some helper that exposes the values via environment file or configuration file

slide-31
SLIDE 31

Changes, multiple choices & ingress traffic

slide-32
SLIDE 32

Orchestration engine services

  • Services get a unique and

stable Virtual IP Address

  • VIP always points to one of the

service containers

  • Consumers are pointed to the

VIP

  • Offered by Kubernetes and

Docker 1.12+

  • Can run in parallel to DNS for

legacy apps

host-B redis-01 eth0 host-C redis-02 eth0 host-A web-01 eth0

redis service VIP

slide-33
SLIDE 33

33

Ingress traffic: Routing requests to ever-changing container topology

Kubernetes has three service modes:

  • ClusterIP: VIP internal to cluster

communication only. (can use externalIP)

  • NodePort: Like Docker 1.12+
  • Loadbalancer: Uses NodePort at the

cluster level and uses one of its pluggable load balancer drivers to instantiate and update external load balancers (gce, aws, OSt) Docker 1.12+ service approach

  • Define services using the --publish/-p flag
  • Services get exposed on all cluster nodes
  • n specific port mappings

node_IP:service_port

slide-34
SLIDE 34

Load balancer host-B guestbook

  • 01

8081 blog-01 8080 host-A guestbook

  • 01

8081 8080 host-C 8081 blog-01 8080

Load balanced ingress traffic flow

  • Load balancer picks a host
  • Traffic is handled by cluster

service

  • Works even when the node

chosen by the LB is not running the container

slide-35
SLIDE 35

Recap

Not just a matter of connecting containers:

  • Service discovery
  • Handling changes & multiple

choices

  • Handling ingress traffic

Approach Spec Calico routing CNI, CNM Docker

  • verlay

CNM Flannel routing, overlay CNI, CNM Kuryr routing, overlay CNI, CNM Weave

  • verlay

CNI, CNM

slide-36
SLIDE 36

Q&A