Evolution of OpenStack Networking at CERN Nova Network, Neutron and - - PowerPoint PPT Presentation

evolution of openstack networking at cern
SMART_READER_LITE
LIVE PREVIEW

Evolution of OpenStack Networking at CERN Nova Network, Neutron and - - PowerPoint PPT Presentation

Evolution of OpenStack Networking at CERN Nova Network, Neutron and SDN Belmiro Moreira @belmiromoreira belmiro.moreira@cern.ch Ricardo Rocha @ahcorporto ricardo.rocha@cern.ch Fundamental Science Founded in 1954 What is 96% of the universe


slide-1
SLIDE 1
slide-2
SLIDE 2

Evolution of OpenStack Networking at CERN

Nova Network, Neutron and SDN

Belmiro Moreira @belmiromoreira belmiro.moreira@cern.ch Ricardo Rocha @ahcorporto ricardo.rocha@cern.ch

slide-3
SLIDE 3

Founded in 1954

What is 96% of the universe made of?

Fundamental Science

Why isn’t there anti-matter in the universe? What was the state of matter just after the Big Bang?

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

6

slide-7
SLIDE 7

7

CELL 1

TOP

CELL 2 CELL N Compute GPU Compute Nova Network Neutron Neutron

Capabilities

CPU Pinning Huge Pages SMP GPU ...

Configuration

Neutron vs Nova Network Allowed Projects ...

Moving from CellsV1 to CellsV2 at CERN, Mon 21 11:35

Scalability & Flexibility

slide-8
SLIDE 8

8 CELL

NODE 2 NODE 1 VN V2 V1 V3 V2 V1 V3

Hypervisors Virtual Machines

  • Order of ~10s of cells (currently 70), with ~200 hypervisors per cell
  • Number of virtual machines per hypervisor varies per use case

From 4 to 30 VMs per hypervisor

slide-9
SLIDE 9

9 CELL

NODE 2 NODE 1 VN V2 V1 V3 V2 V1 V3

S513-V-IP123 137.1XX.43.0/24 ( Primary Service ) S513-V-VM908 188.1XX.191.0/24 ( Secondary Service ) Hypervisors Virtual Machines

  • Flat but segmented network, with multiple broadcast domains

Scalability

Segmentation done on Primary Services

  • Primary Services can have multiple Secondaries
  • No route if Secondary is in a different Primary

VM IP allocation must belong to the hypervisor’s Primary

slide-10
SLIDE 10

10

LanDB

Source of Truth

  • All devices must be present
  • Used for different purposes

○ Security checks ○ DNS/DHCP Configuration

Switch/router configuration

Active Directory, …

Primary Services Secondary Services Hypervisors Virtual Machines IPv4 DNS IPv6 Aliases Aliases IPv6 Readiness Ownership ...

slide-11
SLIDE 11

11

Phase 1. Nova Network Phase 2. Neutron Phase 3. SDN

slide-12
SLIDE 12

12

Phase 1. Nova Network

  • Custom NetworkManager
  • Late IP allocation - after scheduling to compute nodes
  • Patching done directly in the Nova code
  • Nova Network is being deprecated...

Quantum is the new thing… Neutron is the new thing...

NOVA COMPUTE LanDB

NOVA DB

slide-13
SLIDE 13

13

Phase 2. Neutron

  • Linuxbridge, Flat / Provider networks
  • Better integration using ML2, mechanism driver and extensions

Quickly became possible to have it out of tree

Our extensions have a similar role to Neutron Segments

  • Gradual enroll, cell by cell
  • Vanilla upstream packages for Neutron, much smaller patch on Nova
  • More split pieces, potential points of failure

Periodic consistency checks

NOVA COMPUTE Neutron

1 2 3 4a

LanDB

4b

https://gitlab.cern.ch/cloud-infrastructure/openstack-neutron-cern

slide-14
SLIDE 14

14

Phase 2. Neutron

Subnet Cluster Which subnets belong to this cluster?

neutron cluster-list +--------+----------------------+-------------------------------------------------------+ | id | name | subnets | +--------+----------------------+-------------------------------------------------------+ | ... | VMPOOL SXXXX-C-IPZZZ | ... 188.xxx.yy.zz/22 | | ... | VMPOOL SBBBB-C-IPWWW | ... 137.aaa.bb.ccc/25 | | | | ... 137.bbb.cc.0/25 | | | | ... 137.bbb.dd.0/25 | +--------+----------------------+-------------------------------------------------------+

slide-15
SLIDE 15

15

Phase 2. Neutron

Host Restrictions Which subnets can i use for this hypervisor?

neutron host p06253927y321a1 +----------------------------+--------------------------------------+ | Field | Value | +----------------------------+--------------------------------------+ | all_subnets | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | | available_random_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | | available_subnets | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | | least_available_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | | most_available_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | +----------------------------+--------------------------------------+

slide-16
SLIDE 16

16

Phase 2. Neutron

  • Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

3 Virtual Machines

→5x 64GB Virtual Machines

~default rabbit configuration ~default neutron configuration ~looking ok(ish)

< 1000 Nodes

slide-17
SLIDE 17

17

Phase 2. Neutron

  • Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

Cluster crashes once, crashes constantly

Cannot allocate 1318267840 bytes of memory (of type "heap").

Statistics db issues →collect_statistics_interval = 60000 Agents (too) aggressively trying to reconnect →rabbit_retry_backoff = 60 Agents not re-connecting properly →restart neutron servers Scale up Rabbit nodes, larger VMs

1200 Nodes

slide-18
SLIDE 18

18

Phase 2. Neutron

  • Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

Cluster crashes periodically

Lots of queued messages, until it goes ( neutron server ) →rpc_thread_pool_size = 2048 →rpc_conn_pool_size = 60 →rpc_response_timeout = 120 →rpc_workers = 4 ( rabbit ) →tcp_backlog: 4096 →tcp_listen_options { reuseaddr: true, keepalive, true } →tcp_keepalive = true →rabbitmq_server_erl_args = '+K +A128 +P 1048576' →vm_memory_high_watermark = 0.8 →ulimits (65536 for nofile/nproc soft and hard) →cluster_partition_handling = autoheal 2000 Nodes

slide-19
SLIDE 19

19

Phase 2. Neutron

  • Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

Cluster crashes less, but still happens

Lots of queued messages, until it goes ( rabbit virtual machines ) →ip link set %k txqueuelength 10000 ( neutron agent ) →report_interval=43200 ( neutron server ) →agent_downtime=86500

Other Considerations ( not done, not helpful )

→increase rpc_state_report_workers →heartbeat timeouts on the rabbit cluster 2400 Nodes

slide-20
SLIDE 20

20

Phase 2. Neutron

  • Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge Stable cluster →5x 64GB Virtual Machines Ocasional network partitions →recovering most times, but not always →procedure for a quick cluster rebuild (~10min downtime)

~5000 Nodes

slide-21
SLIDE 21

Stable cluster →5x 64GB Virtual Machines Ocasional network partitions →recovering most times, but not always →procedure for a quick cluster rebuild (~10min downtime)

21

Phase 2. Neutron

  • Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

~5000 Nodes

slide-22
SLIDE 22

22

Phase 2. Neutron

  • Single control plane, no partitioning (as with Nova cells)

Scaling RabbitMQ wasis a challenge

~5000 Nodes

slide-23
SLIDE 23

23

Phase 2. Neutron

Migrating existing cells from Nova Network

  • Puppet for reconfiguration
  • Custom command for the live VM changes

$ openstack network cluster migrate --dry-run --host p06146676a327ab $ openstack network cluster migrate --host p06146676a327ab $ openstack network cluster migrate --cluster ‘VMPOOL SXXXX-C-IPZZZ’

https://gitlab.cern.ch/cloud-infrastructure/python-neutronclient-cern

commands.extend([ "brctl delif %s %s" % (NOVA_BRIDGE, raw_device), "ip link set %s down" % NOVA_BRIDGE, "ip link set %s name %s" % (NOVA_BRIDGE, CERN_NETWORK_BRIDGE), "brctl addif %s %s" % (CERN_NETWORK_BRIDGE, raw_device), "ip link set %s up" % CERN_NETWORK_BRIDGE, "ip route add default via %s dev %s" % (gw, CERN_NETWORK_BRIDGE), ])

for instance in instances:

ip = instance.addresses['CERN_NETWORK'][0] mac = ip['OS-EXT-IPS-MAC:mac_addr'] nova_tap = nova_interfaces[mac] neutron_tap = nova_interfaces[mac] commands.extend([ "brctl delif %s %s" % (NOVA_BRIDGE, nova_tap), "ip link set %s name %s" % (nova_tap, neutron_tap), ])

slide-24
SLIDE 24

24

Phase 3. SDN

  • Current network deployment has significant limitations
  • Limited IP Mobility

Segmented broadcast domains

Live migration limited to single cluster

Ad-hoc tunnels for hardware retirement campaigns

  • Hardware Repurposing

Multiple network domains (General, Services, …)

Services dedicated to a single domain

  • No Floating IPs
  • No Tenant/Private Networks
slide-25
SLIDE 25

25

Phase 3. SDN

  • Small prototype setups to evaluate functionality

Neutron/OpenVSwitch OpenDaylight OVN DHCP Neutron Neutron/Built-in Built-in Floating IPs Yes Yes Yes Distributed Routing Only with DVR Yes Yes Tunneling Protocols vxlan / GRE / geneve vxlan / GRE / geneve vxlan / geneve Security Groups IPTables OpenFlow Native OpenFlow Native + Logging Load Balancing Octavia Octavia Octavia / OVN Native Acceleration Limited DPDK DPDK DPDK Tracing tcpdump tcpdump

  • vn-trace

Physical Switch Integr. L2 / L3 L2 / L3 L2 / L3

slide-26
SLIDE 26

26

Phase 3. SDN

  • In the end we picked OpenContrail / Tungsten
slide-27
SLIDE 27

27

Phase 3. SDN

  • In the end we picked OpenContrail / Tungsten

OPENSTACK

WAN GATEWAY

XMPP BGP

CONTROLLER

NETCONF/EVPN OVSDB

CONTROLLER HYPERVISOR HYPERVISOR PHYSICAL PHYSICAL

VROUTER

MPLSoUDP/GRE VXLAN

slide-28
SLIDE 28

28

Phase 3. SDN

  • In the end we picked OpenContrail / Tungsten

OPENSTACK

WAN GATEWAY

XMPP BGP

CONTROLLER

NETCONF/EVPN OVSDB

CONTROLLER HYPERVISOR HYPERVISOR PHYSICAL PHYSICAL

VROUTER

MPLSoUDP/GRE VXLAN

Cassandra, Config, Analytics, ... https://github.com/Juniper/contrail-helm-deployer

slide-29
SLIDE 29

29

Phase 3. SDN

  • In the end we picked OpenContrail / Tungsten

OPENSTACK

WAN GATEWAY

XMPP BGP

CONTROLLER

NETCONF/EVPN OVSDB

CONTROLLER HYPERVISOR HYPERVISOR PHYSICAL PHYSICAL

VROUTER

MPLSoUDP/GRE VXLAN

Neutron ML2 vs Monolithic Separate Region

slide-30
SLIDE 30
  • Scaling Neutron was not trivial, mostly due to the agents / rabbitmq

○ Deployed in production and stable

  • Currently finalizing the migration from Nova Network to Neutron
  • Evaluated different SDN solutions
  • Ongoing work deploying Tungsten in a new Region
  • Looking forward to offer Floating IPs, Private Networks and much more

Summary

30

slide-31
SLIDE 31

31

Questions?

Belmiro Moreira

belmiro.moreira@cern.ch @belmiromoreira

Ricardo Rocha

ricardo.rocha@cern.ch @ahcorporto