Evolution of OpenStack Networking at CERN Nova Network, Neutron and - - PowerPoint PPT Presentation
Evolution of OpenStack Networking at CERN Nova Network, Neutron and - - PowerPoint PPT Presentation
Evolution of OpenStack Networking at CERN Nova Network, Neutron and SDN Belmiro Moreira @belmiromoreira belmiro.moreira@cern.ch Ricardo Rocha @ahcorporto ricardo.rocha@cern.ch Fundamental Science Founded in 1954 What is 96% of the universe
Evolution of OpenStack Networking at CERN
Nova Network, Neutron and SDN
Belmiro Moreira @belmiromoreira belmiro.moreira@cern.ch Ricardo Rocha @ahcorporto ricardo.rocha@cern.ch
Founded in 1954
What is 96% of the universe made of?
Fundamental Science
Why isn’t there anti-matter in the universe? What was the state of matter just after the Big Bang?
6
7
CELL 1
TOP
CELL 2 CELL N Compute GPU Compute Nova Network Neutron Neutron
Capabilities
CPU Pinning Huge Pages SMP GPU ...
Configuration
Neutron vs Nova Network Allowed Projects ...
Moving from CellsV1 to CellsV2 at CERN, Mon 21 11:35
Scalability & Flexibility
8 CELL
NODE 2 NODE 1 VN V2 V1 V3 V2 V1 V3
Hypervisors Virtual Machines
- Order of ~10s of cells (currently 70), with ~200 hypervisors per cell
- Number of virtual machines per hypervisor varies per use case
○
From 4 to 30 VMs per hypervisor
9 CELL
NODE 2 NODE 1 VN V2 V1 V3 V2 V1 V3
S513-V-IP123 137.1XX.43.0/24 ( Primary Service ) S513-V-VM908 188.1XX.191.0/24 ( Secondary Service ) Hypervisors Virtual Machines
- Flat but segmented network, with multiple broadcast domains
○
Scalability
○
Segmentation done on Primary Services
- Primary Services can have multiple Secondaries
- No route if Secondary is in a different Primary
○
VM IP allocation must belong to the hypervisor’s Primary
10
LanDB
Source of Truth
- All devices must be present
- Used for different purposes
○ Security checks ○ DNS/DHCP Configuration
○
Switch/router configuration
○
Active Directory, …
Primary Services Secondary Services Hypervisors Virtual Machines IPv4 DNS IPv6 Aliases Aliases IPv6 Readiness Ownership ...
11
Phase 1. Nova Network Phase 2. Neutron Phase 3. SDN
12
Phase 1. Nova Network
- Custom NetworkManager
- Late IP allocation - after scheduling to compute nodes
- Patching done directly in the Nova code
- Nova Network is being deprecated...
○
Quantum is the new thing… Neutron is the new thing...
NOVA COMPUTE LanDB
NOVA DB
13
Phase 2. Neutron
- Linuxbridge, Flat / Provider networks
- Better integration using ML2, mechanism driver and extensions
○
Quickly became possible to have it out of tree
○
Our extensions have a similar role to Neutron Segments
- Gradual enroll, cell by cell
- Vanilla upstream packages for Neutron, much smaller patch on Nova
- More split pieces, potential points of failure
○
Periodic consistency checks
NOVA COMPUTE Neutron
1 2 3 4a
LanDB
4b
https://gitlab.cern.ch/cloud-infrastructure/openstack-neutron-cern
14
Phase 2. Neutron
Subnet Cluster Which subnets belong to this cluster?
neutron cluster-list +--------+----------------------+-------------------------------------------------------+ | id | name | subnets | +--------+----------------------+-------------------------------------------------------+ | ... | VMPOOL SXXXX-C-IPZZZ | ... 188.xxx.yy.zz/22 | | ... | VMPOOL SBBBB-C-IPWWW | ... 137.aaa.bb.ccc/25 | | | | ... 137.bbb.cc.0/25 | | | | ... 137.bbb.dd.0/25 | +--------+----------------------+-------------------------------------------------------+
15
Phase 2. Neutron
Host Restrictions Which subnets can i use for this hypervisor?
neutron host p06253927y321a1 +----------------------------+--------------------------------------+ | Field | Value | +----------------------------+--------------------------------------+ | all_subnets | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | | available_random_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | | available_subnets | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | | least_available_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | | most_available_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 | +----------------------------+--------------------------------------+
16
Phase 2. Neutron
- Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
3 Virtual Machines
→5x 64GB Virtual Machines
~default rabbit configuration ~default neutron configuration ~looking ok(ish)
< 1000 Nodes
17
Phase 2. Neutron
- Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
Cluster crashes once, crashes constantly
Cannot allocate 1318267840 bytes of memory (of type "heap").
Statistics db issues →collect_statistics_interval = 60000 Agents (too) aggressively trying to reconnect →rabbit_retry_backoff = 60 Agents not re-connecting properly →restart neutron servers Scale up Rabbit nodes, larger VMs
1200 Nodes
18
Phase 2. Neutron
- Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
Cluster crashes periodically
Lots of queued messages, until it goes ( neutron server ) →rpc_thread_pool_size = 2048 →rpc_conn_pool_size = 60 →rpc_response_timeout = 120 →rpc_workers = 4 ( rabbit ) →tcp_backlog: 4096 →tcp_listen_options { reuseaddr: true, keepalive, true } →tcp_keepalive = true →rabbitmq_server_erl_args = '+K +A128 +P 1048576' →vm_memory_high_watermark = 0.8 →ulimits (65536 for nofile/nproc soft and hard) →cluster_partition_handling = autoheal 2000 Nodes
19
Phase 2. Neutron
- Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
Cluster crashes less, but still happens
Lots of queued messages, until it goes ( rabbit virtual machines ) →ip link set %k txqueuelength 10000 ( neutron agent ) →report_interval=43200 ( neutron server ) →agent_downtime=86500
Other Considerations ( not done, not helpful )
→increase rpc_state_report_workers →heartbeat timeouts on the rabbit cluster 2400 Nodes
20
Phase 2. Neutron
- Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge Stable cluster →5x 64GB Virtual Machines Ocasional network partitions →recovering most times, but not always →procedure for a quick cluster rebuild (~10min downtime)
~5000 Nodes
Stable cluster →5x 64GB Virtual Machines Ocasional network partitions →recovering most times, but not always →procedure for a quick cluster rebuild (~10min downtime)
21
Phase 2. Neutron
- Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
~5000 Nodes
22
Phase 2. Neutron
- Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
~5000 Nodes
23
Phase 2. Neutron
Migrating existing cells from Nova Network
- Puppet for reconfiguration
- Custom command for the live VM changes
$ openstack network cluster migrate --dry-run --host p06146676a327ab $ openstack network cluster migrate --host p06146676a327ab $ openstack network cluster migrate --cluster ‘VMPOOL SXXXX-C-IPZZZ’
https://gitlab.cern.ch/cloud-infrastructure/python-neutronclient-cern
commands.extend([ "brctl delif %s %s" % (NOVA_BRIDGE, raw_device), "ip link set %s down" % NOVA_BRIDGE, "ip link set %s name %s" % (NOVA_BRIDGE, CERN_NETWORK_BRIDGE), "brctl addif %s %s" % (CERN_NETWORK_BRIDGE, raw_device), "ip link set %s up" % CERN_NETWORK_BRIDGE, "ip route add default via %s dev %s" % (gw, CERN_NETWORK_BRIDGE), ])
for instance in instances:
ip = instance.addresses['CERN_NETWORK'][0] mac = ip['OS-EXT-IPS-MAC:mac_addr'] nova_tap = nova_interfaces[mac] neutron_tap = nova_interfaces[mac] commands.extend([ "brctl delif %s %s" % (NOVA_BRIDGE, nova_tap), "ip link set %s name %s" % (nova_tap, neutron_tap), ])
24
Phase 3. SDN
- Current network deployment has significant limitations
- Limited IP Mobility
○
Segmented broadcast domains
○
Live migration limited to single cluster
○
Ad-hoc tunnels for hardware retirement campaigns
- Hardware Repurposing
○
Multiple network domains (General, Services, …)
○
Services dedicated to a single domain
- No Floating IPs
- No Tenant/Private Networks
25
Phase 3. SDN
- Small prototype setups to evaluate functionality
Neutron/OpenVSwitch OpenDaylight OVN DHCP Neutron Neutron/Built-in Built-in Floating IPs Yes Yes Yes Distributed Routing Only with DVR Yes Yes Tunneling Protocols vxlan / GRE / geneve vxlan / GRE / geneve vxlan / geneve Security Groups IPTables OpenFlow Native OpenFlow Native + Logging Load Balancing Octavia Octavia Octavia / OVN Native Acceleration Limited DPDK DPDK DPDK Tracing tcpdump tcpdump
- vn-trace
Physical Switch Integr. L2 / L3 L2 / L3 L2 / L3
26
Phase 3. SDN
- In the end we picked OpenContrail / Tungsten
27
Phase 3. SDN
- In the end we picked OpenContrail / Tungsten
OPENSTACK
WAN GATEWAY
XMPP BGP
CONTROLLER
NETCONF/EVPN OVSDB
CONTROLLER HYPERVISOR HYPERVISOR PHYSICAL PHYSICAL
VROUTER
MPLSoUDP/GRE VXLAN
28
Phase 3. SDN
- In the end we picked OpenContrail / Tungsten
OPENSTACK
WAN GATEWAY
XMPP BGP
CONTROLLER
NETCONF/EVPN OVSDB
CONTROLLER HYPERVISOR HYPERVISOR PHYSICAL PHYSICAL
VROUTER
MPLSoUDP/GRE VXLAN
Cassandra, Config, Analytics, ... https://github.com/Juniper/contrail-helm-deployer
29
Phase 3. SDN
- In the end we picked OpenContrail / Tungsten
OPENSTACK
WAN GATEWAY
XMPP BGP
CONTROLLER
NETCONF/EVPN OVSDB
CONTROLLER HYPERVISOR HYPERVISOR PHYSICAL PHYSICAL
VROUTER
MPLSoUDP/GRE VXLAN
Neutron ML2 vs Monolithic Separate Region
- Scaling Neutron was not trivial, mostly due to the agents / rabbitmq
○ Deployed in production and stable
- Currently finalizing the migration from Nova Network to Neutron
- Evaluated different SDN solutions
- Ongoing work deploying Tungsten in a new Region
- Looking forward to offer Floating IPs, Private Networks and much more
Summary
30
31