[PPT] - RabbitMQ or Qpid Dispatch Router: Pushing Ali Sanhaji OpenStack PowerPoint Presentation

SLIDE 1

Ali Sanhaji Javier Rojas Balderrama Matthieu Simonin

OpenStack Summit | Berlin 2018

RabbitMQ or Qpid Dispatch Router: Pushing OpenStack to the Edge

SLIDE 2

Who’s here

Ali Sanhaji Research engineer at Orange, France Javier Rojas Balderrama Research engineer at Inria, France Matthieu Simonin* Research engineer at Inria France

SLIDE 3

Agenda

Bringing OpenStack to the edge
RabbitMQ and Qpid Dispatch Router for OpenStack over a WAN
Performance evaluation
Conclusions and next steps

SLIDE 4

Core Network DC1 DC2 Regional sites Local sites Edge sites

Scalability
Locality
Placement
Resiliency
...

Challenges at the edge

SLIDE 5

OpenStack to the edge

For a telco like Orange, pushing OpenStack to the edge is key
How to deploy OpenStack in small edge sites

(control plane + compute nodes)? ○ Costly and too many control planes to manage and synchronize ○ ⇒ Have a centralized control plane (APIs) and remote compute nodes

OpenStack scalability (stateless processes)
OpenStack over a WAN

SLIDE 6

WAN

Core

Keystone Glance Nova (control) Neutron (control) Horizon

Edge

Edge Deployment:

Centralized control services
Remote edge compute nodes

Communication between edge and core

RPC traffic (periodic tasks, control traffic)
Rest API calls (e.g., between nova and glance)

Deployment under consideration

Nova (agent) Neutron (agent) Nova (agent) Neutron (agent) Nova (agent) Neutron (agent)

SLIDE 7

The message bus in OpenStack

One critical component in OpenStack is the message bus

used for interprocess communication

Used by processes to send various RPCs:

○ call: request from client to server, client waiting for response ○ cast: request from client to server, no response (direct notification) ○ fanout: request from client to multiple servers, no response (grouped notification)

Nova API Nova compute Nova scheduler Nova conductor

message bus

Neutron server Neutron agent

message bus

Client Server Request Response Client Server RPC Client Server RPC Server Server

Call Cast Fanout

SLIDE 8

Broker cluster Router topology

The message bus in OpenStack

Processes use oslo.messaging library to send RPCs

It supports multiple underlying messaging implementations

Nova API Nova compute Nova scheduler Nova conductor

slo.messaging

Neutron server Neutron agent RabbitMQ (AMQP 0.9.1) QPID Dispatch Router (AMQP 1.0) RabbitMQ (AMQP 0.9.1) QPID Dispatch Router (AMQP 1.0) Client Server Server Client

slo.messaging

SLIDE 9

The message bus over a WAN

Central site Regional site 1 Edge site 3 Edge site 1 Edge site 2 Edge site 4 Edge site 5 Edge site 6 Regional site 2

Client 3 Server 3 Client 2 Server 2 Server 1 Client 1 Server 4

Broker cluster

SLIDE 10

The message bus over a WAN

Central site Regional site 1 Edge site 3 Edge site 1 Edge site 2 Edge site 4 Edge site 5 Edge site 6 Regional site 2

Client 3 Server 3 Client 2 Server 2 Server 1 Client 1 Server 4

SLIDE 11

Goal

Evaluate the performance of RabbitMQ and Qpid Dispatch Router over a WAN

○ How do they resist to WAN constraints (packet loss, latency, dropouts)? ○ Does a router fit better in a decentralized environment? ○ Are OpenStack operations still robust without a broker retaining messages? ○ Is a broker safer than a router? ○ How RPC communications (RabbitMQ and QDR) behave in a WAN?

SLIDE 12

What could go wrong in a WAN?

RPC client RPC server

Examples of two possible situations: latency/loss between client and bus

(e.g., nova-conductor sends a boot request to nova-compute)

latency/loss between server and bus

(e.g., nova-compute sends a vm state update to nova-conductor) RPC client RPC server

SLIDE 13

RPC client

RPC server

1

RPC client

RPC server

2

What could go wrong in a WAN?

In case of latency RPC calls:

Sender blocks for 2× latency

RPC casts (fire and forget semantics):

Correct semantic with QDR driver
Incorrect semantic with RabbitMQ driver
In 1. Sender waits for 2x latency (acks)
But higher guarantee on message delivery

SLIDE 14

Experiments

SLIDE 15

Context

○ Test plan of massively distributed RPCs

https://docs.openstack.org/performance-docs/latest/test_plans/ massively_distribute_rpc/plan.html

○ Two categories of experiments: 1. Synthetic (rabbitmq/qdr, decentralized configuration) 2. Operational (with OpenStack and centralized bus)

Network dropout
Latency and loss

SLIDE 16

Tools

○ EnOS for OpenStack deployment (virtualization, bare metal) https://github.com/BeyondTheClouds/enos ○ Grid’5000 a dedicated testbed for experiment-driven research

SLIDE 17

Synthetic experiment recap

OpenStack Summit Vancouver 2018 presentation

Evaluation of the implemented patterns of RPC messages in OpenStack
Brokers and routers scalabilities are similar, but router is lightweight and

achieves low latency message delivery especially under high load

Routers offers locality of the messages in decentralized deployments
Decentralization need to be applied to APIs and database

Openstack internal messaging at the edge: In depth evaluation

www.openstack.org/summit/vancouver-2018/summit-schedule/events/2100 7/openstack-internal-messaging-at-the-edge-in-depth-evaluation

SLIDE 18

https://hal.inria.fr/hal-01891567 × 9→17 × 8→27 × 2

SLIDE 19

WAN

Keystone Glance Nova (control) Neutron (control) Bus (RMQ/QDR)

Operational experiments

× 3 + 1 Edge nodes

Nova (agent) Neutron (agent) Nova (agent) Neutron (agent) Nova (agent) Neutron (agent)

× 100/400 Core node

Software

OpenStack stable/queens
Optimised Kolla based deployment
RabbitMQ version: v3.7.8
Qpid-Dispatch Router v1.3.0

Infrastructure

Hardware: Dell PowerEdge C6420 × 20
32 cores
193 GB RAM
Virtualized deployment
Core: 32 cores, 64 GB RAM
Edge: 2 cores, 4 GB RAM

SLIDE 20

Network dropout

Configuration

Iptables on the core nodes

(controller and network nodes)

Cron to schedule dropouts

○ Frequency: [5m, 10m] ○ Duration: [30s, 60s, 120s]

Rally

○

constant_for_duration

runner ■ Concurrency 5 ■ Duration 30m

OpenStack with 100 computes

Full deployment for each combination (set of parameters, bus)

SLIDE 21

Network dropout: boot_and_delete_servers

SLIDE 22

Network dropout: boot_and_delete_servers

SLIDE 23

Latency and Loss

Configuration

Parameters

○ Latency: [0, 5, 20, 40, 80, 120, 200] s ○ Loss: [0, 0.1, 0.2, 0.4, 0.8, 1.0, 2.0] %

Rally

○

constant runner

■ Concurrency: 5 ■ Iterations: 100

OpenStack

○ Computes: [100, 400]

Full deployment for each combination (set of parameters, bus)

SLIDE 24

100 computes

SLIDE 25

400 computes

SLIDE 26

100 computes

SLIDE 27

Timeline behind the scene of rally benchmarks (multicast/400 computes)

boot_server_and_attach_interface
create_and_delete_network
create_and_delete_port
create_and_delete_router
create_and_delete_security_groups
create_and_delete_subnet
set_and_clear_gateway

SLIDE 28

anycast queues

SLIDE 29

fanout queues

SLIDE 30

Conclusions

SLIDE 31

Summary

In front of WAN latency and loss, the router (no message

retention) is as effective at delivering messages as the broker (message retention)

Router is less resilient in the case of network dropouts
QDR consumes way less resources than RMQ

SLIDE 32

What’s next

Bring QDR closer to edge sites and to compute nodes in
rder to leverage routing capabilities
Bigger scale of compute nodes
Make OpenStack control plane even more decentralized

if possible (e.g., database)

SLIDE 33

https://beyondtheclouds.github.io ali.sanhaji@orange.com javier.rojas-balderrama@inria.fr matthieu.simonin@inria.fr