[PPT] - RabbitMQ at Scale, Lessons Learned OpenStack Summit Barcelona PowerPoint Presentation

SLIDE 1

OpenStack Summit Barcelona

RabbitMQ at Scale, Lessons Learned

Matthew Popow, Weiguo Sun, Wei Tie, Scott Pham, Kerry Miles @mattpopow October 26, 2016

SLIDE 2

2

Overcloud / Undercloud
800+ Nova Compute Nodes, 700 Routers, 1,000+ networks, 10,000 + ports
Each OpenStack service is run on 3 controller VMs
Neutron & OVS & L3 agent
RabbitMQ Cluster
3 x 4 vCPU 8GB RAM
3 node Active / Active cluster
RHEL 7, RabbitMQ 3.3.5-22, ErlangR16B-03.7
Icehouse & Juno (OSP)
No heartbeat or QoS

Environment Details

SLIDE 3

3

When things go astray

SLIDE 4

SLIDE 5

5

nova compute services down/flapping
Instances failing to boot, waiting for port binding
Neutron agent timeouts
RabbitMQ queues growing

Timeout waiting on RPC response - topic: "q-plugin", RPC method: "report_state" info: "<unknown>”

Symptoms of RabbitMQ Issues

SLIDE 6

6

Restarting services can compound the issue

SLIDE 7

7

OpenStack Configuration Parameters

Neutron & Nova Client Configuration Enlarge rpc pool

rpc_thread_pool_size = 2048
rpc_conn_pool_size = 60

Extend timeouts

rpc_response_timeout = 960 (especially for large neutron stacks)
get_active_network_info(), sync_state() (dhcp-agent, l3-agent)
performance optimization in Kilo

Add more workers/consumers

rpc_workers = 4 (we run 3 neutron controllers)

SLIDE 8

8

3 neutron rpc_workers, 10k backlog

SLIDE 9

9

18 neutron rpc_workers, 10k backlog

SLIDE 10

10

Client disconnect & (404) errors

Even with RPC tuning, frequent disconnects / reconnects
With reconnect seeing 404
Agent restart was needed
OVS flows reloaded (pre-Liberty) L

(404) NOT_FOUND - no queue 'q-agent-notifier- network- delete_fanout_a4db343065984f74971fe0080013744e' in vhost '/'

SLIDE 11

11

Kombu Driver (Icehouse) / Oslo

Race condition with auto-delete queues
Before Juno, auto-delete was not configurable for Neutron
When reconnect occurs, race with queue declaration and auto-delete
Backport Oslo driver
Kombu driver improvements for Neutron

SLIDE 12

12

neutron/openstack/common/rpc/impl_kombu.py https://bugs.launchpad.net/neutron/+bug/1393391 Ubuntu Cloud Archive Patch

SLIDE 13

13

Connection issues?

SLIDE 14

14

RabbitMQ Erlang Configuration

RABBITMQ_SERVER_ERL_ARGS="+K true +A128 +P 1048576

kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<5000:64/native>>}]
kernel inet_default_listen_options [{raw,6,18,<<5000:64/native>>}]”
+K true # sets keepalive
+A 128
sets Erlang VM I/O Thread Pool Size
{raw,6,18,<<5000:64/native>>}
Sets TCP_USER_TIMEOUT to 5 seconds, with the idea being to quickly detect

when an established connection fails

Common config recommendation

SLIDE 15

15

TCP_USER_TIMEOUT notes

Note, setting TCP_USER_TIMEOUT will override tcp_keepalive timers if its shorter.
Dropping a single TCP keepalive packet could trigger a socket teardown.
Can happen between RabbitMQ and Client or between cluster

=ERROR REPORT==== 22-May-2016::08:36:42 === closing AMQP connection <0.24752.1> (10.203.106.41:35234 -> 10.203.108.11:5672): {inet_error,etimedout} RPC QoS added in Liberty: https://bugs.launchpad.net/oslo.messaging/+bug/1531222

SLIDE 16

16

Virtualizing Control Plane

Default KVM txqueuelen is tiny, 500 packets

# ifconfig tap79920654-fa tap79920654-fa: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether fe:16:3e:71:7c:d3 txqueuelen 10000 (Ethernet) RX packets 8360296 bytes 2076339428 (1.9 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 5544232 bytes 793456462 (756.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

# Add udev rule to Set Tap Interface TX Queue Length KERNEL=="tap*", RUN+="/sbin/ip link set %k txqueuelen 10000”

SLIDE 17

17

Never Suspend RabbitMQ VM
Monitor Hypervisor for issues
CPU soft lockup (can trigger partition)
Disk / Memory Contention
RAID / IO controller resets

Virtualizing Control Plane Cont.

SLIDE 18

18

RabbitMQ Configuration

[ {rabbit, [ {cluster_nodes, {['rabbit@rabbitmq-001', 'rabbit@rabbitmq-002', 'rabbit@rabbitmq-003'], disc}}, {cluster_partition_handling, pause_minority}, {vm_memory_high_watermark, 0.4}, {tcp_listen_options, [binary, {packet,raw}, {reuseaddr,true}, #reuse sockets in TIME_WAIT, not safe for NAT {backlog,128}, {nodelay,true}, # disabling Nagle’s Algorithm for increased throughput {exit_on_close,false}, {keepalive,true}]} #enable tcp keepalives ]} ].

SLIDE 19

19

RabbitMQ Process Level Tuning

Limit Soft Limit Hard Limit Units Max CPU Time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size unlimited unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes unlimited unlimited processes Max open files 65536 65536 files Max locked memory unlimited unlimited bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals unlimited unlimited signals Max msgqueue size unlimited unlimited bytes Max nice priority unlimited unlimited Max realtime priority unlimited unlimited Max realtime timeout unlimited unlimited us

SLIDE 20

20

i.e. RHEL/CentOS:
cat /proc/$(cat /var/run/rabbitmq/pid)/limits

Verify Process Limits

SLIDE 21

21

Pause minority vs. autoheal
CAP Theorem; Consistency vs. Availability
Pause minority require quorum, will pause if only one node is alive
Autoheal & pause_minority not perfect
Partition monitoring and alerting
Automation to restore partition
Wipe/var/lib/rabbitmq/mnesia on problem node, restart RabbitMQ

Partition Handling

SLIDE 22

22

Queue Mirroring

Set by RabbitMQ policy
{“ha-mode”:”all”}
rabbit_ha_queues = True #depricated
Not applicable for RabbitMQ > 3.x
Mirroring not needed for RPC, expensive
Only mirror billing queues, notification
Deployment examples in Liberty without queue mirroring
Policy change likely requires cluster restart

SLIDE 23

23

Default TCP settings are not ideal
2 hours before first TCP probe is sent
Then resend probe every 75 seconds
No ACK for 9 times, then mark connection as dead
Adjusting can help with client failover / reconnection

Operating System Tuning

Parameter Value net.ipv4.tcp_keepalive_time 5 net.ipv4.tcp_keepalive_probes 5 net.ipv4.tcp_keepalive_intvl 1 net.ipv4.tcp_retries2: 3

SLIDE 24

24

Monitoring RabbitMQ

Health Check / Query each node (rabbitmqadmin)
cluster health / partition status
erlang mem util vs high-water mark
file descriptors used
sockets used
process utilization
system memory
disk utilization
queues and number of unacked messages
total unacked messages
rabbitmq.log for alarm sets:

“New connections will not be accepted until this alarm clears”

SLIDE 25

25

Monitoring RabbitMQ Cont.

Synthetic Tests
Boot a VM; Create Router / Network / Ping VM
Create Volume
Upload Image
Failures of synthetic transaction can indicate RabbitMQ issue

SLIDE 26

26

Rabbitmqadmin vs rabbitmqctl
Before 3.6.0 rabbitmqctl list commands did not stream
Hang with stuck queues
Monitor memory management of stats database
rabbitmqctl status
rabbitmqctl eval 'exit(erlang:whereis(rabbit_mgmt_db), please_terminate).'
Disabling RabbitMQ UI
Adjusting collect_statistics_interval, default 5000ms
rabbitmqctl eval 'application:set_env(rabbit, collect_statistics_interval, 60000).’

Tips

SLIDE 27

27

Tips Cont.

Set policy for Queue TTL
{“expires”: “#ms”}
> rpc_mesage_timeout
0 consumers
Don’t use auto-delete queues
If lots of reconnect between client / server

investigate rpc tuning & network stack

rabbit_hosts=<randomize order>

SLIDE 28

28

Architectural Decisions

Nova, Neutron, Glance Cinder

Single Cluster vs. Many

Ceilometer, Heat

SLIDE 29

29

Troubleshooting oslo.messaging /

RabbitMQ issues (Austin 2016)

Troubleshooting RabbitMQ and Its

Stability Improvement (Tokyo 2015)

rabbitmq-users

Resources

SLIDE 30

30

Q&A

SLIDE 31

mpopow@cisco.com wesun@cisco.com