OpenStack Summit Barcelona
RabbitMQ at Scale, Lessons Learned
Matthew Popow, Weiguo Sun, Wei Tie, Scott Pham, Kerry Miles @mattpopow October 26, 2016
RabbitMQ at Scale, Lessons Learned OpenStack Summit Barcelona - - PowerPoint PPT Presentation
RabbitMQ at Scale, Lessons Learned OpenStack Summit Barcelona Matthew Popow, Weiguo Sun, Wei Tie, Scott Pham, Kerry Miles @mattpopow October 26, 2016 Environment Details Overcloud / Undercloud 800+ Nova Compute Nodes, 700 Routers,
OpenStack Summit Barcelona
Matthew Popow, Weiguo Sun, Wei Tie, Scott Pham, Kerry Miles @mattpopow October 26, 2016
2
3
5
Timeout waiting on RPC response - topic: "q-plugin", RPC method: "report_state" info: "<unknown>”
Symptoms of RabbitMQ Issues
6
7
Neutron & Nova Client Configuration Enlarge rpc pool
Extend timeouts
Add more workers/consumers
8
3 neutron rpc_workers, 10k backlog
9
18 neutron rpc_workers, 10k backlog
10
(404) NOT_FOUND - no queue 'q-agent-notifier- network- delete_fanout_a4db343065984f74971fe0080013744e' in vhost '/'
11
12
neutron/openstack/common/rpc/impl_kombu.py https://bugs.launchpad.net/neutron/+bug/1393391 Ubuntu Cloud Archive Patch
13
14
RABBITMQ_SERVER_ERL_ARGS="+K true +A128 +P 1048576
when an established connection fails
15
=ERROR REPORT==== 22-May-2016::08:36:42 === closing AMQP connection <0.24752.1> (10.203.106.41:35234 -> 10.203.108.11:5672): {inet_error,etimedout} RPC QoS added in Liberty: https://bugs.launchpad.net/oslo.messaging/+bug/1531222
16
# ifconfig tap79920654-fa tap79920654-fa: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether fe:16:3e:71:7c:d3 txqueuelen 10000 (Ethernet) RX packets 8360296 bytes 2076339428 (1.9 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 5544232 bytes 793456462 (756.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# Add udev rule to Set Tap Interface TX Queue Length KERNEL=="tap*", RUN+="/sbin/ip link set %k txqueuelen 10000”
17
18
[ {rabbit, [ {cluster_nodes, {['rabbit@rabbitmq-001', 'rabbit@rabbitmq-002', 'rabbit@rabbitmq-003'], disc}}, {cluster_partition_handling, pause_minority}, {vm_memory_high_watermark, 0.4}, {tcp_listen_options, [binary, {packet,raw}, {reuseaddr,true}, #reuse sockets in TIME_WAIT, not safe for NAT {backlog,128}, {nodelay,true}, # disabling Nagle’s Algorithm for increased throughput {exit_on_close,false}, {keepalive,true}]} #enable tcp keepalives ]} ].
19
Limit Soft Limit Hard Limit Units Max CPU Time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size unlimited unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes unlimited unlimited processes Max open files 65536 65536 files Max locked memory unlimited unlimited bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals unlimited unlimited signals Max msgqueue size unlimited unlimited bytes Max nice priority unlimited unlimited Max realtime priority unlimited unlimited Max realtime timeout unlimited unlimited us
20
21
22
23
Parameter Value net.ipv4.tcp_keepalive_time 5 net.ipv4.tcp_keepalive_probes 5 net.ipv4.tcp_keepalive_intvl 1 net.ipv4.tcp_retries2: 3
24
“New connections will not be accepted until this alarm clears”
25
26
27
investigate rpc tuning & network stack
28
Nova, Neutron, Glance Cinder
Ceilometer, Heat
29
30
mpopow@cisco.com wesun@cisco.com