going brokerless the transition from qpid to 0mq
play

Going brokerless: The transition from Qpid to 0mq Paul Mathews, - PowerPoint PPT Presentation

Going brokerless: The transition from Qpid to 0mq Paul Mathews, Systems Architect EIG/Bluehost OpenStack Summit, November 2013 RPC Messaging One major limiting factor we encountered when scaling OpenStack Many services depend upon a


  1. Going brokerless: The transition from Qpid to 0mq Paul Mathews, Systems Architect EIG/Bluehost OpenStack Summit, November 2013

  2. RPC Messaging ● One major limiting factor we encountered when scaling OpenStack ● Many services depend upon a reliable messaging system – Compute – Neutron – Conductor – Celiometer – VNC – Cells

  3. Why Qpid? ● Used by Redhat ● Clustering – Offered the “possibility” of horizontal scaling – Removed in 0.18

  4. Qpid experience ● Single instance was not reliable – Unable to scale – Single point of failure ● Compute connections to broker are lost – Restart of compute service required ● Problematic due to missed messages

  5. RabbitMQ? ● Similar design as Qpid ● Broker model has the same drawbacks ● Problematic experiences from other users

  6. Possible solutions for scaling broker ● Cells ● Clustering/HA

  7. Cells ● Cells do not address API Cell AMQP performance issues Broker ● Cells lower the load on individual brokers Child Cell Child Cell ● Magnify problems AMQP AMQP Broker Broker when they occur by chaining services Grandchild Cell Grandchild Cell AMQP AMQP Broker Broker

  8. Clustering/HA ● Qpid – Clustering is slow, and unreliable – Node sync causes problems – New HA module is active/passive ● RabbitMQ – Does have an active/active (HA) mode – Complicated setup, many moving pieces ● Scaling a broker is not practical – At best, minimal gains with addition of nodes – Loss of nodes causes further issues

  9. No more brokers!!! ● Brokers are a single point of failure ● Not horizontally scalable ● Reduced reliability with addition of nodes ● HA provides minimal benefit, and adds complexity

  10. Requirements for messaging ● No single point of failure ● Horizontally scalable ● Reliable at scale

  11. ● No more centralized broker – Receiver on each node – Routing handled by matchmaker { "scheduler": [ "sched1", "sched2" ], "consoleauth": [ "cauth1", "cauth2" ] }

  12. Messaging topologies Brokers are limited to ZeroMQ is a partially- a star topology connected mesh Conductor Conductor Scheduler API Scheduler API Celiometer Broker Compute Celiometer Compute Console Console Compute Compute auth auth Compute Compute

  13. Flexibility ● Brokers have a rigidly defined structure – Queues, exchanges, subscriptions, fanouts ● ZeroMQ has four simple methods – Connect, bind, send, recv ● ZeroMQ lets us define our own messaging models

  14. Lightweight messaging ● ZeroMQ uses simple socket connections – Low resource utilization ● FAST

  15. RPC Cast Performance (on a single-core VM) 2000 1800 1600 1400 1200 1000 More is better Casts/Second 800 600 400 200 0 ZeroMQ Qpid RabbitMQ

  16. ZeroMQ Configuration ● Edit nova.conf to use zmq rpc_backend = nova.openstack.common.rpc.impl_zmq rpc_zmq_matchmaker = nova.openstack.common.rpc.matchmaker.MatchMakerRing matchmaker_ringfile = /etc/nova/matchmaker.json rpc_zmq_ipc_dir = /var/run/zeromq ● Configure matchmaker file { "scheduler": [ "sched1", "sched2" ], "consoleauth": [ "cauth1", "cauth2" ] } ● Start zmq-receiver

  17. RPC Migration ● You can't get there from here! – No easy way to move between messaging systems – No logical divisions ● Only one backend allowed – All or nothing switch

  18. We need a new solution ● Moving between messaging systems is painful – Prior strategy will not work ● Tens of thousands of nodes to migrate ● Need to migrate with little or no downtime ● Rollout must allow deployment to individual servers

  19. Dual messaging backends ● Nodes use both Qpid and ZeroMQ messaging backends concurrently ● Code can be rolled out without affecting behavior, and enabled later – Change config, and start the ZeroMQ receiver ● Once dual backends are enabled, ZeroMQ is attempted first, then fails over to Qpid.

  20. Dual message backends 1. Deploy config to controller nodes Compute1 Qpid ? g n i n e t s i l r e v Outgoing message i e c e r NO Q to Compute1 M Z Request Controller Compute2 Nodes ZMQ Receiver

  21. Dual message backends 1. Deploy config to controller nodes 2. Deploy config to compute nodes Compute1 Qpid Outgoing message to Compute2 Request ZMQ receiver listening? Controller Compute2 Nodes YES ZMQ ZMQ Receiver Receiver

  22. Dual message backends 1. Deploy config to controller nodes 2. Deploy config to compute nodes Outgoing message to Compute2 via Qpid Compute1 Call Controller Qpid Node Controller Compute2 Nodes ZMQ ZMQ Receiver Receiver

  23. Configuring dual backends ● Change rpc_backend to impl_zmq, but retain qpid_hostname setting qpid_hostname = qpid1 rpc_backend = nova.openstack.common.rpc.impl_zmq rpc_zmq_matchmaker = nova.openstack.common.rpc.matchmaker.MatchMakerRing matchmaker_ringfile = /etc/nova/matchmaker.json rpc_zmq_ipc_dir = /var/run/zeromq ● Nodes will leverage the qpid_hostname value and connect to Qpid, but will attempt delivery via ZeroMQ first ● Once switched, nodes will accept incoming messages from either Qpid or ZeroMQ

  24. Migrating to ZeroMQ ● Dual backend code meant minimal downtime ● Migration was smooth, without unexpected losses in messaging ● Connection checks to the ZeroMQ receiver do not seem to cause undue stress to nodes

  25. ZeroMQ in production ● More reliable than a broker ● Faster than a broker ● Solves scalability issues

  26. Lingering issues ● Occasionally nova-compute stops processing queued messages

  27. Dual backend code ● https://github.com/paulmathews/nova

  28. Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend