Scaling RabbitMQ

Jared W. Robinson

Vivint, Inc.

Who I am

Father of five awesome children
Enjoy programming and solving problems.
Love Linux, and have been running it since 1995, currently using Ubuntu
Enjoy working at Vivint, helping to scale our server-side infrastructure, which serves our customers and our security panels.

What I believe

Use good tools
Use what's available, practical
Try it, test it, benchmark it. If one idea doesn't work, get feedback, and try something else. Fail fast.

What's covered: 1 of 2

Basics
Work queues, then and now
Latency, then and now
Speed & Throughput
Fail-over

What's covered: 2 of 2

limits.conf
Transient vs durable trade-offs
Backpressure and RabbitMQ death
Grouping messages
Resources

What's not covered

Publish/Subscribe, Topics, RPC
Alternate technologies:
- Apache Kafka
- ZeroMQ

Notes

Excellent summary of the differences and trade-offs between RabbitMQ and Apache Kafka by Stuart Charlton, of Pivotal Software: http://www.quora.com/Which-one-is-better-for-durable-messaging-with-good-query-features-RabbitMQ-or-Kafka

a) Use Kafka if you have a fire hose of events (100k+/sec) you need delivered in partitioned order 'at least once' with a mix of online and batch consumers, you want to be able to re-read messages, you can deal with current limitations around node-level HA (or can use trunk code), and/or you don't mind supporting incubator-level software yourself via forums/IRC.

b) Use Rabbit if you have messages (20k+/sec) that need to be routed in complex ways to consumers, you want per-message delivery guarantees, you don't care about ordered delivery, you need HA at the cluster-node level now, and/or you need 24x7 paid support in addition to forums/IRC.

Basics

What is Rabbit, and how does it help?
Clients: Java, .NET, C/C++, PHP, Python, Ruby, Go, etc.
Publishers, Brokers, Consumers and mortgages
Post-offices and messages: broadcast, direct, handlers
Message, exchange, queue, and binding
Routing key and "earnest money"
Connection, Channel
Work queue & direct exchange

Notes

Rabbit's AMQP concepts: https://www.rabbitmq.com/tutorials/amqp-concepts.html
Glossary of terms: http://pythonhosted.org/nucleon.amqp/glossary.html

In the beginning

Work queues
Round-robin distribution :( -- more on this later
Durability & Persistence

Notes

**** ASK: Who uses RabbitMQ?
**** ASK Who uses work queues? What are they for?
Why is round-robin distribution bad?
Which part of the system should ideally persist messages? As much as possible, the original producers, or the leaves in the graph.

Did you know?

Good book: RabbitMQ in Action
Practical advice, born from experience
Discusses High-Availability Pitfalls & Solutions
- especially with regard to durable queues

Latency

Delivery by carrier pigeon isn't good enough. Nor do we want delivery by cargo-ship
"Latency is the root of all that is evil on the Internet... or so the saying goes." -- Theo Schlossnagle
In the beginning: Round-robin distribution & slow workers
Database: Contention, Slow queries, Expensive queries, Non-indexed queries
CPU contention and VMs on the same hypervisor
Slow third-party services

Latency & Prefetch

Prefetch protects us against slow consumers
Declared by the client, at channel creation time

  channel = amqp.Connection("my.broker.com:5672", ...)


  channel.basic_qos(
     prefetch_size=0, 
     prefetch_count=10, 
     a_global=False)
  

  channel.basic_consume("weather", ...)

Notes

import amqplib.client_0_8 as amqp
channel = amqp.Connection("mr.broker.com:5672", "myuser", "mypassword")
channel.basic_qos(prefetch_size=0, prefetch_count=10, a_global=False)

channel.basic_consume("weather", no_ack=False, callback=my_callback, consumer_tag=CONSUMER_TAG_CONST)

Did you know?

When publishing to an exchange that isn't bound to a queue, there are no errors from the client perspective.

Throughput & Speed

"Fast hands make light work"?
Our max speed is ~10K messages/sec (w/o HiPE)
HiPE
- 20-50% better performance
- not available on Windows
- Experimental. Disable if it segfaults.
- not available with RHEL/CentOS erlang
Speed of hardware
Is it running in VM? CPU/RAM contention.
Do too many consumers slow it down?

I benchmarked the maximum throughput of our hardware around 10,000 messages/second for non-durable queues, non-persistent messages. Each message was less than 4K. YMMV.

Throughput needs

As of March, 2015:
- 50 user-initiated commands/second
- 1,500 inbound signals/second
- 7,000 messages/second on our busiest queue
Doubling this year
Need headroom for spikes and a buffer for growth beyond this year
For coming year:
- 21,000 messages/second on our busiest queue
- Even with HiPE, a single work queue isn't sufficient

Throughput via sharding

"Many hands make light work"

Throughput: Are you my solution?

Are you my solution?
- rabbitmq-consistent-hash-exchange
- rabbitmq-sharding
- client-side, shared nothing
- Federated Queues
Try it, test it, benchmark it. Fail fast & move on.

Consistent Hash Exchange

Plugin ships with Rabbit!
Gracefully handles the disappearance of a queue on a crashed rabbit node
We determine link between the exchange and downstream queues (where queues live, how many there are, etc.)
A single RabbitMQ management console shows the entire cluster!

Notes

rabbitmq-plugins enable rabbitmq_consistent_hash_exchange
Use in combination with RabbitMQ clustering.
https://github.com/rabbitmq/rabbitmq-consistent-hash-exchange
http://rabbitmq.1065348.n5.nabble.com/Unexpected-Behavior-When-Using-the-quot-X-Consistent-Hash-quot-Exchange-Type-td30561.html

Consistent Hash Exchange configuration

$ umask 0022    -- important if you're doing this as root
$ rabbitmq-plugins enable rabbitmq_consistent_hash_exchange
The following plugins have been enabled:
  rabbitmq_consistent_hash_exchange
  Plugin configuration has changed. Restart RabbitMQ for changes to take effect.

$ service rabbitmq-server restart

$ /usr/sbin/rabbitmq-plugins list -E

Consistent Hash Exchange configuration

Declare the exchange as type "x-consistent-hash"
Bind the exchange to downstream queues (you decide where queues live, how many there are, etc.)
When binding the queue to the exchange, the routing key must be a number-as-a-string
- Insufficient: "2", Better: "101"
Producer must vary the per-message routing key -- random, or an id
Not round-robin -- not an even distribution
Don't forget cluster port and firewall configuration

Notes

Gotchas of using the exchange: http://rabbitmq.1065348.n5.nabble.com/Unexpected-Behavior-When-Using-the-quot-X-Consistent-Hash-quot-Exchange-Type-td30561.html
Port and firewall config: http://www.gettingcirrius.com/2013/01/configuring-iptables-for-rabbitmq.html

Documentation on the routing key:

The more points in the hash space each binding has, the closer the actual distribution will be to the desired distribution (as indicated by the ratio of points by binding). However, large numbers of points (many thousands) will substantially decrease performance of the exchange type.

Equally, it is important to ensure that the messages being published to the exchange have a range of different routing_keys: if a very small set of routing keys are being used then there's a possibility of messages not being evenly distributed between the various queues. If the routing key is a pseudo-random session ID or such, then good results should follow.

Consistent Hash Exchange Uneven Distribution

Notes

The binding key for the queues to the exchange was "2"

Consistent Hash Exchange Uneven Distribution

Notes

The binding key for the queues to the exchange was "10"

Consistent Hash Exchange Caveats

Unlike a simple work-queue, requires differing routing keys -- random, or use an id
Definitely consistent, but not an even distribution
Has a chance of losing messages when unbinding or deleting a queue
- There's a moment of time between the determination of which queue a message should be sent to, and the time it's published
Clients must be configured to evenly connect to the queue shards
Try it, test it, benchmark it. Fail fast & move on.

rabbitmq-sharding

Built by the Pivotal/RabbitMQ folks
Based on the consistent hash exchange
Auto-creates and binds the queues to the exchange!
Transparently auto-connects client to the shard with the least number of consumers!
Doesn't ship with Rabbit
https://github.com/rabbitmq/rabbitmq-sharding

https://github.com/rabbitmq/rabbitmq-sharding/blob/master/README.extra.md

rabbitmq-sharding configuration

Download from https://www.rabbitmq.com/community-plugins/v3.3.x/

cp rabbitmq_sharding-3.3.x.ez  \
  /usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/plugins/ 

rabbitmq-plugins enable rabbitmq_sharding
rabbitmq-plugins enable rabbitmq_consistent_hash_exchange

service rabbitmq-server restart

rabbitmqctl set_policy history-shard "^history" \
  '{"shards-per-node": 2, "routing-key": "1234"}' \
  --apply-to exchanges

rabbitmq-sharding client usage

Clients declare the exchange type as "x-modulus-hash"
Equal message distrubituion using uniform random routing key
- Use same routing key, and all messages will go to a single queue shard
When nodes appear and disappear, the exchange automatically creates queue shards on the new node.
- It re-creates missing shards from nodes where I deleted them.

rabbitmq-sharding caveats

Auto-creates and binds the queues to the exchange!
- Must decide how many queue shards per node. Want 3 queue shards on a four-node Rabbit cluster? Sorry.
Transparently auto-connects client to the shard with the least number of consumers!
- Only on a per node basis, not per cluster
When one node fills up, producers block, even though another node may be ready to accept messages
Has a chance of losing messages when unbinding or deleting a queue
Try it, test it, benchmark it. Fail fast & move on.

Did you know?

"Each connection uses a file descriptor on the server. Channels don't"
Publishing a large message on one channel will block a connection while it goes out.
"it's a good idea to separate publishing and consuming connections"

Source: http://stackoverflow.com/questions/18531072/rabbitmq-by-example-multiple-threads-channels-and-queues

Client-side sharding

Doesn't ship with Rabbit -- more work to implement
Shared-nothing RabbitMQ nodes
- Homegrown monitoring to combine metrics from multiple servers
Implement once for each language we use: Python, Java
Gives us fail-over!
Must balance consumers among the shards

Client-side sharding

Producers, Consumers, YAML

Producers: distribute messages round-robin
Producers: auto-detect node failure and reconnect after a period of time
Consumers: Choosing a shard to read from didn't give us an even distribution
Consumers: "sticky" -- no fail-over to a separate shard
- need extra consumers to handle the failure of a single node
Configuration is done via YAML
- Every producer knows about every shard
- Update and deploy using salt-cp

Client-side sharding

Benefits

If a node dies, we don't have to remove it from the cluster
Durable queues work well in the face of node failure (shared nothing)
Mix-and-match rabbit versions (atypical)

Client-side sharding monitoring

Homegrown monitoring

Client-side sharding Caveats

More work to implement and monitor
Producers still block when one of the nodes exerts backpressure (flow control)

Federated Queues

Distribute "the same 'logical' queue... over many brokers."
Configured via policy
Mix-and-match RabbitMQ versions
Performs "best when there is some degree of locality...."

Notes

https://www.rabbitmq.com/federated-queues.html

Pitfall: "cannot currently cause messages to traverse multiple hops between brokers based solely on need for messages in one place. For example, if you federate queues on nodes A, B and C, with A and B connected and B and C connected, but not A and C, then if messages are available at A and consumers waiting at C then messages will not be transferred from A to C via B unless there is also a consumer at B."

Federated Queues

Better together?

Combine with rabbit sharding plugins or with client-side sharding
Keep queues drained

Did you know?

Client can publish to an exchange that doesn't exist, without an error.

Default limits.conf

Default (insufficient) limits.conf for RHEL/CentOS/Fedora


*  soft  nproc   1024 # threads/processes
*  soft  nofile  1024 # Number of open files
*  hard  nofile  4096 # Number of open files


    Notes
See /etc/security/limits.d/90-nproc.conf and /etc/security/limits.conf
I don't see "nofile" in any configuration on RHEL/CentOS

limits.conf for RabbitMQ

Here's what I'm using for /etc/security/limits.conf


rabbitmq  soft  nproc   16384
rabbitmq  hard  nofile  16000
rabbitmq  soft  nofile  16000

Did you know?

A client can't read from a queue that doesn't exist. It causes an error.
Queues balloon in size when being fed, and no consumers are listening
- Use message TTL
- Consider combining TTL with a dead letter exchange
- Dead-letter exchange can be configured by clients, or via RabbitMQ policy
- rabbitmqctl set_policy DLX ".*" '{"dead-letter-exchange":"my-dlx"}' --apply-to queues

Notes

http://www.rabbitmq.com/ttl.html
https://www.rabbitmq.com/dlx.html

Fail-over

What happens to your system when a rabbit node dies?
Load balancer
Client-side fail-over
Consumer capacity
Note: Durable queues can't be re-created on a separate node in the cluster

Transient vs Durable

Opinion: The best place to persist messages is at the origin
Origin can resend when not acknowledged (publisher confirms)
Transient is faster than durable, although durable survives RabbitMQ restarts
RabbitMQ is fastest when few or no messages are in the queue
RabbitMQ stores transient messages to disk when the queue balloons in size

Did you know?

A queue can be bound to multiple exchanges, or the same exchange, multiple times, with different binding keys.

Backpressure and RabbitMQ death

When RabbitMQ gets busy (persisting messages to disk), it exerts backpressure on producers. Producers block on publish.
What if we don't want a producer to block?
RabbitMQ monitors disk space for persistent queues.
Bug: Out-of-disk space crashes RabbitMQ when transient messages are being paged to disk
- Restarting doesn't clear/fix the disk space consumed
- Remove /var/rabbitmq/mnesia/rabbit@YourHost/msg_store_transient/
- Consider using configuration setting "{vm_memory_high_watermark_paging_ratio, 1.1}"

Notes

http://stackoverflow.com/questions/10030227
http://www.rabbitmq.com/memory.html
https://www.rabbitmq.com/disk-alarms.html

RabbitMQ will block producers when free disk space drops below a certain limit. This is a good idea since even transient messages can be paged to disk at any time, and running out of disk space can cause the server to crash. By default RabbitMQ will block producers, and prevent memory-based messages from being paged to disk, when free disk space drops below 50MB. This will reduce but not eliminate the likelihood of a crash due to disk space being exhausted. In particular, if messages are being paged out rapidly it is possible to run out of disk space and crash in the time between two runs of the disk space monitor. A more conservative approach would therefore be to set the limit to the same as the amount of memory installed on the system (see the configuration below).

....By default 50MB is required to be free on the database partition.

Possible method to prevent RabbitMQ from crashing: Add the following setting to /etc/rabbitmq/rabbitmq.config
"{vm_memory_high_watermark_paging_ratio, 1.1}"

Reference: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-September/030458.html

Grouping messages

It may be possible to achieve higher throughput by combining your messages into a single Rabbit message
Increase in latency

Summary 1 of 2

Configure prefetch
Try HiPE
When scaling horizontally, there's more than one way to do it
Plan for fail-over
Configure limits.conf

Summary 2 of 2

Use good tools -- RabbitMQ is a good tool
It's available, cost-effective, practical
Try solutions, test solutions. If one idea doesn't work, get feedback, and try something else. Fail fast.

References

RabbitMQ in Action
Mailing list
stackoverflow on Maximizing throughput with RabbitMQ
RabbitMQ blog on queuing theory: throughput, latency and bandwidth
Building a Distributed Data Ingestion System with RabbitMQ by Alvaro Videla, co-author of RabbitMQ in Action

OpenWest @ UVU, May 2015