Concurrent Servers: Part 5 - Redis case study

This is part 5 in a series of posts on writing concurrent network servers. After discussing techniques for constructing concurrent servers in parts 1-4, this time we're going to do a case study of an existing production-quality server - Redis.

Redis is a fascinating project and I've been following it with interest for a while now. One of the things I admire most about Redis is the clarity of its C source code. It also happens to be a great example of a high-performance concurrent in-memory database server, so the opportunity to use it as a case study for this series was too good to ignore.

Let's see how the ideas discussed in parts 1-4 apply to a real-world application.

All posts in the series:

Event-handling library

One of Redis's main claims to fame around the time of its original release in 2009 was its speed - the sheer number of concurrent client connections the server could handle. It was especially notable that Redis did this all in a single thread, without any complex locking and synchronization schemes on the data stored in memory.

This feat was achieved by Redis's own implementation of an event-driven library which is wrapping the fastest event loop available on a system (epoll for Linux, kqueue for BSD and so on). This library is called ae. ae makes it possible to write a fast server as long as none of the internals are blocking, which Redis goes to great lengths to guarantee [1].

What mainly interests us here is ae's support of file events - registering callbacks to be invoked when file descriptors (like network sockets) have something interesting pending. Like libuv, ae supports multiple event loops and - having read parts 3 and 4 in this series - the signature of aeCreateFileEvent shouldn't be surprising:

int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask,
                      aeFileProc *proc, void *clientData);

It registers a callback (proc) for new file events on fd, with the given event loop. When using epoll, it will call epoll_ctl to add an event on the file descriptor (either EPOLLIN, EPOLLOUT or both, depending on the mask parameter). ae's aeProcessEvents is the "run the event loop and dispatch callbacks" function, and it calls epoll_wait under the hood.

Handling client requests

Let's trace through the Redis server code to see how ae is used to register callbacks for client events. initServer starts it by registering a callback for read events on the socket(s) being listened to, by calling aeCreateFileEvent with the callback acceptTcpHandler. This callback is invoked when new client connections are available. It calls accept [2] and then acceptCommonHandler, which in turn calls createClient to initialize the data structures required to track a new client connection.

createClient's job is to start listening for data coming in from the client. It sets the socket to non-blocking mode (a key ingredient in an asynchronous event loop) and registers another file event callback with aeCreateFileEvent - for read events - readQueryFromClient. This function will be invoked by the event loop every time the client sends some data.

readQueryFromClient does just what we'd expect - parses the client's command and acts on it by querying and/or manipulating data and sending a reply back. Since the client socket is non-blocking, this function has to be able to handle EAGAIN, as well as partial data; data read from the client is accumulated in a client-specific buffer, and the full query may be split across multiple invocations of the callback.

Sending data back to clients

In the previous paragraph I said that readQueryFromClient ends up sending replies back to clients. This is logically true, because readQueryFromClient prepares the reply to be sent, but it doesn't actually do the physical sending - since there's no guarantee the client socket is ready for writing/sending data. We have to use the event loop machinery for that.

The way Redis does this is by registering a beforeSleep function to be called every time the event loop is about to go sleeping waiting for sockets to become available for reading/writing. One of the things beforeSleep does is call handleClientsWithPendingWrites. This function tries to send all available replies immediately by calling writeToClient; if some of the sockets are unavailable, it registers an event-loop callback to invoke sendReplyToClient when the socket is ready. This can be seen as a kind of optimization - if the socket is immediately ready for sending (which often is the case for TCP sockets), there's no need to register the event - just send the data. Since sockets are non-blocking, this never actually blocks the loop.

Why does Redis roll its own event library?

In part 4 we've discussed building asynchronous concurrent servers using libuv. It's interesting to ponder the fact that Redis doesn't use libuv, or any similar event library, and instead implements its own - ae, including wrappers for epoll, kqueue and select. In fact, antirez (Redis's creator) answered precisely this question in a blog post in 2011. The gist of his answer: ae is ~770 lines of code he intimately understands; libuv is huge, without providing additional functionality Redis needs.

Today, ae has grown to ~1300 lines, which is still trivial compared to libuv's 26K (this is without Windows, test, samples, docs). libuv is a far more general library, which makes it more complex and more difficult to adapt to the particular needs of another project; ae, on the other hand, was designed for Redis, co-evolved with Redis and contains only what Redis needs.

This is another great example of the dependencies in software projects formula I mentioned in a post earlier this year:

The benefit of dependencies is inversely proportional to the amount of effort spent on a software project.

antirez referred to this, to some extent, in his post. He mentioned that dependencies that provide a lot of added value ("foundational" dependencies in my post) make more sense (jemalloc and Lua are his examples) than dependencies like libuv, whose functionality is fairly easy to implement for the particular needs of Redis.

Multi-threading in Redis

For the vast majority of its history, Redis has been a purely single-threaded affair. Some people find this surprising, but it makes total sense with a bit of thought. Redis is inherently network-bound - as long as the database size is reasonable, for any given client request, much more time is spent waiting on the network than inside Redis's data structures.

These days, however, things are not quite that simple. There are several new capabilities in Redis that use threads:

"Lazy" freeing of memory.
Writing a persistence journal with fsync calls in a background thread.
Running user-defined modules that need to perform a long-running operation.

For the first two features, Redis uses its own simple bio library (the acronym stands for "Background I/O"). The library is hard-coded for Redis's needs and can't be used outside it - it runs a pre-set number of threads, one per background job type Redis needs.

For the third feature, Redis modules could define new Redis commands, and thus are held to the same standards as regular Redis commands, including not blocking the main thread. If a custom Redis command defined in a module wants to perform a long-running operation, it has to spin up a thread to run it in the background. src/modules/helloblock.c in the Redis tree provides an example.

With these features, Redis combines an event loop with threading to get both speed in the common case and flexibility in the general case, similarly to the work queue discussion in part 4 of this series.

[1]

A core aspect of Redis is its being an in-memory database; therefore, queries should never take too long to execute. There are all kinds of complications, however. In case of partitioning, a server may end up routing the request to another instance; in this case async I/O is used to avoid blocking other clients.

[2]	Through `anetAccept`; `anet` is Redis's wrapper for TCP socket code.