Handling out-of-memory conditions in C

October 30th, 2009 at 7:57 am

We’ve all been taught that when malloc returns 0, it means the machine ran out of memory. This case should be detected and "handled" by our application in some graceful manner. But what does "handled" mean here? How does an application recover from an out of memory (OOM) condition? And what about the increased code complexity of checking all those malloc return values and passing them around?

In this article I want to discuss the common policies of handling OOM conditions in C code. There is no single right approach. Therefore, I will review the code of several popular applications and libraries, to find out how they do it in order to gain useful insights for my own programming.

Note that I focus on desktop & server applications here, not embedded applications, which deserve an article of their own.

The policies

Casting minor variations aside, it’s safe to say there are three major policies for handling OOM:

recovery

The recovery policy is the least commonly used because it’s the most difficult to implement, and is highly domain-specific. This policy dictates that an application has to gracefully recover from an OOM condition. By "gracefully recover", we usually mean one or more of:

  • Release some resources and try again
  • Save the user’s work and exit
  • Clean up temporary resources and exit

Recovery is hard. To be certain that your application recovers correctly, you must be sure that the steps it takes don’t require any more dynamic memory allocation. This sometimes isn’t feasible and always difficult to implement correctly. Since C has no exceptions, memory allocation errors should be carefully propagated to the point where they can be recovered from, and this sometimes means multiple levels of function calls.

abort

The abort policy is simple and familiar: when no memory is available, print a polite error message and exit (abort) the application. This is the most commonly used policy – most command-line tools and desktop applications use it.

As a matter of fact, this policy is so common that most Unix programs use a gnulib library function xmalloc instead of malloc:

void *
xmalloc (size_t n)
{
  void *p = malloc (n);
  if (!p && n != 0)
    xalloc_die ();
  return p;
}

When this function is called, its return value isn’t checked, reducing the code’s complexity. Here’s a representative usage from the find utility:

cur_path = xmalloc (cur_path_size);
strcpy (cur_path, pathname);
cur_path[pathname_len - 2] = '/';

segfault

The segfault policy is the most simplistic of all: don’t check the return value of malloc at all. In case of OOM, a NULL pointer will get dereferenced, so the program will die in a segmentation fault.

If there are proponents to this policy, they’d probably say – "Why abort with an error message, when a segmentation fault would do? With a segfault, we can at least inspect the code dump and find out where the fault was".

Examples – libraries

In this section, I present the OOM policies of a couple of well-known libraries.

Glib

Glib is a cross platform utility library in C, used most notably for GTK+. At first sight, Glib’s approach to memory allocation is flexible. It provides two functions (with several variations):

  • g_malloc: attempts to allocate memory and exits with an error if the allocation fails, using g_error [1]. This is the abort policy.
  • g_try_malloc: attempts to allocate memory and just returns NULL if that fails, without aborting.

This way, Glib leaves the programmer the choice – you can choose the policy. However, the story doesn’t end here. What does Glib use for its own utilities? Let’s check g_array for instance. Allocation of a new array is done by means of calling g_array_maybe_expand that uses g_realloc, which is implemented with the same abort policy as g_malloc – it aborts when the memory can’t be allocated.

Curiously, Glib isn’t consistent with this policy. Many modules use g_malloc, but a couple (such as the gfileutils module) use g_try_malloc and notify the caller on memory allocation errors.

So what do we have here? It seems that one of the most popular C libraries out there uses the abort policy of memory allocations. Take that into account when writing applications that make use of Glib – if you’re planning some kind of graceful OOM recovery, you’re out of luck.

SQLite

SQLite is an extremely popular and successful embedded database [2]. It is a good example to discuss, since high reliability is one of its declared goals.

SQLite’s memory management scheme is very intricate. The user has several options for handling memory allocation:

  • A normal malloc-like scheme can be used
  • Allocation can be done from a static buffer that’s pre-allocated at initialization
  • A debugging memory allocator can be used to debug memory problems (leaks, out-of-bounds conditions, and so on)
  • Finally, the user can provide his own allocation scheme

I’ll examine the default allocation configuration, which is a normal system malloc. The SQLite wrapper for it, sqlite3MemMalloc defined in mem1.c is:

static void *sqlite3MemMalloc(int nByte){
  sqlite3_int64 *p;
  assert( nByte>0 );
  nByte = ROUND8(nByte);
  p = malloc( nByte+8 );
  if( p ){
    p[0] = nByte;
    p++;
  }
  return (void *)p;
}

malloc is used to obtain the memory. Moreover, the size of the allocation is saved right in-front of the block. This is a common idiom for allocators that can report the size of blocks allocated when passed the pointers [3].

As you can see, the pointer obtained from malloc is returned. Hence, SQLite leaves it to the user to handle an OOM condition. This is obviously the recovery policy.

Examples – applications

OOM handling in a few relatively popular applications.

Git

Distributed version control is all the rage nowadays, and Linus Torvalds’ Git is one of the most popular tools used in that domain.

Git defines its own xmalloc wrapper:

void *xmalloc(size_t size)
{
      void *ret = malloc(size);
      if (!ret && !size)
              ret = malloc(1);
      if (!ret) {
              release_pack_memory(size, -1);
              ret = malloc(size);
              if (!ret && !size)
                      ret = malloc(1);
              if (!ret)
                      die("Out of memory, malloc failed");
      }
#ifdef XMALLOC_POISON
      memset(ret, 0xA5, size);
#endif
      return ret;
}

When it runs out of memory, Git attempts to free resources and retries the allocation. This is an example of the recovery policy. If the allocation doesn’t succeed even after releasing the resources, Git aborts.

lighttpd

Lighttpd is a popular web server, notable for its speed and low memory footprint.

There are no OOM checks in Lighttpd – it’s using the segfault policy. Following are a few samples.

From network_server_init:

srv_socket = calloc(1, sizeof(*srv_socket));
srv_socket->fd = -1;

From rewrite_rule_buffer_append:

kvb->ptr = malloc(kvb->size * sizeof(*kvb->ptr));

for(i = 0; i < kvb->size; i++) {
        kvb->ptr[i] = calloc(1, sizeof(**kvb->ptr));

And there are countless other examples. It’s interesting to note that Lighttpd uses the lemon parser generator, a library which itself adheres to the abort policy. Here’s a representative example:

PRIVATE acttab *acttab_alloc(void){
  acttab *p = malloc( sizeof(*p) );
  if( p==0 ){
    fprintf(stderr,"Unable to allocate memory for a new acttab.");
    exit(1);
  }
  memset(p, 0, sizeof(*p));
  return p;
}

Redis

Redis is a key-value database that can store lists and sets as well as strings. It runs as a daemon and communicates with clients using TCP/IP.

Redis implements its own version of size-aware memory allocation function called zmalloc, which returns the value of malloc without aborting automatically when it’s NULL. All the internal utility modules in Redis faithfully propagate a NULL from zmalloc up to the application layer. When the application layer detects a returned NULL, it calls the oom function which does the following:

/* Redis generally does not try to recover from out
 * of memory conditions when allocating objects or
 * strings, it is not clear if it will be possible
 * to report this condition to the client since the
 * networking layer itself is based on heap
 * allocation for send buffers, so we simply abort.
 * At least the code will be simpler to read... */
static void oom(const char *msg) {
    fprintf(stderr, "%s: Out of memory\n",msg);
    fflush(stderr);
    sleep(1);
    abort();
}

Note the comment above this function [4]. It very clearly and honestly summarizes why the abort policy is usually the most logical one for applications.

Conclusion

In this article, the various OOM policies were explained, and many examples were shown from real-world libraries and applications. It is clear that not all tools, even the commonly used ones, are perfect in terms of OOM handling. But how should I write my code?

If you’re writing a library, you most certainly should use the recovery policy. It’s impolite at the least, and rendering your library unusable at worst, to abort or dump core in case of an OOM condition. Even if the application that includes your library isn’t some high-reliability life-support controller, it may have ideas of its own for handling OOM (such as logging it somewhere central). A good library does not impose its style and idiosyncrasies on the calling application.

This makes the code a bit more difficult to write, though not by much. Library code is usually not very deeply nested, so there isn’t a lot of error propagation up the calling stack to do.

For extra points, you can allow the application to specify the allocators and error handlers your library will use. This is a good approach for ultra-flexible, customize-me-to-the-death libraries like SQLite.

If you’re writing an application, you have more choices. I’ll be bold and say that if your application needs to be so reliable that it must recover from OOM in a graceful manner, you are probably a programmer too advanced to benefit from this article. Anyway, recovery techniques are out of scope here.

Otherwise, IMHO the abort policy is the best approach. Wrap your allocation functions with some wrapper that aborts on OOM – this will save you a lot of error checking code in your main logic. The wrapper does more: it provides a viable path to scale up in the future, if required. Perhaps when your application grows more complex you’ll want some kind of gentle recovery like Git does – if all the allocations in your application go through a wrapper, the change will be very easy to implement.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1]

The documentation of g_error states:

A convenience function/macro to log an error message. Error messages are always fatal, resulting in a call to abort() to terminate the application. This function will result in a core dump; don’t use it for errors you expect. Using this function indicates a bug in your program, i.e. an assertion failure.

[2] Embedded in the sense that it can be embedded into other applications. Just link to the 500K DLL and use the convenient and powerful API – and you have a fast and robust database engine in your application.
[3] Here’s the size-checking function from the same file:
static int sqlite3MemSize(void *pPrior){
  sqlite3_int64 *p;
  if( pPrior==0 ) return 0;
  p = (sqlite3_int64*)pPrior;
  p--;
  return (int)p[0];
}
[4] I’ve reformatted it to fit on the blog page without horizontal scrolling.

Related posts:

  1. memmgr – a fixed-pool memory allocator
  2. Using goto for error handling in C
  3. Book review: “Memory management: Algorithms and Implementation in C/C++” by Bill Blunden
  4. Robust exception handling
  5. sloppy code, SourceForge CVS, memory manager

19 Responses to “Handling out-of-memory conditions in C”

  1. RogerNo Gravatar Says:

    BTW you are *completely* wrong about how SQLite handles out of memory. As you point out there are several different allocators, some more that come with the source you didn’t mention, plus you can provide your own. The allocator should return NULL when they have no memory to allocate.

    However the error handling is always the same – SQLite always backs off and ultimately fails the original library request. There is no crash or unrecoverability. It just keeps working. If you examine a lot more of the source you will see this. Additionally memory allocation failures are comprehensively tested by the test suite (along with 100% decision and branch coverage). SQLite also has transactions so again the right thing always happens there – it always remains ACID even in the face of memory allocation failures.

    The reason behind all this is quite simple – it is run on low end devices such as MP3 players where they may have 32kb or 64kb of memory (yes, kilobytes). They will run out of memory and SQLite cannot crash or become inoperable.

  2. AndréNo Gravatar Says:

    In git’s xmalloc – wouldn’t the memset write too much if there’s been a oom ? Because then it has alloced one byte but it tries to write size bytes? I’m just curious.

    Great article though. You might mention that in certain cases one allocates memory from designated pools so when e.g the connection pool runs out one cannot accept new connections but still use the existing ones.

  3. elibenNo Gravatar Says:

    Roger,

    How I’m I completely wrong? There’s no conflict between what I wrote and your comment. SQLite returns NULL to the user when it has no memory to allocate, so it’s the recovery policy. In other words, SQLite does the right thing – a library should notify the calling code about OOM instead of crashing or aborting. Did you read the article at all??

    Andre,

    I guess it would. But so would the user, having been “granted” his request for 10Kbytes with only a single byte allocated. Something is fishy about this :-)

  4. AivarsNo Gravatar Says:

    Eli and André – you’re both wrong about Git. Git tries to allocate 1 byte if and only if previous allocation failed and 0 bytes were requested:

    if (!ret && !size) /* that’s ret == NULL && size == 0 */

    And malloc man page says:
    If size was equal to 0, either NULL or a pointer suitable to be passed to free() is returned.

    Git version of xmalloc just makes sure that it always returns “a pointer suitable to be passed to free() “

  5. elibenNo Gravatar Says:

    Aivars,

    You’re right, of course, I didn’t notice that. I’ll fix the article. Thanks.

  6. banpeiNo Gravatar Says:

    GLib lets you change the malloc-like function called by g_malloc & friends to allocate memory. Take a look at http://library.gnome.org/devel/glib/unstable/glib-Memory-Allocation.html#GMemVTable

  7. AsimNo Gravatar Says:

    Quoting your article:

    “Recovery is hard. To be certain that your application recovers correctly, you must be sure that the steps it takes don’t require any more dynamic memory allocation. This sometimes isn’t feasible and always difficult to implement correctly.”

    I agree with your statement, but for some reason I was reminded for this hilarious article I read a while back:

    http://www.gamasutra.com/view/feature/4111/dirty_coding_tricks.php?print=1

    What do people think about the idea of heap allocating a block of memory that’s never used, and then in case of a malloc fail to free this empty piece of memory and attempt to recover from there-on out? Would this have solved Redis’s problem?

    There are important questions unanswered, like how do you know how much memory to reserve, and is such an extreme waste of resources worth it in the face of the improbability of a malloc fail? I’m not sure, this is just an idea.

  8. NonaneNo Gravatar Says:

    In the popular mail delivery server QMAIL the app keeps retrying the allocation until it succeeds – it sleeps for a few seconds before each retry. I think the motvation is that the server may temporairily run out of resources but something as important as a mail server should do it’s beat o recover.

  9. Earl LapusNo Gravatar Says:

    I use the if-malloc-returned-null-then-exit approach for as long as I can remember. I guess it’s because that approach was the usual idiom that I grew accustomed to. But, after reading this I thought about how often such OOM errors occur AND looking back, I haven’t really encountered/experienced any of my C programs running out of memory. Funny… it’s like having a bomb shelter in a time and place where not a single bomb exists.

  10. NolanNo Gravatar Says:

    It’s nice seeing how tried-and-tested real world code handles memory allocation. Another interesting comparison would be how the various memory-safe language implementations handle their mallocs.

    If you wrote more posts about the inner workings of common projects, I for one wouldn’t complain!

  11. klNo Gravatar Says:

    OOM recovery is pointless on linux, because malloc() *NEVER RETURNS 0*. Linux has overcommit and OOM-Killer.

    http://linux-mm.org/OOM_Killer

    Your error recovery code will never run. In case of real OOM error, your process may be killed (or another process will be killed to allocate memory for you).

    Oh, and this may happen outside of malloc(). You get pointer to memory that isn’t allocated yet (page will be allocated when you access it).

  12. Alejandro WeinsteinNo Gravatar Says:

    The last version of the Embedded Muse, by Jack Ganssle, talk about this (http://www.ganssle.com/tem/tem183.pdf, go the “Too much optimism” section). An excerpt follows:

    “”By the time malloc fails, the system is just plain fubared. I hate deeply convoluted error handling code that checks the return code of malloc….

    “* Then goes through horrible contortions to attempt to recover sanity.
    “* And has _never_ been tested.
    “* And God help it if it either directly, or anywhere else on it’s call graph, invokes malloc again! Which it will if it tries to printf anything!

    “So what is the correct solution?

    “Wrap malloc in a function that checks the return code, and if it fails uses statically pre-allocated space to log a stack trace, then system error and reboot!

    “But that sounds just like what you are ranting against! Hear me out to find the correct solution.

    “What has gone wrong is you have exceeded the design load for that system. Suppose you were told to design a ferry to carry cars. But you couldn’t get management to decide how many cars it should carry. So you designed it so you could load any number of cars.

    “When the ferry starts sinking and taking on water, on every gunwale there is a “water coming in” detector. Attached to each of those detectors is an amazingly complex one-off uniquely-designed gadget which you can’t test without sinking a loaded ferry (for many highly customised loads), that flings the last few cars into the water!

    “One solution is to tell management to get their butts into gear and actually _decide_ on the designed for carrying capacity of the ferry. And then design mechanisms to only allow that many on.

    “Your customers are OK with knowing that their system can only handle a finite load.”

  13. Matthew DempskyNo Gravatar Says:

    Like nonane said, qmail is an example of a server that gracefully handles out-of-memory conditions. Most of the code when it hits a memory limit will clean up and return a proper error response to the caller, while the main daemon (qmail-send) will print a warning message about memory and sleep for 10 seconds before trying again for some very important allocations (e.g., loading and reloading config files, rewriting some mail addresses, maintaining the undelivered mail priority queues).

    djbdns is similar: dnscache allocates all of its cache memory up front, but needs some extra temporary memory for resolving a query and processing DNS response packets. If any of these requests fail, it will give up on just that individual query, clean up the temporary memory allocated, and print a warning message about it.

  14. RogerNo Gravatar Says:

    Yes I did read the article (twice).

    You are wrong because SQLite does not return NULL to the user. It returns NULL to the other routines in the SQLite library which then gracefully handle it, sometimes returning the SQLITE_NOMEM error code. In general a developer using SQLite would never have any reason to call sqlite3_malloc directly. Additionally SQLite has several uses of memory such as as a cache, keeping an ongoing transaction in memory as much as possible before taking a lock on the database, scratch memory etc. It will never leak memory. You can read all the details at http://www.sqlite.org/malloc.html

  15. jlduggerNo Gravatar Says:

    Have you actually tried to trigger OOM conditions? One day after I got dinged by a professor for not checking malloc returns, I decided to test it out and see what the failure mode was and how to fix it. What I discovered is that the kernels in my testing would kill the process on malloc failure. You never get a chance at freeing up internal space or exiting cleanly.

    Am I missing something here?

  16. RyanNo Gravatar Says:

    jldugger: Kl’s comment is spot on.

    To write portable code it is wise to check for a NULL upon a malloc return; however, on Linux, with the optimistic allocator enabled (which it is by default), malloc will never return a NULL. Since the memory page is allocated on a write to the pointer, the out-of-memory condition will cause the kernel to kill the process.

    The proper action to take in an OOM condition is pretty much up to the application. Some applications can simply be restarted, but a restart is not the right course of action for health care or mission critical applications. I like getting a backtrace of an offending application, so I just let most of my applications just die to get a core file.

  17. sumercNo Gravatar Says:

    To write portable code, one shall not assume malloc() never returns NULL. Even the maintainer of glibc, Ulrich Drepper uses the same approach as git. I am assuming git has taken that code from him, because even the function names are same:) xmalloc. See : people.redhat.com/drepper/optimtut2.ps.gz

  18. PhilluminatiNo Gravatar Says:

    The comment by kl is wrong. Linux (not that we’re even targeting that in this post) can have overcommit disabled so that malloc will return 0.

    Overcommit works by doubly allocating the same memory pointer to two applications and hoping one of them doesn’t actually write to that memory. it’s surprisingly common allegedly.

    Anyway that’s why Torvald’s own git project uses the memset (…, size) function. Not to ensure it’s a certain default value…but to force the kernel into an OOM kill straight away…or not, rather than resulting in a hard to trace call stack.

  19. Robin PerunNo Gravatar Says:

    it’s very usefull for me thanks

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)