Construction of function static variables in C++ is not thread safe

Update (26-Aug-2017): C++11 changes the guarantees for thread safety of function static variable. See this page for an example discussion. This article should still be interesting for historical reasons and to better understand the underlying issue and behavior of older compilers.

Here's a short quiz. What will the following code print:

#include <iostream>

using namespace std;

class Foo {
public:
    Foo(const char* s = "") {
        cerr << "Constructing Foo with " << s << endl;
    }
};

void somefunc()
{
    static Foo funcstatic("funcstatic");
    Foo funcauto("funcauto");
}

static Foo glob("global");

int main()
{
    cerr << "Entering main\n";
    somefunc();
    somefunc();
    somefunc();
    return 0;
}

Try to think about it for a moment before reading on. Foo is a dummy class with the sole purpose of demonstrating when its constructor is being called. There are a few Foo instances here: one global, one function static (by which I mean static in a function scope) and one function local (automatic).

Recently I ran into (a variation of) this code and was surprised that its output is:

Constructing Foo with global
Entering main
Constructing Foo with funcstatic
Constructing Foo with funcauto
Constructing Foo with funcauto
Constructing Foo with funcauto

What's surprising here is the construction of funcstatic happening after entering main. Actually, it's happening when somefunc is first called. Why was I surprised? Because I always kind-of assumed that function static variables are handled similarly to global static variables, except their visibility is limited only to the function. While this is true in C, it's only partially true in C++, and here's why.

In C++, variables not only have to be initialized - sometimes, they also have to be constructed. While for POD (Plain Old Data) types the behavior is C-like (the compiler just writes the initialization value into the .data segment, no special code required), for types with custom constructors this can't work. Some code has to be generated to call these constructors.

It turns out that in case of function static variables, this code can be placed in the function and thus is executed when the function is first called. This behavior is actually allowed by the C++ standard. Here's an excerpt from section 6.7 of a working draft (N1095) of the current C++ standard (C++98):

The zero-initialization (8.5) of all local objects with static storage duration (3.7.1) is performed before any other initialization takes place. A local object of POD type (3.9) with static storage duration initialized with constant-expressions is initialized before its block is first entered. An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization.

Highlight is mine. What this means, less formally, is that while the compiler is permitted to invoke the constructors of function static variables at global scope, it's free to do this in the function if it wants.

And apparently, most modern compilers indeed choose to construct function static objects when the function is first called. This makes sense as an optimization - calling too many constructors before main runs can have a negative impact on program start-up. Not to mention that dependencies between statically constructed objects are one of the biggest headaches C++ has to offer.

But herein lies a problem: this construction of static function variables is not thread safe! If somefunc is being called from multiple threads, it may so happen that the constructor of funcstatic will be called multiple times. After all, being static, funcstatic is shared between all threads. The C++ standard doesn't protect us from this happening - it doesn't even acknowledge the existence of threads (this is C++98 we're talking about).

So keep this in mind: such code is not thread safe - you can not assume that in the presence of multiple threads the function static variable will be constructed only once. It is the job of the programmer to guarantee this won't happen.

This is the main point I wanted to make in this post. The rest is going to examine in more detail the code generated by popular compilers for this scenario and discuss the implications.

Let's start with MS Visual C++ 2008. Here's the disassembly of somefunc, skipping the function prologue:

    static Foo funcstatic("funcstatic");
00E314FD  mov         eax,dword ptr [$S1 (0E3A148h)]
00E31502  and         eax,1
00E31505  jne         somefunc+71h (0E31531h)
00E31507  mov         eax,dword ptr [$S1 (0E3A148h)]
00E3150C  or          eax,1
00E3150F  mov         dword ptr [$S1 (0E3A148h)],eax
00E31514  mov         dword ptr [ebp-4],0
00E3151B  push        offset string "funcstatic" (0E3890Ch)
00E31520  mov         ecx,offset funcstatic (0E3A14Ch)
00E31525  call        Foo::Foo (0E31177h)
00E3152A  mov         dword ptr [ebp-4],0FFFFFFFFh
    Foo funcauto("funcauto");
00E31531  push        offset string "funcauto" (0E38900h)
00E31536  lea         ecx,[ebp-11h]
00E31539  call        Foo::Foo (0E31177h)

Here's what this does: a special flag is being kept in memory (in address 0x0E3A148 for this particular run). Its goal is to make sure the constructor of funcstatic is only called once. The code fetches the flag into eax and looks at its lowest bit. If that bit is already turned on, it just skips the call and goes to the next line. Otherwise, it places 1 in the lowest bit and calls the constructor.

The idea here is obvious - this flag is used to ensure the constructor is only being called once. Note how it blissfully ignores the existence of threads. Suppose two threads - A and B enter somefunc simultaneously. Both can check the flag at the same time, see it's still 0 and then call the constructor. Nothing here prevents that from happening. And this is all good and fine according to the C++ standard.

With GCC, however, things get more interesting. Here's the same function compiled with g++ -O0 -g:

0000000000400a9d <_Z8somefuncv>:
  400a9d:  55                      push   rbp
  400a9e:  48 89 e5                mov    rbp,rsp
  400aa1:  48 83 ec 40             sub    rsp,0x40
  400aa5:  b8 a8 21 60 00          mov    eax,0x6021a8
  400aaa:  0f b6 00                movzx  eax,BYTE PTR [rax]
  400aad:  84 c0                   test   al,al
  400aaf:  75 76                   jne    400b27 <_Z8somefuncv+0x8a>
  400ab1:  bf a8 21 60 00          mov    edi,0x6021a8
  400ab6:  e8 cd fd ff ff          call   400888 <__cxa_guard_acquire@plt>
  400abb:  85 c0                   test   eax,eax
  400abd:  0f 95 c0                setne  al
  400ac0:  84 c0                   test   al,al
  400ac2:  74 63                   je     400b27 <_Z8somefuncv+0x8a>
  400ac4:  c6 45 df 00             mov    BYTE PTR [rbp-0x21],0x0
  400ac8:  be aa 0c 40 00          mov    esi,0x400caa
  400acd:  bf b0 21 60 00          mov    edi,0x6021b0
  400ad2:  e8 89 00 00 00          call   400b60 <_ZN3FooC1EPKc>
  400ad7:  c6 45 df 01             mov    BYTE PTR [rbp-0x21],0x1
  400adb:  bf a8 21 60 00          mov    edi,0x6021a8
  400ae0:  e8 03 fe ff ff          call   4008e8 <__cxa_guard_release@plt>
  400ae5:  eb 40                   jmp    400b27 <_Z8somefuncv+0x8a>
  400ae7:  48 89 45 c8             mov    QWORD PTR [rbp-0x38],rax
  400aeb:  48 89 55 d0             mov    QWORD PTR [rbp-0x30],rdx
  400aef:  8b 45 d0                mov    eax,DWORD PTR [rbp-0x30]
  400af2:  89 45 ec                mov    DWORD PTR [rbp-0x14],eax
  400af5:  48 8b 45 c8             mov    rax,QWORD PTR [rbp-0x38]
  400af9:  48 89 45 e0             mov    QWORD PTR [rbp-0x20],rax
  400afd:  0f b6 45 df             movzx  eax,BYTE PTR [rbp-0x21]
  400b01:  83 f0 01                xor    eax,0x1
  400b04:  84 c0                   test   al,al
  400b06:  74 0a                   je     400b12 <_Z8somefuncv+0x75>
  400b08:  bf a8 21 60 00          mov    edi,0x6021a8
  400b0d:  e8 06 fe ff ff          call   400918 <__cxa_guard_abort@plt>
  400b12:  48 8b 45 e0             mov    rax,QWORD PTR [rbp-0x20]
  400b16:  48 89 45 c8             mov    QWORD PTR [rbp-0x38],rax
  400b1a:  48 63 45 ec             movsxd rax,DWORD PTR [rbp-0x14]
  400b1e:  48 8b 7d c8             mov    rdi,QWORD PTR [rbp-0x38]
  400b22:  e8 11 fe ff ff          call   400938 <_Unwind_Resume@plt>
  400b27:  48 8d 7d ff             lea    rdi,[rbp-0x1]
  400b2b:  be b5 0c 40 00          mov    esi,0x400cb5
  400b30:  e8 2b 00 00 00          call   400b60 <_ZN3FooC1EPKc>
  400b35:  c9                      leave
  400b36:  c3                      ret

What's going on here? It turns out that since version 4, GCC generates "guard" calls that ensure multi-threaded safety for this kind of initialization. To better understand what's going on in the code above, there's a relevant section in the Itanium C++ ABI (which GCC follows) right here. GCC also allows to disable these guards by passing -fno-threadsafe-statics flag during compilation. With this flag, the code generated by GCC for our code sample is quite similar to the one generated by MSVC.

On one hand, this is nice of GCC to do. On the other hand, it's one of those things that introduce insidious portability problems. Develop the code for GCC and everything is peachy for function static constructors - no multithreading problems because of the guard code. Then port the code to Windows and start witnessing intermittent failures due to races between threads. Not fun.

The only solution is, of course, to write code that adheres to the C++ standard and doesn't make assumptions that must not be made.