Update (26-Aug-2017): C++11 changes the guarantees for thread safety of function static variable. See this page for an example discussion. This article should still be interesting for historical reasons and to better understand the underlying issue and behavior of older compilers.
Here's a short quiz. What will the following code print:
#include <iostream>
using namespace std;
class Foo {
public:
Foo(const char* s = "") {
cerr << "Constructing Foo with " << s << endl;
}
};
void somefunc()
{
static Foo funcstatic("funcstatic");
Foo funcauto("funcauto");
}
static Foo glob("global");
int main()
{
cerr << "Entering main\n";
somefunc();
somefunc();
somefunc();
return 0;
}
Try to think about it for a moment before reading on. Foo is a dummy class with the sole purpose of demonstrating when its constructor is being called. There are a few Foo instances here: one global, one function static (by which I mean static in a function scope) and one function local (automatic).
Recently I ran into (a variation of) this code and was surprised that its output is:
Constructing Foo with global
Entering main
Constructing Foo with funcstatic
Constructing Foo with funcauto
Constructing Foo with funcauto
Constructing Foo with funcauto
What's surprising here is the construction of funcstatic happening after entering main. Actually, it's happening when somefunc is first called. Why was I surprised? Because I always kind-of assumed that function static variables are handled similarly to global static variables, except their visibility is limited only to the function. While this is true in C, it's only partially true in C++, and here's why.
In C++, variables not only have to be initialized - sometimes, they also have to be constructed. While for POD (Plain Old Data) types the behavior is C-like (the compiler just writes the initialization value into the .data segment, no special code required), for types with custom constructors this can't work. Some code has to be generated to call these constructors.
It turns out that in case of function static variables, this code can be placed in the function and thus is executed when the function is first called. This behavior is actually allowed by the C++ standard. Here's an excerpt from section 6.7 of a working draft (N1095) of the current C++ standard (C++98):
The zero-initialization (8.5) of all local objects with static storage duration (3.7.1) is performed before any other initialization takes place. A local object of POD type (3.9) with static storage duration initialized with constant-expressions is initialized before its block is first entered. An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization.
Highlight is mine. What this means, less formally, is that while the compiler is permitted to invoke the constructors of function static variables at global scope, it's free to do this in the function if it wants.
And apparently, most modern compilers indeed choose to construct function static objects when the function is first called. This makes sense as an optimization - calling too many constructors before main runs can have a negative impact on program start-up. Not to mention that dependencies between statically constructed objects are one of the biggest headaches C++ has to offer.
But herein lies a problem: this construction of static function variables is not thread safe! If somefunc is being called from multiple threads, it may so happen that the constructor of funcstatic will be called multiple times. After all, being static, funcstatic is shared between all threads. The C++ standard doesn't protect us from this happening - it doesn't even acknowledge the existence of threads (this is C++98 we're talking about).
So keep this in mind: such code is not thread safe - you can not assume that in the presence of multiple threads the function static variable will be constructed only once. It is the job of the programmer to guarantee this won't happen.
This is the main point I wanted to make in this post. The rest is going to examine in more detail the code generated by popular compilers for this scenario and discuss the implications.
Let's start with MS Visual C++ 2008. Here's the disassembly of somefunc, skipping the function prologue:
static Foo funcstatic("funcstatic");
00E314FD mov eax,dword ptr [$S1 (0E3A148h)]
00E31502 and eax,1
00E31505 jne somefunc+71h (0E31531h)
00E31507 mov eax,dword ptr [$S1 (0E3A148h)]
00E3150C or eax,1
00E3150F mov dword ptr [$S1 (0E3A148h)],eax
00E31514 mov dword ptr [ebp-4],0
00E3151B push offset string "funcstatic" (0E3890Ch)
00E31520 mov ecx,offset funcstatic (0E3A14Ch)
00E31525 call Foo::Foo (0E31177h)
00E3152A mov dword ptr [ebp-4],0FFFFFFFFh
Foo funcauto("funcauto");
00E31531 push offset string "funcauto" (0E38900h)
00E31536 lea ecx,[ebp-11h]
00E31539 call Foo::Foo (0E31177h)
Here's what this does: a special flag is being kept in memory (in address 0x0E3A148 for this particular run). Its goal is to make sure the constructor of funcstatic is only called once. The code fetches the flag into eax and looks at its lowest bit. If that bit is already turned on, it just skips the call and goes to the next line. Otherwise, it places 1 in the lowest bit and calls the constructor.
The idea here is obvious - this flag is used to ensure the constructor is only being called once. Note how it blissfully ignores the existence of threads. Suppose two threads - A and B enter somefunc simultaneously. Both can check the flag at the same time, see it's still 0 and then call the constructor. Nothing here prevents that from happening. And this is all good and fine according to the C++ standard.
With GCC, however, things get more interesting. Here's the same function compiled with g++ -O0 -g:
0000000000400a9d <_Z8somefuncv>:
400a9d: 55 push rbp
400a9e: 48 89 e5 mov rbp,rsp
400aa1: 48 83 ec 40 sub rsp,0x40
400aa5: b8 a8 21 60 00 mov eax,0x6021a8
400aaa: 0f b6 00 movzx eax,BYTE PTR [rax]
400aad: 84 c0 test al,al
400aaf: 75 76 jne 400b27 <_Z8somefuncv+0x8a>
400ab1: bf a8 21 60 00 mov edi,0x6021a8
400ab6: e8 cd fd ff ff call 400888 <__cxa_guard_acquire@plt>
400abb: 85 c0 test eax,eax
400abd: 0f 95 c0 setne al
400ac0: 84 c0 test al,al
400ac2: 74 63 je 400b27 <_Z8somefuncv+0x8a>
400ac4: c6 45 df 00 mov BYTE PTR [rbp-0x21],0x0
400ac8: be aa 0c 40 00 mov esi,0x400caa
400acd: bf b0 21 60 00 mov edi,0x6021b0
400ad2: e8 89 00 00 00 call 400b60 <_ZN3FooC1EPKc>
400ad7: c6 45 df 01 mov BYTE PTR [rbp-0x21],0x1
400adb: bf a8 21 60 00 mov edi,0x6021a8
400ae0: e8 03 fe ff ff call 4008e8 <__cxa_guard_release@plt>
400ae5: eb 40 jmp 400b27 <_Z8somefuncv+0x8a>
400ae7: 48 89 45 c8 mov QWORD PTR [rbp-0x38],rax
400aeb: 48 89 55 d0 mov QWORD PTR [rbp-0x30],rdx
400aef: 8b 45 d0 mov eax,DWORD PTR [rbp-0x30]
400af2: 89 45 ec mov DWORD PTR [rbp-0x14],eax
400af5: 48 8b 45 c8 mov rax,QWORD PTR [rbp-0x38]
400af9: 48 89 45 e0 mov QWORD PTR [rbp-0x20],rax
400afd: 0f b6 45 df movzx eax,BYTE PTR [rbp-0x21]
400b01: 83 f0 01 xor eax,0x1
400b04: 84 c0 test al,al
400b06: 74 0a je 400b12 <_Z8somefuncv+0x75>
400b08: bf a8 21 60 00 mov edi,0x6021a8
400b0d: e8 06 fe ff ff call 400918 <__cxa_guard_abort@plt>
400b12: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
400b16: 48 89 45 c8 mov QWORD PTR [rbp-0x38],rax
400b1a: 48 63 45 ec movsxd rax,DWORD PTR [rbp-0x14]
400b1e: 48 8b 7d c8 mov rdi,QWORD PTR [rbp-0x38]
400b22: e8 11 fe ff ff call 400938 <_Unwind_Resume@plt>
400b27: 48 8d 7d ff lea rdi,[rbp-0x1]
400b2b: be b5 0c 40 00 mov esi,0x400cb5
400b30: e8 2b 00 00 00 call 400b60 <_ZN3FooC1EPKc>
400b35: c9 leave
400b36: c3 ret
What's going on here? It turns out that since version 4, GCC generates "guard" calls that ensure multi-threaded safety for this kind of initialization. To better understand what's going on in the code above, there's a relevant section in the Itanium C++ ABI (which GCC follows) right here. GCC also allows to disable these guards by passing -fno-threadsafe-statics flag during compilation. With this flag, the code generated by GCC for our code sample is quite similar to the one generated by MSVC.
On one hand, this is nice of GCC to do. On the other hand, it's one of those things that introduce insidious portability problems. Develop the code for GCC and everything is peachy for function static constructors - no multithreading problems because of the guard code. Then port the code to Windows and start witnessing intermittent failures due to races between threads. Not fun.
The only solution is, of course, to write code that adheres to the C++ standard and doesn't make assumptions that must not be made.