Stack frame layout on x86-64
September 6th, 2011 at 8:13 pmA few months ago I’ve written an article named Where the top of the stack is on x86, which aimed to clear some misunderstandings regarding stack usage on the x86 architecture. The article concluded with a useful diagram presenting the stack frame layout of a typical function call.
In this article I will examine the stack frame layout of the newer 64-bit version of the x86 architecture, x64 [1]. The focus will be on Linux and other OSes following the official System V AMD64 ABI (available from here). Windows uses a somewhat different ABI, and I will mention it briefly in the end.
I have no intention of detailing the complete x64 calling convention here. For that, you will literally have to read the whole AMD64 ABI.
Registers galore
x86 has just 8 general-purpose registers available (eax, ebx, ecx, edx, ebp, esp, esi, edi). x64 extended them to 64 bits (prefix "r" instead of "e") and added another 8 (r8, r9, r10, r11, r12, r13, r14, r15). Since some of x86′s registers have special implicit meanings and aren’t really used as general-purpose (most notably ebp and esp), the effective increase is even larger than it seems.
There’s a reason I’m mentioning this in an article focused on stack frames. The relatively large amount of available registers influenced some important design decisions for the ABI, such as passing many arguments in registers, thus rendering the stack less useful than before [2].
Argument passing
I’m going to simplify the discussion here on purpose and focus on integer/pointer arguments [3]. According to the ABI, the first 6 integer or pointer arguments to a function are passed in registers. The first is placed in rdi, the second in rsi, the third in rdx, and then rcx, r8 and r9. Only the 7th argument and onwards are passed on the stack.
The stack frame
With the above in mind, let’s see how the stack frame for this C function looks:
long myfunc(long a, long b, long c, long d,
long e, long f, long g, long h)
{
long xx = a * b * c * d * e * f * g * h;
long yy = a + b + c + d + e + f + g + h;
long zz = utilfunc(xx, yy, xx % yy);
return zz + 20;
}
This is the stack frame:

So the first 6 arguments are passed via registers. But other than that, this doesn’t look very different from what happens on x86 [4], except this strange "red zone". What is that all about?
The red zone
First I’ll quote the formal definition from the AMD64 ABI:
The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.
Put simply, the red zone is an optimization. Code can assume that the 128 bytes below rsp will not be asynchronously clobbered by signals or interrupt handlers, and thus can use it for scratch data, without explicitly moving the stack pointer. The last sentence is where the optimization lays – decrementing rsp and restoring it are two instructions that can be saved when using the red zone for data.
However, keep in mind that the red zone will be clobbered by function calls, so it’s usually most useful in leaf functions (functions that call no other functions).
Recall how myfunc in the code sample above calls another function named utilfunc. This was done on purpose, to make myfunc non-leaf and thus prevent the compiler from applying the red zone optimization. Looking at the code of utilfunc:
long utilfunc(long a, long b, long c)
{
long xx = a + 2;
long yy = b + 3;
long zz = c + 4;
long sum = xx + yy + zz;
return xx * yy * zz + sum;
}
This is indeed a leaf function. Let’s see how its stack frame looks when compiled with gcc:

Since utilfunc only has 3 arguments, calling it requires no stack usage since all the arguments fit into registers. In addition, since it’s a leaf function, gcc chooses to use the red zone for all its local variables. Thus, esp needs not be decremented (and later restored) to allocate space for this data.
Preserving the base pointer
The base pointer rbp (and its predecessor ebp on x86), being a stable "anchor" to the beginning of the stack frame throughout the execution of a function, is very convenient for manual assembly coding and for debugging [5]. However, some time ago it was noticed that compiler-generated code doesn’t really need it (the compiler can easily keep track of offsets from rsp), and the DWARF debugging format provides means (CFI) to access stack frames without the base pointer.
This is why some compilers started omitting the base pointer for aggressive optimizations, thus shortening the function prologue and epilogue, and providing an additional register for general-purpose use (which, recall, is quite useful on x86 with its limited set of GPRs).
gcc keeps the base pointer by default on x86, but allows the optimization with the -fomit-frame-pointer compilation flag. How recommended it is to use this flag is a debated issue – you may do some googling if this interests you.
Anyhow, one other "novelty" the AMD64 ABI introduced is making the base pointer explicitly optional, stating:
The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp (the stack pointer) to index into the stack frame. This technique saves two instructions in the prologue and epilogue and makes one additional general-purpose register (%rbp) available.
gcc adheres to this recommendation and by default omits the frame pointer on x64, when compiling with optimizations. It gives an option to preserve it by providing the -fno-omit-frame-pointer flag. For clarity’s sake, the stack frames showed above were produced without omitting the frame pointer.
The Windows x64 ABI
Windows on x64 implements an ABI of its own, which is somewhat different from the AMD64 ABI. I will only discuss the Windows x64 ABI briefly, mentioning how its stack frame layout differs from AMD64. These are the main differences:
- Only 4 integer/pointer arguments are passed in registers (rcx, rdx, r8, r9).
- There is no concept of "red zone" whatsoever. In fact, the ABI explicitly states that the area beyond rsp is considered volatile and unsafe to use. The OS, debuggers or interrupt handlers may overwrite this area.
- Instead, a "register parameter area" [6] is provided by the caller in each stack frame. When a function is called, the last thing allocated on the stack before the return address is space for at least 4 registers (8 bytes each). This area is available for the callee’s use without explicitly allocating it. It’s useful for variable argument functions as well as for debugging (providing known locations for parameters, while registers may be reused for other purposes). Although the area was originally conceived for spilling the 4 arguments passed in registers, these days the compiler uses it for other optimization purposes as well (for example, if the function needs less than 32 bytes of stack space for its local variables, this area may be used without touching rsp).
Another important change that was made in the Windows x64 ABI is the cleanup of calling conventions. No more cdecl/stdcall/fastcall/thiscall/register/safecall madness – just a single "x64 calling convention". Cheers to that!
For more information on this and other aspects of the Windows x64 ABI, here are some good links:
- Official MSDN page on x64 software conventions – well organized information, IMHO easier to follow and understand than the AMD64 ABI document.
- Everything You Need To Know To Start Programming 64-Bit Windows Systems – MSDN article providing a nice overview.
- The history of calling conventions, part 5: amd64 – an article by the prolific Windows programming evangelist Raymond Chen.
- Why does Windows64 use a different calling convention from all other OSes on x86-64? – an interesting discussion of the question that just begs to be asked.
- Challenges of Debugging Optimized x64 code – focuses on the "debuggability" (and lack thereof) of compiler-generated x64 code.

| [1] | This architecture goes by many names. Originated by AMD and dubbed AMD64, it was later implemented by Intel, which called it IA-32e, then EM64T and finally Intel 64. It’s also being called x86-64. But I like the name x64 – it’s nice and short. |
| [2] | There are calling conventions for x86 that also dictate passing some of the arguments in registers. The best known is probably fastcall. Unfortunately, it’s not consistent across platforms. |
| [3] | The ABI also defines passing floating-point arguments via the xmm registers. The idea is pretty much the same as for integers, however, and IMHO including floating-point arguments in the article will needlessly complicate it. |
| [4] | I’m cheating a bit here. Any compiler worth its salt (and certainly gcc) will use registers for local variables as well, especially on x64 where registers are plentiful. But if there are a lot of local variables (or they’re large, like arrays or structs), they will go on the stack anyway. |
| [5] | Since inside a function rbp always points at the previous stack frame, it forms a kind of linked list of stack frames which the debugger can use to access the execution stack trace at any given time (in core dumps as well). |
| [6] | Also called "home space" sometimes. |
Related posts:

September 7th, 2011 at 16:00
Informative! I am studying computer organization and digital design lately (just started 2 days ago). Great post!
September 7th, 2011 at 18:13
Tienes un blog muy interesante, gracias
. Adding entries to RSS…
September 7th, 2011 at 19:55
“However, keep in mind that the red zone will be clobbered by function calls, so it’s usually most useful in leaf functions (functions that call no other functions).”
Interesting, but I’m struggling to see the advantage in the addition of the ‘red zone’ to the ABI: is it useful for anything else? What has me stumped is: what is a leaf function but something that ought to be inlined instead?
September 7th, 2011 at 20:42
Hi Eli,
Thanks for an enlightening article. However, it appears that compilers and optimization have been using the stack for more than just data storage. Return-oriented programming (ROP) has popped up as an unstoppable method of attacking a system (once the attacker has control over the stack). It appears that compilers have been placing short executable sequences of instructions onto the the stack frame just before the return instruction. This allows an attacker to safely replace the instructions with their own and there is no way to determine that the processing has been compromised since the stack area is in memory that is marked executable and none of the program areas have been touched. I might have this wrong, as this does not match with my general understandings of the use of the stack (for storage of state and variables, not for instructions). Can you comment on this?
September 8th, 2011 at 03:55
Excellent overview. One thing, if you can be bothered fixing it: the stack diagrams use “EBP” and “ESP” — shouldn’t that be “RBP” and “RSP” on x86-64?
September 8th, 2011 at 05:45
Pragma,
I would guess that inlining is something that happens before ASM code generation, so some functions still have to be leaf, right? Besides, some function calls can’t be inlined (such as calls through function pointers or virtual method calls).
Gary,
I’m not familiar with this issue, really. If you have some good references, I’d be glad to read them.
Matt,
Whoops, will fix! Thanks for noticing.
September 8th, 2011 at 20:32
I’m not completely sold on the “red zone” idea, but one reason not to inline leaf functions is to keep the code footprint from getting too big. Depending on the size of the leaf function and how the code that’s calling it is structured, inlining can actually cause worse performance because of additional cache misses.
September 8th, 2011 at 23:41
Very interesting. Thanks for posting.
September 9th, 2011 at 11:01
@Gary
You are mixing up a few notions here.
Indeed to detect a buffer overflow (overwrite of the return address) a common technique is to use something called a canari. That is a random sequence of bytes inserted at compile time between return address and saved rbp aligned on stack boundary, that is checked at function return (unwinding of the stack). If the canari value has changed, then the function has been overflowed, and program exits since the return address can be assumed to be controlled by the attacker.
ROP is a different notion, it is the general case of ret2libc attacks, and used when stack is not executable. Since you can jump anywhere in an x86 instruction (they are not fixed length), you can use this property to use meaningful bytes in memory, and access them by address. This of course supposes that memory layout is not randomized (no ASLR). For example if the mapping of .text section in memory is constant, then you can search for instructions in that section, that you can access by memory addresses.
Luckily for everyone, most modern operating system implement X^W and ASLR in userland, meaning that stack and heap and other sections are tagged as non executable, but writable (or the other way round), and address space is randomized (memory mappings will change between executions). Unfortunately, what is true in user land is not in kernel land… (Except for the Linux kernel as of 2.6.39)
September 10th, 2011 at 04:59
> “This is indeed a leaf function. Let’s see how its stack frame looks when compiled with gcc:”
How do you know/see how the stack frame looks ? Do you inspect it with gdb, how ?
And
> “…all its local variables. Thus, esp needs not be decremented (and later restored) to allocate space for this data.”
Do you mean rsp instead of esp ?
September 11th, 2011 at 16:11
zorg,
Yep, thanks for noticing. Hopefully the intention is clear though
October 18th, 2011 at 21:51
@dahtah
You’re missing something too.
Return Oriented Programming can be applied also with W^X and ASLR, and even on platform where instructions have fixed length (like ARM or SPARC).
Check out something like http://ivanlef0u.fr/repo/expl0it/Surgically%20returning%20to%20randomized%20lib(c).pdf
for a research on a tecnique to use ROP to circumvent ASLR and W^X.
You can also find a practical example of ROP (without ASLR) on arm platform here http://blog.zynamics.com/2010/04/16/rop-and-iphone/
March 8th, 2012 at 01:29
Why is the red zone per stack frame? If it’s clobbered by function calls anyway, why not just set 128 bytes aside at the start of the stack and let all frames write to that one location? I’d guess a threading problem but each thread would have its own stack with its own 128 bytes anyway, so that can’t be it. If the stack is relocatable then I guess you’d have to dedicate a register to store the start of it, maybe that’s the issue?
March 9th, 2012 at 13:43
Joseph,
I don’t understand what you mean there. How do all frames know where “the start of the stack” is?
March 10th, 2012 at 21:49
@eliben: Basically my question is what’s the difference between having the red zone in every frame and just having a single 128 byte buffer in a fixed location in memory? I think the answer is the latter couldn’t work for multiple threads, though it seems you could still make that work by dedicating a register to point to each thread’s 128 byte buffer.
March 11th, 2012 at 05:17
Joseph,
I see. As you yourself say, that would be a problem with multiple threads. In a way, the current red zone is an elegant way to give each thread its sandbox, and a register pointing to it
rbp.March 30th, 2012 at 15:43
Translation of the publication into Armenian http://www.fatcow.com/edu/stack-frame-hy/
November 3rd, 2012 at 19:20
Regarding the red zone:
The red zone is not allocated/reserved per stack frame. Bellow the stack pointer there must always be some unused memory (to allow the stack to grow). The memory is already there, so why not put it to good use? The red zone it’s just a convention to allow functions to use 128 bytes of that space as a scratch area by mandating signal and interrupt handlers not to clobber it.
So it’s just one red zone per stack (although it’s address varies). It can always be located by the %rsp register, and it works with multithreading since each thread already has each own private stack.
It’s a quite elegant solution indeed.
November 29th, 2012 at 06:51
Excelent explication, thanks.
I have programmmed a c code:
void main (void) { return 0;}and compiled on x86-84 platform (GNU/linux/core i5) with gcc front-end. There is not problem on execution.But if i edit a gas assembly source:
.text.globl _start
_start:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
popq %rbp
ret
.end
the execution result is a SEGMENTATION FAULT
The RET instruction dont recover de RETURN ADDRESS TO SYSTEM. Debugging with gdb i can read that the return address IS NOT on the stack. Before the first instruction
pushq %rbpthe %RSP stack pointer reference the 0×00000001 address that is not the return address and cause the SEGMENTATION FAULT.I had not this problem on a old 32 bits platform
¿ Some idea?
Thanks in advance
January 4th, 2013 at 06:19
_start is different from main, _start shouldn’t return normally. Try calling sys_exit instead.
January 11th, 2013 at 01:08
or use simply :
movl $1,%eax
int $0×80