Are pointers and arrays equivalent in C?

October 21st, 2009 at 8:22 pm

Short answer: no

Longer answer: it depends on what you mean by "equivalent". Pointer arithmetic and array indexing are equivalent. In other aspects, pointers and arrays are different.

Here’s an example displaying the equivalence:

#include <stdio.h>

int main()
{
    char arr[] = "don't panic\n";
    char* ptr = arr;

    printf("%c %c\n", arr[4], ptr[4]);
    printf("%c %c\n", *(arr+2), *(ptr+2));

    return 0;
}

The output is, of course:

t t
n n

Note that indexing works on both arrays and pointers. Similarly, pointer arithmetic works on both arrays and pointers.

So how are they different?

In a very important and fundamental way. Consider this code snippet:

char array_place[100] = "don't panic";
char* ptr_place = "don't panic";

int main()
{
    char a = array_place[7];
    char b = ptr_place[7];

    return 0;
}

What exactly happens in the assignment to a, and how is it different from the assignment to b? It’s informative to take a look at the disassembly (taken from Visual C++ 2005 on an x86 machine running Windows XP):

    char a = array_place[7];

0041137E  mov  al,byte ptr [_array_place+7 (417007h)]
00411383  mov  byte ptr [a],al

    char b = ptr_place[7];

00411386  mov  eax,dword ptr [_ptr_place (417064h)]
0041138B  mov  cl,byte ptr [eax+7]
0041138E  mov  byte ptr [b],cl

The semantics of arrays in C dictate that the array name is the address of the first element of the array. Hence in the assignment to a, the 8th character of the array is taken by offsetting the value of array_place by 7, and moving the contents pointed to by the resulting address into the al register, and later into a.

On the other hand, the semantics of pointers are quite different. A pointer is just a regular variable that happens to hold the address of another variable inside. Therefore, to actually compute the offset of the 8th character of the string, the CPU will first copy the value of the pointer into a register and only then increment it. This takes another instruction [1].

A graphical explanation

This is a graphical explanation:

http://eli.thegreenplace.net/wp-content/uploads/2009/10/array_place.png

The rightmost column is the memory addresses, and the boxes are the contents of memory cells. The first few letters of the string in array_place are displayed.

Note that array_place is simply a label (or an alias) to the memory address 0×417000. Therefore accessing array_place[7] is simply accessing memory address 0×417007. Therefore, as we can see in the disassembly, the compiler just replaces array_place[7] by 0×417007 – no address computation has to be done by the assembly it generates.

With a pointer, this works differently:

http://eli.thegreenplace.net/wp-content/uploads/2009/10/ptr_place.png

ptr_place is just a variable that contains an address inside [2]. This is the address to the first byte of the string that sits in another memory location. Compare this to the disassembly listing of the access to pointer_place[7] – it becomes clear why the compiler generates that code.

Variable names in C are just labels

This point is frequently ignored by programmers who don’t actually hack on compilers. A variable in C is just a convenient, alphanumeric pseudonym of a memory location. Were we writing assembly code, we would just create a label in some memory location and then access this label instead of always hard-coding the memory value – and this is what the compiler does.

Well, actually the address is not hard-coded in an absolute way because of loading and relocation issues, but for the sake of this discussion we don’t have to get into these details.

A label is something the compiler assigns at compile time. From here the great difference between arrays and pointers in C stems. And this is also why…

Arrays passed to functions are converted to pointers

Here’s a snippet:

void foo(char arr_arg[], char* ptr_arg)
{
    char a = arr_arg[7];
    char b = ptr_arg[7];
}

Quiz: how are the accesses to a and b different here?

Answer: they’re not!

    char a = arr_arg[7];

00412DCE  mov  eax,dword ptr [arr_arg]
00412DD1  mov  cl,byte ptr [eax+7]
00412DD4  mov  byte ptr [a],cl

    char b = ptr_arg[7];

00412DD7  mov  eax,dword ptr [ptr_arg]
00412DDA  mov  cl,byte ptr [eax+7]
00412DDD  mov  byte ptr [b],cl

This happens because arrays passed into functions are always converted into pointers. The argument declaration char arr_place[] is just syntactic sugar for char* arr_place [3].

Here’s a quote from K&R2:

When an array name is passed to a function, what is passed is the location of the initial element. Within the called function, this argument is a local variable, and so an array name parameter is a pointer, that is, a variable containing an address.

If this seems strange, think again. Recall the diagrams of the previous section. The C compiler has no choice here, since an array name is a label it replaces at compile time with the address it represents. But a function isn’t called at compile time, it’s called at run time, where something should be placed on the stack to be considered as an argument. The compiler can’t just treat array references inside a function as labels and replace them with addresses, because it has no idea what actual array will be passed in at run time.

This last point may be a bit convoluted, but it’s not critical to the understanding of the article. You can just take it as a fact: arrays passed to functions are converted to pointers, end of story!

Does the difference affect me?

Yes.

One way is that arrays just can’t be manipulated the way pointers can. Here’s a quote from Expert C Programming:

There is one difference between an array name and a pointer that must be kept in mind. A pointer is a variable, so pa=a and pa++ are legal. But an array name is not a variable; constructions like a=pa and a++ are illegal.

Here’s an example:

#include <stdio.h>


int main()
{
    int i;
    char array[] = "don't panic";
    char* ptr = array;

    /* array traversal */
    for (i = 0; i < sizeof(array); ++i)
        printf("%c ", array[i]);

    printf("\n");

    /* pointer traversal */
    for (; *ptr; ++ptr)
        printf("%c ", *ptr);

    return 0;
}

Note how an array has to be indexed with another variable. A pointer, on the contrary, is just a variable that can be manipulated freely.

Another, more important, difference is actually a common C gotcha:

Suppose one file contains a global array:

char my_arr[256];

And soothed by the seeming equivalence between arrays and pointers, the programmer that wants to use it in another file mistakingly declares as:

extern char* my_arr;

When he tries to access some element of the array using this pointer, he will most likely get a segmentation fault or a fatal exception (the nomenclature depends on the OS). Understanding why this happens is left as an exercise to the reader [4].

References

The following sources were helpful in the preparation of this article:

  • K&R2 – chapter 5
  • Expert C Programming, by Van der Linden – chapters 4, 9 and 10
  • The C FAQ, questions 6.1, 6.2, 6.3, 6.4, 6.10
http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1] That’s just because we’re on x86, by the way. On a CPU with a richer set of addressing modes (like PDP-11), it could have been done in a single instruction.
[2] Note that I drew a multi-byte memory cell for ptr_place. On my x86 32-bit machine, it actually takes 4 bytes with the least significant byte of the value in the lower address.
[3] By the way, so is char arr_place[100]. The size makes no difference to the C compiler – it’s still converted to a pointer.
[4] Hint: look at the first assembly listing in this article. How will the element be accessed via the pointer? What’s going to happen if it’s not actually a pointer but an array?

Related posts:

  1. Pointers to arrays in C
  2. Pointers vs. arrays in C, part 2(D)
  3. Allocating multi-dimensional arrays in C++
  4. Correct usage of const with pointers
  5. making sense of pointers

33 Responses to “Are pointers and arrays equivalent in C?”

  1. Marius GedminasNo Gravatar Says:

    Nice article!

    I would like to question this statement, though: “all pointers in C take sizeof(int) bytes.” While true for your x86 example, it’s not true universally. On typical 64-bit architectures sizeof(int) == 4 but sizeof(void *) == 8.

    I *think* that the C standard guarantees that a pointer fits in sizeof(long) bytes.

  2. Jared GrubbNo Gravatar Says:

    Another way that helps me remember that pointer arithmetic and arrays are the same is by showing that these four expressions are all equivalent in C and C++:

    x[10] == *(x+10) == *(10+x) == 10[x]

    (the last one looks funny but is legitimate C syntax)

  3. NitinNo Gravatar Says:

    Really nice article! I am not an expert C programmer but I want to make sure I understand the exercise. So, the reason for the segfault will be that the C compiler will try and apply pointer semantics to what’s actually an array, right? So, instead of just incrementing the address of my_ptr by some value, it will take the value stored at my_ptr and try to increment it by some value. This will, of course, be a problem.

    Am I right?

  4. Saulius MenkeviciusNo Gravatar Says:

    Marius: on Win64 we have sizeof(long) = 4, and sizeof(void*) = 8, so it is not guaranteed in any way for a ptr to get into long. (win64 uses LLP64 model)

  5. AlexNo Gravatar Says:

    No, it’s because “char my_arr[256]” declares a block of memory, for which you can retrieve the memory location by referencing “my_arr”.

    The declaration “extern char* my_arr” declares a pointer to a location in memory. This pointer contains NULL or some garbage, because it never gets initialized, and it never references the real block memory (char my_arr[256]) that was declared earlier.

    So when you access memory via the second declaration, the first 4 bytes of the memory block get interpreted as a pointer, thus a garbage or NULL pointer gets dereferenced, and the process segfaults. It could only be successfully dereferenced if the first 4 bytes (for x86 architecture) accidentally contained the address of a valid memory location.

  6. KrisNo Gravatar Says:

    If I recall correctly, however, by using:

    extern char my_arr[];

    will cause the compiler to do the right thing.

    Another difference between arrays and pointer:

    char a[256];
    char *b = a;

    printf(“a: %d b:%d\n”, sizeof(a), sizeof(b));

  7. FalainaNo Gravatar Says:

    Actually the reason arrays decay to pointers for formal parameters is simply because the standard says so; the compiler certainly has a choice: It could pass the entire array by value via the callstack and replace the array labels with the correct stack offset of the beginning of the array. This doesn’t happen because even when you declare void foo(int arr[]) the standard requires arr must be converted to a pointer, rather than allowing arr to be passed by value.

  8. elibenNo Gravatar Says:

    Marius,
    You’re right, thanks. I’ve removed the incorrect statement.

    Falaina,
    Passing a whole array by value is hardly a viable option. This would severly limit the ability to process large arrays in functions, as it’s very inefficient to copy whole arrays into the stack.

  9. Keith ThompsonNo Gravatar Says:

    Marius: No, the C standard doesn’t guarantee that pointers fit in sizeof(long) bytes. In fact, it makes no guarantees at all about the relative sizes of pointers and integer, or even of different pointer types (except that char* and void* have the same size and representation, as do all pointer-to-struct types and all pointer-to-union types).

    So you could have sizeof(char*)==8 and sizeof(int*)==4 (say, on a machine where a native address points to a word, and the compiler builds a char* pointer by combining a word address and a byte offset).

    C99 provides intptr_t and uintptr_t, integer types gauranteed to be able to hold a converted void* value without loss of information — but it doesn’t require them to exist. An implementation where a pointer is bigger than *any* integer is perfectly legal.

  10. Keith ThompsonNo Gravatar Says:

    Let me try to explain the relationship between arrays and pointers in a way that I personally find a bit clearer. Your mileage may vary.

    The most important thing to remember is this: Arrays are not pointers, and pointers are not arrays. Certain features of the language seem to conspire to make you think they’re equivalent. They are not.

    An expression of array type, in most contexts, is implicitly converted to a pointer to the first element of the array object. There are three cases where this conversion doesn’t occur:
    1. When the array is the operand of a unary “&” operator (so &arr yields the address of the array, not the address of its first element; same address, different type).
    2. When the array is the operand of a unary “sizeof” operator (so sizeof arr yields the size of the array, not the size of a pointer).
    3. When the array is a string literal in an initializer used to initialize an array object (so char arr[6] = “hello” works).

    Another rule: When you declare a function parameter with array type, it’s really of pointer type. So this:
    void foo(int arr[]);
    really means this:
    void foo(int *arr);
    This isn’t a conversion, it’s a compile-time translation.

    Finally, the indexing operator [] *doesn’t take an array operand*. It takes two operands, a pointer and an integer. p[i] is, by definition, equivalent to *(p+i). (And it’s commutative, so arr[42] is equivalent to 42[arr]. Now that you know that, please don’t use it.)

    So when you write arr[i], where arr is declared as an array object, the indexing operator gets a pointer to the first element of arr.

    And when you pass an array to a function:
    int arr[10];
    func(arr);
    you’re really passing an int*, not an array. This isn’t because it’s a function call, it’s because the expression is converted *before* the call. The same conversion would happen in any context other than the three that I mentioned.

    So when you say that “Arrays passed to functions are converted to pointers”, that’s really just one case of a more general rule.

  11. CodeJustinNo Gravatar Says:

    @Keith
    Really nice explanation

    @Eli
    Nice post, I’m going to link this on DZone for you =]

  12. corbyNo Gravatar Says:

    This article misses a BIG fundamental difference between arrays and pointers when dealing with memory allocation.

    I’ve been an embedded programmer for 15 years now and when I teach our younger coders about pointers vs. arrays I use a completely different emphasis.

    Arrays are allocated on the stack.
    Pointer memory is allocated on the heap.

    Of course this is too general: char* ptr = “test” is not actually on the stack, but in the DATA segment of the file, since the array value is defined at complie time.

    But what about dynamic allocation:
    char ary[8]; //is allocated on the local stack.
    char* ptr = new char[8]; //is allocated on the heap.

    This is default behavior of ‘new’ since it uses malloc, but this can be modified by using calloc which allocates memory on the stack.

    This is a VERY important fundamental difference between arrays and poniters in embedded systems where you may only have 1KB of thread stack available, but 2GB of cheap flash meory on the heap.
    One array declaration out of place and you blow the local stack and crash the OS :)

  13. daveNo Gravatar Says:

    There are conventions for pointers and longs and int’s relations. LP64 means longs and pointers are 64bits. Ints are not in this case big enough for a pointer.

    ILP64 means ints longs and pointers are all 64bit.

    You can even have LLP64 (older versions of windows were this way…) longs and ints don’t have enough space for a pointer, but a Long Long does.

    Need to be very careful when you tell people they can stick a pointer in an int.

    i suggest using C99 types from stdint.h like intptr_t which will be appropriately typedef’d for your platform.

    The ONLY relationship the C standards guarantee are the following:

    sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long0 <= sizeof(long long)

    BE CAREFUL! :-) I’ve fixed a lot of broken code in my day that didn’t pay attention to this stuff.

  14. daveNo Gravatar Says:

    Corby I’m pretty sure you’re wrong about pointers versus arrays being on the heap vs the stack.

    In fact, when you globally declare

    char * ptr = “Hello\n”;

    That “Hello” is likely encoded in a DATA area as well as the pointer.

    The only reason anything ends up in the heap, in most C runtimes, is if you allocate it via some library routine.

  15. Keith ThompsonNo Gravatar Says:

    Corby: Um, no.

    Pointer objects are allocated wherever you happen to allocate them. The memory that a pointer *points to* is typically allocated on the heap, but you can easily have a pointer point to an object that’s allocated anywhere you like.

    The C++ “new” operator very likely is implemented using malloc(), but that’s not specified by the language. This is an important distinction; allocating with “new” and deallocating with free(), or allocating with malloc() and deallocating with “delete”, might happen to work, or it might go badly wrong.

    calloc() does not allocate on the stack; it’s like malloc() except that the allocated memory is initialized to all-bits-zero. You’re probably thinking of alloca() (which is non-standard and generally not recommended).

    Any object can be allocated anywhere (or, as the C standard puts it, with any storage duration: static, automatic, or allocated). Pointers and arrays are just two particular kinds of objects; others are integers, floating-point objects, structs, and unions. The way a particular object is allocated depends entirely on the code used to create it.

    Incidentally, the C language standard doesn’t use the terms “stack” and “heap”.

  16. Keith ThompsonNo Gravatar Says:

    dave:
    You wrote:
    ==========
    The ONLY relationship the C standards guarantee are the following:

    sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
    ==========

    That’s not *quite* true. The standard makes certain guarantees about the *ranges* of the predefined types. It guarantees that sizeof(char)==1, but you could conceivably have sizeof(short) > sizeof(int) (this would require short to have padding bits, i.e., bits that don’t contribute to the value). In practice, the above size relationships will almost certainly hold.

    Some more guarantees:
    sizeof(char)==1
    The range of each type in the following list is a subset of the range of the following type:
    signed char, short, int, long, long long
    Likewise for the corresponding unsigned types. (Plain char has the same range and representation as either signed char or unsigned char, but is a distinct type.)
    char is at least 8 bits
    short and int are at least 16 bits
    long is at least 32 bits
    long long is at least 64 bits
    (Those guarantees are actually stated in terms of ranges.)

  17. daveNo Gravatar Says:

    Keith,

    You’re referring to defined limits of precision in the standard.

    You cite a rather interesting case where short is padded but int is not.

    There are ranks for precision of conversion that must be met in the standard as well requiring that the highest rank is long long, followed by long, followed by int, followed by short, and lastly char.

    That said, you have to be able to represent the numerical value of a short in an int… your example of padding digits is one I’ve never seen happen, but I suppose it could, and thusly could increase the size of a short, but the size would then not be an indication of precision :-)

    My main point is, knowing the LP32, LP64, ILP64 or whatever model of the system in question, is pretty much a prerequisite to doing useful things with integer types, and stdint.h is a nice portable way to get some of the behavior you want, when portability matters.

  18. turlyNo Gravatar Says:

    We used to have an interview question as follows:
    In str.c:

    char str [] = "Hello, world";

    and in main.c:

    extern char *str;
    
    int main (int argc, const char *argv [])
    {
         printf ("str is '%s'\n", str);
         return 0;
    }

    Q: what will happen when the resulting program is run, and why?

  19. Keith ThompsonNo Gravatar Says:

    turly: Since the program’s behavior is undefined, one of the infinitely many (but vanishingly unlikely) possibilities is that will print
    str is 'Hello, world'

    (Oh, and the “const” shouldn’t be there. The standard specifies two forms for the definition of main; neither of them uses “const”. It’s unlikely to cause any problems.)

    I once had an interview question involving the output of a program that involved something along the lines of “i = i++;”. I said the behavior was undefined. The interviewer insisted that it’s well defined. Another answer, of course, is that the line will never survive a code review; whatever it’s supposed to mean, there’s a better way to say it. (I got the job.)

  20. gwenhwyfaerNo Gravatar Says:

    Funnily enough, in BCPL arrays and pointers were exactly equivalent. If you declared an array, you got a variable with that name, initialised with a pointer to the number of items in that array. Likewise, functions were variables initialised to the address of the function. Dennis Ritchie changed this in the transition from B to C when he added types, because not having to chase the hidden pointer variables around made life easier (as I recall). Indeed, arguably the difference you’re going all around the houses to explain above is simply the difference between pointers (which *is* how C treats arrays) and pointer variables (exactly the same as the difference between a char and a char variable). And looked at that way, it all gets quite straightforward.

  21. PeterNo Gravatar Says:

    I cant be bothered to read the whole article. I just want to know if a is the same as b in the 2nd example. Please state the answer before you dive into the details. Otherwise, thanks

  22. IsaacNo Gravatar Says:

    Excellent article! That really clarifies my understanding of pointers and array in C. I’ve been learning it through P&H’s MIPS Architecture book, so it’s also interesting to see if presented in more “modern” x86 assembly.

    Cheers, coming from HN.

  23. Experiment GardenNo Gravatar Says:

    This is a very nice tutorial that is bound to help many programming newbies. I remember when I was taking beginning C++ in college. I had already been working with C++ for years but there were many in the class who had no prior programming experience. They didn’t understand the difference between pointers and arrays and also didn’t understand why anyone would use a pointer at all!

    I tried to explain the power of the pointer to them but I fear that most of them never got it. Your succinct explanation here, though, probably would have helped them understand.

  24. SachinNo Gravatar Says:

    Avery neat and clean and nice explanation….clearly explains the difference..thanks

  25. Andrew MontalentiNo Gravatar Says:

    Extremely nice and concise explanation, both OP and Keith Thompson in the comments thread.

    I used to TA a systems programming course that used C as the language. First half of the course was actually learning x86 assembler and then C. This array vs. pointer problem was one of the most common mistakes my students made. Heck, even I made it occasionally when I was actively programming in C.

  26. ConfusedNo Gravatar Says:

    I didn’t get it :(

    You say that all variables (including array variables) in C are just pseudonyms/labels to memory addresses, and the compiler during compilation replaces variable names with addresses. Yet these following two sentences sound exactly the same, but at the same time they “are quite different” (?):

    “Hence in the assignment to a, the 8th character of the array is taken by offsetting the *value* of array_place by 7, and moving the contents pointed to by the resulting address into the al register, and later into a.”

    “Therefore, to actually compute the offset of the 8th character of the string, the CPU will first copy the *value* of the pointer into a register and only then increment it.”

    So I don’t understand why CPU first has to copy the value of pointer in to register and only then increment it, when at the same time it can take the value of an array and offset it without copying it into register. If I’m guessing right, the confusion arises from the use of “value”: in the array example “value” is the label for memory location for the array name, and in pointer example “value” is what is contained at the memory location?

  27. elibenNo Gravatar Says:

    Confused,

    The diagrams right after the quoted paragraphs are there to clear this confusion. Pointers have different semantics from arrays in C, and the compiler does the service of creating code with another level of indirection when pointers are indexed. The CPU knows nothing about pointers or arrays – the compiler sets up the code to work as expected.

  28. H2CO3No Gravatar Says:

    “The semantics of arrays in C dictate that the array name is the address of the first element of the array.” <- wrong. An array is an array. Now in most contexts, it decays into (implicitly converted to) a pointer, but that doesn't mean at all that "an array name is the address of its first element". It isn't. If that was true, then the sizeof operator would be broken and useless when applied to arrays.

  29. elibenNo Gravatar Says:

    @H2CO3,

    You’re being overly pedantic. And not entirely humble. This is a blog post, not the ISO C standard. Since the blog post discusses semantics w.r.t. code generation, this statement is true enough. In modern compilers, the front-end would compute sizeof(arr) and place a constant in its place long before the backend starts to figure out how to emit machine code from array and pointer accesses. The “array name is the address” abstraction is very useful to understand a large family of issues with confusion between arrays and pointers in C.

  30. jucestainNo Gravatar Says:

    I think H2C03 has a good point. I’ve been trying to get this straight in my head for a while and I think ultimately the most confusing part is that arrays “decay” into pointers when they’re passed to functions, or used as arguments in operators, and so most people assume array names are pointers. For example, if you do something like int b[] = {1,2,3,4}; int c[] = {1,2,3,4}; b = c;, the compiler spits out the following warning: error: incompatible types when assigning to type 'int[4]' from type 'int *'. So, in the case of the assignment operator, the array name decays into a pointer as H2C03 suggests. But, the datatype for a and b are both int[4], which is an array. The same is true for basic functions, in that passing array names results in them decaying into pointers. The one thing I’m curious about is the sizeof operator, in that for this special function it must not decay into a pointer, so it must be a macro or some compiler related thing. Either way it’s incredibly confusing…

  31. elibenNo Gravatar Says:

    @jucestain,

    But this is precisely the point the article has tried to clear. The array name is a special symbol the compiler knows about (and thus can compute its sizeof). But you can’t pass it around – only the address of the array can be passed around – hence the decaying to pointer.

  32. jucestainNo Gravatar Says:

    @eliben,

    Hmm ok. Just to help clear some thinks up, the C Programming Language states “Since the name of an array is a synonym for the location of the initial element.” So you technically might be right, in regards to what H2CO3 said.

    Still confused. Maybe one day I’ll comprehend arrays…

  33. yaraeoventoNo Gravatar Says:

    int b[] = {1,2,3,4}; int c[] = {1,2,3,4}; b = c;
    This feels like a memory leak in the making.

    @eliben: Thank you so much for the great article!

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)