The perils of unsigned iteration in C/C++
June 11th, 2010 at 6:17 amC and C++ frequently coax you into using an unsigned type for iteration. Standard functions like strlen and the size method of containers (in C++) return size_t, which is an unsigned type, so to avoid conversion warnings you comply and iterate with a variable of the appropriate type. For example:
size_t len = strlen(some_c_str);
size_t i;
for (i = 0; i < len; ++i) {
/* Do stuff with each char of some_c_str
*/
}
I’ve long been aware of one painful gotcha of using size_t for iteration – using it for iterating backwards. The following code will fail:
/* Warning: buggy code!
*/
size_t len = strlen(some_c_str);
size_t i;
for (i = len - 1; i >= 0; --i) {
/* Do stuff with each char of some_c_str, backwards
*/
}
When i reaches 0 it’s still within bounds, so it will be decremented and become a huge positive number (probably 2^((sizeof(size_t)*8) - 1). Congratulations, we have an infinite loop.
Today I ran into another manifestation of this problem. This one is more insidious, because it happens only for some kinds of input. I wrote the following code because the operation had to consider each character in the string and the character after it:
/* Warning: buggy code!
*/
size_t len = strlen(some_c_str);
size_t i;
for (i = 0; i < len - 1; ++i) {
/* Do stuff with some_c_str[i] and some_c_str[i+1].
*/
}
Can you spot the bug?
When some_c_str is empty, len is 0. Therefore, i is compared with the unsigned version of -1, which is that huge positive number again. What chance does poor i have against such a giant? It will just keep chugging along, well beyond the length of my string.
As I see it, to avoid the problem we can either:
- Use an int variable and cast the return value of strlen to int. This feels a bit dirty, especially in C++ where you’d have to use static_cast<int>.
- Just keep using unsigned types for iteration, but be extra careful and use various hacks to avoid the problematic corner cases.
None of these options is ideal, so if you have a better idea, let me know.
Edit 12.06.2010: Thanks everyone for the excellent comments! It’s obvious creative ways exist to overcome this problem for unsigned types. Still, it remains a gotcha even seasoned programmers stumble upon from time to time. It’s not surprising that many C/C++ style guides recommend keeping unsigned types for bitfields only, using plain ints for everything else.
Related posts:

June 11th, 2010 at 08:51
i use
June 11th, 2010 at 09:56
It really only makes sense for equal / not equal comparisons with zero for unsigned variables, which gives your second example a bad smell – perhaps compilers could include this as a warning…?
In general of course this is what C++’s STL iterators and algorithms are for… but beside that (and for C code), I’d probably tend to use pointer equality comparison rather than indexing into an array, and avoid the index altogether (this is effectively what the relevant STL algorithms do anyway);
char* ptr;
for (ptr = some_c_str; ptr != some_c_str+len; ptr++) {
/* do stuff with *ptr */
}
maybe this is a bit of a hack, but I think often introducing an index variable at all causes problems (personally I hate C arrays, but probably more people hate pointers…)
June 11th, 2010 at 10:44
The usual form for iterating backwards would be:
In the second case, when you want to process
some_c_str[i]andsome_c_str[i+1], you need to check that makes sense before entering the loop, so check the length of the string first.June 11th, 2010 at 10:52
size_t len = strlen(some_c_str);
// warning: correct code follows
for (size_t i = len; i–;) {
/* Do stuff with each char of some_c_str, backwards
*/
}
// i not longer accessible outside the for loop
June 11th, 2010 at 11:58
The signed version of size_t is ssize_t, not int. It can make a big difference (literally) on 64-bit machines
June 11th, 2010 at 12:45
Since you mention C++, in many cases you can use iterators which shield you from these gotchas. I agree, static_cast would be a code smell
Obviously iterators have their own costs/quirks it may or may not be applicable to your specific case / codebase.
June 11th, 2010 at 13:35
@and and @Ben Bass,
This won’t help cleanly access a char and the next char. Simple iteration is not a problem, especially over a string where I could simply use:
@Thomas & @Mr. T,
Thanks, an interesting approach. I must admit I haven’t seen a lot of real code that uses this idiom, and on first sight it raises an eyebrow as your subconscience battles with the fact that it indeed never makes you point outside the string.
@Francesco,
Yes, using a reverse iterator with
rbegin()andrend()would help. But not for the second problemJune 11th, 2010 at 14:02
If you need to abstract that into iterators because it’s too sharp a tool, maybe it’s time for you to reconsider your choice of language.
The only peril of unsigned loop counters is, possibly, making impossible for the compiler to prove their boundedness.
June 11th, 2010 at 14:14
For the first, why not take into account that len?
For the second I think you’ll just have to make sure len isn’t zero.
June 11th, 2010 at 14:16
That should read:
For the first, why not take into account that less-than zero is the same as greater-than len?
June 11th, 2010 at 14:29
Your loop invariant is that both i and i+1 must be valid indexes into some_c_str. It doesn’t make sense to talk about negative sizes so just work in positives.
June 11th, 2010 at 14:31
One cool way to make these bugs less subtle IMHO is to use an integer class that won’t silently overflow.
I’ve used the SafeInt library from microsoft before, it works great: http://blogs.msdn.com/b/david_leblanc/archive/2008/09/30/safeint-3-on-codeplex.aspx
SafeInt<size_t>is just like asize_t, except that it(a) throws on overflow
(b) disallows implicit conversions to other integer types
It also works fine on GCC.
June 11th, 2010 at 15:53
In this case the simplest fix is:
size_t len = strlen(some_c_str);
size_t i;
for (i = 0; i+1 < len; ++i) {
/* Do stuff with some_c_str[i] and some_c_str[i+1].
*/
}
June 11th, 2010 at 16:46
JavaScript doesn’t have an unsigned type, so the initial problem affect that dialect of ECMAScript, though I have been bitten by this in ActionScript, which does support unsigned types.
Although most ECMAScript implementations offer
forEach-equivalents, the while-loop, length-down-to-zero pattern is becoming more popular in the JavaScript community as that construct performs slightly faster in most implementations.The main caveat here, which is not just limited to this approach, is that JavaScript only has function-local, and not block-local scope.
June 11th, 2010 at 17:15
Couldn’t you replace
i < len-1byi+1 < len?June 11th, 2010 at 17:38
equivalently
June 11th, 2010 at 17:39
Oops, forgot ++b1 from the second one
June 11th, 2010 at 18:00
You write an unsigned down-count loop like:
for (i = len – 1; i < len; i–)
and your second example should use:
for (i = 0; i+1 < len; i++)
In unsigned, "a < b" is not equivalent to "a-c < b-c".
June 11th, 2010 at 18:05
You write an unsigned down-count loop like:
and your second example should use:
The actual condition you need is “i<len && i+1 < len", but that simplifies to just
"i+1 < len". In unsigned arithmetic, "a < b" is not equivalent to "a-c < b-c", so
this is not the same condition as "i < len-1".
June 11th, 2010 at 18:51
Here’s a clean way to write that code without changing to signed or adding lots of extra checks:
June 11th, 2010 at 18:57
size_t len = strlen(some_c_str);
size_t i;
for (i = 0; i < len; i++) {
/* Do stuff with some_c_str[i] and some_c_str[i+1].
*/
}
June 11th, 2010 at 19:33
One way to “solve” the problem might be to do use “i + 1″ instead of “len – 1″ in the comparison:
This may cause extra move/increment instructions, unless they can be optimized away.
June 11th, 2010 at 20:11
Don’t use int’s as loop iterator variables! This introduces a rather bad 32-bit limitation in your code, were it ever to run on 64-bit machines. On 64-bit machines, it is quite likely already you will encounter arrays with more than 2 billion elements- and on most 64-bit C/C++ compilers, ints are still 32 bits. Java is hitting this problem hard (see all the blog posts on how hard it is to write a correct binary search- almost all the problems they talk about are due to very large array indicies that overflow 32-bit arithmetic). It’s only when counting down are unsigned values a problem.
June 11th, 2010 at 20:28
If you used a signed loop variable instead, would the bugs be any better? Looping 2 billion times or not looping at all still leaves a bug. At least if it’s looping you have plenty of time to break into the debugger and see where it’s stuck.
June 11th, 2010 at 20:58
I also prefer using the beginning and end of the string as my iterator bounds.
June 11th, 2010 at 22:06
@eliben: When using iterators instead of indexes (as ‘and’ and Ben Bass comment), you just use p[1] instead of s[i+1] to access the next element.
June 11th, 2010 at 23:04
I write your last example
for (i = 0; i+1 < len; ++i) ...when i is unsigned. I don’t like to use the postfix operators.
June 12th, 2010 at 06:17
@Roger Pate,
But then you can’t iterate until
enditself, but must iterate untilend - 1.@Matt, Laurie and others,
Using
i + 1 < lenis an interesting approach – I didn’t think about it at all. You must admit that it’s not an idiomatic way to loop, however.June 13th, 2010 at 10:16
Rather than using i + 1 < len, why not sanitize len before the loop, which saves you an increment each loop as well? For readability, you can store sanitized value in another variable if you like.
Alternatively, you can loop by referring to the 2nd value of the pair:
June 13th, 2010 at 11:08
Sorry my first example won’t work, but the 2nd should
June 14th, 2010 at 19:07
Just error out of the function early before the loop:
if (len == 0) return;
March 19th, 2012 at 17:13
I almost always loop using unsigned variables if that is appropriate. For decreasing loops over the interval [a,b) I use the following idiom, which curiously I have never seen used by others:
for (unsigned i=b; i-->a; ){ // stuff using i here
}
This works well for any possible a,b, including a=0. If a<=b, then it is equivalent to a version with
i-- != athat I saw in the comments above, buti --> a(I often use spacing like that) just look so cute.