I'll start with a quiz. What does this function do?

def foo(lst):
    a = 0
    for i in lst:
        a += i
    b = 1
    for t in lst:
        b *= i
    return a, b

If you think "computes the sum and product of the items in lst", don't feel too bad about yourself. The bug here is often tricky to spot. If you did see it, well done - but buried in mountains of real code, and when you don't know it's a quiz, discovering the bug is significantly more difficult.

The bug here is due to using i instead of t in the body of the second for loop. But wait, how does this even work? Shouldn't i be invisible outside of the first loop? [1] Well, no. In fact, Python formally acknowledges that the names defined as for loop targets (a more formally rigorous name for "index variables") leak into the enclosing function scope. So this:

for i in [1, 2, 3]:

Is valid and prints 3, by design. In this writeup I want to explore why this is so, why it's unlikely to change, and also use it as a tracer bullet to dig into some interesting parts of the CPython compiler.

And by the way, if you're not convinced this behavior can cause real problems, consider this snippet:

def foo():
    lst = []
    for i in range(4):
        lst.append(lambda: i)
    print([f() for f in lst])

If you'd expect this to print [0, 1, 2, 3], no such luck. This code will, instead, emit [3, 3, 3, 3], because there's just a single i in the scope of foo, and this is what all the lambdas capture.

The official word

The Python reference documentation explicitly documents this behavior in the section on for loops:

The for-loop makes assignments to the variables(s) in the target list. [...] Names in the target list are not deleted when the loop is finished, but if the sequence is empty, they will not have been assigned to at all by the loop.

Note the last sentence - let's try it:

for i in []:

Indeed, a NameError is raised. Later on, we'll see that this is a natural outcome of the way the Python VM executes its bytecode.

Why this is so

I actually asked Guido van Rossum about this behavior and he was gracious enough to reply with some historical background (thanks Guido!). The motivation is keeping Python's simple approach to names and scopes without resorting to hacks (such as deleting all the values defined in the loop after it's done - think about the complications with exceptions, etc.) or more complex scoping rules.

In Python, the scoping rules are fairly simple and elegant: a block is either a module, a function body or a class body. Within a function body, names are visible from the point of their definition to the end of the block (including nested blocks such as nested functions). That's for local names, of course; global names (and other nonlocal names) have slightly different rules, but that's not pertinent to our discussion.

The important point here is: the innermost possible scope is a function body. Not a for loop body. Not a with block body. Python does not have nested lexical scopes below the level of a function, unlike some other languages (C and its progeny, for example).

So if you just go about implementing Python, this behavior is what you'll likely to end with. Here's another enlightening snippet:

for i in range(4):
    d = i * 2

Would it surprise you to find out that d is visible and accessible after the for loop is finished? No, this is just the way Python works. So why would the index variable be treated any differently?

By the way, the index variables of list comprehensions are also leaked to the enclosing scope. Or, to be precise, were leaked, before Python 3 came along.

Python 3 fixed the leakage from list comprehensions, along with other breaking changes. Make no mistake, changing such behavior is a major breakage in backwards compatibility. This is why I think the current behavior stuck and won't be changed.

Moreover, many folks still find this a useful feature of Python. Consider:

for i, item in enumerate(somegenerator()):
    dostuffwith(i, item)
print('The loop executed {0} times!'.format(i+1))

If you have no idea how many items somegenerator actually returned, this is a pretty succinct way to know. Otherwise you'd have to keep a separate counter.

Here's another example:

for i in somegenerator():
    if isinteresing(i):

Which is a useful pattern for finding things in a loop and using them afterwards [2].

There are other uses people came up with over the years that justify keeping this behavior in place. It's hard enough to instill breaking changes for features the core developers deem detrimental and harmful. When the feature is argued by many to be useful, and moreover is used in a huge bunch of code in the real world, the chances of removing it are zero.

Under the hood

Now the fun part. Let's see how the Python compiler and VM conspire to make this behavior possible. In this particular case, I think the most lucid way to present things is going backwards from the bytecode. I hope this may also serve as an interesting example on how to go about digging in Python's internals [3] in order to find stuff out (it's so much fun, seriously!)

Let's take a part of the function presented at the start of this article and disassemble it:

def foo(lst):
    a = 0
    for i in lst:
        a += i
    return a

The resulting bytecode is:

 0 LOAD_CONST               1 (0)
 3 STORE_FAST               1 (a)

 6 SETUP_LOOP              24 (to 33)
 9 LOAD_FAST                0 (lst)
13 FOR_ITER                16 (to 32)
16 STORE_FAST               2 (i)

19 LOAD_FAST                1 (a)
22 LOAD_FAST                2 (i)
26 STORE_FAST               1 (a)
29 JUMP_ABSOLUTE           13

33 LOAD_FAST                1 (a)

As a reminder, LOAD_FAST and STORE_FAST are the opcodes Python uses to access names that are only used within a function. Since the Python compiler knows statically (at compile-time) how many such names exist in each function, they can be accessed with static array offsets as opposed to a hash table, which makes access significanly faster (hence the _FAST suffix). But I digress. What's really important here is that a and i are treated identically. They are both fetched with LOAD_FAST and modified with STORE_FAST. There is absolutely no reason to assume that their visibility is in any way different [4].

So how did this come to be? Somehow, the compiler figured that i is just another local name within foo. This logic lives in the symbol table code, when the compiler walks over the AST to create a control-flow graph from which bytecode is later emitted; there are more details about this process in my article about symbol tables - so I'll just stick to the essentials here.

The symtable code doesn't treat for statements very specially. In symtable_visit_stmt we have:

case For_kind:
    VISIT(st, expr, s->v.For.target);
    VISIT(st, expr, s->v.For.iter);
    VISIT_SEQ(st, stmt, s->v.For.body);
    if (s->v.For.orelse)
        VISIT_SEQ(st, stmt, s->v.For.orelse);

The loop target is visited as any other expression. Since this code visits the AST, it's worthwhile to dump it to see how the node for the for statement looks:

For(target=Name(id='i', ctx=Store()),
    iter=Name(id='lst', ctx=Load()),
    body=[AugAssign(target=Name(id='a', ctx=Store()),
                    value=Name(id='i', ctx=Load()))],

So i lives in a Name node. These are handled in the symbol table code by the following clause in symtable_visit_expr:

case Name_kind:
    if (!symtable_add_def(st, e->v.Name.id,
                          e->v.Name.ctx == Load ? USE : DEF_LOCAL))
        VISIT_QUIT(st, 0);
    /* ... */

Since the name i is clearly tagged with DEF_LOCAL (because of the *_FAST opcodes emitted to access it, but this is also easy to observe if the symbol table is dumped using the symtable module), the code above evidently calls symtable_add_def with DEF_LOCAL as the third argument. This is the right time to glance at the AST above and notice the ctx=Store part of the Name node of i. So it's the AST that already comes in carrying the information that i is stored to in the target part of the For node. Let's see how that comes to be.

The AST-building part of the compiler goes over the parse tree (which is a fairly low-level hierarchical representation of the source code - some background is available here) and, among other things, sets the expr_context attributes on some nodes, most notably Name nodes. Think about it this way, in the following statement:

foo = bar + 1

Both foo and bar are going to end up in Name nodes. But while bar is only being loaded from, foo is actually being stored into in this code. The expr_context attribute is used to distinguish between uses for later consumption by the symbol table code [5].

Back to our for loop targets, though. These are handled in the function that creates an AST for for statements - ast_for_for_stmt. Here are the relevant parts of this function:

static stmt_ty
ast_for_for_stmt(struct compiling *c, const node *n)
    asdl_seq *_target, *seq = NULL, *suite_seq;
    expr_ty expression;
    expr_ty target, first;

    /* ... */

    node_target = CHILD(n, 1);
    _target = ast_for_exprlist(c, node_target, Store);
    if (!_target)
        return NULL;
    /* Check the # of children rather than the length of _target, since
       for x, in ... has 1 element in _target, but still requires a Tuple. */
    first = (expr_ty)asdl_seq_GET(_target, 0);
    if (NCH(node_target) == 1)
        target = first;
        target = Tuple(_target, Store, first->lineno, first->col_offset, c->c_arena);

    /* ... */

    return For(target, expression, suite_seq, seq, LINENO(n), n->n_col_offset,

The Store context is created in the call to ast_for_exprlist, which creates the node for the target (recall the the for loop target may be a sequence of names for tuple unpacking, not just a single name).

This function is probably the most important part in the process of explaining why for loop targets are treated similarly to other names within the loop. After this tagging happens in the AST, the code for handling such names in the symbol table and VM is no different from other names.

Wrapping up

This article discusses a particular behavior of Python that may be considered a "gotcha" by some. I hope the article does a decent job of explaining how this behavior flows naturally from the naming and scoping semantics of Python, why it can be useful and hence is unlikely to ever change, and how the internals of the Python compiler make it work under the hood. Thanks for reading!

[1]Here I'm tempted to make a Microsoft Visual C++ 6 joke, but the fact that most readers of this blog in 2015 won't get it is somewhat disturbing (because it reflects my age, not the abilities of my readers).
[2]You could argue that dowithstuff(i) could go into the if right before the break here. But this isn't always convenient. Besides, according to Guido there's a nice separation of concerns here - the loop is used for searching, and only that. What happens with the value after the search is done is not the loop's concern. I think this is a very good point.
[3]As usual for my articles on Python's internals, this is about Python 3. Specifically, I'm looking at the default branch of the Python repository, where work on the next release (3.5) is being done. But for this particular topic, the source code of any release in the 3.x series should do.
[4]Another thing clear from the disassembly is why i remains invisible if the loop doesn't execute. The GET_ITER and FOR_ITER pair of opcodes treat the thing we loop over as an iterator and then call its __next__ method. If that call ends up raising StopIteration, the VM catches it and exits the loop. Only if an actual value is returned does the VM proceed to execute STORE_FAST to i, thus bringing it into existence for subsequent code to refer to.
[5]It's a curious design, which I suspect stems from the desire for relatively clean recursive visitation code in AST consumers such as the symbol table code and CFG generation.


comments powered by Disqus