Python internals: adding a new statement to Python
June 30th, 2010 at 7:18 pmThis article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I’m taking a hands-on approach here: I’m going to add an until statement to Python.
All the coding for this article was done against the cutting-edge Py3k branch in the Python Mercurial repository mirror.
The until statement
Some languages, like Ruby, have an until statement, which is the complement to while (until num == 0 is equivalent to while num != 0). In Ruby, I can write:
num = 3
until num == 0 do
puts num
num -= 1
end
And it will print:
3
2
1
So, I want to add a similar capability to Python. That is, being able to write:
num = 3
until num == 0:
print(num)
num -= 1
A language-advocacy digression
This article doesn’t attempt to suggest the addition of an until statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python’s philosophy of minimalism. All I’m trying to do here, really, is gain some insight into the inner workings of Python.
Modifying the grammar
Python uses a custom parser generator named pgen. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file Grammar/Grammar [1]. This is a simple text file that specifies the grammar of Python.
Two modifications have to be made to the grammar file. The first is to add a definition for the until statement. I found where the while statement was defined (while_stmt), and added until_stmt below [2]:
compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite
Note that I’ve decided to exclude the else clause from my definition of until, just to make it a little bit different (and because frankly I dislike the else clause of loops and don’t think it fits well with the Zen of Python).
The second change is to modify the rule for compound_stmt to include until_stmt, as you can see in the snippet above. It’s right after while_stmt, again.
When you run make after modifying Grammar/Grammar, notice that the pgen program is run to re-generate Include/graminit.h and Python/graminit.c, and then several files get re-compiled.
Modifying the AST generation code
After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are much simpler to work with in subsequent stages of the compilation process.
So, we’re going to visit Parser/Python.asdl which defines the structure of Python’s ASTs and add an AST node for our new until statement, again right below the while:
| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)
If you now run make, notice that before compiling a bunch of files, Parser/asdl_c.py is run to generate C code from the AST definition file. This (like Grammar/Grammar) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since Parser/asdl_c.py is a Python script, this is a kind of bootstrapping – to build Python from scratch, Python already has to be available.
While Parser/asdl_c.py generated the code to manage our newly defined AST node (into the files Include/Python-ast.h and Python/Python-ast.c), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file Python/ast.c. There, a function named ast_for_stmt converts parse tree nodes for statements into AST nodes. Again, guided by our old friend while, we jump right into the big switch for handling compound statements and add a clause for until_stmt:
case while_stmt:
return ast_for_while_stmt(c, ch);
case until_stmt:
return ast_for_until_stmt(c, ch);
Now we should implement ast_for_until_stmt. Here it is:
static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
/* until_stmt: 'until' test ':' suite */
REQ(n, until_stmt);
if (NCH(n) == 4) {
expr_ty expression;
asdl_seq *suite_seq;
expression = ast_for_expr(c, CHILD(n, 1));
if (!expression)
return NULL;
suite_seq = ast_for_suite(c, CHILD(n, 3));
if (!suite_seq)
return NULL;
return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
}
PyErr_Format(PyExc_SystemError,
"wrong number of tokens for 'until' statement: %d",
NCH(n));
return NULL;
}
Again, this was coded while closely looking at the equivalent ast_for_while_stmt, with the difference that for until I’ve decided not to support the else clause. As expected, the AST is created recursively, using other AST creating functions like ast_for_expr for the condition expression and ast_for_suite for the body of the until statement. Finally, a new node named Until is returned.
Note that we access the parse-tree node n using some macros like NCH and CHILD. These are worth understanding – their code is in Include/node.h.
Digression: AST composition
I chose to create a new type of AST for the until statement, but actually this isn’t necessary. I could’ve saved some work and implemented the new functionality using composition of existing AST nodes, since:
until condition:
# do stuff
Is functionally equivalent to:
while not condition:
# do stuff
Instead of creating the Until node in ast_for_until_stmt, I could have created a Not node with an While node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped.
Compiling ASTs into bytecode
The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article.
The code we will look at next is Python/compile.c. Following the lead of while, we find the function compiler_visit_stmt, which is responsible for compiling statements into bytecode. We add a clause for Until:
case While_kind:
return compiler_while(c, s);
case Until_kind:
return compiler_until(c, s);
If you wonder what Until_kind is, it’s a constant (actually a value of the _stmt_kind enumeration) automatically generated from the AST definition file into Include/Python-ast.h. Anyway, we call compiler_until which, of course, still doesn’t exist. I’ll get to it an a moment.
If you’re curious like me, you’ll notice that compiler_visit_stmt is peculiar. No amount of grep-ping the source tree reveals where it is called. When this is the case, only one option remains – C macro-fu. Indeed, a short investigation leads us to the VISIT macro defined in Python/compile.c:
#define VISIT(C, TYPE, V) {\
if (!compiler_visit_ ## TYPE((C), (V))) \
return 0; \
It’s used to invoke compiler_visit_stmt in compiler_body. Back to our business, however…
As promised, here’s compiler_until:
static int
compiler_until(struct compiler *c, stmt_ty s)
{
basicblock *loop, *end, *anchor = NULL;
int constant = expr_constant(s->v.Until.test);
if (constant == 1) {
return 1;
}
loop = compiler_new_block(c);
end = compiler_new_block(c);
if (constant == -1) {
anchor = compiler_new_block(c);
if (anchor == NULL)
return 0;
}
if (loop == NULL || end == NULL)
return 0;
ADDOP_JREL(c, SETUP_LOOP, end);
compiler_use_next_block(c, loop);
if (!compiler_push_fblock(c, LOOP, loop))
return 0;
if (constant == -1) {
VISIT(c, expr, s->v.Until.test);
ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
}
VISIT_SEQ(c, stmt, s->v.Until.body);
ADDOP_JABS(c, JUMP_ABSOLUTE, loop);
if (constant == -1) {
compiler_use_next_block(c, anchor);
ADDOP(c, POP_BLOCK);
}
compiler_pop_fblock(c, LOOP, loop);
compiler_use_next_block(c, end);
return 1;
}
I have a confession to make: this code wasn’t written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin compiler_while function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the dis module, which has a list of Python bytecodes with descriptions, it’s possible to understand what’s going on.
That’s it, we’re done… Aren’t we?
After making all the changes and running make, we can run the newly compiled Python and try our new until statement:
>>> until num == 0:
... print(num)
... num -= 1
...
3
2
1
Voila, it works! Let’s see the bytecode created for the new statement by using the dis module as follows:
import dis
def myfoo(num):
until num == 0:
print(num)
num -= 1
dis.dis(myfoo)
Here’s the result:
4 0 SETUP_LOOP 36 (to 39)
>> 3 LOAD_FAST 0 (num)
6 LOAD_CONST 1 (0)
9 COMPARE_OP 2 (==)
12 POP_JUMP_IF_TRUE 38
5 15 LOAD_NAME 0 (print)
18 LOAD_FAST 0 (num)
21 CALL_FUNCTION 1
24 POP_TOP
6 25 LOAD_FAST 0 (num)
28 LOAD_CONST 2 (1)
31 INPLACE_SUBTRACT
32 STORE_FAST 0 (num)
35 JUMP_ABSOLUTE 3
>> 38 POP_BLOCK
>> 39 LOAD_CONST 0 (None)
42 RETURN_VALUE
The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for until. If the jump isn’t executed, the loop body keeps running until it jumps back to the condition at operation 35.
Feeling good about my change, I then tried running the function (executing myfoo(3)) instead of showing its bytecode. The result was less than encouraging:
Traceback (most recent call last):
File "zy.py", line 9, in <module>
myfoo(3)
File "zy.py", line 5, in myfoo
print(num)
SystemError: no locals when loading 'print'
Whoa… this can’t be good. So what went wrong?
The case of the missing symbol table
One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to PySymtable_Build in PyAST_Compile calls into the symbol table module (Python/symtable.c), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope.
To fix the problem, we have to modify the symtable_visit_stmt function in Python/symtable.c, adding code for handling until statements, after the similar code for while statements [3]:
case While_kind:
VISIT(st, expr, s->v.While.test);
VISIT_SEQ(st, stmt, s->v.While.body);
if (s->v.While.orelse)
VISIT_SEQ(st, stmt, s->v.While.orelse);
break;
case Until_kind:
VISIT(st, expr, s->v.Until.test);
VISIT_SEQ(st, stmt, s->v.Until.body);
break;
And now we really are done. Compiling the source after this change makes the execution of myfoo(3) work as expected.
Conclusion
In this article I’ve demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn’t difficult to implement, because I used a similar and existing statement as a guideline.
The Python compiler is a sophisticated chunk of software, and I don’t claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler’s principles and source code. It will serve as a base for future articles that will get deeper into the compiler.
References
I used a few excellent references for the construction of this article. Here they are, in no particular order:
- PEP 339: Design of the CPython compiler – probably the most important and comprehensive piece of official documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python.
- "Python Compiler Internals" – an article by Thomas Lee
- "Python: Design and Implementation" – a presentation by Guido van Rossum
- Python (2.5) Virtual Machine, A guided tour – a presentation by Peter Tröger

| [1] | From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run configure and make to build Python. |
| [2] | This demonstrates a common technique I use when modifying source code I’m not familiar with: work by similarity. This principle won’t solve all your problems, but it can definitely ease the process. Since everything that has to be done for while also has to be done for until, it serves as a pretty good guideline. |
| [3] | By the way, without this code there’s a compiler warning for Python/symtable.c. The compiler notices that the Until_kind enumeration value isn’t handled in the switch statement of symtable_visit_stmt and complains. It’s always important to check for compiler warnings! |
Related posts:

June 30th, 2010 at 20:34
Fantastic. Well done. I hope more people with experience in these kinds of things will use this post as a template for writing more posts that expose more of the Python internals. Maybe by the end of the year I’ll actually feel confident about becoming a contributor myself!
Thanks for this.
June 30th, 2010 at 21:04
This is an excellent piece of work, Eli, and I hope you will consent to its becoming a part of the official Python documentation (along with other similarly worthy things you may write). As Brian has already indicated it’s this kind of low-level practical detail that helps people realise you don’t need to be a “software superman” to help make Python better (and I appreciate your acknowledgment that an “until” statement would be anti-pythonic).
Now if you could find a suitable syntax for the loop-and-a-half http://www.cs.duke.edu/~ola/patterns/plopd/loops.html#loop-and-a-half that *would* be worth implementing!
June 30th, 2010 at 21:13
@Brian,
Glad you like it, thanks for the feedback.
@Steve,
Of course I consent – any idea where in the documentation such material could be incorporated?
By the way, that’s a great link on loop patterns – I’m adding it to my to-read list
June 30th, 2010 at 21:58
PEP 306: How to Change Python’s Grammar
http://python.org/dev/peps/pep-0306/
It is usually out of date, but still handy.
June 30th, 2010 at 22:02
This is one of the most interesting and informative posts I have read in a long time. Well done.
July 1st, 2010 at 02:48
This is against the Zen of Python
“There should be one– and preferably ONLY ONE –obvious way to do it.”
while not … is ok, and “until” stament is redundant
ruby sucks, so dont try to copy that xD
despite that, good work
July 1st, 2010 at 07:27
Fantastic !!
July 1st, 2010 at 08:21
Just a short question, is this specific to Python 3, or does it work with any version ?
(did they make changes to the scanner/parser/etc – combo ?)
July 1st, 2010 at 08:41
You are a very sick individual — good job!
I have been slightly touched by the grammar of Ruby and have always been attracted to Python. This article is an articulation that has been missing in my existence with these language sets for some time. I had actually thought about how excellent a language would be that encapsulated the essence of both and here you go…
Thanks for your time and effort — you really helped bury a question that I, as well as many others, have had lingering for some time due to our ignorance to these topics.
Compiler development is a serious endeavor and one you atomized quite efficiently, especially where byte code is concerned.
Do-while; if-else, OOP/OOD style where ‘until’ is interpreter style giving Python the best of both worlds. Use the interpreted for straight through and climb into objects as needed in sidebar fashion. This type of development is very inspiring…
July 1st, 2010 at 08:44
Interesting, not something I would take time to try doing. Gives some insight on how the grammar is defined and how VM works.
Now, what about you trying do the same (bend the grammar), but this time from Python itself, in runtime. Is that possible? If so, how far can you stretch it?
July 1st, 2010 at 09:39
Excellent and interesting article. Thank you.
Similar article about Common Lisp would be short:
Or more low-level:
And now there is:
July 1st, 2010 at 11:12
Back in the late 1970s, I wrote an n+1/2 loop construct (c.f. Steve Holden’s link to a loop-and-a-half construct) for a rationalised front-end preprocessor for a BASIC interpretter. In those days, BASICs didn’t have any loops except for FOR. The syntax that I used was:
LOOP{stmt block}
[ EXITIF {condition}
{stmt block}
] ...
ENDLOOP
Note: you can have multiple EXITIF
This could emulate all common types of loop
e.g. (using BASIC syntax)
FOR cv = start TO end STEP step{stmt block}
NEXT
would become
cv = startLOOP
EXITIF cv > end
{stmt block}
ENDLOOP
a ‘while’ loop (not in the target language) would be
LOOPEXITIF NOT condition
{stmt block}
ENDLOOP
(I later added an EXITIFNOT variant)
an ‘until’ loop (not in the target language) would be
LOOPEXITIF condition
{stmt block}
ENDLOOP
a ‘do-while’ loop (execute at least once) would be
LOOP{stmt block}
EXITIFNOT {condition}
ENDLOOP
The reason that I started on n+1/2 loops was because I was working on the PASCAL P-code compiler which used the following constructs in many places (I have forgotten PASCAL syntax but the following is close enough):
do{stmt block 1};
test = {condition};
if test
then
{stmt block 2}
until test
which uses an intermediate value for the termination condition and checks it twice. Using ‘my’ syntax, this would be
LOOP{stmt block 1}
EXITIFNOT {condition}
{stmt block 2}
ENDLOOP
When I started paid work in 1980, I was writing in Fortran IV (another unstructured language), so I wrote a rationaliser for that as well using the same syntax.
July 1st, 2010 at 11:15
OOPS error in my post
should become
cv = startLOOP
EXITIF cv > end
{stmt block}
cv = cv + step
ENDLOOP
July 1st, 2010 at 12:34
Fantastic post. Will be trying this at home today
July 1st, 2010 at 16:58
@Yerko & RDRush,
Just to point out once again – I’m not really proposing adding
untilto Python. This is just an exercise, anduntilis the first thing that sprang to mind.July 1st, 2010 at 18:09
Excellent article!
I especially like how it can be used as a future reference “howto” document for adding statements.
Keep up the good work!
July 1st, 2010 at 22:12
Minimalism is good, but the amount of times I need an until loop, and end up with the
while True:hack is not insignificant. I really wish Python would have an “loop and a half” statement, it’s IMO the major misfeature of Python. And I propose that “until” is an excellent name and syntax for that statement.July 2nd, 2010 at 04:24
@joeuser Most patches to 3.x grammar/bytecode loop will apply cleanly to 2.x. The big changes to Python’s AST (Abstract Syntax Tree) and going fully 64-bit happened for the 2.5 release. Python 3.x is a fork of that. I had to rewrite my class decorator patch for 2.5 and I was finally allowed to check it in for 2.6 (yay!).
Of course if you are dealing with str/unicode the differences between 2.x and 3.x can be large. This doesn’t impact the grammar/bytecode much but it does effect many modules.
July 2nd, 2010 at 14:54
until is ultimately confusing
if you want an inversed do…while inverse the abort condition. not the statement.
July 2nd, 2010 at 19:00
I have just suggested on python-dev that it be put under python.org/dev/ somewhere. I think that’s where it really belongs – I have a larger concept of “docs” than some, I guess – didn’t mean docs.python.org. Sorry. Let’s see what the devs say.
July 2nd, 2010 at 19:10
I was going to mention PEP 306 as well, but Jack beat me to it.
@Lennart: if you can find a solution to PEP 315 that is genuinely superior to the fully general loops already provided by “while True:” you will likely find plenty of supporters. Perceptions differ as to how much of a hack people consider the current solution to be though, so be prepared for an argument if you volunteer to take another crack at it
July 2nd, 2010 at 20:31
Very nice. Don’t forget to update the parser module too, though! (Though it’s not clear if anyone cares about the parser module any more; AST is much nicer to work with.)
July 3rd, 2010 at 12:38
@Nick Coghlan: Well, in my opinion, I *have* found a solution that is genuinely superior. Un “until” statement, which always runs the block once before the test.
If that is not superior than “while True:” then why even have an expression in the while statement? You can replace
while foo == bar:do_something()
with
while True:if foo != bar:
break
do_something()
as well. Is that superior?
March 28th, 2011 at 08:09
Yasher Koach! Very fascinating read.
March 28th, 2011 at 08:13
@Henrique See http://code.activestate.com/recipes/576944-the-goto-decorator/
There a real
gotostatement is emulated in pure Python.March 11th, 2012 at 21:41
Hi Eli,
Thanks for the excellent article. I tried to follow the same procedure to build Python from source on Windows. Things worked out of the box without much changes.
However, I would like to point out a small change.
Inside the body of the following function,
static int
compiler_until(struct compiler *c, stmt_ty s)
the second line is currently
int constant = expr_constant(s->v.Until.test);
it must instead be
int constant = expr_constant(c, s->v.Until.test); # this function needs the reference to the compiler as well
I’ve been following your blog all along and I am always delighted by your efforts to explain even the most intricate technical details.
Thanks a lot for sharing your experiences and Keep up the good work !!
March 12th, 2012 at 07:47
Karthikeyan,
Thanks for the correction.
March 12th, 2012 at 22:07
Cheers for the attribution, Eli! Here’s a link to my original paper on the topic from an earlier Australian OSDC in case folks are interested:
http://tomlee.co/wp-content/uploads/2008/12/python-language-internals.pdf
May 19th, 2012 at 20:49
Hello this is a nice article, im from mexico and i need to translate some of the native statements of python to spanish for a project at school. Im trying this method but fails “SystemError: invalid node 339 for PyAST_FromNode”, i don’t know if is there an easy way to do this, can you please help me??
May 20th, 2012 at 03:14
Heli,
There’s no “easy” way to do this without modifying the Python lexer & parser. However, keep in mind that you probably don’t need to go as deep as this article went. Just change the lexer & parser to recognize “mientras” instead of “while”, but keep all the rest the same (i.e. the node internally can still be “while” for CPython).
July 20th, 2012 at 05:01
Fantastic!!!!! CPython is so cool~
July 31st, 2012 at 09:38
It helps me look at insight of python. It’s very interesting. I hope you can write some articles about how to improve the speed of python, some tips.
January 24th, 2013 at 06:32
Hi Eli,
Excellent article. I’m facing an issue with this implementation though. The ‘until’ statement works fine in the global scope, but it gives an invalid syntax error when put inside a function. Do you know why that happens?
March 15th, 2013 at 21:26
Hey Adithya,
The syntax error when you put this stuff inside a function relates to the (easy-to-overlook!) fact that you also need to make a change to the symtable pass.
See the symtable_visit_* functions in Python/symtable.c — specifically, symtable_visit_stmt & the
While_kindcase clause. Once you track that down, you should be able to figure out what you need to do in symtable.c forUntil_kind.(Or feel free to ping me & I can elaborate, if Eli doesn’t beat me to it.)
Cheers,
Tom