Python internals: adding a new statement to Python

June 30th, 2010 at 7:18 pm

This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I’m taking a hands-on approach here: I’m going to add an until statement to Python.

All the coding for this article was done against the cutting-edge Py3k branch in the Python Mercurial repository mirror.

The until statement

Some languages, like Ruby, have an until statement, which is the complement to while (until num == 0 is equivalent to while num != 0). In Ruby, I can write:

num = 3
until num == 0 do
  puts num
  num -= 1
end

And it will print:

3
2
1

So, I want to add a similar capability to Python. That is, being able to write:

num = 3
until num == 0:
  print(num)
  num -= 1

A language-advocacy digression

This article doesn’t attempt to suggest the addition of an until statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python’s philosophy of minimalism. All I’m trying to do here, really, is gain some insight into the inner workings of Python.

Modifying the grammar

Python uses a custom parser generator named pgen. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file Grammar/Grammar [1]. This is a simple text file that specifies the grammar of Python.

Two modifications have to be made to the grammar file. The first is to add a definition for the until statement. I found where the while statement was defined (while_stmt), and added until_stmt below [2]:

compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite

Note that I’ve decided to exclude the else clause from my definition of until, just to make it a little bit different (and because frankly I dislike the else clause of loops and don’t think it fits well with the Zen of Python).

The second change is to modify the rule for compound_stmt to include until_stmt, as you can see in the snippet above. It’s right after while_stmt, again.

When you run make after modifying Grammar/Grammar, notice that the pgen program is run to re-generate Include/graminit.h and Python/graminit.c, and then several files get re-compiled.

Modifying the AST generation code

After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are much simpler to work with in subsequent stages of the compilation process.

So, we’re going to visit Parser/Python.asdl which defines the structure of Python’s ASTs and add an AST node for our new until statement, again right below the while:

| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)

If you now run make, notice that before compiling a bunch of files, Parser/asdl_c.py is run to generate C code from the AST definition file. This (like Grammar/Grammar) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since Parser/asdl_c.py is a Python script, this is a kind of bootstrapping – to build Python from scratch, Python already has to be available.

While Parser/asdl_c.py generated the code to manage our newly defined AST node (into the files Include/Python-ast.h and Python/Python-ast.c), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file Python/ast.c. There, a function named ast_for_stmt converts parse tree nodes for statements into AST nodes. Again, guided by our old friend while, we jump right into the big switch for handling compound statements and add a clause for until_stmt:

case while_stmt:
    return ast_for_while_stmt(c, ch);
case until_stmt:
    return ast_for_until_stmt(c, ch);

Now we should implement ast_for_until_stmt. Here it is:

static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
    /* until_stmt: 'until' test ':' suite */
    REQ(n, until_stmt);

    if (NCH(n) == 4) {
        expr_ty expression;
        asdl_seq *suite_seq;

        expression = ast_for_expr(c, CHILD(n, 1));
        if (!expression)
            return NULL;
        suite_seq = ast_for_suite(c, CHILD(n, 3));
        if (!suite_seq)
            return NULL;
        return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
    }

    PyErr_Format(PyExc_SystemError,
                 "wrong number of tokens for 'until' statement: %d",
                 NCH(n));
    return NULL;
}

Again, this was coded while closely looking at the equivalent ast_for_while_stmt, with the difference that for until I’ve decided not to support the else clause. As expected, the AST is created recursively, using other AST creating functions like ast_for_expr for the condition expression and ast_for_suite for the body of the until statement. Finally, a new node named Until is returned.

Note that we access the parse-tree node n using some macros like NCH and CHILD. These are worth understanding – their code is in Include/node.h.

Digression: AST composition

I chose to create a new type of AST for the until statement, but actually this isn’t necessary. I could’ve saved some work and implemented the new functionality using composition of existing AST nodes, since:

until condition:
   # do stuff

Is functionally equivalent to:

while not condition:
  # do stuff

Instead of creating the Until node in ast_for_until_stmt, I could have created a Not node with an While node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped.

Compiling ASTs into bytecode

The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article.

The code we will look at next is Python/compile.c. Following the lead of while, we find the function compiler_visit_stmt, which is responsible for compiling statements into bytecode. We add a clause for Until:

case While_kind:
    return compiler_while(c, s);
case Until_kind:
    return compiler_until(c, s);

If you wonder what Until_kind is, it’s a constant (actually a value of the _stmt_kind enumeration) automatically generated from the AST definition file into Include/Python-ast.h. Anyway, we call compiler_until which, of course, still doesn’t exist. I’ll get to it an a moment.

If you’re curious like me, you’ll notice that compiler_visit_stmt is peculiar. No amount of grep-ping the source tree reveals where it is called. When this is the case, only one option remains – C macro-fu. Indeed, a short investigation leads us to the VISIT macro defined in Python/compile.c:

#define VISIT(C, TYPE, V) {\
    if (!compiler_visit_ ## TYPE((C), (V))) \
        return 0; \

It’s used to invoke compiler_visit_stmt in compiler_body. Back to our business, however…

As promised, here’s compiler_until:

static int
compiler_until(struct compiler *c, stmt_ty s)
{
    basicblock *loop, *end, *anchor = NULL;
    int constant = expr_constant(s->v.Until.test);

    if (constant == 1) {
        return 1;
    }
    loop = compiler_new_block(c);
    end = compiler_new_block(c);
    if (constant == -1) {
        anchor = compiler_new_block(c);
        if (anchor == NULL)
            return 0;
    }
    if (loop == NULL || end == NULL)
        return 0;

    ADDOP_JREL(c, SETUP_LOOP, end);
    compiler_use_next_block(c, loop);
    if (!compiler_push_fblock(c, LOOP, loop))
        return 0;
    if (constant == -1) {
        VISIT(c, expr, s->v.Until.test);
        ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
    }
    VISIT_SEQ(c, stmt, s->v.Until.body);
    ADDOP_JABS(c, JUMP_ABSOLUTE, loop);

    if (constant == -1) {
        compiler_use_next_block(c, anchor);
        ADDOP(c, POP_BLOCK);
    }
    compiler_pop_fblock(c, LOOP, loop);
    compiler_use_next_block(c, end);

    return 1;
}

I have a confession to make: this code wasn’t written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin compiler_while function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the dis module, which has a list of Python bytecodes with descriptions, it’s possible to understand what’s going on.

That’s it, we’re done… Aren’t we?

After making all the changes and running make, we can run the newly compiled Python and try our new until statement:

>>> until num == 0:
...   print(num)
...   num -= 1
...
3
2
1

Voila, it works! Let’s see the bytecode created for the new statement by using the dis module as follows:

import dis

def myfoo(num):
    until num == 0:
        print(num)
        num -= 1

dis.dis(myfoo)

Here’s the result:

4           0 SETUP_LOOP              36 (to 39)
      >>    3 LOAD_FAST                0 (num)
            6 LOAD_CONST               1 (0)
            9 COMPARE_OP               2 (==)
           12 POP_JUMP_IF_TRUE        38

5          15 LOAD_NAME                0 (print)
           18 LOAD_FAST                0 (num)
           21 CALL_FUNCTION            1
           24 POP_TOP

6          25 LOAD_FAST                0 (num)
           28 LOAD_CONST               2 (1)
           31 INPLACE_SUBTRACT
           32 STORE_FAST               0 (num)
           35 JUMP_ABSOLUTE            3
      >>   38 POP_BLOCK
      >>   39 LOAD_CONST               0 (None)
           42 RETURN_VALUE

The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for until. If the jump isn’t executed, the loop body keeps running until it jumps back to the condition at operation 35.

Feeling good about my change, I then tried running the function (executing myfoo(3)) instead of showing its bytecode. The result was less than encouraging:

Traceback (most recent call last):
  File "zy.py", line 9, in <module>
    myfoo(3)
  File "zy.py", line 5, in myfoo
    print(num)
SystemError: no locals when loading 'print'

Whoa… this can’t be good. So what went wrong?

The case of the missing symbol table

One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to PySymtable_Build in PyAST_Compile calls into the symbol table module (Python/symtable.c), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope.

To fix the problem, we have to modify the symtable_visit_stmt function in Python/symtable.c, adding code for handling until statements, after the similar code for while statements [3]:

case While_kind:
    VISIT(st, expr, s->v.While.test);
    VISIT_SEQ(st, stmt, s->v.While.body);
    if (s->v.While.orelse)
        VISIT_SEQ(st, stmt, s->v.While.orelse);
    break;
case Until_kind:
    VISIT(st, expr, s->v.Until.test);
    VISIT_SEQ(st, stmt, s->v.Until.body);
    break;

And now we really are done. Compiling the source after this change makes the execution of myfoo(3) work as expected.

Conclusion

In this article I’ve demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn’t difficult to implement, because I used a similar and existing statement as a guideline.

The Python compiler is a sophisticated chunk of software, and I don’t claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler’s principles and source code. It will serve as a base for future articles that will get deeper into the compiler.

References

I used a few excellent references for the construction of this article. Here they are, in no particular order:

  • PEP 339: Design of the CPython compiler – probably the most important and comprehensive piece of official documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python.
  • "Python Compiler Internals" – an article by Thomas Lee
  • "Python: Design and Implementation" – a presentation by Guido van Rossum
  • Python (2.5) Virtual Machine, A guided tour – a presentation by Peter Tröger
http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1] From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run configure and make to build Python.
[2] This demonstrates a common technique I use when modifying source code I’m not familiar with: work by similarity. This principle won’t solve all your problems, but it can definitely ease the process. Since everything that has to be done for while also has to be done for until, it serves as a pretty good guideline.
[3] By the way, without this code there’s a compiler warning for Python/symtable.c. The compiler notices that the Until_kind enumeration value isn’t handled in the switch statement of symtable_visit_stmt and complains. It’s always important to check for compiler warnings!

Related posts:

  1. Python internals: Working with Python ASTs
  2. Adventures in parsing C: ASTs for switch statements
  3. Python internals: Symbol tables, part 1
  4. Python internals: how callables work
  5. Python internals: Symbol tables, part 2

37 Responses to “Python internals: adding a new statement to Python”

  1. Brian K. JonesNo Gravatar Says:

    Fantastic. Well done. I hope more people with experience in these kinds of things will use this post as a template for writing more posts that expose more of the Python internals. Maybe by the end of the year I’ll actually feel confident about becoming a contributor myself!

    Thanks for this.

  2. Steve HoldenNo Gravatar Says:

    This is an excellent piece of work, Eli, and I hope you will consent to its becoming a part of the official Python documentation (along with other similarly worthy things you may write). As Brian has already indicated it’s this kind of low-level practical detail that helps people realise you don’t need to be a “software superman” to help make Python better (and I appreciate your acknowledgment that an “until” statement would be anti-pythonic).

    Now if you could find a suitable syntax for the loop-and-a-half http://www.cs.duke.edu/~ola/patterns/plopd/loops.html#loop-and-a-half that *would* be worth implementing!

  3. elibenNo Gravatar Says:

    @Brian,

    Glad you like it, thanks for the feedback.

    @Steve,

    Of course I consent – any idea where in the documentation such material could be incorporated?

    By the way, that’s a great link on loop patterns – I’m adding it to my to-read list :-)

  4. Jack DiederichNo Gravatar Says:

    PEP 306: How to Change Python’s Grammar
    http://python.org/dev/peps/pep-0306/

    It is usually out of date, but still handy.

  5. André RobergeNo Gravatar Says:

    This is one of the most interesting and informative posts I have read in a long time. Well done.

  6. YerkoNo Gravatar Says:

    This is against the Zen of Python
    “There should be one– and preferably ONLY ONE –obvious way to do it.”
    while not … is ok, and “until” stament is redundant

    ruby sucks, so dont try to copy that xD

    despite that, good work :)

  7. VinayNo Gravatar Says:

    Fantastic !!

  8. joeuserNo Gravatar Says:

    Just a short question, is this specific to Python 3, or does it work with any version ?
    (did they make changes to the scanner/parser/etc – combo ?)

  9. RDRushNo Gravatar Says:

    You are a very sick individual — good job!

    I have been slightly touched by the grammar of Ruby and have always been attracted to Python. This article is an articulation that has been missing in my existence with these language sets for some time. I had actually thought about how excellent a language would be that encapsulated the essence of both and here you go…

    Thanks for your time and effort — you really helped bury a question that I, as well as many others, have had lingering for some time due to our ignorance to these topics.

    Compiler development is a serious endeavor and one you atomized quite efficiently, especially where byte code is concerned.

    Do-while; if-else, OOP/OOD style where ‘until’ is interpreter style giving Python the best of both worlds. Use the interpreted for straight through and climb into objects as needed in sidebar fashion. This type of development is very inspiring…

  10. HenriqueNo Gravatar Says:

    Interesting, not something I would take time to try doing. Gives some insight on how the grammar is defined and how VM works.

    Now, what about you trying do the same (bend the grammar), but this time from Python itself, in runtime. Is that possible? If so, how far can you stretch it? ;)

  11. OzNo Gravatar Says:

    Excellent and interesting article. Thank you.

    Similar article about Common Lisp would be short:

    (defmacro until (condition &body body)
      `(loop until ,condition do (progn ,@body)))

    Or more low-level:

    (defmacro until (condition &body body)
      (let ((start (gensym))
            (end (gensym)))
        `(tagbody
            ,start
            (if ,condition
                (go ,end)
                (progn ,@body (go ,start)))
            ,end)))

    And now there is:

    (until condition
      ;; do stuff
      )
  12. JSC42No Gravatar Says:

    Back in the late 1970s, I wrote an n+1/2 loop construct (c.f. Steve Holden’s link to a loop-and-a-half construct) for a rationalised front-end preprocessor for a BASIC interpretter. In those days, BASICs didn’t have any loops except for FOR. The syntax that I used was:
    LOOP
    {stmt block}
    [ EXITIF {condition}
    {stmt block}
    ] ...
    ENDLOOP

    Note: you can have multiple EXITIF
    This could emulate all common types of loop
    e.g. (using BASIC syntax)
    FOR cv = start TO end STEP step
    {stmt block}
    NEXT

    would become
    cv = start
    LOOP
    EXITIF cv > end
    {stmt block}
    ENDLOOP

    a ‘while’ loop (not in the target language) would be
    LOOP
    EXITIF NOT condition
    {stmt block}
    ENDLOOP

    (I later added an EXITIFNOT variant)

    an ‘until’ loop (not in the target language) would be
    LOOP
    EXITIF condition
    {stmt block}
    ENDLOOP

    a ‘do-while’ loop (execute at least once) would be
    LOOP
    {stmt block}
    EXITIFNOT {condition}
    ENDLOOP

    The reason that I started on n+1/2 loops was because I was working on the PASCAL P-code compiler which used the following constructs in many places (I have forgotten PASCAL syntax but the following is close enough):
    do
    {stmt block 1};
    test = {condition};
    if test
    then
    {stmt block 2}
    until test

    which uses an intermediate value for the termination condition and checks it twice. Using ‘my’ syntax, this would be
    LOOP
    {stmt block 1}
    EXITIFNOT {condition}
    {stmt block 2}
    ENDLOOP

    When I started paid work in 1980, I was writing in Fortran IV (another unstructured language), so I wrote a rationaliser for that as well using the same syntax.

  13. JSC42No Gravatar Says:

    OOPS error in my post

    FOR cv = start TO end STEP step
    {stmt block}
    NEXT

    should become
    cv = start
    LOOP
    EXITIF cv > end
    {stmt block}
    cv = cv + step
    ENDLOOP

  14. Tiklu GangulyNo Gravatar Says:

    Fantastic post. Will be trying this at home today :)

  15. elibenNo Gravatar Says:

    @Yerko & RDRush,

    Just to point out once again – I’m not really proposing adding until to Python. This is just an exercise, and until is the first thing that sprang to mind.

  16. lorgNo Gravatar Says:

    Excellent article!
    I especially like how it can be used as a future reference “howto” document for adding statements.

    Keep up the good work!

  17. Lennart RegebroNo Gravatar Says:

    Minimalism is good, but the amount of times I need an until loop, and end up with the while True: hack is not insignificant. I really wish Python would have an “loop and a half” statement, it’s IMO the major misfeature of Python. And I propose that “until” is an excellent name and syntax for that statement. :-)

  18. Jack DiederichNo Gravatar Says:

    @joeuser Most patches to 3.x grammar/bytecode loop will apply cleanly to 2.x. The big changes to Python’s AST (Abstract Syntax Tree) and going fully 64-bit happened for the 2.5 release. Python 3.x is a fork of that. I had to rewrite my class decorator patch for 2.5 and I was finally allowed to check it in for 2.6 (yay!).

    Of course if you are dealing with str/unicode the differences between 2.x and 3.x can be large. This doesn’t impact the grammar/bytecode much but it does effect many modules.

  19. namerequiredNo Gravatar Says:

    until is ultimately confusing

    if you want an inversed do…while inverse the abort condition. not the statement.

  20. Steve HoldenNo Gravatar Says:

    I have just suggested on python-dev that it be put under python.org/dev/ somewhere. I think that’s where it really belongs – I have a larger concept of “docs” than some, I guess – didn’t mean docs.python.org. Sorry. Let’s see what the devs say.

  21. Nick CoghlanNo Gravatar Says:

    I was going to mention PEP 306 as well, but Jack beat me to it.

    @Lennart: if you can find a solution to PEP 315 that is genuinely superior to the fully general loops already provided by “while True:” you will likely find plenty of supporters. Perceptions differ as to how much of a hack people consider the current solution to be though, so be prepared for an argument if you volunteer to take another crack at it :)

  22. Mark DickinsonNo Gravatar Says:

    Very nice. Don’t forget to update the parser module too, though! (Though it’s not clear if anyone cares about the parser module any more; AST is much nicer to work with.)

  23. Lennart RegebroNo Gravatar Says:

    @Nick Coghlan: Well, in my opinion, I *have* found a solution that is genuinely superior. Un “until” statement, which always runs the block once before the test. :) If that is not superior than “while True:” then why even have an expression in the while statement? You can replace
    while foo == bar:
    do_something()

    with
    while True:
    if foo != bar:
    break
    do_something()

    as well. Is that superior?

  24. Abafei SoftwareNo Gravatar Says:

    Yasher Koach! Very fascinating read.

  25. Abafei SoftwareNo Gravatar Says:

    @Henrique See http://code.activestate.com/recipes/576944-the-goto-decorator/
    There a real goto statement is emulated in pure Python.

  26. KarthikeyanNo Gravatar Says:

    Hi Eli,

    Thanks for the excellent article. I tried to follow the same procedure to build Python from source on Windows. Things worked out of the box without much changes.

    However, I would like to point out a small change.

    Inside the body of the following function,

    static int
    compiler_until(struct compiler *c, stmt_ty s)

    the second line is currently
    int constant = expr_constant(s->v.Until.test);

    it must instead be
    int constant = expr_constant(c, s->v.Until.test); # this function needs the reference to the compiler as well

    I’ve been following your blog all along and I am always delighted by your efforts to explain even the most intricate technical details.

    Thanks a lot for sharing your experiences and Keep up the good work !!

  27. elibenNo Gravatar Says:

    Karthikeyan,

    Thanks for the correction.

  28. Tom LeeNo Gravatar Says:

    Cheers for the attribution, Eli! Here’s a link to my original paper on the topic from an earlier Australian OSDC in case folks are interested:

    http://tomlee.co/wp-content/uploads/2008/12/python-language-internals.pdf

  29. HeliNo Gravatar Says:

    Hello this is a nice article, im from mexico and i need to translate some of the native statements of python to spanish for a project at school. Im trying this method but fails “SystemError: invalid node 339 for PyAST_FromNode”, i don’t know if is there an easy way to do this, can you please help me??

  30. elibenNo Gravatar Says:

    Heli,

    There’s no “easy” way to do this without modifying the Python lexer & parser. However, keep in mind that you probably don’t need to go as deep as this article went. Just change the lexer & parser to recognize “mientras” instead of “while”, but keep all the rest the same (i.e. the node internally can still be “while” for CPython).

  31. ETNo Gravatar Says:

    Fantastic!!!!! CPython is so cool~

  32. baiyangNo Gravatar Says:

    It helps me look at insight of python. It’s very interesting. I hope you can write some articles about how to improve the speed of python, some tips.

  33. AdithyaNo Gravatar Says:

    Hi Eli,

    Excellent article. I’m facing an issue with this implementation though. The ‘until’ statement works fine in the global scope, but it gives an invalid syntax error when put inside a function. Do you know why that happens?

  34. Tom LeeNo Gravatar Says:

    Hey Adithya,

    The syntax error when you put this stuff inside a function relates to the (easy-to-overlook!) fact that you also need to make a change to the symtable pass.

    See the symtable_visit_* functions in Python/symtable.c — specifically, symtable_visit_stmt & the While_kind case clause. Once you track that down, you should be able to figure out what you need to do in symtable.c for Until_kind. :)

    (Or feel free to ping me & I can elaborate, if Eli doesn’t beat me to it.)

    Cheers,
    Tom

  35. skateboyNo Gravatar Says:

    Very interesting article.
    However,I have tried several times “make” with

    ①”default” branch(which is 3k and committed at 16/09/2013) changeset: 85732:6b747ad4a99a
    ②”2.7″ branchchangeset: 85733:3d46ef0c62c5

    but “NameError: name ‘until’ is not defined” appears every time.
    Am I missing anything?

    I intentionally change Grammer/Grammer to exclude “else” from while(like while_stmt: ‘while’ test ‘:’ suite) but I still could use while ~~ else ~~ after “make”.However,if I convert “while” and “until”,then “make” won’t make it.

  36. elibenNo Gravatar Says:

    @skateboy,

    Note what section “A language-advocacy digression” says – I didn’t really add this to upstream Python; these changes were just made locally for the sake of the article.

  37. skateboyNo Gravatar Says:

    @eliben,
    No,I totally understand your concept of this unique article.

    I meant that I tried to follow your article and added sentences in Grammer/Grammer,Python/ast.c..etc accordingly,and “make”,and executed generated “./python”,but the interactive console said that “‘until’ is not defined”.

    Maybe I have to do some more works in newer python’s?
    But referring to the official doc,it seems that above procedure is enough for adding statement as it does not change the token structures. http://docs.python.org/devguide/grammar.html

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)