Building and using llvmlite - a basic example



A while ago I wrote a short post about employing llvmpy to invoke LLVM from within Python. Today I want to demonstrate an alternative technique, using a new library called llvmlite. llvmlite was created last year by the developers of Numba (a JIT compiler for scientific Python), and just recently replaced llvmpy as their bridge to LLVM. Since the Numba devs were also the most active maintainers of llvmpy in the past couple of years, I think llvmlite is definitily worth paying attention to.

One of the reasons the Numba devs decided to ditch llvmpy in favor of a new approach is the biggest issue heavy users of LLVM as a library have - its incredible rate of API-breaking change. The LLVM C++ API is notoriously unstable and will remain this way for the foreseeable future. This leaves library users and all kinds of language bindings (like llvmpy) in a constant chase after the latest LLVM release, if they want to benefit from improved optimizations, new targets and so on. The Numba developers felt this while maintaining llvmpy and decided on an alternative approach that will be easier to keep stable going forward. In addition, llvmpy's architecture made it slow for Numba's users - llvmlite fixes this as well.

The main idea is - use the LLVM C API as much as possible. Unlike the core C++ API, the C API is meant for facing external users, and is kept relatively stable. This is what llvmlite does, but with one twist. Since building the IR using repeated FFI calls to LLVM proved to be slow and error-prone in llvmpy, llvmlite re-implemented the IR builder in pure Python. Once the IR is built, its textual representation is passed into the LLVM IR parser. This also reduces the "API surface" llvmlite uses, since the textual representation of LLVM IR is one of the more stable things in LLVM.

I found llvmlite pretty easy to build and use on Linux (though it's portable to OS X and Windows as well). Since there's not much documentation yet, I thought this post may be useful for others who wish to get started.

After cloning the llvmlite repo, I downloaded the binary release of LLVM 3.5 - pre-built binaries for Ubuntu 14.04 mean there's no need to compile LLVM itself. Note that I didn't install LLVM, just downloaded and untarred it.

Next, I had to install the libedit-dev package with apt-get, since it's required while building llvmlite. Depending on what you have lying around on your machine, you may need to install some additional -dev packages.

Now, time to build llvmlite. I chose to use Python 3.4, but any modern version should work (for versions below 3.4 llvmlite currently requires the enum34 package). LLVM has a great tool named llvm-config in its binary image, and the Makefile in llvmlite uses it, which means building llvmlite with any version of LLVM I want is just a simple matter of running:

$ LLVM_CONFIG=<path/to/llvm-config> python3.4 setup.py build

This compiles the C/C++ parts of llvmlite and links them statically to LLVM. Now, you're ready to use llvmlite. Again, I prefer not to install things unless I really have to, so the following script can be run with:

$ PYTHONPATH=$PYTHONPATH:<path/to/llvmlite> python3.4 basic_sum.py

Replace the path with your own, or just install llvmlite into some virtualenv.

And the sample code does the same as the previous post - creates a function that adds two numbers, and JITs it:

from ctypes import CFUNCTYPE, c_int
import sys

import llvmlite.ir as ll
import llvmlite.binding as llvm

llvm.initialize()
llvm.initialize_native_target()
llvm.initialize_native_asmprinter()

# Create a new module with a function implementing this:
#
# int sum(int a, int b) {
#   return a + b;
# }
module = ll.Module()
func_ty = ll.FunctionType(ll.IntType(32), [ll.IntType(32), ll.IntType(32)])
func = ll.Function(module, func_ty, name='sum')

func.args[0].name = 'a'
func.args[1].name = 'b'

bb_entry = func.append_basic_block('entry')
irbuilder = ll.IRBuilder(bb_entry)
tmp = irbuilder.add(func.args[0], func.args[1])
ret = irbuilder.ret(tmp)

print('=== LLVM IR')
print(module)

# Convert textual LLVM IR into in-memory representation.
llvm_module = llvm.parse_assembly(str(module))

tm = llvm.Target.from_default_triple().create_target_machine()

# Compile the module to machine code using MCJIT
with llvm.create_mcjit_compiler(llvm_module, tm) as ee:
    ee.finalize_object()
    print('=== Assembly')
    print(tm.emit_assembly(llvm_module))

    # Obtain a pointer to the compiled 'sum' - it's the address of its JITed
    # code in memory.
    cfptr = ee.get_pointer_to_function(llvm_module.get_function('sum'))

    # To convert an address to an actual callable thing we have to use
    # CFUNCTYPE, and specify the arguments & return type.
    cfunc = CFUNCTYPE(c_int, c_int, c_int)(cfptr)

    # Now 'cfunc' is an actual callable we can invoke
    res = cfunc(17, 42)
    print('The result is', res)

This should print the LLVM IR for the function we built, its assembly as produced by LLVM's JIT compiler, and the result 59.

Compared to llvmpy, llvmlite now seems like the future, mostly due to the maintenance situation. llvmpy is only known to work with LLVM up to 3.3, which is already a year and half old by now. Having just been kicked out of Numba, there's a good chance it will fall further behind. llvmlite, on the other hand, is very actively developed and keeps track with the latest stable LLVM release. Also, it's architectured in a way that should make it significantly easier to keep up with LLVM in the future. Unfortunately, as far as uses outside of Numba go, llvmlite is still rough around the edges, especially w.r.t. documentation and examples. But the llvmlite developers appear keen on making it useful in a more general setting and not just for Numba, so that's a good sign.


The scope of index variables in Python's for loops



I'll start with a quiz. What does this function do?

def foo(lst):
    a = 0
    for i in lst:
        a += i
    b = 1
    for t in lst:
        b *= i
    return a, b

If you think "computes the sum and product of the items in lst", don't feel too bad about yourself. The bug here is often tricky to spot. If you did see it, well done - but buried in mountains of real code, and when you don't know it's a quiz, discovering the bug is significantly more difficult.

The bug here is due to using i instead of t in the body of the second for loop. But wait, how does this even work? Shouldn't i be invisible outside of the first loop? [1] Well, no. In fact, Python formally acknowledges that the names defined as for loop targets (a more formally rigorous name for "index variables") leak into the enclosing function scope. So this:

for i in [1, 2, 3]:
    pass
print(i)

Is valid and prints 3, by design. In this writeup I want to explore why this is so, why it's unlikely to change, and also use it as a tracer bullet to dig into some interesting parts of the CPython compiler.

And by the way, if you're not convinced this behavior can cause real problems, consider this snippet:

def foo():
    lst = []
    for i in range(4):
        lst.append(lambda: i)
    print([f() for f in lst])

If you'd expect this to print [0, 1, 2, 3], no such luck. This code will, instead, emit [3, 3, 3, 3], because there's just a single i in the scope of foo, and this is what all the lambdas capture.

The official word

The Python reference documentation explicitly documents this behavior in the section on for loops:

The for-loop makes assignments to the variables(s) in the target list. [...] Names in the target list are not deleted when the loop is finished, but if the sequence is empty, they will not have been assigned to at all by the loop.

Note the last sentence - let's try it:

for i in []:
    pass
print(i)

Indeed, a NameError is raised. Later on, we'll see that this is a natural outcome of the way the Python VM executes its bytecode.

Why this is so

I actually asked Guido van Rossum about this behavior and he was gracious enough to reply with some historical background (thanks Guido!). The motivation is keeping Python's simple approach to names and scopes without resorting to hacks (such as deleting all the values defined in the loop after it's done - think about the complications with exceptions, etc.) or more complex scoping rules.

In Python, the scoping rules are fairly simple and elegant: a block is either a module, a function body or a class body. Within a function body, names are visible from the point of their definition to the end of the block (including nested blocks such as nested functions). That's for local names, of course; global names (and other nonlocal names) have slightly different rules, but that's not pertinent to our discussion.

The important point here is: the innermost possible scope is a function body. Not a for loop body. Not a with block body. Python does not have nested lexical scopes below the level of a function, unlike some other languages (C and its progeny, for example).

So if you just go about implementing Python, this behavior is what you'll likely to end with. Here's another enlightening snippet:

for i in range(4):
    d = i * 2
print(d)

Would it surprise you to find out that d is visible and accessible after the for loop is finished? No, this is just the way Python works. So why would the index variable be treated any differently?

By the way, the index variables of list comprehensions are also leaked to the enclosing scope. Or, to be precise, were leaked, before Python 3 came along.

Python 3 fixed the leakage from list comprehensions, along with other breaking changes. Make no mistake, changing such behavior is a major breakage in backwards compatibility. This is why I think the current behavior stuck and won't be changed.

Moreover, many folks still find this a useful feature of Python. Consider:

for i, item in enumerate(somegenerator()):
    dostuffwith(i, item)
print('The loop executed {0} times!'.format(i+1))

If you have no idea how many items somegenerator actually returned, this is a pretty succinct way to know. Otherwise you'd have to keep a separate counter.

Here's another example:

for i in somegenerator():
    if isinteresing(i):
        break
dostuffwith(i)

Which is a useful pattern for finding things in a loop and using them afterwards [2].

There are other uses people came up with over the years that justify keeping this behavior in place. It's hard enough to instill breaking changes for features the core developers deem detrimental and harmful. When the feature is argued by many to be useful, and moreover is used in a huge bunch of code in the real world, the chances of removing it are zero.

Under the hood

Now the fun part. Let's see how the Python compiler and VM conspire to make this behavior possible. In this particular case, I think the most lucid way to present things is going backwards from the bytecode. I hope this may also serve as an interesting example on how to go about digging in Python's internals [3] in order to find stuff out (it's so much fun, seriously!)

Let's take a part of the function presented at the start of this article and disassemble it:

def foo(lst):
    a = 0
    for i in lst:
        a += i
    return a

The resulting bytecode is:

 0 LOAD_CONST               1 (0)
 3 STORE_FAST               1 (a)

 6 SETUP_LOOP              24 (to 33)
 9 LOAD_FAST                0 (lst)
12 GET_ITER
13 FOR_ITER                16 (to 32)
16 STORE_FAST               2 (i)

19 LOAD_FAST                1 (a)
22 LOAD_FAST                2 (i)
25 INPLACE_ADD
26 STORE_FAST               1 (a)
29 JUMP_ABSOLUTE           13
32 POP_BLOCK

33 LOAD_FAST                1 (a)
36 RETURN_VALUE

As a reminder, LOAD_FAST and STORE_FAST are the opcodes Python uses to access names that are only used within a function. Since the Python compiler knows statically (at compile-time) how many such names exist in each function, they can be accessed with static array offsets as opposed to a hash table, which makes access significanly faster (hence the _FAST suffix). But I digress. What's really important here is that a and i are treated identically. They are both fetched with LOAD_FAST and modified with STORE_FAST. There is absolutely no reason to assume that their visibility is in any way different [4].

So how did this come to be? Somehow, the compiler figured that i is just another local name within foo. This logic lives in the symbol table code, when the compiler walks over the AST to create a control-flow graph from which bytecode is later emitted; there are more details about this process in my article about symbol tables - so I'll just stick to the essentials here.

The symtable code doesn't treat for statements very specially. In symtable_visit_stmt we have:

case For_kind:
    VISIT(st, expr, s->v.For.target);
    VISIT(st, expr, s->v.For.iter);
    VISIT_SEQ(st, stmt, s->v.For.body);
    if (s->v.For.orelse)
        VISIT_SEQ(st, stmt, s->v.For.orelse);
    break;

The loop target is visited as any other expression. Since this code visits the AST, it's worthwhile to dump it to see how the node for the for statement looks:

For(target=Name(id='i', ctx=Store()),
    iter=Name(id='lst', ctx=Load()),
    body=[AugAssign(target=Name(id='a', ctx=Store()),
                    op=Add(),
                    value=Name(id='i', ctx=Load()))],
    orelse=[])

So i lives in a Name node. These are handled in the symbol table code by the following clause in symtable_visit_expr:

case Name_kind:
    if (!symtable_add_def(st, e->v.Name.id,
                          e->v.Name.ctx == Load ? USE : DEF_LOCAL))
        VISIT_QUIT(st, 0);
    /* ... */

Since the name i is clearly tagged with DEF_LOCAL (because of the *_FAST opcodes emitted to access it, but this is also easy to observe if the symbol table is dumped using the symtable module), the code above evidently calls symtable_add_def with DEF_LOCAL as the third argument. This is the right time to glance at the AST above and notice the ctx=Store part of the Name node of i. So it's the AST that already comes in carrying the information that i is stored to in the target part of the For node. Let's see how that comes to be.

The AST-building part of the compiler goes over the parse tree (which is a fairly low-level hierarchical representation of the source code - some background is available here) and, among other things, sets the expr_context attributes on some nodes, most notably Name nodes. Think about it this way, in the following statement:

foo = bar + 1

Both foo and bar are going to end up in Name nodes. But while bar is only being loaded from, foo is actually being stored into in this code. The expr_context attribute is used to distinguish between uses for later consumption by the symbol table code [5].

Back to our for loop targets, though. These are handled in the function that creates an AST for for statements - ast_for_for_stmt. Here are the relevant parts of this function:

static stmt_ty
ast_for_for_stmt(struct compiling *c, const node *n)
{
    asdl_seq *_target, *seq = NULL, *suite_seq;
    expr_ty expression;
    expr_ty target, first;

    /* ... */

    node_target = CHILD(n, 1);
    _target = ast_for_exprlist(c, node_target, Store);
    if (!_target)
        return NULL;
    /* Check the # of children rather than the length of _target, since
       for x, in ... has 1 element in _target, but still requires a Tuple. */
    first = (expr_ty)asdl_seq_GET(_target, 0);
    if (NCH(node_target) == 1)
        target = first;
    else
        target = Tuple(_target, Store, first->lineno, first->col_offset, c->c_arena);

    /* ... */

    return For(target, expression, suite_seq, seq, LINENO(n), n->n_col_offset,
               c->c_arena);
}

The Store context is created in the call to ast_for_exprlist, which creates the node for the target (recall the the for loop target may be a sequence of names for tuple unpacking, not just a single name).

This function is probably the most important part in the process of explaining why for loop targets are treated similarly to other names within the loop. After this tagging happens in the AST, the code for handling such names in the symbol table and VM is no different from other names.

Wrapping up

This article discusses a particular behavior of Python that may be considered a "gotcha" by some. I hope the article does a decent job of explaining how this behavior flows naturally from the naming and scoping semantics of Python, why it can be useful and hence is unlikely to ever change, and how the internals of the Python compiler make it work under the hood. Thanks for reading!


[1]Here I'm tempted to make a Microsoft Visual C++ 6 joke, but the fact that most readers of this blog in 2015 won't get it is somewhat disturbing (because it reflects my age, not the abilities of my readers).
[2]You could argue that dowithstuff(i) could go into the if right before the break here. But this isn't always convenient. Besides, according to Guido there's a nice separation of concerns here - the loop is used for searching, and only that. What happens with the value after the search is done is not the loop's concern. I think this is a very good point.
[3]As usual for my articles on Python's internals, this is about Python 3. Specifically, I'm looking at the default branch of the Python repository, where work on the next release (3.5) is being done. But for this particular topic, the source code of any release in the 3.x series should do.
[4]Another thing clear from the disassembly is why i remains invisible if the loop doesn't execute. The GET_ITER and FOR_ITER pair of opcodes treat the thing we loop over as an iterator and then call its __next__ method. If that call ends up raising StopIteration, the VM catches it and exits the loop. Only if an actual value is returned does the VM proceed to execute STORE_FAST to i, thus bringing it into existence for subsequent code to refer to.
[5]It's a curious design, which I suspect stems from the desire for relatively clean recursive visitation code in AST consumers such as the symbol table code and CFG generation.

Another look at my programming language arsenal



Around eight years ago, I wrote a post in this blog called My current programming language arsenal, in which I summarized my main toolbox of programming languages at the time, and what I tend to use those languages for. Recently I felt like giving it another go, since so much time has passed, and quite a lot has changed. So in this post I will discuss the current languages I tend to use, as well as review the fate of the languages discussed in that old post.

But a bit of background first. Back in 2006 I wasn't doing a whole lot of programming in my day job. Those were my hardware/embedded engineering days, and writing software was perhaps 20-30% of my role. I was hacking quite a bit on pet projects, though. Since 2010 or so I switched back to a much more software oriented position and these days my day job involves more coding; on the other hand, I now have less time for personal projects.

Eight years is a long time, and my views in general have shifted somewhat; I now feel much less strongly about programming languages (and other "religious" issues in general). I value pragmatism more than shiny new features, and I believe that factors outside the language itself, such as availability of high-quality third-party libraries and a vibrant community matter a lot. I also prefer languages and idioms that make code easier to read, rather than easier to write.

Anyway, let's get going. First I'll start with the list of languages I'm actually using these days. The list is quite short - the following languages cover 99% of the code I've written in the past few years.

C and C++

I still combine these two under a single heading - it just feels more natural to me. Since I no longer write firmware for embedded controllers, the amount of pure-C projects I've touched recently is close to zero. Most of the code I write at work is C++, and if I need low level and high performance for some part of a pet project, I pick C++ as well; some parts occasionally remain in C, such as APIs for easy embedding, bits and pieces of headers and ABIs, and all those things that come as a result of C being the official systems programming glue language. But the bulk of the code is C++.

There's actually just one pure-C project I've been involved with in the last few years, and that's the core of CPython. It's not your uncle's C, though; when you hack on CPython (or implement pure C extensions to Python) you actually write somewhat object-oriented code with manual reference counting.

My feelings about C++ are still mixed, but less so than in the past. As I said above, age and experience have brought a certain pragmatism with them :-) This really deserves a separate post, though.

Another thing I wanted to add is that, in general, I really like the additions that C++11 brought; moreover, C++ seems to be more actively developed than ever before, and future standards promise even larger positive changes. With the ascent of Clang, tooling has also greatly improved. Sure, some complexity is inevitable (perfect forwarding and universal references are not for the faint of heart). But all in all, C++11 is less burdensome and more pleasant to work with than older versions. Things like smart pointers finally being part of the language core a big deal, IMHO, because it forces programmers to standardize on a convention, rather than invent a plethora of ad-hoc solutions other programmers have to decipher.

Python

I listed C and C++ first because that's what pays the bills; but without a trace of doubt, Python is my favorite language, and the tool I'm most likely to reach for when doing something for fun. Python serves a lot of roles in my arsenal. It's a prototyping tool, a scripting language (my rule of thumb is to convert any Bash script that grows longer than 5-6 lines, or grows control flow, into Python), a server-side programming language, a language for numeric and scientific explorations, and much more.

I'm a Python core developer (though I admit I used to be more active in the past - I'm still planning that comeback), and consider myself a diehard Python afficionado in general. Whenever I want to implement something, Python is my first choice and I really try to stick to it unless there's a good reason not to.

That said, I don't think Python is perfect and is the right tool for every job. Obviously there's the performance issue. But also, recently I became intrigued by claims made by programmers I respect about the difficulty to write and maintain very large projects in Python, because of its flexibility and dynamism. I'm not expressing a strong opinion here for now, but it's definitely a topic I keep an eye for.

Javascript

A pragmatic, professional engineer will know when to keep his feelings out of rational decisions. This is very relevant for my "relationship" with Javascript. Do I like the language? No. Do I think it's a well-designed language? Lulz, no.

Do I think it's amazing that there is some language that's so universally standardized and supported for in-browser programming? Do I think it's amazing that any kid these days can pick up any computer with a browser and a text editor and write real programs, even complete games, without installing anything else? Do I think it's amazing that a web application can replace a GUI these days, leading to standardized interfaces based on open platforms, rather than proprietary GUI mega-frameworks? Yes, yes, yes!

I don't get to write much Javascript; close to none at work, a bit for personal projects here and there. But I do like the web. I do think web apps are superior to GUIs (even when just running locally). And I do realize that Javascript is the standardized language to do this, so that's what I use. I don't subscibe to the "compile-my-awesome-language-to-JS" bandwagon; these are fads that come and go. Attempting to dilute this space with a multitude of language kills one of its charms, IMHO. Javascript itself is a good-enough language to get the job done, and there's a stupendous amount of documentation, libraries and community activity around it online. If you have a problem in JS, someone, somewhere has already solved it and documented it.

The language has its warts, for sure; but there's some hope at the end of the tunnel. ECMA standardization seems intent on improving the language in the future, and with backing from companies like Google and Mozilla, we may see improvements sooner rather than later. It seems like JS is not bound by the chains of old Microsoft browsers anymore, which adds to its appeal. Who knows, Javascript may become a likable language one day.

What happened to Perl, Ruby and Lisp

It's curious that three of the four languages featured prominently in the initial post are no longer to be seen. Python is the main culprit here; let me explain.

When I wrote that post, it was at a point where I started getting disheartened with Perl, mainly because of its obfuscability. I started "flirting" with Ruby, and Lisp was constantly in the background as the "perfect, maybe some day" language. But Ruby turned out to be disappointing. The languange never seemed to really catch on outside the Rails community, and it showed. Once I saw Python and played with it a bit, Perl's and Ruby's fate was sealed very quickly. From Python I got the best of both worlds - a clean, powerful language on one hand, and a great community with support for all kinds of programming on the other. This turned out to be one of the best decisions I made, since Python exploded in popularity in the last few years and started capturing even more niches, such as scientific computing.

What about Lisp? This is somewhat more interesting. Lisp has always served the role of the "ideal language in the sky" for me - something I'm perpetually waiting for to become feasible for real-world programming. I was really enjoying myself with Lisp, but also not happy about the dearth of libraries and community. Well, Python made that go away too, because I also really enjoy writing Python.

There's really just one significant thing Lisp has and Python doesn't - macros. While I think macros are immensely cool, with time I also started questioning how features like macros affect programs in the long run. With experience and age came the realization that code readability, both for yourself and for other programmers, may be way more important than the most succinct way to write a program. I won't say I have a final opinion on this, just that I don't miss macros all that much when coding in Python.

None of this is to say I lost my affection for Lisp, of course, just that I see even less chance of using it in a real project in the future.

Looking into the future

So I've discussed the present and the past, but what about the future? What languages and technologies am I currently keeping my eye on and/or plan learning?

There's a couple. First, Go. I was long disappointed by not seeing any language designed for concurrent and parallel programming in the mainstream. Most popular languages have libraries for these things, but I'm talking about core in-language support here. I was thinking of starting to learn Erlang, but then luckily Go came along. I spent a few days tinkering with the language and I enjoy it so far. It has its warts, but overall I like its design. Especially the built-in concurrency primitives. Now I just have to find an interesting pet project to put it to good use...

Another is Rust. I haven't really looked into it in any amount of depth, but I'm definitely keeping an eye here. Rust is super ambitious because it challenges the king of the hill - C++. At this time it's still way too early to judge, and the language isn't even finished yet, but I'll be following with interest. Can a new language unsettle C++ from its throne of the system programming kingdom? Time will tell.