Update (2016-03-20): Starting with LLVM 3.7, the instructions shown here for installing llvmlite from source may not work. See the main page of my PyKaleidoscope repository for up-to-date details.

A while ago I wrote a short post about employing llvmpy to invoke LLVM from within Python. Today I want to demonstrate an alternative technique, using a new library called llvmlite. llvmlite was created last year by the developers of Numba (a JIT compiler for scientific Python), and just recently replaced llvmpy as their bridge to LLVM. Since the Numba devs were also the most active maintainers of llvmpy in the past couple of years, I think llvmlite is definitily worth paying attention to.

One of the reasons the Numba devs decided to ditch llvmpy in favor of a new approach is the biggest issue heavy users of LLVM as a library have - its incredible rate of API-breaking change. The LLVM C++ API is notoriously unstable and will remain this way for the foreseeable future. This leaves library users and all kinds of language bindings (like llvmpy) in a constant chase after the latest LLVM release, if they want to benefit from improved optimizations, new targets and so on. The Numba developers felt this while maintaining llvmpy and decided on an alternative approach that will be easier to keep stable going forward. In addition, llvmpy's architecture made it slow for Numba's users - llvmlite fixes this as well.

The main idea is - use the LLVM C API as much as possible. Unlike the core C++ API, the C API is meant for facing external users, and is kept relatively stable. This is what llvmlite does, but with one twist. Since building the IR using repeated FFI calls to LLVM proved to be slow and error-prone in llvmpy, llvmlite re-implemented the IR builder in pure Python. Once the IR is built, its textual representation is passed into the LLVM IR parser. This also reduces the "API surface" llvmlite uses, since the textual representation of LLVM IR is one of the more stable things in LLVM.

I found llvmlite pretty easy to build and use on Linux (though it's portable to OS X and Windows as well). Since there's not much documentation yet, I thought this post may be useful for others who wish to get started.

After cloning the llvmlite repo, I downloaded the binary release of LLVM 3.5 - pre-built binaries for Ubuntu 14.04 mean there's no need to compile LLVM itself. Note that I didn't install LLVM, just downloaded and untarred it.

Next, I had to install the libedit-dev package with apt-get, since it's required while building llvmlite. Depending on what you have lying around on your machine, you may need to install some additional -dev packages.

Now, time to build llvmlite. I chose to use Python 3.4, but any modern version should work (for versions below 3.4 llvmlite currently requires the enum34 package). LLVM has a great tool named llvm-config in its binary image, and the Makefile in llvmlite uses it, which means building llvmlite with any version of LLVM I want is just a simple matter of running:

$ LLVM_CONFIG=<path/to/llvm-config> python3.4 setup.py build

This compiles the C/C++ parts of llvmlite and links them statically to LLVM. Now, you're ready to use llvmlite. Again, I prefer not to install things unless I really have to, so the following script can be run with:

$ PYTHONPATH=$PYTHONPATH:<path/to/llvmlite> python3.4 basic_sum.py

Replace the path with your own, or just install llvmlite into some virtualenv.

And the sample code does the same as the previous post - creates a function that adds two numbers, and JITs it:

from ctypes import CFUNCTYPE, c_int
import sys

import llvmlite.ir as ll
import llvmlite.binding as llvm


# Create a new module with a function implementing this:
# int sum(int a, int b) {
#   return a + b;
# }
module = ll.Module()
func_ty = ll.FunctionType(ll.IntType(32), [ll.IntType(32), ll.IntType(32)])
func = ll.Function(module, func_ty, name='sum')

func.args[0].name = 'a'
func.args[1].name = 'b'

bb_entry = func.append_basic_block('entry')
irbuilder = ll.IRBuilder(bb_entry)
tmp = irbuilder.add(func.args[0], func.args[1])
ret = irbuilder.ret(tmp)

print('=== LLVM IR')

# Convert textual LLVM IR into in-memory representation.
llvm_module = llvm.parse_assembly(str(module))

tm = llvm.Target.from_default_triple().create_target_machine()

# Compile the module to machine code using MCJIT
with llvm.create_mcjit_compiler(llvm_module, tm) as ee:
    print('=== Assembly')

    # Obtain a pointer to the compiled 'sum' - it's the address of its JITed
    # code in memory.
    cfptr = ee.get_pointer_to_function(llvm_module.get_function('sum'))

    # To convert an address to an actual callable thing we have to use
    # CFUNCTYPE, and specify the arguments & return type.
    cfunc = CFUNCTYPE(c_int, c_int, c_int)(cfptr)

    # Now 'cfunc' is an actual callable we can invoke
    res = cfunc(17, 42)
    print('The result is', res)

This should print the LLVM IR for the function we built, its assembly as produced by LLVM's JIT compiler, and the result 59.

Compared to llvmpy, llvmlite now seems like the future, mostly due to the maintenance situation. llvmpy is only known to work with LLVM up to 3.3, which is already a year and half old by now. Having just been kicked out of Numba, there's a good chance it will fall further behind. llvmlite, on the other hand, is very actively developed and keeps track with the latest stable LLVM release. Also, it's architectured in a way that should make it significantly easier to keep up with LLVM in the future. Unfortunately, as far as uses outside of Numba go, llvmlite is still rough around the edges, especially w.r.t. documentation and examples. But the llvmlite developers appear keen on making it useful in a more general setting and not just for Numba, so that's a good sign.


comments powered by Disqus