Calling back into Python from llvmlite-JITed code



This post is about a somewhat more interesting and complex use of llvmlite than the basic example presented in my previous article on the subject.

I see compilation as a meta-tool. It lets us build new levels of abstraction and expressiveness within our code. We can use it to build additional languages on top of our host language (common for C, C++ and Java-based systems, less common for Python), to accelerate some parts of our host language (more common in Python), or anything in between.

To fully harness the power of runtime compilation (JITing), however, it's very useful to know how to bridge the gap between the host language and the JITed language; preferably in both directions. As the previous article shows, calling from the host into the JITed language is trivial. In fact, this is what JITing is mostly about. But what about the other direction? This is somewhat more challenging, but leads to interesting uses and additional capabilities.

While the post uses llvmlite for the JITing, I believe it presents general concepts that are relevant for any programming environment.

Callback from JITed code to Python

Let's start with a simple example: we want to be able to invoke some Python function from within JITed code.

from ctypes import c_int64, c_void_p, CFUNCTYPE
import sys

import llvmlite.ir as ir
import llvmlite.binding as llvm

def create_caller(m):
    # define i64 @caller(i64 (i64, i64)* nocapture %f, i64 %i) #0 {
    # entry:
    #   %mul = shl nsw i64 %i, 1
    #   %call = tail call i64 %f(i64 %i, i64 %mul) #1
    #   ret i64 %call
    # }
    i64_ty = ir.IntType(64)

    # The callback function 'caller' accepts is a pointer to FunctionType with
    # the appropriate signature.
    cb_func_ptr_ty = ir.FunctionType(i64_ty, [i64_ty, i64_ty]).as_pointer()
    caller_func_ty = ir.FunctionType(i64_ty, [cb_func_ptr_ty, i64_ty])

    caller_func = ir.Function(m, caller_func_ty, name='caller')
    caller_func.args[0].name = 'f'
    caller_func.args[1].name = 'i'
    irbuilder = ir.IRBuilder(caller_func.append_basic_block('entry'))
    mul = irbuilder.mul(caller_func.args[1],
                        irbuilder.constant(i64_ty, 2),
                        name='mul')
    call = irbuilder.call(caller_func.args[0], [caller_func.args[1], mul])
    irbuilder.ret(call)

create_caller creates a new LLVM IR function called caller and injects it into the given module m.

If you're not an expert at reading LLVM IR, caller is equivalent to this C function:

int64_t caller(int64_t (*f)(int64_t, int64_t), int64_t i) {
  return f(i, i * 2);
}

It takes f - a pointer to a function accepting two integers and returning an integer (all integers in this post are 64-bit), and i - an integer. It calls f with i*2 and i as the arguments. That's it - pretty simple, but sufficient for our demonstration's purposes.

Now let's define a Python function:

def myfunc(a, b):
    print('I was called with {0} and {1}'.format(a, b))
    return a + b

Finally, let's see how we can pass myfunc as the callback caller will invoke. This is fairly straightforward, thanks to the support for callback functions in ctypes. In fact, it's exactly similar to the way you'd pass Python callbacks to C code via ctypes without any JITing involved:

def main():
    module = ir.Module()
    create_caller(module)

    llvm.initialize()
    llvm.initialize_native_target()
    llvm.initialize_native_asmprinter()

    llvm_module = llvm.parse_assembly(str(module))
    tm = llvm.Target.from_default_triple().create_target_machine()

    # Compile the module to machine code using MCJIT.
    with llvm.create_mcjit_compiler(llvm_module, tm) as ee:
        ee.finalize_object()

        # Obtain a pointer to the compiled 'caller' - it's the address of its
        # JITed code in memory.
        CBFUNCTY = CFUNCTYPE(c_int64, c_int64, c_int64)
        cfptr = ee.get_pointer_to_function(llvm_module.get_function('caller'))
        callerfunc = CFUNCTYPE(c_int64, CBFUNCTY, c_int64)(cfptr)

        # Wrap myfunc in CBFUNCTY and pass it as a callback to caller.
        cb_myfunc = CBFUNCTY(myfunc)
        print('Calling "caller"')
        res = callerfunc(cb_myfunc, 42)
        print('  The result is', res)

If we run this code, we get the expected result:

Calling "caller"
I was called with 42 and 84
  The result is 126

Registering host functions in JITed code

When developing a JIT, one need that comes up very often is to delegate some of the functionality in the JITed code to the host language. For example, if you're developing a JIT to implement a fast DSL, you may not want to reimplement a whole I/O stack in your language. So you'd prefer to delegate all I/O to the host language. Taking C as a sample host language, you just want to call printf from your DSL and somehow have it routed to the host call.

How do we accomplish this feat? The solution here, naturally, depends on both the host language and the DSL you're JITing. Let's take the LLVM tutorial as an example. The Kaleidoscope language does computations on floating point numbers, but it has no I/O facilities of its own. Therefore, the Kaleidoscope compiler exposes a putchard function from the host (C++) to be callable in Kaleidoscope. For Kaleidoscope this is fairly simple, because the host is C++ and is compiled into machine code in the same process with the JITed code. All the JITed code needs to know is the symbol name of the host function to call and the call can happen (as long as the calling conventions match, of course).

Alas, for Python as a host language, things are not so straightforward. This is why, in my reimplementation of Kaleidoscope with llvmlite, I resorted to implementing the builtins in LLVM IR, emitting them into the module along with compiled Kaleidoscope code. These builtins just call the underlying C functions (which still reside in the same process, since Python itself is written in C) and don't call into Python.

But say we wanted to actually call Python. How would we go about that?

Well, we've seen a way to call Python from JITed code in this post. Can this approach be used? Yes, though it's quite cumbersome. The problem is that the only place where we have an actual interface between Python and the JITed code is when we invoke a JITed function. Somehow we should use this interface to convey to the JIT side what Python functions are available to it and how to call them. Essentially, we'll have to imlement something akin to the following schematic symbol table interface in the JITed code:

typedef int64_t (*CallbackType)(int64_t, int64_t);
std::unordered_map<std::string, CallbackType> symtab;

void register_callback(std::string name, CallbackType callback) {
  symtab[name] = callback;
}

CallbackType get_callback(std::string name) {
  auto iter = symtab.find(name);
  if (iter != symtab.end()) {
    return iter->second;
  } else {
    return nullptr;
  }
}

To register Python callbacks with the JIT, we'll call register_callback from Python, passing it a name and the callback (CFUNCTYPE as shown in the code sample at the top). The JIT side will remember this mapping in a symbol table. When it needs to invoke a Python function it will use get_callback to get the pointer by name.

In addition to being cumbersome to implement [1], this is also inefficient. It seems wasteful to go through a symbol table lookup for every call to a Python builtin. It's not like these mappings ever change in a typical use case! We are emitting code at runtime here and have so much flexibility at our command - so this lookup feels like a crutch.

Moreover, this is a simplified example - every callback takes two integer arguments. In real scenarios, the signatures of callback functions can be arbitrary, so we'd have to implement a full blown FFI-dispatching on the calls.

Breaching the compile/run-time barrier

We can do better. For every Python function we intend to call from the JITed code, we can emit a JITed wrapper. This wrapper will hard-code a call to the Python function, thus removing this dispatching (the symbol table shown above) from run-time; this totally makes sense because we know at compile time which Python functions are needed and where to find them.

Let's write the code to do this with llvmlite:

import ctypes
from ctypes import c_int64, c_void_p
import sys

import llvmlite.ir as ir
import llvmlite.binding as llvm

cb_func_ty = ir.FunctionType(ir.IntType(64),
                             [ir.IntType(64), ir.IntType(64)])
cb_func_ptr_ty = cb_func_ty.as_pointer()
i64_ty = ir.IntType(64)

def create_addrcaller(m, addr):
    # define i64 @addrcaller(i64 %a, i64 %b) #0 {
    # entry:
    #   %f = inttoptr i64% ADDR to i64 (i64, i64)*
    #   %call = tail call i64 %f(i64 %a, i64 %b)
    #   ret i64 %call
    # }
    addrcaller_func_ty = ir.FunctionType(i64_ty, [i64_ty, i64_ty])
    addrcaller_func = ir.Function(m, addrcaller_func_ty, name='addrcaller')
    a = addrcaller_func.args[0]; a.name = 'a'
    b = addrcaller_func.args[1]; b.name = 'b'
    irbuilder = ir.IRBuilder(addrcaller_func.append_basic_block('entry'))
    f = irbuilder.inttoptr(irbuilder.constant(i64_ty, addr),
                           cb_func_ptr_ty, name='f')
    call = irbuilder.call(f, [a, b])
    irbuilder.ret(call)

The IR function created by create_addrcaller is somewhat similar to the one we've seen above with create_caller, but there's a subtle difference. addcaller does not take a function pointer at runtime. It has knowledge of this function pointer encoded into it when it's generated. The addr argument passed into create_addrcaller is the runtime address of the function to call. addrcaller converts it to a function pointer (using the inttoptr instruction, which is somewhat similar to a reinterpret_cast in C++) and calls it [2].

Here's how to use it:

def main():
    CBFUNCTY = ctypes.CFUNCTYPE(c_int64, c_int64, c_int64)
    def myfunc(a, b):
        print('I was called with {0} and {1}'.format(a, b))
        return a + b
    cb_myfunc = CBFUNCTY(myfunc)
    cb_addr = ctypes.cast(cb_myfunc, c_void_p).value
    print('Callback address is 0x{0:x}'.format(cb_addr))

    module = ir.Module()
    create_addrcaller(module, cb_addr)
    print(module)

    llvm.initialize()
    llvm.initialize_native_target()
    llvm.initialize_native_asmprinter()

    llvm_module = llvm.parse_assembly(str(module))

    tm = llvm.Target.from_default_triple().create_target_machine()

    # Compile the module to machine code using MCJIT
    with llvm.create_mcjit_compiler(llvm_module, tm) as ee:
        ee.finalize_object()
        # Now call addrcaller
        print('Calling "addrcaller"')
        addrcaller = ctypes.CFUNCTYPE(c_int64, c_int64, c_int64)(
            ee.get_pointer_to_function(llvm_module.get_function('addrcaller')))
        res = addrcaller(105, 23)
        print('  The result is', res)

The key trick here is the call to ctypes.cast. It takes a Python function wrapped in a ctypes.CFUNCTYPE and casts it to a void*; in other words, it obtains its address [3]. This is the address we pass into create_addrcaller. The code ends up having exactly the same effect as the previous sample, but with an important difference: whereas previously the dispatch to myfunc happened at run-time, here it happens at compile-time.

This is a synthetic example, but it should be clear how to extend it to the full thing mentioned earlier: for each built-in needed by the JITed code from the host code, we emit a JITed wrapper to call it. No symbol table dispatching at runtime. Even better, since these builtins can have arbitrary signatures, the JITed wrapper can handle all of that efficiently. PyPy uses this technique to make calls into C (via the cffi library) much more efficien than they are with ctypes. ctypes uses libffi, which has to pack all the arguments to a function at runtime, according to a type signature it was provided. However, since this type signature almost never changes during the runtime of one program, this packing can be done much more efficiently with JITing.

Conclusion

Hopefully it's clear that while this article focuses on a very specific technology (using llvmlite to JIT native code from Python), its principles are universal. The overarching idea here is that the difference between what happens when the program is compiled and what happens when it runs is artificial. We can breach and overlay it in many ways, and use it to built increasingly complex abstractions. Some languages, like the Lisp family, list this mixture of compile-time and run-time as one of their unique strengths, and have been preaching it for decades. I fondly recall my own first real-world use of this technique many years ago - reading a configuration file and generating code at runtime that unpacks data based on that configuration. That task, emitting Perl code from Perl according to a XML config may appear worlds away from the topic of this post - emitting LLVM IR from Python according to a function signature, but if you really think about it, it's exactly the same thing.

I suspect this is one of the most obtuse articles I've written lately; if you read this far, I sure hope you found it interesting and helpful. Let me know in the comments if anything isn't clear or if you have relevant ideas - I love discussing this topic!


[1]We'll have to compile the equivalent of a hash table implementation into our JITed code. While not impossible, this may be an overkill if you really just want a quick-and-simple DSL.
[2]This delightful mixture of compile-time and run-time is by far the most important part of this article; if you remember just one thing from here, this should be it. Let me know in the comments if it's not clear.
[3]The concept of "address" for a Python function may raise an eyebrow. Keep in mind that this isn't a pure Python function we're talking about here. It's wrapped in a ctypes.CFUNCTYPE, which is a dispatcher created by ctypes ("thunk" in the nomenclature of libffi, the underlying mechanism behind ctypes) to perform argument conversion and make the actual call.

YAPF - Yet Another Python Formatter



In the past couple of years, automated reformatting tools came into prominence with go fmt for the Go programming language and clang-format for C, C++ and Java. It's very rare to encounter unformatted Go code, and the same becomes true of C++ in a number of projects (a few major open-source projects start enforcing formatting in pre-commit rules and such).

Python didn't have such a tool; well, it kinda did. There's a bunch of auto-fixers there like autopep8, but all of these serve slightly different roles. autopep8's focus is larger than just whitespace and formatting, and it won't touch code that isn't violating PEP 8 and just looks ugly. This is somewhat similar to the many existing auto linters and fixers for C++, and yet clang-format shot into prominence, for a good reason. There's a good case to make for a tool that just cares about formatting (without actually modifying the code's AST in any way), and reformats the whole code to consistently follow a single standard.

YAPF was conceived as a new tool to do this for Python. It's out there now, open-source; The first alpha release was pushed to PyPI yesterday, so you can go ahead and run: pip install yapf, or just use the downloaded or cloned source directory to run it. Here's an example:

$ cat /tmp/code.py
x = {  'a':37,'b':42,

'c':927}

y = 'hello ''world'
z = 'hello '+'world'
a = 'hello {}'.format('world')
class foo  (     object  ):
  def f    (self   ):
    return       37*-2
  def g(self, x,y=42):
      return y
def f  (   a ) :
  return      37-a[42-x :  y**3]


$ python yapf /tmp/code.py
x = {'a': 37, 'b': 42, 'c': 927}

y = 'hello ' 'world'
z = 'hello ' + 'world'
a = 'hello {}'.format('world')


class foo(object):
    def f(self):
        return 37 * -2

    def g(self, x, y=42):
        return y


def f(a):
    return 37 - a[42 - x:y ** 3]

YAPF also accepts the -i flag to overwrite a file, and a bunch of other configuration parameters. Check it out with yapf --help.

There are two big advantages for using YAPF for all your code:

  1. It makes you think (and obsess) about formatting much less when writing / tweaking code. This saves time when coding.
  2. It makes code from different developers consistent in a single project. This aids reading code, so IMHO it's the more important advantage.

I care about this tool a lot - not only because I find it really useful, but also because I had the privelege to participate in its development. Since its initial release it got a huge amount of attention with more than 2000 stars on Github as of this morning, in just a couple of weeks - there's obviously a need for such a tool out there! It's also being used in a growing number of Python projects internally at Google.

Python is a language that carries the code readability flag tall and proud. And yet, until YAPF, my feeling was that even C++ had a better auto-formatting story with clang-format. Hopefully YAPF will change this.

So please, try YAPF out, use it for your code, integrate it into your development process, report bugs, contribute pull requests.


Summary of reading: January - March 2015



  • "American Rust" by Philipp Meyer - set out in a formerly prosperous area of Pennsylvania that went through an economic collapse due to the demise of the local steel industry. The story is very engaging, and somewhat morally confusing - it's hard to decide on whose side to be. Overall, a pretty good read, though the ending could be better.
  • "The Paradox of Choice: Why more is less" by Barry Schwartz - one of the most useless books I read recently. Though it was obvious from very early on that this book is going to be disappointing, I bravely plowed through, finally giving up at around 65%. Seriously, I don't need a whole book to tell me how we have many choices in modern life, and how it can be detrimental. Not sure what I was looking for when I set out to read it, really; mental note - be more careful in the future. I suppose for some people this book can be an eye opener (divert them from their foul ways), but not being one to succumb to a plenitude of options myself, I just don't see the point.
  • "The Emperor of all Maladies: A Biography of Cancer" - a very well written book (Pulitzer prizes don't get handed out for nothing), but a very troubling one at the same time. The long chapters about leukemia in children will make any parent squirm, and the general feeling of hopelessness drags throughout this long book. Yes, the final couple of chapters are hopeful, but the vast majority of the book leaves a stronger impression. I wouldn't call the book perfect - some parts (like the legal battles surrounding smoking) could be made much shorter, and I wish some other parts (like the biology of the cancer cell and recent research in general) could be longer. Overall, however, it's a great read.
  • "The Autobiography of Bertrand Russell, Vol I" by Bertrand Russell - Russell was one of the most brilliant minds of the early 20th century, with large contributions to mathematics, philosophy and politics (for some reason, there are fewer such polymaths lately - perhaps because the areas of knowledge became too specialized). This book is the first volume of 3 in his autobiography, from childhood to age ~40 (he lived to 97, so there's plenty more to tell). It's a surprisingly readable and enjoyable book in most parts. I found the letters to be a bit tiresome, especially given that many are addressed to Russell and weren't written by him.
  • "John von Neumann and the Origins of Modern Computing" by William Aspray - a biography of John von Neumann's various contributions to early computing. The man this book describes was extremely impressive - a truly inspiring breadth and depth of knowledge. The book itself is somewhat dry though, somewhat academic with many hundreds of detailed notes and references, and very matter-of-factual presentation. Reads a bit like a long encyclopedia entry.
  • "The Goldfinch" by Donna Tartt - An exceptionally well-written and captivating book. It's obvious that the author made a huge investment in research - there are tons of small but interesting details sprinkled all over. I'm not sure I like the ending, but overall the book is excellent.
  • "Mastery" by George Leonard - in the preface, the author mentions that he wrote this book after being asked by readers for more information following a magazine article he wrote earlier on the same topic. This shows :) Though the book is fairly short (less than 200 pages in a small format), its essence could really be summarized in just a few pages. This is the first part of the book. The rest feels like filler. That said, the essence is well worth the read. The main thesis is that on the path to mastery (e.g. any learning or development journey) what matters is the path itself, not the end goal. This lesson pops up in various guises in self-help materials, but Leonard really has a nice way to describe it and teach it, so I'd say the book is recommended. If you also happen to be an auto-didactic introvert who enjoyed Csikszentmihalyi's "Flow", then this book is a must-have complement.
  • "Quiet: The Power of Introverts in a World That Can't Stop Talking" by Susan Cain (audiobook) - the author sets on a quest to define the differences between introverts and extroverts, and most importantly to help introverts self-validate about their place in life and society. I think this book could be quite important to introvert persons who aren't feeling good about their social skills and are in a defining period of their lives (say high-school to early years of career). For older individuals, the most interesting insights will be about how to properly raise children who are introverts. It's not a bad book in general, but it has some suboptimal properties. For example, trying to cleanly divide the world into two personality types is a stretch, and the gross generalizations that result from this are unavoidable. It also feels rather unscientific in places. And too much bashing of extroverts, IMHO.
  • "The Son" by Philipp Meyer - a very well-written saga about a Texan family, spread over half of the 19th century and most of the 20th. Tells the story from the POV of three different family members - the original patriarch, his son, and his grand-granddaughter. The writing is excellent, though I dislike explicit tricks that make it hard to put a book down (intersperse several plot lines and always end each segment with something that leaves you in suspense). What I found most interesting is the description of life among the Comanches; it's either all imagined or the author did some serious research (I certainly hope for the latter). The ending of the book is kind-of unexpected and anti-climatic, but then again, it also makes sense, in a way.
  • "Make It Stick: The Science of Successful Learning" by P. Brown, H. Roediger * and M. McDaniel (audiobook) - the authors present several techniques for more effective learning. The first half (or so) of the book is very interesting and enjoyable - the efficacy of testing as opposed to repeated reading, the advantages of forcing recall and spaced repetition - it all makes a lot of sense and represents the same central theme from different angles. The second part of the book is less useful, IMHO, and feels like filler.
  • "A Student's Guide to Vectors and Tensors" by Daniel Fleisch - a short book that attempts to provide an introduction to the mathematics and uses of vectors and tensors. I mainly picked it up for the latter. While the book is well written in general and does a fairly good job explaining the material, IMHO the division of effort between vectors and tensors is wrong. Vectors are a much more basic and familiar concept, and much easier to relate to the physical world. Hence, spending significantly more time on vectors and providing more examples isn't the best choice here. If this is someone's first exposure to vectors, he's unlikely to get tensors on a first reading of this book. And for someone already familiar with vectors and looking for more information on tensors, the first part of the book is almost useless. I was also disappointed by the lack of rigor - important concepts are presented without any proof or motivation. The exercises do a good job of complementing the material - the online solution manual is awesome, though I felt the exercises are a bit on the easy side.

Re-reads:

  • "A Certain Ambiguity: A mathematical novel" by G. Suri and H. Bal
  • "The varieties of scientific experience" by Carl Sagan