<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Eli Bendersky's website - LLVM &amp; Clang</title><link href="https://eli.thegreenplace.net/" rel="alternate"></link><link href="https://eli.thegreenplace.net/feeds/llvm-clang.atom.xml" rel="self"></link><id>https://eli.thegreenplace.net/</id><updated>2025-01-25T13:56:56-08:00</updated><entry><title>Adventures in JIT compilation: Part 3 - LLVM</title><link href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-3-llvm/" rel="alternate"></link><published>2017-05-01T05:51:00-07:00</published><updated>2024-05-04T19:46:23-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2017-05-01:/2017/adventures-in-jit-compilation-part-3-llvm/</id><summary type="html">&lt;p&gt;This is part 3 of my &amp;quot;Adventures in JIT compilation&amp;quot; series.
&lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-1-an-interpreter/"&gt;Part 1&lt;/a&gt;
introduced the BF input language and presented a few interpreters in increasing
degree of optimization. In &lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit/"&gt;part 2&lt;/a&gt;
we've seen how to roll a JIT compiler for BF from scratch, by emitting x64
machine code into executable …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This is part 3 of my &amp;quot;Adventures in JIT compilation&amp;quot; series.
&lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-1-an-interpreter/"&gt;Part 1&lt;/a&gt;
introduced the BF input language and presented a few interpreters in increasing
degree of optimization. In &lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit/"&gt;part 2&lt;/a&gt;
we've seen how to roll a JIT compiler for BF from scratch, by emitting x64
machine code into executable memory and jumping to it for execution.&lt;/p&gt;
&lt;p&gt;In this part, I'm going to present another JIT compiler for BF. This time,
however, rather than rolling everything from scratch, I'll be using the &lt;a class="reference external" href="http://llvm.org/"&gt;LLVM&lt;/a&gt; framework.&lt;/p&gt;
&lt;div class="section" id="llvm-as-a-programming-language-backend"&gt;
&lt;h2&gt;LLVM as a programming language backend&lt;/h2&gt;
&lt;p&gt;I'll start by saying this post is not meant to be a full tutorial for LLVM. LLVM
has &lt;a class="reference external" href="http://llvm.org/docs/tutorial/"&gt;a pretty good tutorial&lt;/a&gt; already (see also
&lt;a class="reference external" href="https://eli.thegreenplace.net/2015/python-version-of-the-llvm-tutorial/"&gt;my Python port of it&lt;/a&gt;).
I've also written &lt;a class="reference external" href="https://eli.thegreenplace.net/tag/llvm-clang"&gt;more in-depth pieces about LLVM&lt;/a&gt; in the past; see for example
&lt;a class="reference external" href="https://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm"&gt;Life of an instruction in LLVM&lt;/a&gt;. That
said, I do hope that seeing how to apply LLVM to develop a complete JIT compiler
for BF can be useful, and the &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/blob/main/2017/bfjit/llvmjit.cpp"&gt;complete code sample&lt;/a&gt;
can serve as a starting point for programming language enthusiasts to develop
their own backends with a fairly modern version of LLVM. If you have a bit more
time, look for exercise and experiment suggestions in the footnotes.&lt;/p&gt;
&lt;p&gt;Speaking of LLVM versions... LLVM's C++ API is notoriously volatile, and code
working today has little chance of working in just a few months without any
changes. However, LLVM &lt;em&gt;does&lt;/em&gt; have &lt;a class="reference external" href="http://releases.llvm.org/"&gt;numbered releases&lt;/a&gt; that can be used to maintain some sort of sanity
since they are extensively tested. The code for this post was developed with the
LLVM 4.0 release. Even if you're reading this in the year 2022, hopefully you
should be able to download LLVM 4.0 and compile &amp;amp; link the sample code.&lt;/p&gt;
&lt;p&gt;As opposed to the use of asmjit in &lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit/"&gt;part 2&lt;/a&gt;,
LLVM offers several distinct advantages:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;Since LLVM comes with many state-of-the-art optimizations on the IR level, we
can generate fairly straightforward LLVM IR from our source language, without
worrying too much about its efficiency. I'll demonstrate this shortly.&lt;/li&gt;
&lt;li&gt;LLVM is multi-target. In this post I'm showing a JIT compiler that emits
code for the machine it runs on (x64 in my case), but one can easily reuse
the same code to compile BF to ARM, PowerPC, MIPS or a bunch of other
backends supported by LLVM.&lt;/li&gt;
&lt;li&gt;LLVM comes with a large set of tools useful to visualize and manipulate IR
and other stages of compilation. I mention a couple uses of the &lt;tt class="docutils literal"&gt;opt&lt;/tt&gt; tool
throughout the post, and &lt;a class="reference external" href="http://llvm.org/docs/CommandGuide/"&gt;there are others&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div class="section" id="generating-llvm-ir-from-bf"&gt;
&lt;h2&gt;Generating LLVM IR from BF&lt;/h2&gt;
&lt;p&gt;The core of the code generation code in this sample is fairly short - only ~130
lines of C++ in &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/blob/main/2017/bfjit/llvmjit.cpp"&gt;emit_jit_function&lt;/a&gt;.
I'll walk through it, leaving some details out. Feel free to check the LLVM
documentation or API headers for more information, if needed.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jit_func_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;void_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jit_func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;jit_func_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ExternalLinkage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JIT_FUNC_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BasicBlock&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;entry_bb&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BasicBlock&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;entry&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jit_func&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;IRBuilder&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry_bb&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We begin by creating a LLVM function to hold the emitted code. The function is
named &lt;tt class="docutils literal"&gt;__llvmjit&lt;/tt&gt; (which is what the constant &lt;tt class="docutils literal"&gt;JIT_FUNC_NAME&lt;/tt&gt; contains) and
its linkage is external so that we could call it from outside the LLVM module
where it resides. We also create the entry basic block in the function where the
initial code will go, as well as an &lt;em&gt;IR builder&lt;/em&gt; that makes the job of emitting
LLVM IR a bit easier than using raw APIs.&lt;/p&gt;
&lt;p&gt;More setup follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;AllocaInst&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateAlloca&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int8_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MEMORY_SIZE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;memory&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateMemSet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MEMORY_SIZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;AllocaInst&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateAlloca&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int32_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr_addr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this version of the BF JIT, I decided to keep the data memory on the stack of
the JITed function. This makes it easier for LLVM to optimize accesses to the
memory (as we'll see), because the memory is private - it can't be aliased from
outside the function and can't have side effects, which frees the optimizer to
be more aggressive &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;. The data pointer itself is kept on the stack too
(created with an &lt;tt class="docutils literal"&gt;alloca&lt;/tt&gt; instruction) - more on this very shortly. The
data pointer is initialized to 0, per BF semantics &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Finally, as usual for one-pass code emission from BF, we have to keep a stack of
open brackets (in the &lt;tt class="docutils literal"&gt;asmjit&lt;/tt&gt; version the type was called &lt;tt class="docutils literal"&gt;BracketLabels&lt;/tt&gt;):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;BracketBlocks&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;open_bracket_stack&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we're ready for the compilation loop that takes the next BF instruction and
emits the LLVM IR for it. First, pointer movement:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;&amp;#39;&amp;gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;inc_dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;inc_dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inc_dataptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;&amp;#39;&amp;lt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dec_dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateSub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dec_dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dec_dataptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To move the data pointer, we load its value from its storage on the stack,
update it (by either incrementing or decrementing) and store it back. If this
seems inefficient, read on! Later on, the post describes why this style of LLVM
IR emission is not only acceptable, but desirable.&lt;/p&gt;
&lt;p&gt;For data memory updates, the code is not much more complicated:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;&amp;#39;+&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateInBoundsGEP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element_addr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;inc_element&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;inc_element&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inc_element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;&amp;#39;-&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateInBoundsGEP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element_addr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dec_element&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateSub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;sub_element&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dec_element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We similarly load the data pointer, this time using it to compute an offset into
the memory (using LLVM's &lt;tt class="docutils literal"&gt;getelementptr&lt;/tt&gt; instruction). We then load the
element from memory, update it and store it back. Note how we use the
&lt;tt class="docutils literal"&gt;inbounds&lt;/tt&gt; variant of &lt;tt class="docutils literal"&gt;getelementptr&lt;/tt&gt;; BF doesn't define the behavior of
stepping outside the bounds of memory - we leverage this fact to let LLVM
produce more optimized code.&lt;/p&gt;
&lt;p&gt;For I/O, we use a technique similar to the one employed with &lt;tt class="docutils literal"&gt;asmjit&lt;/tt&gt; in part
2: call the &lt;tt class="docutils literal"&gt;getchar/putchar&lt;/tt&gt; functions from the host. If you look at the
signature of &lt;tt class="docutils literal"&gt;emit_jit_function&lt;/tt&gt;, it takes a &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;llvm::Function*&lt;/span&gt;&lt;/tt&gt; for each of
&lt;tt class="docutils literal"&gt;getchar&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;putchar&lt;/tt&gt;; these are declared in the caller by adding their
declaration to the module. Later, in the section dealing with the JIT we'll see
how these get resolved at run-time.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;&amp;#39;.&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateInBoundsGEP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element_addr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_i32&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateIntCast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;int32_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element_i32_&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;putchar_func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_i32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;&amp;#39;,&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getchar_func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;user_input&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;user_input_i8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateIntCast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;int8_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;user_input_i8_&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateInBoundsGEP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element_addr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input_i8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As usual, the trickiest part of generating code for BF is handling the loops,
which could be nested. LLVM makes this fairly easy by having &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Basic_block"&gt;basic blocks&lt;/a&gt; as first-class citizens. Every
loop body gets its basic block, and the original block gets split to two - the
first part jumping into the loop, the last part happening after the loop (since
we can skip directly to it). Here's how the control-flow graph within a function
with just a single loop looks &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;:&lt;/p&gt;
&lt;img alt="CFG for function with one loop" class="align-center" src="https://eli.thegreenplace.net/images/2017/llvmjit-oneloop-cfg.png" /&gt;
&lt;p&gt;Note how the jumps between basic blocks happen on a condition and have &lt;tt class="docutils literal"&gt;T&lt;/tt&gt;
(true) or &lt;tt class="docutils literal"&gt;F&lt;/tt&gt; (false) clauses. These just encode the usual BF semantics:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;For a &lt;tt class="docutils literal"&gt;[&lt;/tt&gt;, we compare the current memory cell with 0; if it's 0, we skip the
loop (the &lt;tt class="docutils literal"&gt;T&lt;/tt&gt; clause); if it's not 0, we enter the loop (the &lt;tt class="docutils literal"&gt;F&lt;/tt&gt; clause).&lt;/li&gt;
&lt;li&gt;For a &lt;tt class="docutils literal"&gt;]&lt;/tt&gt;, we compare the current memory cell with 0; if it's 0, we jump
back to the loop start; otherwise we end the loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here's the code generating LLVM IR from &lt;tt class="docutils literal"&gt;[&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;&amp;#39;[&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateInBoundsGEP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element_addr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cmp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateICmpEQ&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;compare_zero&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BasicBlock&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loop_body_block&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BasicBlock&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;loop_body&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jit_func&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BasicBlock&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;post_loop_block&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BasicBlock&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;post_loop&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jit_func&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateCondBr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;post_loop_block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loop_body_block&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;open_bracket_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BracketBlocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loop_body_block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;post_loop_block&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetInsertPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loop_body_block&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With the overview and control-flow graph above, I hope it's clear. The most
interesting part is using the basic blocks to represent parts of the loop. When
&lt;tt class="docutils literal"&gt;[&lt;/tt&gt; is encountered, we create both the loop and &amp;quot;post loop&amp;quot; blocks, since the
branch instruction has to refer to them. We also save these blocks on the open
bracket stack to refer to them when the matching &lt;tt class="docutils literal"&gt;]&lt;/tt&gt; is encountered. Finally,
we set our IR builder to insert all subsequent instructions into the loop block.&lt;/p&gt;
&lt;p&gt;This is how &lt;tt class="docutils literal"&gt;]&lt;/tt&gt; is handled:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sc"&gt;&amp;#39;]&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;open_bracket_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;DIE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;unmatched closing &amp;#39;]&amp;#39; at pc=&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;BracketBlocks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;open_bracket_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;open_bracket_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataptr_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dataptr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateInBoundsGEP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataptr&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element_addr&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateLoad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;element&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cmp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateICmpNE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getInt8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;compare_zero&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateCondBr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loop_body_block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post_loop_block&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetInsertPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post_loop_block&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Pretty much as we'd expect: the relevant blocks are popped from the stack and
used as targets for another conditional branch &lt;a class="footnote-reference" href="#footnote-4" id="footnote-reference-4"&gt;[4]&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="ir-sample-for-a-simple-bf-program"&gt;
&lt;h2&gt;IR sample for a simple BF program&lt;/h2&gt;
&lt;p&gt;Let's see what LLVM IR our code emits for the simple &lt;tt class="docutils literal"&gt;count1to5.bf&lt;/tt&gt; program:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;++++++++ ++++++++ ++++++++ ++++++++ ++++++++ ++++++++

&amp;gt;+++++
[&amp;lt;+.&amp;gt;-]
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The full LLVM-based JIT is available in &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/blob/main/2017/bfjit/llvmjit.cpp"&gt;llvmjit.cpp&lt;/a&gt;.
When invoked in verbose mode, it will dump LLVM IR for the translated program
before and after LLVM optimization. Let's start by looking at the
pre-optimization IR. I've added comments on lines starting with &lt;tt class="docutils literal"&gt;###&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;define void @__llvmjit() {
entry:

  ### The entry BB starts by declaring the memory and data pointer, and
  ### initializing the data pointer to 0.

  %memory = alloca i8, i32 30000
  call void @llvm.memset.p0i8.i64(i8* %memory, i8 0, i64 30000, i32 1, i1 false)
  %dataptr_addr = alloca i32
  store i32 0, i32* %dataptr_addr

  ### The following 5 instructions increment memory[dataptr] (with dataptr
  ### remaining at 0, as initialized), and they are repeated 48 times...

  %dataptr = load i32, i32* %dataptr_addr
  %element_addr = getelementptr inbounds i8, i8* %memory, i32 %dataptr
  %element = load i8, i8* %element_addr
  %inc_element = add i8 %element, 1
  store i8 %inc_element, i8* %element_addr

  ### ... 47 more times skipped ...

  store i8 %inc_element184, i8* %element_addr182
  %dataptr185 = load i32, i32* %dataptr_addr
  %element_addr186 = getelementptr inbounds i8, i8* %memory, i32 %dataptr185
  %element187 = load i8, i8* %element_addr186
  %inc_element188 = add i8 %element187, 1
  store i8 %inc_element188, i8* %element_addr186

  ### Now incrementing dataptr and 5 more data increments.

  %dataptr189 = load i32, i32* %dataptr_addr
  %inc_dataptr = add i32 %dataptr189, 1
  store i32 %inc_dataptr, i32* %dataptr_addr

  %dataptr190 = load i32, i32* %dataptr_addr
  %element_addr191 = getelementptr inbounds i8, i8* %memory, i32 %dataptr190
  %element192 = load i8, i8* %element_addr191
  %inc_element193 = add i8 %element192, 1
  store i8 %inc_element193, i8* %element_addr191

  ### ... 4 more increments skipped

  ### Load memory[dataptr] and compare it to 0; on true, jump to %post_loop;
  ### on false jump to %loop_body.

  %dataptr210 = load i32, i32* %dataptr_addr
  %element_addr211 = getelementptr inbounds i8, i8* %memory, i32 %dataptr210
  %element212 = load i8, i8* %element_addr211
  %compare_zero = icmp eq i8 %element212, 0
  br i1 %compare_zero, label %post_loop, label %loop_body

loop_body:

  ### Decrement dataptr

  %dataptr213 = load i32, i32* %dataptr_addr
  %dec_dataptr = sub i32 %dataptr213, 1
  store i32 %dec_dataptr, i32* %dataptr_addr

  ### Increment memory[dataptr]

  %dataptr214 = load i32, i32* %dataptr_addr
  %element_addr215 = getelementptr inbounds i8, i8* %memory, i32 %dataptr214
  %element216 = load i8, i8* %element_addr215
  %inc_element217 = add i8 %element216, 1
  store i8 %inc_element217, i8* %element_addr215

  ### Invoke putchar on memory[dataptr]

  %dataptr218 = load i32, i32* %dataptr_addr
  %element_addr219 = getelementptr inbounds i8, i8* %memory, i32 %dataptr218
  %element220 = load i8, i8* %element_addr219
  %element_i32_ = zext i8 %element220 to i32
  %0 = call i32 @putchar(i32 %element_i32_)

  ### Increment dataptr

  %dataptr221 = load i32, i32* %dataptr_addr
  %inc_dataptr222 = add i32 %dataptr221, 1
  store i32 %inc_dataptr222, i32* %dataptr_addr

  ### Decrement memory[dataptr]

  %dataptr223 = load i32, i32* %dataptr_addr
  %element_addr224 = getelementptr inbounds i8, i8* %memory, i32 %dataptr223
  %element225 = load i8, i8* %element_addr224
  %sub_element = sub i8 %element225, 1
  store i8 %sub_element, i8* %element_addr224

  ### Load memory[dataptr] and compare it to 0; on true, jump back to
  ### %loop_body; on false jump to %post_loop.

  %dataptr226 = load i32, i32* %dataptr_addr
  %element_addr227 = getelementptr inbounds i8, i8* %memory, i32 %dataptr226
  %element228 = load i8, i8* %element_addr227
  %compare_zero229 = icmp ne i8 %element228, 0
  br i1 %compare_zero229, label %loop_body, label %post_loop

post_loop
  call void @dump_memory(i8* %memory)
  ret void
}
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As discussed before, this code is doing a huge amount of repetetive and mostly
unnecessary work. It keeps storing and reloading values it should already have.
But this is precisely what the LLVM optimizer is designed to fix. Let's see the
post-optimization code LLVM produces:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;define void @__llvmjit() local_unnamed_addr {
loop_body.preheader:
  %memory290 = alloca [30000 x i8], align 1
  %memory290.sub = getelementptr inbounds [30000 x i8], [30000 x i8]* %memory290, i64 0, i64 0
  %0 = getelementptr inbounds [30000 x i8], [30000 x i8]* %memory290, i64 0, i64 2
  call void @llvm.memset.p0i8.i64(i8* nonnull %0, i8 0, i64 29998, i32 1, i1 false)
  store i8 48, i8* %memory290.sub, align 1
  %element_addr191 = getelementptr inbounds [30000 x i8], [30000 x i8]* %memory290, i64 0, i64 1
  store i8 5, i8* %element_addr191, align 1
  br label %loop_body

loop_body:
  %1 = tail call i32 @putchar(i32 49)
  %2 = tail call i32 @putchar(i32 50)
  %3 = tail call i32 @putchar(i32 51)
  %4 = tail call i32 @putchar(i32 52)
  %5 = tail call i32 @putchar(i32 53)
  store i8 53, i8* %memory290.sub, align 1
  store i8 0, i8* %element_addr191, align 1
  call void @dump_memory(i8* nonnull %memory290.sub)
  ret void
}
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The LLVM optimizer is extremely aggressive! Not only all the repetition is gone,
but there isn't even a loop any more because LLVM statically computed it will
just run 5 times and unrolled it completely. &lt;tt class="docutils literal"&gt;putchar&lt;/tt&gt; is invoked 5 times
with the values the loop would have produced, and that's all. The reason LLVM
kept &lt;tt class="docutils literal"&gt;memory&lt;/tt&gt; around was only so the call to &lt;tt class="docutils literal"&gt;dump_memory&lt;/tt&gt; would have data.
If we remove this debugging call, the function turns into:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;define void @__llvmjit() local_unnamed_addr #0 {
entry:
  %0 = tail call i32 @putchar(i32 49)
  %1 = tail call i32 @putchar(i32 50)
  %2 = tail call i32 @putchar(i32 51)
  %3 = tail call i32 @putchar(i32 52)
  %4 = tail call i32 @putchar(i32 53)
  ret void
}
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="observing-how-llvm-handles-our-loops"&gt;
&lt;h2&gt;Observing how LLVM handles our loops&lt;/h2&gt;
&lt;p&gt;So the &lt;tt class="docutils literal"&gt;count1to5&lt;/tt&gt; sample was a bit too simple for the LLVM optimizer. To see
our loops actually being emitted, we'll have to resort to more tricks - by
placing &amp;quot;input&amp;quot; instructions (&lt;tt class="docutils literal"&gt;,&lt;/tt&gt; in BF) in stratetic locations so that LLVM
can't just infer their value statically. Take for example the
(slightly-nonsensical) BF program:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&amp;gt;,

[&amp;lt;+.&amp;gt;,]
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It places user input into cell 1, and then repeatedly:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Increments cell 0, printing its value out&lt;/li&gt;
&lt;li&gt;Places new user input in cell 1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It will terminate when the user input is 0. Not very useful, but it does the
job. Here's the &lt;em&gt;optimized&lt;/em&gt; IR LLVM emits for it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;define void @__llvmjit() local_unnamed_addr {
entry:
  %memory29 = alloca [30000 x i8], align 1
  %memory29.sub = getelementptr inbounds [30000 x i8], [30000 x i8]* %memory29, i64 0, i64 0
  call void @llvm.memset.p0i8.i64(i8* nonnull %memory29.sub, i8 0, i64 30000, i32 1, i1 false)
  %user_input = tail call i32 @getchar()
  %user_input_i8_ = trunc i32 %user_input to i8
  %element_addr = getelementptr inbounds [30000 x i8], [30000 x i8]* %memory29, i64 0, i64 1
  store i8 %user_input_i8_, i8* %element_addr, align 1
  %compare_zero = icmp eq i8 %user_input_i8_, 0
  br i1 %compare_zero, label %post_loop, label %loop_body.preheader

loop_body.preheader:
  br label %loop_body

loop_body:
  %element730 = phi i8 [ %inc_element, %loop_body ], [ 0, %loop_body.preheader ]
  %inc_element = add i8 %element730, 1
  %element_i32_ = zext i8 %inc_element to i32
  %0 = tail call i32 @putchar(i32 %element_i32_)
  %user_input13 = tail call i32 @getchar()
  %user_input_i8_14 = trunc i32 %user_input13 to i8
  %compare_zero20 = icmp eq i8 %user_input_i8_14, 0
  br i1 %compare_zero20, label %post_loop.loopexit, label %loop_body

post_loop.loopexit:
  store i8 %inc_element, i8* %memory29.sub, align 1
  store i8 0, i8* %element_addr, align 1
  br label %post_loop

post_loop:
  call void @dump_memory(i8* nonnull %memory29.sub)
  ret void
}
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The two most important points to note are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;There's no extra stack movement; like, &lt;em&gt;at all&lt;/em&gt;. LLVM statically computed what
happens at address 0 and address 1 and just juggles virtual registers with
these values, &lt;em&gt;actually storing to memory only outside the loop&lt;/em&gt;!&lt;/li&gt;
&lt;li&gt;To be able to do that, it generated a &lt;tt class="docutils literal"&gt;phi&lt;/tt&gt; instruction at the loop body
start.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The latter is especially important: this is LLVM converting the IR to &lt;em&gt;SSA
form&lt;/em&gt;, where every value is assigned only once and special &lt;tt class="docutils literal"&gt;phi&lt;/tt&gt; nodes are
required to merge multiple possible values. SSA is well outside the scope of
this humble post, but I suggest reading about it. LLVM's &lt;tt class="docutils literal"&gt;mem2reg&lt;/tt&gt; pass
converts our naive stack-using code to SSA with virtual registers, and SSA form
makes it much easier for the compiler to analyze the IR and optimize it
aggressively.&lt;/p&gt;
&lt;p&gt;On the other hand, &lt;em&gt;emitters&lt;/em&gt; of LLVM IR don't have to worry about efficient
usage of virtual registers and can just assign stack slots for all &amp;quot;variables&amp;quot;,
leaving the optimization to LLVM. For our simple use case of BF this may not
matter too much, but think about a classical programming language where a
function may have dozens of variables to track &lt;a class="footnote-reference" href="#footnote-5" id="footnote-reference-5"&gt;[5]&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="jiting-llvm-ir-to-executable-machine-code"&gt;
&lt;h2&gt;JITing LLVM IR to executable machine code&lt;/h2&gt;
&lt;p&gt;Being able to optimize LLVM IR is just one of the strengths of LLVM &lt;tt class="docutils literal"&gt;llvmjit&lt;/tt&gt;
is using. The other is the ability to efficiently execute this IR at run-time
by JITing it into an excutable in-memory chunk of machine code.&lt;/p&gt;
&lt;p&gt;In &lt;tt class="docutils literal"&gt;llvmjit&lt;/tt&gt;, the part doing this is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;SimpleOrcJIT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;jit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="cm"&gt;/*verbose=*/&lt;/span&gt;&lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;setDataLayout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_target_machine&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;createDataLayout&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;jit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_module&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;JITSymbol&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jit_func_sym&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find_symbol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;JIT_FUNC_NAME&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;jit_func_sym&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;DIE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Unable to find symbol &amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JIT_FUNC_NAME&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot; in module&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="k"&gt;using&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JitFuncType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;JitFuncType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jit_func_ptr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;reinterpret_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;JitFuncType&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jit_func_sym&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getAddress&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;jit_func_ptr&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Which, of course, leaves the question - what is &lt;tt class="docutils literal"&gt;SimpleOrcJit&lt;/tt&gt;? It's a
simplified instantiation of LLVM's &lt;a class="reference external" href="http://llvm.org/docs/tutorial/BuildingAJIT2.html"&gt;newest JIT engine - ORC&lt;/a&gt;. For the full code see
&lt;a class="reference external" href="https://github.com/eliben/code-for-blog/blob/main/2017/bfjit/llvm_jit_utils.h"&gt;this header&lt;/a&gt;
and its accompanying source file. My implementation of the JIT is very similar
to the one in the LLVM tutorial, with the addition of dumping the JITed machine
code to a binary file prior to returning an executable pointer to it.&lt;/p&gt;
&lt;p&gt;LLVM's ORC JIT looks formidable, but there's no magic to it. In its essence,
it's just doing the thing &lt;a class="reference external" href="https://eli.thegreenplace.net/2013/11/05/how-to-jit-an-introduction"&gt;my introduction to JITing&lt;/a&gt;
describes, with with many (many!) more layers on top. One interesting thing I
&lt;em&gt;would&lt;/em&gt; like to mention, though, is how the JIT finds host functions to call,
since it's designed differently from the previous JITs I showed.&lt;/p&gt;
&lt;p&gt;For &lt;tt class="docutils literal"&gt;llvmjit&lt;/tt&gt;, all we do is &lt;em&gt;declare&lt;/em&gt; (without defining) the host functions in
the module, so that we can insert a &lt;tt class="docutils literal"&gt;call&lt;/tt&gt; instruction to these functions. For
example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;declare i32 @putchar(i32) local_unnamed_addr #0
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Unlike the approach with &lt;tt class="docutils literal"&gt;asmjit&lt;/tt&gt;-based JITs, we don't encode the address of
the host function in the JITed code. Rather we let the JIT resolve it
automatically. This is done by adding the following &lt;em&gt;resolver&lt;/em&gt; to the JIT:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;resolver&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;orc&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;createLambdaResolver&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sym&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;find_mangled_symbol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sym&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JITSymbol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sym_addr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="n"&gt;RTDyldMemoryManager&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;getSymbolAddressInProcess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JITSymbol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sym_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JITSymbolFlags&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Exported&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JITSymbol&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The second lambda function passed to the resolver is invoked when the LLVM IR
being compiled contains a &lt;tt class="docutils literal"&gt;call&lt;/tt&gt; to a symbol that wasn't defined in the
module. In this case, what we do amounts to a call to &lt;tt class="docutils literal"&gt;dlsym&lt;/tt&gt; with the symbol
name. The constructor of &lt;tt class="docutils literal"&gt;SimpleOrcJIT&lt;/tt&gt; calls
&lt;tt class="docutils literal"&gt;LoadLibraryPermanently(nullptr)&lt;/tt&gt; which translates to &lt;tt class="docutils literal"&gt;dlopen(nullptr,
&lt;span class="pre"&gt;...)&lt;/span&gt;&lt;/tt&gt;, meaning that all symbols contained in the host executable are
visible to the JIT. To accomplish this, &lt;tt class="docutils literal"&gt;llvmjit&lt;/tt&gt; is compiled with the
&lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;-rdynamic&lt;/span&gt;&lt;/tt&gt; linker flag. This way the JITed code can call any function found
in the host code, including standard C library functions like &lt;tt class="docutils literal"&gt;putchar&lt;/tt&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="how-fast-does-it-run"&gt;
&lt;h2&gt;How fast does it run?&lt;/h2&gt;
&lt;p&gt;Let's see how &lt;tt class="docutils literal"&gt;llvmjit&lt;/tt&gt; compares to the JITs developed in &lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit/"&gt;part 2&lt;/a&gt;.
To make this measurement fair, I'm instrumenting the program to report &lt;em&gt;execution&lt;/em&gt;
time separately from &lt;em&gt;compilation&lt;/em&gt; time, since LLVM may take a while optimizing
a large IR program.&lt;/p&gt;
&lt;p&gt;With this instrumentation, &lt;tt class="docutils literal"&gt;llvmjit&lt;/tt&gt; takes 0.92 seconds to run &lt;tt class="docutils literal"&gt;mandelbrot&lt;/tt&gt;,
which is virtually the same time &lt;tt class="docutils literal"&gt;optasmjit&lt;/tt&gt; took; it takes 0.14 seconds to
run &lt;tt class="docutils literal"&gt;factor&lt;/tt&gt;, half of the time of &lt;tt class="docutils literal"&gt;optasmjit&lt;/tt&gt;. That said, for these programs
LLVM optimizations took roughly the same amount of time as execution, so if you
include everything in - &lt;tt class="docutils literal"&gt;llvmjit&lt;/tt&gt; is equal on &lt;tt class="docutils literal"&gt;factor&lt;/tt&gt; and loses by ~2x on
&lt;tt class="docutils literal"&gt;mandelbrot&lt;/tt&gt; &lt;a class="footnote-reference" href="#footnote-6" id="footnote-reference-6"&gt;[6]&lt;/a&gt;. So, as often is the case with JITs, it all depends on the
use case - if we care about run-time of long programs, it makes sense to spend
more time optimizing; it we mostly care about short programs, it doesn't. This
is why many modern JITs (think JavaScript) are multi-stage - starting with a
fast interpreter or &lt;em&gt;baseline&lt;/em&gt; (very light on optimization) JIT, and switching
to a more optimizing JIT if the program turns out to be longer-running than
initially expected.&lt;/p&gt;
&lt;img alt="BF opt3 vs. part 2 jits vs. llvmjit" class="align-center" src="https://eli.thegreenplace.net/images/2017/bf-runtime-vs-llvmjit.png" /&gt;
&lt;p&gt;Optimization-time aside, it's quite amazing that LLVM is managing to reach the
speed of &lt;tt class="docutils literal"&gt;optasmjit&lt;/tt&gt;, which emits tight machine code and uses domain-specific
optimizations to convert whole loops from the BF code to simple O(1) sequences.
This is due to the power of LLVM's optimizer; it is able, on its own, to infer
that some of these loops are really accomplishing trivial computations. I
suggest taking this as an exercise: write simple BF programs and run them
through &lt;tt class="docutils literal"&gt;llvmjit &lt;span class="pre"&gt;--verbose&lt;/span&gt;&lt;/tt&gt; to observe the post-optimization IR. You'll see
that LLVM is able to remove loops like &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;[-]&lt;/span&gt;&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;[&amp;lt;-&amp;gt;+]&lt;/span&gt;&lt;/tt&gt;, replacing them by
simple data initialization &amp;amp; movement. Bonus points for uncovering other
optimizations that LLVM did but our &lt;tt class="docutils literal"&gt;optasmjit&lt;/tt&gt; didn't - after all, LLVM does
run the &lt;tt class="docutils literal"&gt;factor&lt;/tt&gt; benchmark much faster, so there must be something interesting
there.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;After playing with manual code-generation in part 2, this post demonstrates
using an industrial-strength compiler backend to create a JIT for BF. LLVM is a
large and complex library with a steep learning curve, but once you're past
the initial ramp-up stage it's an extremely powerful tool. LLVM lets you emit
very straightforward code without worrying about things like registers and
recomputing the same values over and over again. Its state-of-the art optimizer
removes all these inefficiencies, producing extremely tight IR. Moreover, LLVM
lets us target multiple architectures very easily from the same code - just
choose which targets you want when configuring &amp;amp; compiling LLVM itself, and you
have a multi-target compiler.&lt;/p&gt;
&lt;p&gt;However, LLVM has downsides as well - it's large, and its compile time is
considerable. This sometimes makes it less desirable in situations where a small
footprint and fast compilation are required. For an interesting account of how
the WebKit developers replaced LLVM with a custom backend for their &amp;quot;last-tier&amp;quot;
optimized JIT, read &lt;a class="reference external" href="https://webkit.org/blog/5852/introducing-the-b3-jit-compiler/"&gt;Introducing the B3 JIT compiler&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Links to all posts in this series:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-1-an-interpreter/"&gt;Part 1 - an interpreter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit/"&gt;Part 2 - an x64 JIT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-3-llvm/"&gt;Part 3 - LLVM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-4-in-python/"&gt;Part 4 - Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;As an exercise, move this memory outside the function - by allocating it
on the host and passing a pointer to the JITed code, similarly to how it
was done in &lt;a class="reference external" href="https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit/"&gt;part 2&lt;/a&gt;.
Observe how this affects LLVM's optimization, if at all. Read on LLVM's
concepts of &lt;em&gt;volatile&lt;/em&gt; and &lt;em&gt;aliasing&lt;/em&gt; and think about how they may affect
optimizations.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;A fun experiment is to remove this initialization and observe what LLVM
does with the code. I had this bug initially, and it's a brilliant
demonstration of LLVM's capability of optimizing code that has undefined
behavior (in this case using uninitialized values).&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;This CFG is produced automatically by LLVM and is a good example of the
tools provided by default with this framework. To generate it, I took
the LLVM IR file dumped by &lt;tt class="docutils literal"&gt;llvmjit&lt;/tt&gt; in verbose mode, and ran LLVM's
&lt;tt class="docutils literal"&gt;opt&lt;/tt&gt; tool on it with the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;-view-cfg-only&lt;/span&gt;&lt;/tt&gt; flag. This makes &lt;tt class="docutils literal"&gt;opt&lt;/tt&gt;
dump a Graphviz &lt;tt class="docutils literal"&gt;.dot&lt;/tt&gt; file which can then be converted to an image.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-4" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-4"&gt;[4]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Exercise: why do we need to save &lt;tt class="docutils literal"&gt;loop_body_block&lt;/tt&gt; at all? When a &lt;tt class="docutils literal"&gt;]&lt;/tt&gt;
is encountered, shouldn't the matching &lt;tt class="docutils literal"&gt;loop_body_block&lt;/tt&gt; be the same
one as the currently populated block?&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-5" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-5"&gt;[5]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;A cool experiment to try is compile some C function to LLVM IR with
the Clang front-end and observe stack usage in unoptimized code, followed
by running &lt;tt class="docutils literal"&gt;opt &lt;span class="pre"&gt;-O3&lt;/span&gt;&lt;/tt&gt; to see what the optimizer turned it into.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-6" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-6"&gt;[6]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;For &lt;tt class="docutils literal"&gt;mandelbrot&lt;/tt&gt;, the textual representation of LLVM IR generated for
the BF program runs to ~42,000 lines, which takes LLVM 0.8 seconds to
fully optimize.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="LLVM &amp; Clang"></category><category term="Compilation"></category><category term="Code generation"></category></entry><entry><title>Updates for building my LLVM &amp; Clang samples for release 3.7</title><link href="https://eli.thegreenplace.net/2015/updates-for-building-my-llvm-clang-samples-for-release-37/" rel="alternate"></link><published>2015-09-06T06:22:00-07:00</published><updated>2022-10-04T14:08:24-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2015-09-06:/2015/updates-for-building-my-llvm-clang-samples-for-release-37/</id><summary type="html">&lt;p&gt;The &lt;tt class="docutils literal"&gt;Makefile&lt;/tt&gt; in my &lt;a class="reference external" href="https://github.com/eliben/llvm-clang-samples"&gt;LLVM &amp;amp; Clang samples project&lt;/a&gt; is using &lt;tt class="docutils literal"&gt;g++&lt;/tt&gt; by default as
the C++ compiler. Even though the officially recommended compiler to build Clang
and LLVM is Clang, &lt;tt class="docutils literal"&gt;g++&lt;/tt&gt; has worked well because not everyone has Clang
installed on their system.&lt;/p&gt;
&lt;p&gt;This is going to change very soon …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The &lt;tt class="docutils literal"&gt;Makefile&lt;/tt&gt; in my &lt;a class="reference external" href="https://github.com/eliben/llvm-clang-samples"&gt;LLVM &amp;amp; Clang samples project&lt;/a&gt; is using &lt;tt class="docutils literal"&gt;g++&lt;/tt&gt; by default as
the C++ compiler. Even though the officially recommended compiler to build Clang
and LLVM is Clang, &lt;tt class="docutils literal"&gt;g++&lt;/tt&gt; has worked well because not everyone has Clang
installed on their system.&lt;/p&gt;
&lt;p&gt;This is going to change very soon, when LLVM and Clang 3.7 are released (any day
now, as the release is in RC2 stage). When you build the samples vs. the
binary of release 3.7, you'll likely run into compile errors about
&lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;-Wcovered-switch-default&lt;/span&gt;&lt;/tt&gt;, and maybe other problems. The reason for this has
to do with the following:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;When LLVM &amp;amp; Clang binary releases are created, they are built with Clang.&lt;/li&gt;
&lt;li&gt;Starting with 3.7, LLVM &amp;amp; Clang are built with the
&lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;-Wcovered-switch-default&lt;/span&gt;&lt;/tt&gt; flag by default.&lt;/li&gt;
&lt;li&gt;This also means that the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;llvm-config&lt;/span&gt;&lt;/tt&gt; utility will emit this flag when
queried for compile flags. And my samples' &lt;tt class="docutils literal"&gt;Makefile&lt;/tt&gt; uses &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;llvm-config&lt;/span&gt;&lt;/tt&gt;.&lt;/li&gt;
&lt;li&gt;GCC &lt;a class="reference external" href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61864"&gt;doesn't yet support -Wcovered-switch-default&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are two possible solutions for this issue. The best one is to just use
Clang to build these samples (and any other code linking with LLVM or Clang
themselves). You don't even have to have Clang installed for this; simply
download &lt;a class="reference external" href="http://llvm.org/releases/"&gt;the latest binary release&lt;/a&gt;, and point
&lt;tt class="docutils literal"&gt;CXX&lt;/tt&gt; to &lt;tt class="docutils literal"&gt;bin/clang++&lt;/tt&gt; in the untarred directory. Everything should work,
since Clang will find GCC's include paths, libstd++ and everything else.&lt;/p&gt;
&lt;p&gt;An alternative approach, if you just &lt;em&gt;must&lt;/em&gt; stick to GCC for some reason, is to
sanitize the flags emitted by &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;llvm-config&lt;/span&gt;&lt;/tt&gt;, removing those that aren't
supported before passing them to the compiler. This means you won't be able to
use my &lt;tt class="docutils literal"&gt;Makefile&lt;/tt&gt; as is, though.&lt;/p&gt;
&lt;p&gt;In the long run, I suggest to just use Clang to build LLVM &amp;amp; Clang. This is what
the LLVM core developers do to test things and build releases, so this is the
way least susceptible to unexpected problems.&lt;/p&gt;
</content><category term="misc"></category><category term="LLVM &amp; Clang"></category></entry><entry><title>Calling back into Python from llvmlite-JITed code</title><link href="https://eli.thegreenplace.net/2015/calling-back-into-python-from-llvmlite-jited-code/" rel="alternate"></link><published>2015-04-18T05:50:00-07:00</published><updated>2025-01-25T13:56:56-08:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2015-04-18:/2015/calling-back-into-python-from-llvmlite-jited-code/</id><summary type="html">&lt;p&gt;&lt;strong&gt;Update (2025-01-25):&lt;/strong&gt; these days &lt;tt class="docutils literal"&gt;llvmlite&lt;/tt&gt; has binary wheels on PyPI.
For self-contained examples (including this post's), check out
&lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2015/llvmlite-samples"&gt;this GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;This post is about a somewhat more interesting and complex use of llvmlite than
the basic example presented in &lt;a class="reference external" href="https://eli.thegreenplace.net/2015/building-and-using-llvmlite-a-basic-example/"&gt;my previous article on the subject&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I see compilation as …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Update (2025-01-25):&lt;/strong&gt; these days &lt;tt class="docutils literal"&gt;llvmlite&lt;/tt&gt; has binary wheels on PyPI.
For self-contained examples (including this post's), check out
&lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2015/llvmlite-samples"&gt;this GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;This post is about a somewhat more interesting and complex use of llvmlite than
the basic example presented in &lt;a class="reference external" href="https://eli.thegreenplace.net/2015/building-and-using-llvmlite-a-basic-example/"&gt;my previous article on the subject&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I see compilation as a meta-tool. It lets us build new levels of
abstraction and expressiveness within our code. We can use it to build
additional languages on top of our host language (common for C, C++ and
Java-based systems, less common for Python), to accelerate some parts of our
host language (more common in Python), or anything in between.&lt;/p&gt;
&lt;p&gt;To fully harness the power of runtime compilation (JITing), however, it's very
useful to know how to bridge the gap between the host language and the JITed
language; preferably in both directions. As the &lt;a class="reference external" href="https://eli.thegreenplace.net/2015/building-and-using-llvmlite-a-basic-example/"&gt;previous article&lt;/a&gt;
shows, calling from the host into the JITed language is trivial. In fact, this
is what JITing is mostly about. But what about the other direction? This is
somewhat more challenging, but leads to interesting uses and additional
capabilities.&lt;/p&gt;
&lt;p&gt;While the post uses llvmlite for the JITing, I believe it presents general
concepts that are relevant for any programming environment.&lt;/p&gt;
&lt;div class="section" id="callback-from-jited-code-to-python"&gt;
&lt;h2&gt;Callback from JITed code to Python&lt;/h2&gt;
&lt;p&gt;Let's start with a simple example: we want to be able to invoke some Python
function from within JITed code.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ctypes&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_void_p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CFUNCTYPE&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sys&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;llvmlite.ir&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;ir&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;llvmlite.binding&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;llvm&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_caller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# define i64 @caller(i64 (i64, i64)* nocapture %f, i64 %i) #0 {&lt;/span&gt;
    &lt;span class="c1"&gt;# entry:&lt;/span&gt;
    &lt;span class="c1"&gt;#   %mul = shl nsw i64 %i, 1&lt;/span&gt;
    &lt;span class="c1"&gt;#   %call = tail call i64 %f(i64 %i, i64 %mul) #1&lt;/span&gt;
    &lt;span class="c1"&gt;#   ret i64 %call&lt;/span&gt;
    &lt;span class="c1"&gt;# }&lt;/span&gt;
    &lt;span class="n"&gt;i64_ty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IntType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# The callback function &amp;#39;caller&amp;#39; accepts is a pointer to FunctionType with&lt;/span&gt;
    &lt;span class="c1"&gt;# the appropriate signature.&lt;/span&gt;
    &lt;span class="n"&gt;cb_func_ptr_ty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;as_pointer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;caller_func_ty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cb_func_ptr_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;caller_func&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;caller_func_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;caller&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;caller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;f&amp;#39;&lt;/span&gt;
    &lt;span class="n"&gt;caller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;i&amp;#39;&lt;/span&gt;
    &lt;span class="n"&gt;irbuilder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IRBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append_basic_block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;mul&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;mul&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;caller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;create_caller&lt;/tt&gt; creates a new LLVM IR function called &lt;tt class="docutils literal"&gt;caller&lt;/tt&gt; and injects
it into the given module &lt;tt class="docutils literal"&gt;m&lt;/tt&gt;.&lt;/p&gt;
&lt;p&gt;If you're not an expert at reading LLVM IR, &lt;tt class="docutils literal"&gt;caller&lt;/tt&gt; is equivalent to this C
function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It takes &lt;tt class="docutils literal"&gt;f&lt;/tt&gt; - a pointer to a function accepting two integers and returning an
integer (all integers in this post are 64-bit), and &lt;tt class="docutils literal"&gt;i&lt;/tt&gt; - an integer. It calls
&lt;tt class="docutils literal"&gt;f&lt;/tt&gt; with &lt;tt class="docutils literal"&gt;i*2&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;i&lt;/tt&gt; as the arguments. That's it - pretty simple, but
sufficient for our demonstration's purposes.&lt;/p&gt;
&lt;p&gt;Now let's define a Python function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;I was called with &lt;/span&gt;&lt;span class="si"&gt;{0}&lt;/span&gt;&lt;span class="s1"&gt; and &lt;/span&gt;&lt;span class="si"&gt;{1}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally, let's see how we can pass &lt;tt class="docutils literal"&gt;myfunc&lt;/tt&gt; as the callback &lt;tt class="docutils literal"&gt;caller&lt;/tt&gt; will
invoke. This is fairly straightforward, thanks to the support for callback
functions in &lt;tt class="docutils literal"&gt;ctypes&lt;/tt&gt;. In fact, it's exactly similar to the way you'd pass
Python callbacks to C code via &lt;tt class="docutils literal"&gt;ctypes&lt;/tt&gt; without any JITing involved:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;module&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;create_caller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize_native_target&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize_native_asmprinter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;llvm_module&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_assembly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;tm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Target&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_default_triple&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_target_machine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Compile the module to machine code using MCJIT.&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_mcjit_compiler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llvm_module&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;finalize_object&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Obtain a pointer to the compiled &amp;#39;caller&amp;#39; - it&amp;#39;s the address of its&lt;/span&gt;
        &lt;span class="c1"&gt;# JITed code in memory.&lt;/span&gt;
        &lt;span class="n"&gt;CBFUNCTY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CFUNCTYPE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cfptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_pointer_to_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llvm_module&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;caller&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;callerfunc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CFUNCTYPE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CBFUNCTY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;cfptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Wrap myfunc in CBFUNCTY and pass it as a callback to caller.&lt;/span&gt;
        &lt;span class="n"&gt;cb_myfunc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CBFUNCTY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Calling &amp;quot;caller&amp;quot;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;callerfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cb_myfunc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;  The result is&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If we run this code, we get the expected result:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;Calling &amp;quot;caller&amp;quot;
I was called with 42 and 84
  The result is 126
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="registering-host-functions-in-jited-code"&gt;
&lt;h2&gt;Registering host functions in JITed code&lt;/h2&gt;
&lt;p&gt;When developing a JIT, one need that comes up very often is to delegate some of
the functionality in the JITed code to the host language. For example, if you're
developing a JIT to implement a fast DSL, you may not want to reimplement a
whole I/O stack in your language. So you'd prefer to delegate all I/O to the
host language. Taking C as a sample host language, you just want to call
&lt;tt class="docutils literal"&gt;printf&lt;/tt&gt; from your DSL and somehow have it routed to the host call.&lt;/p&gt;
&lt;p&gt;How do we accomplish this feat?  The solution here, naturally, depends on both
the host language and the DSL you're JITing. Let's take the LLVM tutorial as an
example. The Kaleidoscope language does computations on floating point numbers,
but it has no I/O facilities of its own. Therefore, the Kaleidoscope compiler
exposes a &lt;tt class="docutils literal"&gt;putchard&lt;/tt&gt; function from the host (C++) to be callable in
Kaleidoscope. For &lt;a class="reference external" href="http://llvm.org/docs/tutorial/LangImpl04.html"&gt;Kaleidoscope this is fairly simple&lt;/a&gt;, because the host is C++ and is
compiled into machine code in the same process with the JITed code. All the
JITed code needs to know is the symbol name of the host function to call and the
call can happen (as long as the calling conventions match, of course).&lt;/p&gt;
&lt;p&gt;Alas, for Python as a host language, things are not so straightforward. This is
why, in my &lt;a class="reference external" href="https://github.com/eliben/pykaleidoscope/blob/master/chapter5.py"&gt;reimplementation of Kaleidoscope with llvmlite&lt;/a&gt;, I resorted
to implementing the builtins in LLVM IR, emitting them into the module
along with compiled Kaleidoscope code. These builtins just call the underlying C
functions (which still reside in the same process, since Python itself is
written in C) and don't call into Python.&lt;/p&gt;
&lt;p&gt;But say we wanted to actually call Python. How would we go about that?&lt;/p&gt;
&lt;p&gt;Well, we've seen a way to call Python from JITed code in this post. Can this
approach be used? Yes, though it's quite cumbersome. The problem is that the
only place where we have an actual interface between Python and the JITed code
is when we invoke a JITed function. Somehow we should use this interface to
convey to the JIT side what Python functions are available to it and how to call
them. Essentially, we'll have to implement something akin to the following
schematic symbol table interface in the JITed code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;typedef&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CallbackType&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;unordered_map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CallbackType&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;symtab&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;register_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CallbackType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;symtab&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="n"&gt;CallbackType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;symtab&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;symtab&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iter&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;second&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To register Python callbacks with the JIT, we'll call &lt;tt class="docutils literal"&gt;register_callback&lt;/tt&gt; from
Python, passing it a name and the callback (&lt;tt class="docutils literal"&gt;CFUNCTYPE&lt;/tt&gt; as shown in the code
sample at the top). The JIT side will remember this mapping in a symbol table.
When it needs to invoke a Python function it will use &lt;tt class="docutils literal"&gt;get_callback&lt;/tt&gt; to
get the pointer by name.&lt;/p&gt;
&lt;p&gt;In addition to being cumbersome to implement &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;, this is also inefficient. It
seems wasteful to go through a symbol table lookup for every call to a Python
builtin. It's not like these mappings ever change in a typical use case! We are
emitting code at runtime here and have so much flexibility at our command - so
this lookup feels like a crutch.&lt;/p&gt;
&lt;p&gt;Moreover, this is a simplified example - every callback takes two integer
arguments. In real scenarios, the signatures of callback functions can be
arbitrary, so we'd have to implement a full blown FFI-dispatching on the calls.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="breaching-the-compile-run-time-barrier"&gt;
&lt;h2&gt;Breaching the compile/run-time barrier&lt;/h2&gt;
&lt;p&gt;We can do better. For every Python function we intend to call from the JITed
code, we can emit a JITed wrapper. This wrapper will hard-code a call to the
Python function, thus removing this dispatching (the symbol table shown above)
from run-time; this totally makes sense because we know at compile time which
Python functions are needed and where to find them.&lt;/p&gt;
&lt;p&gt;Let's write the code to do this with llvmlite:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;ctypes&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ctypes&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_void_p&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sys&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;llvmlite.ir&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;ir&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;llvmlite.binding&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;llvm&lt;/span&gt;

&lt;span class="n"&gt;cb_func_ty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IntType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                             &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IntType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IntType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;cb_func_ptr_ty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cb_func_ty&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;as_pointer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;i64_ty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IntType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_addrcaller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# define i64 @addrcaller(i64 %a, i64 %b) #0 {&lt;/span&gt;
    &lt;span class="c1"&gt;# entry:&lt;/span&gt;
    &lt;span class="c1"&gt;#   %f = inttoptr i64% ADDR to i64 (i64, i64)*&lt;/span&gt;
    &lt;span class="c1"&gt;#   %call = tail call i64 %f(i64 %a, i64 %b)&lt;/span&gt;
    &lt;span class="c1"&gt;#   ret i64 %call&lt;/span&gt;
    &lt;span class="c1"&gt;# }&lt;/span&gt;
    &lt;span class="n"&gt;addrcaller_func_ty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;addrcaller_func&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addrcaller_func_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;addrcaller&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;addrcaller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;addrcaller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;
    &lt;span class="n"&gt;irbuilder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IRBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addrcaller_func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append_basic_block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inttoptr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i64_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                           &lt;span class="n"&gt;cb_func_ptr_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;f&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The IR function created by &lt;tt class="docutils literal"&gt;create_addrcaller&lt;/tt&gt; is somewhat similar to the one
we've seen above with &lt;tt class="docutils literal"&gt;create_caller&lt;/tt&gt;, but there's a subtle difference.
&lt;tt class="docutils literal"&gt;addcaller&lt;/tt&gt; does not take a function pointer at runtime. It has knowledge of
this function pointer encoded into it when it's generated. The &lt;tt class="docutils literal"&gt;addr&lt;/tt&gt; argument
passed into &lt;tt class="docutils literal"&gt;create_addrcaller&lt;/tt&gt; is the runtime address of the function to
call. &lt;tt class="docutils literal"&gt;addrcaller&lt;/tt&gt; converts it to a function pointer (using the &lt;tt class="docutils literal"&gt;inttoptr&lt;/tt&gt;
instruction, which is somewhat similar to a &lt;tt class="docutils literal"&gt;reinterpret_cast&lt;/tt&gt; in C++) and
calls it &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's how to use it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;CBFUNCTY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CFUNCTYPE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;I was called with &lt;/span&gt;&lt;span class="si"&gt;{0}&lt;/span&gt;&lt;span class="s1"&gt; and &lt;/span&gt;&lt;span class="si"&gt;{1}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
    &lt;span class="n"&gt;cb_myfunc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CBFUNCTY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cb_addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cb_myfunc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_void_p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Callback address is 0x&lt;/span&gt;&lt;span class="si"&gt;{0:x}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cb_addr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;module&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;create_addrcaller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cb_addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize_native_target&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize_native_asmprinter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;llvm_module&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_assembly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;tm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Target&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_default_triple&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_target_machine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Compile the module to machine code using MCJIT&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_mcjit_compiler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llvm_module&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;finalize_object&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;# Now call addrcaller&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Calling &amp;quot;addrcaller&amp;quot;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;addrcaller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctypes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CFUNCTYPE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int64&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;
            &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_pointer_to_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llvm_module&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;addrcaller&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;addrcaller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;105&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;  The result is&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The key trick here is the call to &lt;tt class="docutils literal"&gt;ctypes.cast&lt;/tt&gt;. It takes a Python function
wrapped in a &lt;tt class="docutils literal"&gt;ctypes.CFUNCTYPE&lt;/tt&gt; and casts it to a &lt;tt class="docutils literal"&gt;void*&lt;/tt&gt;; in other words,
it obtains its address &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;. This is the address we pass into
&lt;tt class="docutils literal"&gt;create_addrcaller&lt;/tt&gt;. The code ends up having exactly the same effect as the
previous sample, but with an important difference: whereas previously the
dispatch to &lt;tt class="docutils literal"&gt;myfunc&lt;/tt&gt; happened at run-time, here it happens at compile-time.&lt;/p&gt;
&lt;p&gt;This is a synthetic example, but it should be clear how to extend it to the
full thing mentioned earlier: for each built-in needed by the JITed code from
the host code, we emit a JITed wrapper to call it. No symbol table dispatching
at runtime. Even better, since these builtins can have arbitrary signatures,
the JITed wrapper can handle all of that efficiently. PyPy uses this technique
to make calls into C (via the &lt;tt class="docutils literal"&gt;cffi&lt;/tt&gt; library) much more efficient than they
are with &lt;tt class="docutils literal"&gt;ctypes&lt;/tt&gt;. &lt;tt class="docutils literal"&gt;ctypes&lt;/tt&gt; uses libffi, which has to pack all the arguments
to a function at runtime, according to a type signature it was provided.
However, since this type signature almost never changes during the runtime of
one program, this packing can be done much more efficiently with JITing.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Hopefully it's clear that while this article focuses on a very specific
technology (using llvmlite to JIT native code from Python), its principles are
universal. The overarching idea here is that the difference between what happens
when the program is compiled and what happens when it runs is artificial. We can
breach and overlay it in many ways, and use it to build increasingly complex
abstractions. Some languages, like the Lisp family, list this mixture of
compile-time and run-time as one of their unique strengths, and have been
preaching it for decades. I fondly recall my own first real-world use of this
technique &lt;a class="reference external" href="https://eli.thegreenplace.net/2005/09/04/cool-hack-creating-custom-subroutines-on-the-fly-in-perl"&gt;many years ago&lt;/a&gt;
- reading a configuration file and generating code at runtime that unpacks data
based on that configuration. That task, emitting Perl code from Perl according
to a XML config may appear worlds away from the topic of this post - emitting
LLVM IR from Python according to a function signature, but if you really think
about it, it's exactly the same thing.&lt;/p&gt;
&lt;p&gt;I suspect this is one of the most obtuse articles I've written lately; if you
read this far, I sure hope you found it interesting and helpful. Let me know in
the comments if anything isn't clear or if you have relevant ideas - I love
discussing this topic!&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;We'll have to compile the equivalent of a hash table implementation into
our JITed code. While not impossible, this may be an overkill if you
really just want a quick-and-simple DSL.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;This delightful mixture of compile-time and run-time is by far the most
important part of this article; if you remember just one thing from here,
this should be it. Let me know in the comments if it's not clear.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;The concept of &amp;quot;address&amp;quot; for a Python function may raise an eyebrow. Keep
in mind that this isn't a pure Python function we're talking about here.
It's wrapped in a &lt;tt class="docutils literal"&gt;ctypes.CFUNCTYPE&lt;/tt&gt;, which is a dispatcher created by
&lt;tt class="docutils literal"&gt;ctypes&lt;/tt&gt; (&amp;quot;thunk&amp;quot; in the nomenclature of libffi, the underlying
mechanism behind &lt;tt class="docutils literal"&gt;ctypes&lt;/tt&gt;) to perform argument conversion and make the
actual call.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="LLVM &amp; Clang"></category><category term="Python"></category><category term="Compilation"></category></entry><entry><title>Python version of the LLVM tutorial</title><link href="https://eli.thegreenplace.net/2015/python-version-of-the-llvm-tutorial/" rel="alternate"></link><published>2015-02-08T15:28:00-08:00</published><updated>2023-02-04T13:41:52-08:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2015-02-08:/2015/python-version-of-the-llvm-tutorial/</id><summary type="html">&lt;p&gt;The &lt;a class="reference external" href="http://llvm.org/docs/tutorial/"&gt;LLVM tutorial&lt;/a&gt; is a venerable and
important part of the project's documentation. It's been there for as long as
I've been using LLVM (and according to the logs, a few years before that),
almost always the first resource newcomers to the project are pointed to. It
strikes just the …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The &lt;a class="reference external" href="http://llvm.org/docs/tutorial/"&gt;LLVM tutorial&lt;/a&gt; is a venerable and
important part of the project's documentation. It's been there for as long as
I've been using LLVM (and according to the logs, a few years before that),
almost always the first resource newcomers to the project are pointed to. It
strikes just the right balance between simplicity and interesting content to
provide an enticing introduction to LLVM. For a motivated reader, it shouldn't
take more than a work day or two to go through it from start to finish, building
a full compiler for a simple but &amp;quot;real&amp;quot; programming language in the process; how
cool is that?&lt;/p&gt;
&lt;p&gt;Anyway, it occurred to me that since the &amp;quot;official&amp;quot; version of the tutorial is
in C++ and the only alternative checked into the tree is in OCaml, it may be
interesting to re-implement the tutorial in Python. While I wouldn't write an
industrial strength compiler in Python, it's a great prototyping platform, and
when thinking about compilers and languages in general, prototyping is very
imporatnt. You want to try all kinds of possibilities and combinations of
features quickly, to get a feel for writing code in the language before it's
fully done - and Python (with LLVM) is great for that.&lt;/p&gt;
&lt;p&gt;Enter &lt;a class="reference external" href="https://github.com/eliben/pykaleidoscope/"&gt;Pykaleidoscope&lt;/a&gt;, a project I
put on GitHub that follows the steps of the official LLVM tutorial, but
implementing the Kaleidoscope compiler in Python, using &lt;a class="reference external" href="https://github.com/numba/llvmlite"&gt;llvmlite&lt;/a&gt; as the binding to LLVM.&lt;/p&gt;
&lt;p&gt;Installing llvmlite is fairly easy - see &lt;a class="reference external" href="https://eli.thegreenplace.net/2015/building-and-using-llvmlite-a-basic-example/"&gt;this post&lt;/a&gt;
if you have any issues.&lt;/p&gt;
&lt;p&gt;While working on Pykaleidoscope, I was impressed with llvmlite's maturity and
compatibility with the C++ LLVM IR APIs. I didn't run into any significant
problems, except maybe lack of documentation. But documentation isn't a strong
side of LLVM either, which is one of the problems the tutorial helps with. So I
hope this Python version will help folks understand how to use llvmlite to build
non-trivial LLVM IR in Python.&lt;/p&gt;
</content><category term="misc"></category><category term="LLVM &amp; Clang"></category><category term="Python"></category><category term="Compilation"></category></entry><entry><title>Building and using llvmlite - a basic example</title><link href="https://eli.thegreenplace.net/2015/building-and-using-llvmlite-a-basic-example/" rel="alternate"></link><published>2015-01-26T05:03:00-08:00</published><updated>2025-01-25T13:56:56-08:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2015-01-26:/2015/building-and-using-llvmlite-a-basic-example/</id><summary type="html">&lt;p&gt;&lt;strong&gt;Update (2025-01-25):&lt;/strong&gt; these days &lt;tt class="docutils literal"&gt;llvmlite&lt;/tt&gt; has binary wheels on PyPI.
For self-contained examples (including this post's), check out
&lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2015/llvmlite-samples"&gt;this GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;A while ago I wrote a &lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/10/building-and-using-llvmpy-a-basic-example"&gt;short post about employing llvmpy&lt;/a&gt;
to invoke LLVM from within Python. Today I want to demonstrate an alternative
technique, using a new library …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Update (2025-01-25):&lt;/strong&gt; these days &lt;tt class="docutils literal"&gt;llvmlite&lt;/tt&gt; has binary wheels on PyPI.
For self-contained examples (including this post's), check out
&lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2015/llvmlite-samples"&gt;this GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;A while ago I wrote a &lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/10/building-and-using-llvmpy-a-basic-example"&gt;short post about employing llvmpy&lt;/a&gt;
to invoke LLVM from within Python. Today I want to demonstrate an alternative
technique, using a new library called &lt;a class="reference external" href="https://github.com/numba/llvmlite"&gt;llvmlite&lt;/a&gt;. llvmlite was created last year by the
developers of Numba (a JIT compiler for scientific Python), and just recently
replaced llvmpy as their bridge to LLVM. Since the Numba devs were also the most
active maintainers of llvmpy in the past couple of years, I think llvmlite is
definitily worth paying attention to.&lt;/p&gt;
&lt;p&gt;One of the reasons the Numba devs decided to ditch llvmpy in favor of a new
approach is the biggest issue heavy users of LLVM as a library have - its
incredible rate of API-breaking change. The LLVM C++ API is notoriously unstable
and will remain this way for the foreseeable future. This leaves library users
and all kinds of language bindings (like llvmpy) in a constant chase after the
latest LLVM release, if they want to benefit from improved optimizations, new
targets and so on. The Numba developers felt this while maintaining llvmpy and
decided on an alternative approach that will be easier to keep stable going
forward. In addition, llvmpy's architecture made it slow for Numba's users -
llvmlite fixes this as well.&lt;/p&gt;
&lt;p&gt;The main idea is - use the LLVM C API as much as possible. Unlike the core C++
API, the C API is meant for facing external users, and is kept relatively
stable. This is what llvmlite does, but with one twist. Since building the IR
using repeated FFI calls to LLVM proved to be slow and error-prone in llvmpy,
llvmlite re-implemented the IR builder in pure Python. Once the IR is built, its
textual representation is passed into the LLVM IR parser. This also reduces the
&amp;quot;API surface&amp;quot; llvmlite uses, since the textual representation of LLVM IR is one
of the more stable things in LLVM.&lt;/p&gt;
&lt;p&gt;I found llvmlite pretty easy to build and use on Linux (though it's portable to
OS X and Windows as well). Since there's not much documentation yet, I thought
this post may be useful for others who wish to get started.&lt;/p&gt;
&lt;p&gt;After cloning the &lt;a class="reference external" href="https://github.com/numba/llvmlite"&gt;llvmlite repo&lt;/a&gt;, I
downloaded the &lt;a class="reference external" href="http://llvm.org/releases/"&gt;binary release of LLVM 3.5&lt;/a&gt; -
pre-built binaries for Ubuntu 14.04 mean there's no need to compile LLVM
itself. Note that I didn't &lt;em&gt;install&lt;/em&gt; LLVM, just downloaded and untarred it.&lt;/p&gt;
&lt;p&gt;Next, I had to install the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;libedit-dev&lt;/span&gt;&lt;/tt&gt; package with &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;apt-get&lt;/span&gt;&lt;/tt&gt;, since it's
required while building llvmlite. Depending on what you have lying around on
your machine, you may need to install some additional &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;-dev&lt;/span&gt;&lt;/tt&gt; packages.&lt;/p&gt;
&lt;p&gt;Now, time to build llvmlite. I chose to use Python 3.4, but any modern version
should work (for versions below 3.4 llvmlite currently requires the &lt;tt class="docutils literal"&gt;enum34&lt;/tt&gt;
package). LLVM has a great tool named &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;llvm-config&lt;/span&gt;&lt;/tt&gt; in its binary image, and
the &lt;tt class="docutils literal"&gt;Makefile&lt;/tt&gt; in llvmlite uses it, which means building llvmlite with any
version of LLVM I want is just a simple matter of running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ LLVM_CONFIG=&amp;lt;path/to/llvm-config&amp;gt; python3.4 setup.py build
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This compiles the C/C++ parts of llvmlite and links them statically to LLVM.
Now, you're ready to use &lt;tt class="docutils literal"&gt;llvmlite&lt;/tt&gt;. Again, I prefer not to install things
unless I really have to, so the following script can be run with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ PYTHONPATH=$PYTHONPATH:&amp;lt;path/to/llvmlite&amp;gt; python3.4 basic_sum.py
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Replace the path with your own, or just install llvmlite into some
&lt;tt class="docutils literal"&gt;virtualenv&lt;/tt&gt;.&lt;/p&gt;
&lt;p&gt;And the sample code does the same as the &lt;a class="reference external" href="https://eli.thegreenplace.net/2012/08/10/building-and-using-llvmpy-a-basic-example"&gt;previous post&lt;/a&gt; -
creates a function that adds two numbers, and JITs it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ctypes&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CFUNCTYPE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sys&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;llvmlite.ir&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;ll&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;llvmlite.binding&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;llvm&lt;/span&gt;

&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize_native_target&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;initialize_native_asmprinter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Create a new module with a function implementing this:&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# int sum(int a, int b) {&lt;/span&gt;
&lt;span class="c1"&gt;#   return a + b;&lt;/span&gt;
&lt;span class="c1"&gt;# }&lt;/span&gt;
&lt;span class="n"&gt;module&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;func_ty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FunctionType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IntType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IntType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IntType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func_ty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;sum&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;

&lt;span class="n"&gt;bb_entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append_basic_block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;entry&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;irbuilder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IRBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bb_entry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tmp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;ret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;irbuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;=== LLVM IR&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Convert textual LLVM IR into in-memory representation.&lt;/span&gt;
&lt;span class="n"&gt;llvm_module&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_assembly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;tm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Target&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_default_triple&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_target_machine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Compile the module to machine code using MCJIT&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;llvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_mcjit_compiler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llvm_module&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;finalize_object&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;=== Assembly&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emit_assembly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llvm_module&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Obtain a pointer to the compiled &amp;#39;sum&amp;#39; - it&amp;#39;s the address of its JITed&lt;/span&gt;
    &lt;span class="c1"&gt;# code in memory.&lt;/span&gt;
    &lt;span class="n"&gt;cfptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ee&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_pointer_to_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llvm_module&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;sum&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# To convert an address to an actual callable thing we have to use&lt;/span&gt;
    &lt;span class="c1"&gt;# CFUNCTYPE, and specify the arguments &amp;amp; return type.&lt;/span&gt;
    &lt;span class="n"&gt;cfunc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CFUNCTYPE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c_int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c_int&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;cfptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Now &amp;#39;cfunc&amp;#39; is an actual callable we can invoke&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;The result is&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This should print the LLVM IR for the function we built, its assembly as
produced by LLVM's JIT compiler, and the result 59.&lt;/p&gt;
&lt;p&gt;Compared to llvmpy, llvmlite now seems like the future, mostly due to the
maintenance situation. llvmpy is only known to work with LLVM up to 3.3, which
is already a year and half old by now. Having just been kicked out of Numba,
there's a good chance it will fall further behind. llvmlite, on the other hand,
is very actively developed and keeps track with the latest stable LLVM release.
Also, it's architectured in a way that should make it significantly easier to
keep up with LLVM in the future. Unfortunately, as far as uses outside of Numba
go, llvmlite is still rough around the edges, especially w.r.t. documentation
and examples. But the llvmlite developers appear keen on making it useful in a
more general setting and not just for Numba, so that's a good sign.&lt;/p&gt;
</content><category term="misc"></category><category term="LLVM &amp; Clang"></category><category term="Python"></category><category term="Compilation"></category></entry></feed>