Samples for using LLVM and Clang as a library

My llvm-clang-samples repository has been public for over a year, and has become quite popular recently. I figured it's about time I write a quick blog post explaining how it came to be and what the principles behind it are.

One on the biggest selling points of LLVM and Clang is that they're packaged as libraries with a rich C++ API (and also C APIs), and thus can be easily embedded in larger applications. However, if you look online for samples of making this embedding happen, you'll start noticing two fairly big problems with most of the code you find:

All official LLVM tutorials (and many of the samples online) talk about building your project inside the LLVM tree, using LLVM's own build system. Actually, LLVM has two official build systems (one based on autotools and another on CMake), so the samples will be further fragmented between these. While building within the LLVM tree if fine for experimenting, it won't work if you want to integrate LLVM as a library into a parent project.
LLVM's and Clang's C++ API is changing constantly; C++ API stability is not a design goal of the LLVM community (one could argue that instability is a design goal). Therefore, if you find some code a few months after it was posted online, there's a very good chance that it won't compile or run. Code from a couple of years ago? Forget about it.

A few years ago, when I was getting started with LLVM, I was also frustrated by these problems. So I rolled my sleeves and banged out a simple Makefile that made it possible to build a few samples out of the LLVM tree, and then industriously kept it up to date with LLVM and Clang changes. I had it in my private code coffers for a while, but last year figured it could be useful to others, so I published it in a public GitHub repository.

The idea of llvm-clang-samples is very simple - it's just a bunch of self-contained programs using LLVM or Clang as libraries, centered around the Makefile, which dictates how to build these programs vs. a built version of LLVM & Clang itself. I chose a Makefile since it's the lowest common denominator of build systems - my Makefile is purposefully very simple and linear - think of it as a shell script with some automatic dependency management thrown in.

With simple configuration, this Makefile can build programs vs. either a built source checkout of LLVM, or released binaries (so compiling LLVM itself is not really required). There's also a suite of tests I run to make sure that the samples are not only built correctly, but also run correctly and keep producing expected results.

The samples themselves cover a wide range of LLVM & Clang uses. There are standalone programs using LLVM as a library to process LLVM IR. There's a sample of building a dynamically-linked pass that can be loaded as a plugin with opt. There are samples of Clang tooling, a Clang plugin, and so on.

How do I keep the repository up-to-date, though? There are two paths. First, every time there is a new official LLVM release (this happens about twice a year), I make sure the samples build and work fine with it, and create a new branch. Forever after, checking this branch out will give you the repository in a state that works with the relevant released version. This is very useful because for most users, the bleeding edge is not required and they can do just fine with the latest released version. Moreover, if there's a need to work with an even older release, the repository already has some history, going back to LLVM 3.3 (released in June 2013).

The master branch of the repository is kept in sync with LLVM by me manually, and the "last known good LLVM revision" it works against appears on the main README file. I usually try to refresh it every week or two. It's very rare for it to fall more than a few weeks behind. And of course, if you find it did fall behind, don't hesitate to open an issue (or better yet, create a pull request) - I usually get to these fairly quickly.