Clang tooling sees lots of interest and development focus in the past few years. At last, we have a convenient, accurate, open-source and well supported framework for programmatically analyzing and refactoring C++ code; I find this very exciting.

A great outcome of this rapid pace of development is that new APIs and tools spring up all the time. For example, some time ago the Clang tooling developers figured out folks doing AST traversals have to write a lot of repetitive code to find interesting AST nodes, so they came up with a great new API called AST matchers, which I want to discuss here.

Visitors vs. matchers

Here's a motivating example. Suppose we're looking for pointer-typed variables being used in if comparisons. To make this more specific, let's say we're looking for cases where the pointer-typed variable is on the left-hand-side of an equality comparison (==). To find such nodes in a recursive visitor, we'd have to write something like this:

bool VisitIfStmt(IfStmt *s) {
  if (const BinaryOperator *BinOP =
          llvm::dyn_cast<BinaryOperator>(s->getCond())) {
    if (BinOP->getOpcode() == BO_EQ) {
      const Expr *LHS = BinOP->getLHS();
      if (const ImplicitCastExpr *Cast =
              llvm::dyn_cast<ImplicitCastExpr>(LHS)) {
        LHS = Cast->getSubExpr();
      }

      if (const DeclRefExpr *DeclRef = llvm::dyn_cast<DeclRefExpr>(LHS)) {
        if (const VarDecl *Var =
                llvm::dyn_cast<VarDecl>(DeclRef->getDecl())) {
          if (Var->getType()->isPointerType()) {
            Var->dump();  // YAY found it!!
          }
        }
      }
    }
  }
  return true;
}

This is quite a bit of code, but nothing out of the ordinary if you've been working with Clang ASTs for a while. Perhaps it can be golfed down into a somewhat shorter form, but the main problem is that to write this one has to go through quite a bit of documentation and header files to figure out which methods to call and what kinds of objects they return.

Here's the equivalent AST matcher:

Finder.addMatcher(
    ifStmt(hasCondition(binaryOperator(
        hasOperatorName("=="),
        hasLHS(ignoringParenImpCasts(declRefExpr(
            to(varDecl(hasType(pointsTo(AnyType))).bind("lhs")))))))),
    &HandlerForIf);

Some difference, right? The declarative nature of matcher definitions makes this very natural to read and to map to the actual problem. HandlerForIf is a MatchCallback object that has direct access to the bound nodes of the matcher:

struct IfStmtHandler : public MatchFinder::MatchCallback {
  virtual void run(const MatchFinder::MatchResult &Result) {
    const VarDecl *lhs = Result.Nodes.getNodeAs<VarDecl>("lhs");
    lhs->dump();   // YAY found it!!
  }
};

There's actually quite a bit of documentation available about AST matchers on the official Clang website. For a complete example that can be built outside of the LLVM tree, I redid the tooling sample from the previous article, now with AST matchers (all available in the llvm-clang-samples repository).

Using clang-query to test matchers and explore the AST

An interesting new tool in Clang-land is clang-query. It's an interactive evaluator for AST matchers that can be used both to test your matchers and do some programmatic exploration of the AST. Let's say we want to develop an AST matcher for pointer comparisons, similar to the one shown above. Here's a sample input file we're going to work on:

int foo(int* p, int v) {
  if (p == 0) {
    return v + 1;
  } else {
    return v - 1;
  }
}

Let's fire up clang-query and see what it can do:

$ clang-query /tmp/iflhsptr.c --
clang-query> set output diag
clang-query> match functionDecl()

Match #1:

/tmp/iflhsptr.c:1:1: note: "root" binds here
int foo(int* p, int v) {
^~~~~~~~~~~~~~~~~~~~~~~~
1 match.

This is a basic smoke test to see how it matches the function declaration. The output mode set in the first command could also ask the tool to dump or print the AST, but for our purpose the diagnostic output is convenient.

Here's how we can match deeper nodes and bind them:

clang-query> match ifStmt(hasCondition(binaryOperator(hasOperatorName("==")).bind("op")))

Match #1:

/tmp/iflhsptr.c:2:7: note: "op" binds here
  if (p == 0) {
      ^~~~~~
/tmp/iflhsptr.c:2:3: note: "root" binds here
  if (p == 0) {
  ^~~~~~~~~~~~~
1 match.

If we intend to provide our own bindings, the root binding can be turned off:

clang-query> set bind-root false

Let's see multiple matches:

clang-query> match varDecl().bind("var")

Match #1:

/tmp/iflhsptr.c:1:9: note: "var" binds here
int foo(int* p, int v) {
        ^~~~~~

Match #2:

/tmp/iflhsptr.c:1:17: note: "var" binds here
int foo(int* p, int v) {
                ^~~~~
2 matches.

At this point I'll stop because long matchers don't format conveniently in a blog post, but I'm sure you got the idea. It's very obvious how helpful this tool can be with developing matchers. It's still new and has some rough edges, but is already quite useful.

Refactoring tools and replacements

With the growing usage of libTooling, it's hardly surprising that its developers keep coming up with higher levels of abstraction that help writing new tools with less and less effort. The AST matchers framework presented above is one example. Another is RefactoringTool, a subclass of ClangTool that makes it possible to craft new tools with very little code. I'll show an example soon, but first a word about replacements.

The tools I was demonstrating so far used a Rewriter to change the underlying source code in response to finding interesting things in the AST. This is a good approach, but it has a problem scaling for large projects. Imagine running a tool over a large project with many source files and many header files. Some of rewritings may need to happen in the header files, but how to manage that, given that the same headers get included into multiple translation units? Some edits may end up being duplicated or even conflicting, and that's a problem.

Replacements are the solution. The source transformation task is divided into two distinct steps:

  1. Custom tools go through the source base, finding the refactoring patterns to apply, and generating serialized replacements into files. Think of replacements as something like patch files (precise directions of how to modify a source file), but in a somewhat friendlier format.
  2. clang-apply-replacements can then run with access to all replacements, perform the required de-duplication and conflict resolution and actually apply the changes to the source.

This approach also allows nice parallelization of the refactoring over huge code-bases, though there aren't many projects and companies in the world with source code large enough to make this a real problem.

Back to an example then. I took the simple sample tool from the previous article (just finding interesting if nodes and adding some comments into them) and rewrote it once again, using RefactoringTool and replacements. The full code is available in the samples project, but it's so short that I can show most of it here.

Here's the complete main function. For ease of hacking it just dumps the replacements to stdout instead of serializing or applying them:

int main(int argc, const char **argv) {
  CommonOptionsParser op(argc, argv, ToolingSampleCategory);
  RefactoringTool Tool(op.getCompilations(), op.getSourcePathList());

  // Set up AST matcher callbacks.
  IfStmtHandler HandlerForIf(&Tool.getReplacements());

  MatchFinder Finder;
  Finder.addMatcher(ifStmt().bind("ifStmt"), &HandlerForIf);

  // Run the tool and collect a list of replacements. We could call
  // runAndSave, which would destructively overwrite the files with
  // their new contents. However, for demonstration purposes it's
  // interesting to show the replacements.
  if (int Result = Tool.run(newFrontendActionFactory(&Finder).get())) {
    return Result;
  }

  llvm::outs() << "Replacements collected by the tool:\n";
  for (auto &r : Tool.getReplacements()) {
    llvm::outs() << r.toString() << "\n";
  }

  return 0;
}

IfStmtHandler is just a MatchCallback that is triggered on if statements:

class IfStmtHandler : public MatchFinder::MatchCallback {
public:
  IfStmtHandler(Replacements *Replace) : Replace(Replace) {}

  virtual void run(const MatchFinder::MatchResult &Result) {
    // The matched 'if' statement was bound to 'ifStmt'.
    if (const IfStmt *IfS =
          Result.Nodes.getNodeAs<clang::IfStmt>("ifStmt")) {
      const Stmt *Then = IfS->getThen();
      Replacement Rep(*(Result.SourceManager), Then->getLocStart(), 0,
                      "// the 'if' part\n");
      Replace->insert(Rep);

      if (const Stmt *Else = IfS->getElse()) {
        Replacement Rep(*(Result.SourceManager), Else->getLocStart(), 0,
                        "// the 'else' part\n");
        Replace->insert(Rep);
      }
    }
  }

private:
  Replacements *Replace;
};

Note how little boilerplate this code contains. The tool is set up in just a handful of lines of code, and most of my code deals with the actual refactoring at hand. This definitely makes writing tools quicker and easier than ever before.