Writing multi-package analysis tools for Go

In my posts about embedding in Go last month, I provided multiple examples of different kinds of embeddings from the Go standard library. How did I find these examples?

I wish I could say it all comes from a deep familiarity with the breadth and depth of the standard library; instead, I combined the programming virtues of laziness and impatience and wrote a tool that found these examples for me.

In this post, I'm going to describe this tool and how you may go about writing such tools of your own to analyze real-world Go codebases to glean any insights you may be interested in.

The task

Let's start by describing the requirement: we're interested in finding all instances of embeddings in Go code, and moreover - we'd like to know what kinds of embeddings they are and call it out in some way; i.e. distinguish interface-in-interface embeddings from struct-in-struct embeddings, and so on.

I wrote earlier about the various compilation steps Go source code goes through. Many of these are available for Go tool writers as well, and it's worth spending a bit of time thinking about the level of information we need for our tool. For a deeper exploration of what it takes to analyze Go source code, I highly recommend reading this document.

Just parsing the Go source code of a project won't do, because we'll need type information. Take this example struct from part 3 of the embedding post:

type StatsConn struct {
  net.Conn

  BytesRead uint64
}

We can figure out that net.Conn is an embedding from parsing this code and looking at the AST. But what kind of embedding is it? Is net.Conn an interface or a struct? For this, we'll have to run the AST through Go type checking; moreover, in the general case this ought to be cross-package, or even cross-module type checking because the embedded type net.Conn could be defined in a different package or module. Therefore, our tool should be able to perform cross-module type checking. If this sounds tricky, that's because it is! But worry not, Go has just the package to help us.

x/tools/go/packages

Enter x/tools/go/packages, which I'll refer to as XTGP from this point on. This package is a one-stop-shop for loading Go packages for analysis. It does all the heavy lifting for tool writers, leaving us with just the "business logic" of the tool to write - the analysis itself. For a given package, XTGP will:

Parse and type check the package, providing access to the AST and full type information.
Optionally load all of the package's dependencies, type checking them as well.

XTGP is the newest (2018) in a sequence of similar packages, and has by now replaced the other approaches as the "one true recommended way" for multi-package analysis. It's also used as the basis for x/tools/go/analysis, upon which tools like go vet are now built. In this post I'll show how to write my tool using both "vanilla" XTGP and the go/analysis framework.

Finding embeddings

It's time to show the code of the "find embeddings" tool. The full source code is available on GitHub. We'll start with the setup for configuring XTGP:

import "golang.org/x/tools/go/packages"

const mode packages.LoadMode = packages.NeedName |
  packages.NeedTypes |
  packages.NeedSyntax |
  packages.NeedTypesInfo

func main() {
  flag.Usage = func() {
    out := flag.CommandLine.Output()
    fmt.Fprintln(out, "usage: find-embeddings [options] <module dir>\n")
    fmt.Fprintln(out, "Options:")
    flag.PrintDefaults()
  }

  pattern := flag.String("pattern", "./...", "Go package pattern")
  flag.Parse()
  if flag.NArg() != 1 {
    log.Fatal("Expecting a single argument: directory of module")
  }

  var fset = token.NewFileSet()
  cfg := &packages.Config{Fset: fset, Mode: mode, Dir: flag.Args()[0]}
  pkgs, err := packages.Load(cfg, *pattern)
  if err != nil {
    log.Fatal(err)
  }

  for _, pkg := range pkgs {
    findInPackage(pkg, fset)
  }
}

The main entry point to XTGP is packages.Load, which takes a packages.Config object for configuration. The most important field to pay attention to is Mode, which specifies what XTGP should load. It's tempting to just ask for "everything", but this isn't necessarily the best approach in the general case, as it may take quite a while for large projects. For example, in our case we don't need NeetImports | NeedDeps, which would bring in the type-checked ASTs of all the transitive dependencies of our code. This is an expensive operation, as you can imagine! All we need for our tool is to look at dependencies sufficiently to glean the type information of their exported types; luckily, in Go this information is available cheaply (to support Go's famously fast parallel compilation process).

Once we have the packages loaded, we get a slice of packages.Package values, through which we can perform our analysis. We invoke findInPackage for each such package.

func findInPackage(pkg *packages.Package, fset *token.FileSet) {
  for _, fileAst := range pkg.Syntax {
    ast.Inspect(fileAst, func(n ast.Node) bool {
      if structTy, ok := n.(*ast.StructType); ok {
        findInFields(structTy.Fields, n, pkg.TypesInfo, fset)
      } else if interfaceTy, ok := n.(*ast.InterfaceType); ok {
        findInFields(interfaceTy.Methods, n, pkg.TypesInfo, fset)
      }

      return true
    })
  }
}

This function has two important tasks:

Invoke ast.Inspect to run a visitor function on every AST node in the package. Our visitor focuses on either an *ast.StructType or *ast.InterfaceType to look deeper into struct/interface declarations.
Deal with a difference in how struct vs. interface fields are accessed (Fields field for *ast.StructType, Methods field for *ast.InterfaceType).

Let's move on to findInFields:

func findInFields(fl *ast.FieldList, n ast.Node, tinfo *types.Info, fset *token.FileSet) {
  type FieldReport struct {
    Name string
    Kind string
    Type types.Type
  }
  var reps []FieldReport

  for _, field := range fl.List {
    if field.Names == nil {
      tv, ok := tinfo.Types[field.Type]
      if !ok {
        log.Fatal("not found", field.Type)
      }

      embName := fmt.Sprintf("%v", field.Type)

      _, hostIsStruct := n.(*ast.StructType)
      var kind string

      switch typ := tv.Type.Underlying().(type) {
      case *types.Struct:
        if hostIsStruct {
          kind = "struct (s@s)"
        } else {
          kind = "struct (s@i)"
        }
        reps = append(reps, FieldReport{embName, kind, typ})
      case *types.Interface:
        if hostIsStruct {
          kind = "interface (i@s)"
        } else {
          kind = "interface (i@i)"
        }
        reps = append(reps, FieldReport{embName, kind, typ})
      default:
      }
    }
  }

  if len(reps) > 0 {
    fmt.Printf("Found at %v\n%v\n", fset.Position(n.Pos()), nodeString(n, fset))

    for _, report := range reps {
      fmt.Printf("--> field '%s' is embedded %s: %s\n", report.Name, report.Kind, report.Type)
    }
    fmt.Println("")
  }
}

This function is conceptually simple; it iterates over a slice of fields, focusing only on fields that are unnamed (i.e. embedded). For each field, it looks at its underlying type [1] and its kind - is it a struct type, or an interface type? This is where inter-package type analysis is critical, because in the general case we have no way of knowing the type of fields without understanding the types imported from other packages.

This is it! There's a bit of extra logic in findInFields to collect all embedded fields of a given struct/interface into a single place, but otherwise it does what we need - including distinguishing between the kinds of embedding. This simple tool can now be run on the Go standard library or real-world large projects (like k8s or hugo) and report all the embeddings found therein.

Finding embeddings using `go/analysis`

The example shown above uses the "raw" XTGP API to load packages. An alternative approach is to use the go/analysis framework, which saves us from some of the boilerplate:

import "golang.org/x/tools/go/analysis"
import "golang.org/x/tools/go/analysis/singlechecker"

var EmbedAnalysis = &analysis.Analyzer{
  Name: "embedanalysis",
  Doc:  "reports embeddings",
  Run:  run,
}

func main() {
  singlechecker.Main(EmbedAnalysis)
}

func run(pass *analysis.Pass) (interface{}, error) {
  for _, file := range pass.Files {
    ast.Inspect(file, func(n ast.Node) bool {
      if structTy, ok := n.(*ast.StructType); ok {
        findInFields(structTy.Fields, n, pass.TypesInfo, pass.Fset)
      } else if interfaceTy, ok := n.(*ast.InterfaceType); ok {
        findInFields(interfaceTy.Methods, n, pass.TypesInfo, pass.Fset)
      }

      return true
    })
  }

  return nil, nil
}

Note how short the main function becomes; by delegating to the go/analysis framework, we no longer need to explicitly initialize go/packages or handle command-line flags. The singlechecker helper from go/analysis does this for us.

The rest of the code is very similar to the previous sample. run is the moral equivalent of findInPackage and does pretty much the same work, except that it has to operate on pass.Files instead of pkg.Syntax. It invokes findInFields for every struct or interface, and this function is exactly the same as shown above.

Conclusion

Given the two slightly different approaches to achieve the same goal, which one should you choose?

Pros of using XTGP directly:

More flexibility in how go/packages is configured and how the command-line interface (flags, etc.) for the tool is defined.
Less magic and fewer black boxes in the process.

Pros of using go/analysis:

Slightly less code to write; if you're writing many different analyses, this may add up.
Interoperability with other analyses; there's a rich set of analysis passes available for use, and go/analysis makes it easy to chain passes and pass information between them.

Whichever way you choose, it's comforting to know that Go has powerful tooling support that makes it relatively easy to write tools to analyze whole codebases. This tooling framework handles to most tedious part of tool-writing: figuring out how the project is assembled from multiple modules and packages [2]. It does the heavy lifting, leaving the tool writer with only the "business logic" of the analysis to implement.

For writing the business logic, we have a fully type-checked AST at our disposal. ASTs are the starting point for most real-world compilers, and if you need some specialized IR - this can typically be constructed from a type-checked AST. For example, if your analysis needs the program in SSA form you can use x/tools/go/ssa to create SSA straight from the type-checked packages XTGP returns. But... I'm getting carried away here, as this is a topic for another time.

Happy tool-writing!

[1]	The concept of underlying types help us see through named types or aliases. For example if we have `var k Foo` and elsewhere `type Foo int`, then we know that the underlying type of `k` is `int`.

[2]	In the C++ world this is similar to compilation databases.

The task

x/tools/go/packages

Finding embeddings

Finding embeddings using go/analysis

Conclusion

Finding embeddings using `go/analysis`