Tags Go , Haskell

Algebraic data types (also known as variant types, sum types or discriminated unions) is a neat feature of some programming languages that lets us specify that a value might take one of several related types, and includes convenient syntax for pattern matching on these types at run-time. Here's a canonical binary tree example from Wikipedia, written in Haskell:

data Tree = Empty
          | Leaf Int
          | Node Tree Tree

A Tree can be either empty, or a leaf with one integer field, or a node with two Tree fields - its left and right children. Here's a function that finds the depth of such trees:

depth :: Tree -> Int
depth Empty = 0
depth (Leaf n) = 1
depth (Node l r) = 1 + max (depth l) (depth r)

It takes a Tree and returns an integer, and its workings are laid out by cases, depending on the run-time type of the parameter [1]. If we pass anything that's not a Tree into depth, we'll get an error. Very concise and elegant!

"Why doesn't Go have algebraic data types" is a commonly asked question, and there's a FAQ entry about it. Even more details can be found in a Reddit AMA session the Go developers did in 2016 and in this proposal issue. The short version of the answer is that it isn't clear how this feature would interact with interfaces; how would one variant type discriminate between multiple interfaces that may have the same methods, etc. Indeed, it's a tricky issue and no one has found a satisfactory solution yet.

Another common answer is that Go already supports very similar functionality via a combination of interfaces and run-time type switches. In this post I want to show how it can be done, with some examples from synthetic to "real life". The full code for the samples in this post is available here.

Here's the same binary tree type, written in Go [2]:

type Tree interface {
}

type Empty struct {
}

type Leaf struct {
    v int
}

type Node struct {
    left, right Tree
}

Tree is an interface; Empty, Leaf and Node implement the interface. Note that the Tree interface is empty, so any type implements it, even the built-in string. We can easily restrict this by providing a method that would only be implemented by the interesting types:

type Tree interface {
    isTree()
}

And we'll have to add empty imlementations for our types:

func (_ Leaf) isTree()  {}
func (_ Node) isTree()  {}
func (_ Empty) isTree() {}

Now only types with a isTree method will be allowed where the Tree interface is expected.

The depth function employs a type switch to do run-time type dispatching:

func depth(t Tree) int {
    switch nt := t.(type) {
    case Empty:
        return 0
    case Leaf:
        return 1
    case Node:
        return 1 + max(depth(nt.left), depth(nt.right))
    default:
        log.Fatalf("unexpected type %T", nt)
    }
    return 0
}

It's much more verbose than the Haskell pattern matching, but not less readable, and just as efficient. One obvious deficiency is having to manually reject other types in the default clause; Haskell's pattern matching does this automatically and also makes sure we've covered all cases.

A more realistic example

Let's turn to a more realistic example that you could legitimatley encounter in a Go program. Since I'm a compiler nerd this will be about trees again - abstract syntax trees. These data structures are a key layer in most compilers, produced by parsers and consumed by whatever comes after (type checking/inference, lowering, optimization, evaluation, etc).

For this sample I wrote a complete evaluator for a simple calculator language with arithmetic operations, variables you can set and access, and if conditionals. The full code is in parser-evaluator.go; here I'll just focus on the AST nodes and how to "pattern match" them. This is the Node interface:

type Node interface {
    isNode()
    String() string
}

It has the isNode guard method that all concrete node types will implement, along with a String method to make sure all nodes implement the fmt.Stringer interface for debugging. Here are a few sample concrete node types:

type AssignStmt struct {
    Pos  scanner.Position
    Name string
    Expr Node
}

type IfStmt struct {
    Pos  scanner.Position
    Cond Node
    Then Node
    Else Node
}

type IntConstant struct {
    Pos   scanner.Position
    Value int
}

If you look at the full code, all of these have a dummy isNode method, along with a String method.

This is how pattern matching happens in the Eval function that evaluates an expression:

func (eval *Evaluator) Eval(n Node) (int, error) {
    switch nt := n.(type) {
    case IntConstant:
        return nt.Value, nil
    // ...
    // ... more cases
    // ...
    case AssignStmt:
        val, err := eval.Eval(nt.Expr)
        if err != nil {
            return 0, err
        }
        eval.symbolTable[nt.Name] = val
        return 0, nil
    case IfStmt:
        condVal, err := eval.Eval(nt.Cond)
        if err != nil {
            return 0, err
        }
        // Lazily evaluate Then or Else based on the result of Cond.
        if condVal == 0 {
            elseVal, err := eval.Eval(nt.Else)
            if err != nil {
                return 0, err
            }
            return elseVal, nil
        } else {
            thenVal, err := eval.Eval(nt.Then)
            if err != nil {
                return 0, err
            }
            return thenVal, nil
        }
    }
    return 0, fmt.Errorf("unmatched node %s", n)
}

Again, a type switch to cleanly discriminate between all the run-time types n could take.

An even more realistic example

The evaluator shown in the previous section is something you could run into in real programs, and in fact you do. Go's own go/ast package uses the same idiom for its AST nodes [3].

Looking in src/go/ast/ast.go in Go's source code, we see:

// All statement nodes implement the Stmt interface.
type Stmt interface {
    Node
    stmtNode()
}

Node is an embedded interface for some position-related methods, and stmtNode is a dummy method only Stmt types implement. Then, looking in src/go/types/stmt.go we find many examples of the by-now familiar type switch:

// stmt typechecks statement s.
func (check *Checker) stmt(ctxt stmtContext, s ast.Stmt) {
  // ...
    switch s := s.(type) {
    case *ast.BadStmt, *ast.EmptyStmt:
        // ignore
    case *ast.IfStmt:
        check.openScope(s, "if")
        defer check.closeScope()

        check.simpleStmt(s.Init)
        var x operand
        check.expr(&x, s.Cond)
        // ...
    // ...
}

Conclusion

It seems that Go can do just fine without adding variant types, due to the challenges mentioned in the beginning of the post and the ease with which the major use-cases can be implemented using existing language features. While Go's interfaces and type switches are certainly more verbose than Haskell's ADT declarations with pattern matching, and provide a bit less static safety, they are nevertheless fully usable for the task and just as efficient.

Language design is a careful balance, and a lot can be said for keeping the language simple with a minimal number of features, even if this leads to small losses in expressivity. It's not worth adding a significant language feature just to cover for a small weakness in the existing ones that can be easily worked around.


[1]It's important to note that this is run-time type dispatch; the value has to have a run-time tag saying what its type is. This will be important in comparing with the Go implementation later on.
[2]This is not the idiomatic way of writing binary trees in Go, it's just here to provide a syntactically close comparison to the Haskell code.
[3]go/ast is a library package useful for constructing tools to process Go code. The Go compiler is now itself written in Go and has similar code in it (though AFAICT it's incompatible with the public-facing go/ast).