Type erasure and reification



In this post I'd like to discuss the concepts of type erasure and reification in programming languages. I don't intend to dive very deeply into the specific rules of any particular language; rather, the post is going to present several simple examples in multiple languages, hoping to provide enough intuition and background for a more serious study, if necessary. As you'll see, the actual concepts are very simple and familiar. Deeper details of specific languages pertain more to the idiosyncrasies of those languages' semantics and implementations.

Important note: in C++ there is a programming pattern called type erasure, which is quite distinct from what I'm trying to describe here [1]. I'll be using C++ examples here, but that's to demonstrate how the original concepts apply in C++. The programming pattern will be covered in a separate post.

Types at compile time, no types at run-time

The title of this section is a "one short sentence" explanation of what type erasure means. With few exceptions, it only applies to languages with some degree of compile time (a.k.a. static) type checking. The basic principle should be immediately familiar to folks who have some idea of what machine code generated from low-level languages like C looks like. While C has static typing, this only matters in the compiler - the generated code is completely oblivious to types.

For example, consider the following C snippet:

typedef struct Frob_t {
  int x;
  int y;
  int arr[10];
} Frob;

int extract(Frob* frob) {
  return frob->y * frob->arr[7];
}

When compiling the function extract, the compiler will perform type checking. It won't let us access fields that were not declared in the struct, for example. Neither will it let us pass a pointer to a different struct (or to a float) into extract. But once it's done helping us, the compiler generates code which is completely type-free:

0:   8b 47 04                mov    0x4(%rdi),%eax
3:   0f af 47 24             imul   0x24(%rdi),%eax
7:   c3                      retq

The compiler is familiar with the stack frame layout and other specifics of the ABI, and generates code that assumes a correct type of structure was passed in. If the actual type is not what this function expects, there will be trouble (either accessing unmapped memory, or accessing wrong data).

A slightly adjusted example will clarify this:

int extract_cast(void* p) {
  Frob* frob = p;
  return frob->y * frob->arr[7];
}

The compiler will generate exactly identical code from this function, which in itself a good indication of when the types matter and when they don't. What's more interesting is that extract_cast makes it extremely easy for programmers to shoot themselves in the foot:

SomeOtherStruct ss;
extract_cast(&ss);    // oops

In general, type erasure is a concept that descibes these semantics of a language. Types matter to the compiler, which uses them to generate code and help the programmer avoid errors. Once everything is type-checked, however, the types are simply erased and the code the compiler generates is oblivious to them. The next section will put this in context by comparing to the opposite approach.

Reification - retaining types at run-time

While erasure means the compiler discards all type information for the actual generated code, reification is the other way to go - types are retained at run-time and used for perform various checks. A classical example from Java will help demonstrate this:

class Main {
  public static void main(String[] args) {
    String strings[] = {"a", "b"};
    Object objects[] = strings;
    objects[0] = 5;
  }
}

This code creates an array of String, and converts it to a generic array of Object. This is valid because arrays in Java are covariant, so the compiler doesn't complain. However, in the next line we try to assign an integer into the array. This happens to fail with an exception at run-time:

Exception in thread "main" java.lang.ArrayStoreException: java.lang.Integer
    at Main.main(Main.java:5)

A type check was inserted into the generated code, and it fired when an incorrect assignment was attempted. In other words, the type of objects is reified. Reification is defined roughly as "taking something abstract and making it real/concrete", which when applied to types means "compile-time types are converted to actual run-time entities".

C++ has some type reification support as well, e.g. with dynamic_cast:

struct Base {
  virtual void basefunc() {
    printf("basefunc\n");
  }
};

struct Derived : public Base {
  void derivedfunc() {
    printf("derived\n");
  }
};

void call_derived(Base* b) {
  Derived* d = dynamic_cast<Derived*>(b);
  if (d != nullptr) {
    d->derivedfunc();
  } else {
    printf("cast failed\n");
  }
}

We can call call_derived thus:

int main() {
  Derived d;
  call_derived(&d);

  Base b;
  call_derived(&b);
}

The first call will successfully invoke derivedfunc; the second will not, because the dynamic_cast will return nullptr at run-time. This is because we're using C++'s run-time type information (RTTI) capabilities here, where an actual representation of the type is stored in the generated code (most likely attached to the vtable which every polymorphic object points to). C++ also has the typeid feature, but I'm showing dynamic_cast since it's the one most commonly used.

Note particularly the differences between this sample and the C sample in the beginning of the post. Conceptually, it's similar - we use a pointer to a general type (in C that's void*, in the C++ example we use a base type) to interact with concrete types. Whereas in C there is no built-in run-time type feature, in C++ we can use RTTI in some cases. With RTTI enabled, dynamic_cast can be used to interact with the run-time (reified) representation of types in a limited but useful way.

Type erasure and Java generics

One place where folks not necessarily familiar with programming language type theory encounter erasure is Java generics, which were bolted onto the language after a large amount of code has already been written. The designers of Java faced the binary compatibility challenge, wherein they wanted code compiled with newer Java compilers to run on older VMs.

The solution was to use type erasure to implement generics entirely in the compiler. Here's a quote from the official Java generics tutorial:

Generics were introduced to the Java language to provide tighter type checks at compile time and to support generic programming. To implement generics, the Java compiler applies type erasure to:

  • Replace all type parameters in generic types with their bounds or Object if the type parameters are unbounded. The produced bytecode, therefore, contains only ordinary classes, interfaces, and methods.
  • Insert type casts if necessary to preserve type safety.
  • Generate bridge methods to preserve polymorphism in extended generic types.

Here's a very simple example to demonstrate what's going on, taken from a Stack Overflow answer. This code:

import java.util.List;
import java.util.ArrayList;

class Main {
  public static void main(String[] args) {
    List<String> list = new ArrayList<String>();
    list.add("Hi");
    String x = list.get(0);
    System.out.println(x);
  }
}

Uses a generic List. However, what the compiler creates prior to emitting bytecode is equivalent to:

import java.util.List;
import java.util.ArrayList;

class Main {
  public static void main(String[] args) {
    List list = new ArrayList();
    list.add("Hi");
    String x = (String) list.get(0);
    System.out.println(x);
  }
}

Here List is a container of Object, so we can assign any element to it (similarly to the reification example shown in the previous section). The compiler then inserts a cast when accessing that element as a string. In this case the compiler will adamantly preserve type safety and won't let us do list.add(5) in the original snippet, because it sees that list is a List<String>. Therefore, the cast to (String) should be safe.

Using type erasure to implement generics with backwards compatibility is a neat idea, but it has its issues. Some folks complain that not having the types available at runtime is a limitation (e.g. not being able to use instanceof and other reflection capabilities). Other languages, like C# and Dart 2, have reified generics which do preserve the type information at run-time.

Reification in dynamically typed languages

I hope it's obvious that the theory and techniques described above only apply to statically-typed languages. In dynamically-typed languages, like Python, there is almost no concept of types at compile-time, and types are a fully reified concept. Even trivial errors like:

class Foo:
  def bar(self): pass

f = Foo()
f.joe()         # <--- calling non-existent method

Fire at run-time, because there's no static type checking [2]. Types obviously exist at run-time, with functions like type() and isinstance() providing complete reflection capabilities. The type() function can even create new types entirely at run-time.


[1]But it's most likely what you'll get to if you google for "c++ type erasure".
[2]To be clear - this is not a bug; it's a feature of Python. A new method can be added to classes dynamically at runtime (here, some code could have defined a joe method for Foo before the f.joe() invocation), and the compiler has absolutely no way of knowing this could or couldn't happen. So it has to assume such invocations are valid and rely on run-time checking to avoid serious errors like memory corruption.

Type inference



Type inference is a major feature of several programming languages, most notably languages from the ML family like Haskell. In this post I want to provide a brief overview of type inference, along with a simple Python implementation for a toy ML-like language.

Uni-directional type inference

While static typing is very useful, one of its potential downsides is verbosity. The programmer has to annotate values with types throughout the code, which results in more effort and clutter. What's really annoying, though, is that in many cases these annotations feel superfluous. Consider this classical C++ example from pre-C++11 times:

std::vector<Blob*> blobs;
std::vector<Blob*>::iterator iter = blobs.begin();

Clearly when the compiler sees blobs.begin(), it knows the type of blobs, so it also knows the type of the begin() method invoked on it because it is familiar with the declaration of begin. Why should the programmer be burdened with spelling out the type of the iterator? Indeed, one of the most welcome changes in C++11 was lifting this burden by repurposing auto for basic type inference:

std::vector<Blob*> blobs;
auto iter = blobs.begin();

Go has a similar capability with the := syntax. Given some function:

func parseThing(...) (Node, error) {
}

We can simply write:

node, err := parseThing(...)

Without having to explicitly declare that node has type Node and err has type error.

These features are certainly useful, and they involve some degree of type inference from the compiler. Some functional programming proponents say this is not real type inference, but I think the difference is just a matter of degree. There's certainly some inference going on here, with the compiler calculating and assigning the right types for expressions without the programmer's help. Since this calculation flows in one direction (from the declaration of the vector::begin method to the auto assignment), I'll call it uni-directional type inference [1].

Bi-directional type inference (Hindley-Milner)

If we define a new map function in Haskell to map a function over a list, we can do it as follows:

mymap f [] = []
mymap f (first:rest) = f first : mymap f rest

Note that we did not specify the types for either the arguments of mymap, or its return value. The Haskell compiler can infer them on its own, using the definition provided:

> :t Main.mymap
Main.mymap :: (t1 -> t) -> [t1] -> [t]

The compiler has determined that the first argument of mymap is a generic function, assigning its argument the type t1 and its return value the type t. The second argument of mymap has the type [t1], which means "list of t1"; then the return value of mymap has the type "list of t". How was this accomplished?

Let's start with the second argument. From the [] = [] variant, and also from the (first:rest) deconstruction, the compiler infers it has a list type. But there's nothing else in the code constraining the element type, so the compiler chooses a generic type specifier - t1. f first applies f to an element of this list, so f has to take t1; nothing constrains its return value type, so it gets the generic t. The result is f has type (t1 -> t), which in Haskell parlance means "a function from t1 to t".

Here is another example, written in a toy language I put together for the sake of this post. The language is called microml, and its implementation is described at the end of the post:

foo f g x = if f(x == 1) then g(x) else 20

Here foo is declared as a function with three arguments. What is its type? Let's try to run type inference manually. First, note that the body of the function consists of an if expresssion. As is common in programming languages, this one has some strict typing rules in microml; namely, the type of the condition is boolean (Bool), and the types of the then and else clauses must match.

So we know that f(x == 1) has to return a Bool. Moreover, since x is compared to an integer, x is an Int. What is the type of g? Well, it has an Int argument, and it return value must match the type of the else clause, which is an Int as well.

To summarize:

  • The type of x is Int
  • The type of f is Bool -> Bool
  • The type of g is Int -> Int

So the overall type of foo is:

((Bool -> Bool), (Int -> Int), Int) -> Int

It takes three arguments, the types of which we have determined, and returns an Int.

Note how this type inference process is not just going in one direction, but seems to be "jumping around" the body of the function figuring out known types due to typing rules. This is why I call it bi-directional type inference, but it's much better known as Hindley-Milner type inference, since it was independently discovered by Roger Hindley in 1969 and Robin Milner in 1978.

How Hindley-Milner type inference works

We've seen a couple of examples of manually running type inference on some code above. Now let's see how to translate it to an implementable algorithm. I'm going to present the process in several separate stages, for simplicity. Some other presentations of the algorithm combine several of these stages, but seeing them separately is more educational, IMHO.

The stages are:

  1. Assign symbolic type names (like t1, t2, ...) to all subexpressions.
  2. Using the language's typing rules, write a list of type equations (or constraints) in terms of these type names.
  3. Solve the list of type equations using unification.

Let's use this example again:

foo f g x = if f(x == 1) then g(x) else 20

Starting with stage 1, we'll list all subexpressions in this declaration (starting with the declaration itself) and assign unique type names to them:

foo                                       t0
f                                         t1
g                                         t2
x                                         t3
if f(x == 1) then g(x) else 20            t4
f(x == 1)                                 t5
x == 1                                    t6
x                                         t3
g(x)                                      t7
20                                        Int

Note that every subexpression gets a type, and we de-duplicate them (e.g. x is encountered twice and gets the same type name assigned). Constant nodes get known types.

In stage 2, we'll use the language's typing rules to write down equations involving these type names. Usually books and papers use slightly scary formal notation for typing rules; for example, for if:

\[\frac{\Gamma \vdash e_0 : Bool, \Gamma \vdash e_1 : T, \Gamma \vdash e_2 : T}{\Gamma \vdash if\: e_0\: then\: e_1\: else\: e_2 : T}\]

All this means is the intuitive typing of if we've described above: the condition is expected to be boolean, and the types of the then and else clauses are expected to match, and their type becomes the type of the whole expression.

To unravel the notation, prepend "given that" to the expression above the line and "we can derive" to the expression below the line; \Gamma \vdash e_0 : Bool means that e_0 is typed to Bool in the set of typing assumptions called \Gamma.

Similarly, a typing rule for single-argument function application would be:

\[\frac{\Gamma \vdash e_0 : T, \Gamma \vdash f : T \rightarrow U}{\Gamma \vdash f(e_0) : U}\]

The real trick of type inference is running these typing rules in reverse. The rule tells us how to assign types to the whole expression given its constituent types, but we can also use it as an equation that works both ways and lets us infer constituent types from the whole expression's type.

Let's see what equations we can come up with, looking at the code:

From f(x == 1) we infer t1 = (t6 -> t5), because t1 is the type of f, t6 is the type of x == 1, and t5 is the type of f(x == 1). Note that we're using the typing rules for function application here. Moreover, we can infer that t3 is Int and t6 is Bool because of the typing rule of the == operator.

Similarly, from g(x) we infer t2 = (t3 -> t7).

From the if expression, we infer that t6 is Bool (since it's the condition of the if) and that t4 = Int, because the then and else clauses must match.

Now we have a list of equations, and our task is to find the most general solution, treating the equations as constraints. This is done by using the unification algorithm which I described in detail in the previous post. The solution we're seeking here is precisely the most general unifier.

For our expression, the algorithm will find the type of foo to be:

((Bool -> Bool), (Int -> Int), Int) -> Int)

As expected.

If we make a slight modification to the expression to remove the comparison of x with 1:

foo f g x = if f(x) then g(x) else 20

Then we can no longer constrain the type of x, since all we know about it is that it's passed into functions f and g, and nothing else constrains the arguments of these functions. The type inference process will thus calculate this type for foo:

((a -> Bool), (a -> Int), a) -> Int

It assigns x the generic type name a, and uses it for the arguments of f and g as well.

The implementation

An implementation of microml is available here, as a self-contained Python program that parses a microml declaration and infers its type. The best starting point is main.py, which spells out the stages of type inference:

code = 'foo f g x = if f(x == 1) then g(x) else 20'
print('Code', '----', code, '', sep='\n')

# Parse the microml code snippet into an AST.
p = parser.Parser()
e = p.parse_decl(code)
print('Parsed AST', '----', e, '', sep='\n')

# Stage 1: Assign symbolic typenames
typing.assign_typenames(e.expr)
print('Typename assignment', '----',
      typing.show_type_assignment(e.expr), '', sep='\n')

# Stage 2: Generate a list of type equations
equations = []
typing.generate_equations(e.expr, equations)
print('Equations', '----', sep='\n')
for eq in equations:
    print('{:15} {:20} | {}'.format(str(eq.left), str(eq.right), eq.orig_node))

# Stage 3: Solve equations using unification
unifier = typing.unify_all_equations(equations)
print('', 'Inferred type', '----',
      typing.get_expression_type(e.expr, unifier, rename_types=True),
      sep='\n')

This will print out:

Code
----
foo f g x = if f(x == 1) then g(x) else 20

Parsed AST
----
Decl(foo, Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20)))

Typename assignment
----
Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20))   t0
If(App(f, [(x == 1)]), App(g, [x]), 20)                      t4
App(f, [(x == 1)])                                           t5
f                                                            t1
(x == 1)                                                     t6
x                                                            t3
1                                                            Int
App(g, [x])                                                  t7
g                                                            t2
x                                                            t3
20                                                           Int

Equations
----
Int             Int                  | 1
t3              Int                  | (x == 1)
Int             Int                  | (x == 1)
t6              Bool                 | (x == 1)
t1              (t6 -> t5)           | App(f, [(x == 1)])
t2              (t3 -> t7)           | App(g, [x])
Int             Int                  | 20
t5              Bool                 | If(App(f, [(x == 1)]), App(g, [x]), 20)
t4              t7                   | If(App(f, [(x == 1)]), App(g, [x]), 20)
t4              Int                  | If(App(f, [(x == 1)]), App(g, [x]), 20)
t0              ((t1, t2, t3) -> t4) | Lambda([f, g, x], If(App(f, [(x == 1)]), App(g, [x]), 20))

Inferred type
----
(((Bool -> Bool), (Int -> Int), Int) -> Int)

There are many more examples of type-inferred microml code snippets in the test file test_typing.py. Here's another example which is interesting:

> foo f x = if x then lambda t -> f(t) else lambda j -> f(x)
((Bool -> a), Bool) -> (Bool -> a)

The actual inference is implemented in typing.py, which is fairly well commented and should be easy to understand after reading this post. The trickiest part is probably the unification algorithm, but that one is just a slight adaptation of the algorithm presented in the previous post.


[1]

After this post was published, it was pointed out that another type checking / inference technique is already called bi-directional (see this paper for example); while it's related to Hindley-Milner (HM), it's a distinct method. Therefore, my terminology here can create a confusion.

I'll emphasize that my only use of the term "bi-directional" is to distinguish what HM does from the simpler "uni-directional" inference described at the beginning.


Unification



In logic and computer science, unification is a process of automatically solving equations between symbolic terms. Unification has several interesting applications, notably in logic programming and type inference. In this post I want to present the basic unification algorithm with a complete implementation.

Let's start with some terminology. We'll be using terms built from constants, variables and function applications:

  • A lowercase letter represents a constant (could be any kind of constant, like an integer or a string)
  • An uppercase letter represents a variable
  • f(...) is an application of function f to some parameters, which are terms themselves

This representation is borrowed from first-order logic and is also used in the Prolog programming language. Some examples:

  • V: a single variable term
  • foo(V, k): function foo applied to variable V and constant k
  • foo(bar(k), baz(V)): a nested function application

Pattern matching

Unification can be seen as a generalization of pattern matching, so let's start with that first.

We're given a constant term and a pattern term. The pattern term has variables. Pattern matching is the problem of finding a variable assignment that will make the two terms match. For example:

  • Constant term: f(a, b, bar(t))
  • Pattern term: f(a, V, X)

Trivially, the assignment V=b and X=bar(t) works here. Another name to call such an assignment is a substitution, which maps variables to their assigned values. In a less trivial case, variables can appear multiple times in a pattern:

  • Constant term: f(top(a), a, g(top(a)), t)
  • Pattern term: f(V, a, g(V), t)

Here the right substitution is V=top(a).

Sometimes, no valid substitutions exist. If we change the constant term in the latest example to f(top(b), a, g(top(a)), t), then there is no valid substitution becase V would have to match top(b) and top(a) simultaneously, which is not possible.

Unification

Unification is just like pattern matching, except that both terms can contain variables. So we can no longer say one is the pattern term and the other the constant term. For example:

  • First term: f(a, V, bar(D))
  • Second term f(D, k, bar(a))

Given two such terms, finding a variable substitution that will make them equivalent is called unification. In this case the substitution is {D=a, V=k}.

Note that there is an infinite number of possible unifiers for some solvable unification problem. For example, given:

  • First term: f(X, Y)
  • Second term: f(Z, g(X))

We have the substitution {X=Z, Y=g(X)} but also something like {X=K, Z=K, Y=g(K)} and {X=j(K), Z=j(K), Y=g(j(K))} and so on. The first substitution is the simplest one, and also the most general. It's called the most general unifier or mgu. Intuitively, the mgu can be turned into any other unifier by performing another substitution. For example {X=Z, Y=g(X)} can be turned into {X=j(K), Z=j(K), Y=g(j(K))} by applying the substitution {Z=j(K)} to it. Note that the reverse doesn't work, as we can't turn the second into the first by using a substitution. So we say that {X=Z, Y=g(X)} is the most general unifier for the two given terms, and it's the mgu we want to find.

An algorithm for unification

Solving unification problems may seem simple, but there are a number of subtle corner cases to be aware of. In his 1991 paper Correcting a Widespread Error in Unification Algorithms, Peter Norvig noted a common error that exists in many books presenting the algorithm, including SICP.

The correct algorithm is based on J.A. Robinson's 1965 paper "A machine-oriented logic based on the resolution principle". More efficient algorithms have been developed over time since it was first published, but our focus here will be on correctness and simplicity rather than performance.

The following implementation is based on Norvig's, and the full code (with tests) is available on Github. This implementation uses Python 3, while Norvig's original is in Common Lisp. There's a slight difference in representations too, as Norvig uses the Lisp-y (f X Y) syntax to denote an application of function f. The two representations are isomorphic, and I'm picking the more classical one which is used in most papers on the subject. In any case, if you're interested in the more Lisp-y version, I have some Clojure code online that ports Norvig's implementation more directly.

We'll start by defining the data structure for terms:

class Term:
    pass

class App(Term):
    def __init__(self, fname, args=()):
       self.fname = fname
       self.args = args

    # Not shown here: __str__ and __eq__, see full code for the details...


class Var(Term):
    def __init__(self, name):
        self.name = name


class Const(Term):
    def __init__(self, value):
        self.value = value

An App represents the application of function fname to a sequence of arguments.

def unify(x, y, subst):
    """Unifies term x and y with initial subst.

    Returns a subst (map of name->term) that unifies x and y, or None if
    they can't be unified. Pass subst={} if no subst are initially
    known. Note that {} means valid (but empty) subst.
    """
    if subst is None:
        return None
    elif x == y:
        return subst
    elif isinstance(x, Var):
        return unify_variable(x, y, subst)
    elif isinstance(y, Var):
        return unify_variable(y, x, subst)
    elif isinstance(x, App) and isinstance(y, App):
        if x.fname != y.fname or len(x.args) != len(y.args):
            return None
        else:
            for i in range(len(x.args)):
                subst = unify(x.args[i], y.args[i], subst)
            return subst
    else:
        return None

unify is the main function driving the algorithm. It looks for a substitution, which is a Python dict mapping variable names to terms. When either side is a variable, it calls unify_variable which is shown next. Otherwise, if both sides are function applications, it ensures they apply the same function (otherwise there's no match) and then unifies their arguments one by one, carefully carrying the updated substitution throughout the process.

def unify_variable(v, x, subst):
    """Unifies variable v with term x, using subst.

    Returns updated subst or None on failure.
    """
    assert isinstance(v, Var)
    if v.name in subst:
        return unify(subst[v.name], x, subst)
    elif isinstance(x, Var) and x.name in subst:
        return unify(v, subst[x.name], subst)
    elif occurs_check(v, x, subst):
        return None
    else:
        # v is not yet in subst and can't simplify x. Extend subst.
        return {**subst, v.name: x}

The key idea here is recursive unification. If v is bound in the substitution, we try to unify its definition with x to guarantee consistency throughout the unification process (and vice versa when x is a variable). There's another function being used here - occurs_check; I'm retaining its classical name from early presentations of unification. Its goal is to guarantee that we don't have self-referential variable bindings like X=f(X) that would lead to potentially infinite unifiers.

def occurs_check(v, term, subst):
    """Does the variable v occur anywhere inside term?

    Variables in term are looked up in subst and the check is applied
    recursively.
    """
    assert isinstance(v, Var)
    if v == term:
        return True
    elif isinstance(term, Var) and term.name in subst:
        return occurs_check(v, subst[term.name], subst)
    elif isinstance(term, App):
        return any(occurs_check(v, arg, subst) for arg in term.args)
    else:
        return False

Let's see how this code handles some of the unification examples discussed earlier in the post. Starting with the pattern matching example, where variables are just one one side:

>>> unify(parse_term('f(a, b, bar(t))'), parse_term('f(a, V, X)'), {})
{'V': b, 'X': bar(t)}

Now the examples from the Unification section:

>>> unify(parse_term('f(a, V, bar(D))'), parse_term('f(D, k, bar(a))'), {})
{'D': a, 'V': k}
>>> unify(parse_term('f(X, Y)'), parse_term('f(Z, g(X))'), {})
{'X': Z, 'Y': g(X)}

Finally, let's try one where unification will fail due to two conflicting definitions of variable X.

>>> unify(parse_term('f(X, Y, X)'), parse_term('f(r, g(X), p)'), {})
None

Lastly, it's instructive to trace through the execution of the algorithm for a non-trivial unification to see how it works. Let's unify the terms f(X,h(X),Y,g(Y)) and f(g(Z),W,Z,X):

  • unify is called, sees the root is an App of function f and loops over the arguments.
    • unify(X, g(Z)) invokes unify_variable because X is a variable, and the result is augmenting subst with X=g(Z)
    • unify(h(X), W) invokes unify_variable because W is a variable, so the subst grows to {X=g(Z), W=h(X)}
    • unify(Y, Z) invokes unify_variable; since neither Y nor Z are in subst yet, the subst grows to {X=g(Z), W=h(X), Y=Z} (note that the binding between two variables is arbitrary; Z=Y would be equivalent)
    • unify(g(Y), X) invokes unify_variable; here things get more interesting, because X is already in the subst, so now we call unify on g(Y) and g(Z) (what X is bound to)
      • The functions match for both terms (g), so there's another loop over arguments, this time only for unifying Y and Z
      • unify_variable for Y and Z leads to lookup of Y in the subst and then unify(Z, Z), which returns the unmodified subst; the result is that nothing new is added to the subst, but the unification of g(Y) and g(Z) succeeds, because it agrees with the existing bindings in subst
  • The final result is {X=g(Z), W=h(X), Y=Z}

Efficiency

The algorithm presented here is not particularly efficient, and when dealing with large unification problems it's wise to consider more advanced options. It does too much copying around of subst, and also too much work is repeated because we don't try to cache terms that have already been unified.

For a good overview of the efficiency of unification algorithms, I recommend checking out two papers:

  • "An Efficient Unificaiton algorithm" by Martelli and Montanari
  • "Unification: A Multidisciplinary survey" by Kevin Knight