EOPL define-datatype and cases in Clojure

I'm going through the Essentials of Programming Languages (3rd ed.) book and it's been pretty good so far. In chapter 2, the authors use a pair of macros - define-datatype and cases - to make it easy to define data-driven programs, where objects belong to types, each of which has several "variants" with custom fields (this is essentially a macro-driven implementation of algebraic data types).

The canonical example used chapter 2 is the "Lambda calculus expression":

(define-datatype lc-exp lc-exp?
  (var-exp
   (var symbol?))
  (lambda-exp
   (bound-var symbol?)
   (body lc-exp?))
  (app-exp
   (rator lc-exp?)
   (rand lc-exp?)))

This means we create a type named lc-exp, with three variants:

var-exp which has a field named var, a symbol.
lambda-exp which has two fields: bound-var is a symbol, and body is a lc-exp.
app-exp which has two fields: rator and rand, both a lc-exp.

The define-datatype invocation creates multiple helper functions; for example, the predicate lc-exp? that tests whether the object it's given is a lc-exp. It can also optionally create accessors such as app-exp->rand, that will extract a field from a given variant.

The companion cases macro lets us organize code that operates on types created with define-datatype succinctly. For example, a function that checks whether some symbol occurs as a free variable in a given lc-exp:

(defn occurs-free?
  [search-var exp]
  (cases lc-exp exp
         (var-exp (variable) (= variable search-var))
         (lambda-exp (bound-var body)
                     (and (not (= search-var bound-var))
                          (occurs-free? search-var body)))
         (app-exp (rator rand)
                  (or
                   (occurs-free? search-var rator)
                   (occurs-free? search-var rand)))))

[Note: this is actual Clojure code from my implementation; the book uses Scheme, so it has slightly different syntax.]

Alas, while the book explains how this pair of macros works and uses them all over the place, it provides no definition. The definitions found online are either hard to hunt down or very verbose (which may be due to Scheme's use of hygienic macros).

Therefore I rolled my own, in Clojure, and the full code is available here. The code comes with a large number of unit tests, many of which are taken from the exercises in chapter 2 of the book.

It's been quite a while since I last did any serious Lispy macro hacking, so my implementation is fairly cautious in its use of macros. One cool thing about the way Clojure's (Common Lisp-like) macros work is that writing them is very close to just manipulating lists of symbols (representing code) in regular functions. Here's my define-datatype:

(defn define-datatype-aux
  "Creates a datatype from the specification. This is a function, so all its
  arguments are symbols or quoted lists. In particular, variant-descriptors is a
  quoted list of all the descriptors."
  [typename predicate-name variant-descriptors]
  ...)

(defmacro define-datatype
  "Simple macro wrapper around define-datatype-aux, so that the type name,
  predicate name and variant descriptors don't have to be quoted but rather can
  be regular Clojure symbols."
  [typename predicate-name & variant-descriptors]
  (define-datatype-aux typename predicate-name variant-descriptors))

All the macro does here is to do the thing only macros can do - change the evaluation rules of expressions, by not actually evaluating the arguments passed to define-datatype; rather passing them as lists of symbols (code) to a function. The define-datatype-aux function can then manipulate these lists of symbols. The only problem with this approach is that while macros can simply inject defns into the namespace, functions have to work a bit harder for that; what I use instead is:

(defn internfunc
  "Helper for interning a function with the given name (as a string) in the
  current namespace."
  [strname func]
  (intern *ns* (symbol strname) func))

I'm sure the code could be made much shorter by doing more work in the macro, but writing it this way made it possible to break the implementation into a number of small and simple functions, each of which is easy to test and understand without peering into the output of macroexpand.

In the implementation of cases I was a bit more brave and left more work in the macro itself:

(defn make-cond-case
  "Helper function for cases that generates a single case for the variant cond.

  variant-case is one variant case as given to the cases macro.
  obj-variant is the actual object variant (a symbol) as taken from the object.
  obj-fields is the list of the actual object's fields.

  Produces the code for '(cond-case cond-action)."
  [variant-case obj-variant obj-fields]
  `((= (quote ~(first variant-case)) ~obj-variant)
    (apply (fn [~@(second variant-case)] ~(last variant-case)) ~obj-fields)))

(defmacro cases
  [typename obj & variant-cases]
  (let [obj-type-sym (gensym 'type)
        obj-variant-sym (gensym 'variant)
        obj-fields-sym (gensym 'fields)]
    `(let [[~obj-type-sym ~obj-variant-sym & ~obj-fields-sym] ~obj]
       (assert (= ~obj-type-sym (quote ~typename)) "Unexpected type")
       (cond
         ~@(mapcat (fn [vc] (make-cond-case vc obj-variant-sym obj-fields-sym))
                   variant-cases)
         :else (assert false "Unsupported variant")))))

As you can see, I still deferred some of the work to a function - make-cond-case - to avoid complex nested quoting within the macro.

The full code is on GitHub.