Notes on debugging Clojure code - Eli Bendersky's website

Clojure is a great programming language, but a recurring complaint one keeps hearing from developers hacking on Clojure code is that debugging can be unpleasant. First of all, I agree! Debugging Clojure code can be more daunting on average than, say, debugging Python code. This is mainly due to two reasons:

Clojure's Java legacy. Clojure is compiled to Java bytecode, which has some terminology and idiosyncracies Clojure programmers aren't always familiar with. These terms tend to pop up in stack traces and cause confusion (e.g. IFN).
Clojure - being a Lisp - has a certain code structure which is different from, say, a more common imperative coding style. Rather than being a sequence of statements, Clojure programs tend to involve long call chains of nested expressions. Where only part of an expression fails, it's often non-trivial to figure out why.

In this post I want to share some notes from my own experience debugging Clojure programs.

Dealing with Clojure's cryptic exceptions

The first problem with Clojure's runtime exceptions is that we usually don't get to see the full stack trace by default. Let's say we have this silly, nonsensical, function in a file called sample.clj:

(defn foo
  [n]
  (cond (> n 40) (+ n 20)
        (> n 20) (- (first n) 20)
        :else 0))

Then to try how it works, we load the file into the REPL and type the following [1]:

debugging.core=> (foo 24)
IllegalArgumentException Don't know how to create ISeq from: java.lang.Long
  clojure.lang.RT.seqFrom (RT.java:542)

Uh oh. There are two problems here. First, what does this error message mean? What's ISeq and what's java.lang.Long? Second, it's not clear where it is actually failing (thanks for that pointer to RT.java though, Clojure!) Let's address the second problem first. The magic incantation to show the stack trace of the last exception is calling the pst function:

debugging.core=> (pst)
IllegalArgumentException Don't know how to create ISeq from: java.lang.Long
  clojure.lang.RT.seqFrom (RT.java:542)
  clojure.lang.RT.seq (RT.java:523)
  clojure.lang.RT.first (RT.java:668)
  clojure.core/first--4339 (core.clj:55)
  clojure.core/first--4339 (core.clj:55)
  debugging.sample/foo (sample.clj:10)
  debugging.sample/foo (sample.clj:7)
  debugging.core/eval13715 (form-init6539101589609174055.clj:1)
  debugging.core/eval13715 (form-init6539101589609174055.clj:1)
  clojure.lang.Compiler.eval (Compiler.java:6927)
  clojure.lang.Compiler.eval (Compiler.java:6890)
  clojure.core/eval (core.clj:3105)

This is much better because at least some files in this trace are familiar. core.clj is not our core.clj, it's Clojure's core library. But sample.clj is our file, and we can infer that on line 10 we call clojure,core/first and something goes wrong. Line 10 happens to be:

(> n 20) (- (first n) 20)

So now things become more clear. The call (first n) must be bad, and bad in a way that tries to coerce clojure into creating an ISeq from a Long. In other words, we're passing a number into a function that expects a sequence, and this is, indeed, bad. Learning to map from Clojure values and types to the JVM's expectations will take time and grit - especially if you (like me) don't have much Java experience. I suggest doing a bit of reading on Clojure/Java interoperability, and about other Java-isms Clojure inherits; it ain't pretty, and you may not always want to use it, but being familiar with the terms can go a long way in deciphering cryptic stack traces.

For a more detailed treatment of this debugging issue I highly recommend Aphyr's article on debugging Clojure.

Finding which form an exception comes from

Let's invoke the foo function in a different way that demonstrates another issue with debugging Clojure:

debugging.core=> (foo nil)

NullPointerException   clojure.lang.Numbers.ops (Numbers.java:1013)

OK, we know what to do next:

debugging.core=> (pst)
NullPointerException
  clojure.lang.Numbers.ops (Numbers.java:1013)
  clojure.lang.Numbers.gt (Numbers.java:229)
  clojure.lang.Numbers.gt (Numbers.java:3864)
  debugging.sample/foo (sample.clj:9)
  debugging.sample/foo (sample.clj:7)
  debugging.core/eval14693 (form-init6539101589609174055.clj:1)
  debugging.core/eval14693 (form-init6539101589609174055.clj:1)
  clojure.lang.Compiler.eval (Compiler.java:6927)
  clojure.lang.Compiler.eval (Compiler.java:6890)
  clojure.core/eval (core.clj:3105)
  clojure.core/eval (core.clj:3101)
  clojure.main/repl/read-eval-print--7408/fn--7411 (main.clj:240)

So the exception comes from line 9, which is:

(cond (> n 40) (+ n 20)

This exception also tells us it comes from clojure.lang.Numbers.gt from which we can infer it's the > operator that is complaining. But imagine for a second that we had two forms with the same operator on that line:

(cond (> c 40) (* (+ n 20) (+ m 30)))

If we got a NullPointerException about an addition, we wouldn't know which one fails. Luckily, Clojure comes with a very useful module that helps debugging - tools.trace. In this particular case, we'd use the trace-forms macro which tells us which nested form (expression) is failing. We can modify our function to be:

(defn foo
  [n]
  (trace-forms (cond (> n 40) (+ n 20)
                     (> n 20) (- (first n) 20)
                     :else 0)))

And now when called with nil, we get:

debugging.core=> (foo nil)
NullPointerException : No message attached to throwable java.lang.NullPointerException
  Form failed: (> n 40)
  Form failed: (if
 (> n 40)
 (+ n 20)
 (clojure.core/cond (> n 20) (- (first n) 20) :else 0))
  Form failed: (cond (> n 40) (+ n 20) (> n 20) (- (first n) 20) :else 0)
  clojure.lang.Numbers.ops (Numbers.java:1013)

Neat, huh? trace-forms breaks the form it traces to all the nested forms and reports precisely which one failed - propagating this information upwards towards the top form [2]. trace-forms is very useful when errors manifest as exceptions.

Unfortunately, this isn't sufficient for all cases. Our foo wasn't designed to handle nils, and the bug here is in the place where the nil came from. This may be quite a bit removed - and not on the same stack trace - from where foo is invoked. We'll get an exception when foo is called, but the real challenge is to find where the nil came from. More generally, bugs that manifest as thrown exceptions are the easier kind of bugs. The more insidious bugs hide in programs that run just fine end-to-end but compute slightly incorrect results.

Tracing and logging

This gets us into the more general domain of debugging, where the tricks and tools programmers use are as varied as the bugs hiding in our programs. When it comes to debugging, I'm firmly in the printf camp; I rarely prefer debuggers over printf-based debugging [3], and Clojure is no exception. In fact, due to the way Clojure programs look (nested forms), I find that debuggers are even less useful in Clojure than in other languages. On the other hand, Clojure's macros make it possible to trace / print stuff in a very nice way.

For example, I find that it's useful to be able to turn debugging printouts on and off frequently. So I have this trusty code in my utilities:

(def ^:dynamic *verbose* false)

(defmacro printfv
  [fmt & args]
  `(when *verbose*
     (printf ~fmt ~@args)))

Calls to printfv can be freely scattered around the code; by default, they will not print anything. When I do want to see what these printfvs have to say, another macro comes useful:

(defmacro with-verbose
  [& body]
  `(binding [*verbose* true] ~@body))

Here's how it works; Suppose we've written this factorial function, with a debugging printout:

(defn factorial
  [n]
  (printfv "factorial: %d%n" n)
  (if (< n 1)
    1
    (* n (factorial (- n 1)))))

Now, if we just call it as usual from the REPL, we get:

debugging.core=> (factorial 6)
720

But if we want to actually see the debugging output, we call:

debugging.core=> (with-verbose (factorial 6))
factorial: 6
factorial: 5
factorial: 4
factorial: 3
factorial: 2
factorial: 1
factorial: 0
720

This optional verbosity is perfect when you're in the middle of a furious bug hunt, adding printfvs in many places in your code. with-verbose can turn verbose logging on selectively and control the amount of debugging spew [4].

This example brings us back to the tools.trace library, which provides another awesome tool that helps trace function calls (the bread and butter of Clojure programs). Enter trace-vars. After importing it, all we need to do is invoke it on any functions we want traced; for example:

debugging.core=> (trace-vars factorial)
#'debugging.core/factorial

And now invoking our factorial produces:

debugging.core=> (factorial 6)
TRACE t16315: (debugging.core/factorial 6)
TRACE t16316: | (debugging.core/factorial 5)
TRACE t16317: | | (debugging.core/factorial 4)
TRACE t16318: | | | (debugging.core/factorial 3)
TRACE t16319: | | | | (debugging.core/factorial 2)
TRACE t16320: | | | | | (debugging.core/factorial 1)
TRACE t16321: | | | | | | (debugging.core/factorial 0)
TRACE t16321: | | | | | | => 1
TRACE t16320: | | | | | => 1
TRACE t16319: | | | | => 2
TRACE t16318: | | | => 6
TRACE t16317: | | => 24
TRACE t16316: | => 120
TRACE t16315: => 720
720

We get to see the full call tree, including values of parameters and what each call returns. It even works for mutually-recursive functions:

(defn iseven?
  [n]
  (if (= n 0)
    true
    (isodd? (- n 1))))

(defn isodd?
  [n]
  (if (= n 0)
    false
    (iseven? (- n 1))))

Let's try it:

debugging.core=> (trace-vars iseven? isodd?)
#'debugging.core/isodd?
debugging.core=> (iseven? 7)
TRACE t16332: (debugging.core/iseven? 7)
TRACE t16333: | (debugging.core/isodd? 6)
TRACE t16334: | | (debugging.core/iseven? 5)
TRACE t16335: | | | (debugging.core/isodd? 4)
TRACE t16336: | | | | (debugging.core/iseven? 3)
TRACE t16337: | | | | | (debugging.core/isodd? 2)
TRACE t16338: | | | | | | (debugging.core/iseven? 1)
TRACE t16339: | | | | | | | (debugging.core/isodd? 0)
TRACE t16339: | | | | | | | => false
TRACE t16338: | | | | | | => false
TRACE t16337: | | | | | => false
TRACE t16336: | | | | => false
TRACE t16335: | | | => false
TRACE t16334: | | => false
TRACE t16333: | => false
TRACE t16332: => false
false

Note how easy it to see what calls what. Quite often, bugs are uncovered simply by carefully studying the chain of function calls some input tickles in our code, and trace-vars is a very low-effort method to enable this kind of debugging.

Deeper tracing inside `cond` forms

Tracing function calls is great, but sometimes insufficient. It's not uncommon to have cond forms in functions, and sometimes it's pretty hard to know which condition was actually "taken" (this isn't always easy to infer from the return value of the function). We've seen how to explore where exceptions come from with trace-forms, but exceptions are just one kind of problem. The more difficul problem arises when the code throws no exceptions but still produces a wrong value.

I've mentioned how Clojure's macro superpowers let us write very powerful debugging tools. What follows is another example.

Consider this toy code:

(cond (> 10 20) (+ 10 20)
      (> 20 10) (- 20 10)
      :else 200)

It happens to return 10 since the second condition fires. But suppose it stands for a much more complicated cond where it's not obvious which condition was taken and where the return value came from. How do we go about debugging this?

Well, we can always add a printfv into every result expression (possibly wrapping in a do form) and see what fires. This would work, but it's quite tiresome, especially for large conds. To do this automatically, we can write the following macro:

(defmacro condv
  [& clauses]
  (when clauses
    (list
     'if
     (first clauses)
     (if (next clauses)
       `(do (println (str "condv " '~(first clauses)))
            ~(second clauses))
       (throw (IllegalArgumentException.
               "cond requires an even number of forms")))
     (cons 'condv (next (next clauses))))))

It behaves just like cond, while also printing out the condition that fired. If we replace the cond in the original example with condv and evaluate it, we'll get:

debugging.core=> (condv (> 10 20) (+ 10 20)
            #_=>        (> 20 10) (- 20 10)
            #_=>        :else 200)
condv (> 20 10)
10

Note the printout before the return value of 10: condv (> 20 10) - it shows us exactly which condition was taken.

Conclusion

While beginning Clojure programmers may find the debugging experience challenging, I believe that with some effort and perseverance it's possible to get used to the unusual environment and even reach new levels of productivity by developing a set of debugging tools and techniques.

In this endeavor, Clojure's macro capabilities are an extremely powerful ally. Coupled with a fast edit-rerun cycle in the REPL, such tools can turn Clojure debugging into a much less painful activity.

[1]	Alternatively, we can evaluate the same expression somewhere in our editor using a Clojure plugin (such as `vim-fireplace` for Vim).

[2]	The astute reader will notice a slight discrepancy between our code and the output of `trace-form`. We don't have an `if` form, or do we? Quiz: what does `cond` expand to? Complex interactions between macros and functions is yet another reason debugging Clojure code is sometimes hard...

[3]	In my professional life I spent far more time writing debuggers than actually using them.

[4]	This method is only recommended when the debugging prinouts are destined to be eventually eliminated from the code. For more permanent logging with more verbosity controls, consider using a proper logging library like tools.logging.

Dealing with Clojure's cryptic exceptions

Finding which form an exception comes from

Tracing and logging

Deeper tracing inside cond forms

Conclusion

Deeper tracing inside `cond` forms