Right and left folds, primitive recursion patterns in Python and Haskell

A "fold" is a fundamental primitive in defining operations on data structures; it's particularly important in functional languages where recursion is the default tool to express repetition. In this article I'll present how left and right folds work and how they map to some fundamental recursive patterns.

The article starts with Python, which should be (or at least look) familiar to most programmers. It then switches to Haskell for a discussion of more advanced topics like the connection between folding and laziness, as well as monoids.

Extracting a fundamental recursive pattern

Let's begin by defining a couple of straightforward functions in a recursive manner, in Python. First, computing the product of all the numbers in a given list:

def product(seq):
    if not seq:
        return 1
        return seq[0] * product(seq[1:])

Needless to say, we wouldn't really write this function recursively in Python; but if we were, this is probably how we'd write it.

Now another, slightly different, function. How do we double (multiply by 2) every element in a list, recursively?

def double(seq):
    if not seq:
        return []
        return [seq[0] * 2] + double(seq[1:])

Again, ignoring the fact that Python has much better ways to do this (list comprehensions, for example), this is a straightforward recursive pattern that experienced programmers can produce in their sleep.

In fact, there's a lot in common between these two implementation. Let's try to find the commonalities.

Recursion pattern in the product function

As this diagram shows, the functions product and double are really only different in three places:

  1. The initial value produced when the input sequence is empty.
  2. The mapping applied to every sequence value processed.
  3. The combination of the mapped sequence value with the rest of the sequence.

For product:

  1. The initial value is 1.
  2. The mapping is identity (each sequence element just keeps its value, without change).
  3. The combination is the multiplication operator.

Can you figure out the same classification for double? Take a few moments to try for yourself. Here it is:

  1. The initial value is the empty list [].
  2. The mapping takes a value, multiplies it by 2 and puts it into a list. We could express this in Python as lambda x: [x * 2].
  3. The combination is the list concatenation operator +.

With the diagram above and these examples, it's straightforward to write a generalized "recursive transform" function that can be used to implement both product and double:

def transform(init, mapping, combination, seq):
    if not seq:
        return init
        return combination(mapping(seq[0]),
                           transform(init, mapping, combination, seq[1:]))

The transform function is parameterized with init - the initial value, mapping- a mapping function applied to every sequence value, and combination - the combination of the mapped sequence value with the rest of the sequence. With these given, it implements the actual recursive traversal of the list.

Here's how we'd write product in terms of transform:

def product_with_transform(seq):
    return transform(1, lambda x: x, lambda a, b: a * b, seq)

And double:

def double_with_transform(seq):
    return transform([], lambda x: [x * 2], lambda a, b: a + b, seq)

foldr - fold right

Generalizations like transform make functional programming fun and powerful, since they let us express complex ideas with the help of relatively few building blocks. Let's take this idea further, by generalizing transform even more. The main insight guiding us is that the mapping and the combination don't even have to be separate functions. A single function can play both roles.

In the definition of transform, combination is applied to:

  1. The result of calling mapping on the current sequence value.
  2. The recursive application of the transformation to the rest of the sequence.

We can encapsulate both in a function we call the "reduction function". This reduction function takes two arguments: the current sequence value (item), and the result of applying the full transfromation to the rest of the sequence. The driving transformation that uses this reduction function is called "a right fold" (or foldr):

def foldr(func, init, seq):
    if not seq:
        return init
        return func(seq[0], foldr(func, init, seq[1:]))

We'll get to why this is called "fold" shortly; first, let's convince ourselves it really works. Here's product implemented using foldr:

def product_with_foldr(seq):
    return foldr(lambda seqval, acc: seqval * acc, 1, seq)

The key here is the func argument. In the case of product, it "reduces" the current sequence value with the "accumulator" (the result of the overall transformation invoked on the rest of the sequence) by multiplying them together. The overall result is a product of all the elements in the list.

Let's trace the calls to see the recursion pattern. I'll be using the tracing technique described in this post. For this purpose I hoisted the reducing function into a standalone function called product_reducer:

def product_reducer(seqval, acc):
    return seqval * acc

def product_with_foldr(seq):
    return foldr(product_reducer, 1, seq)

The full code for this experiment is available here. Here's the tracing of invoking product_with_foldr([2, 4, 6, 8]):

product_with_foldr([2, 4, 6, 8])
  foldr(<function product_reducer at 0x7f3415145ae8>, 1, [2, 4, 6, 8])
    foldr(<function product_reducer at 0x7f3415145ae8>, 1, [4, 6, 8])
      foldr(<function product_reducer at 0x7f3415145ae8>, 1, [6, 8])
        foldr(<function product_reducer at 0x7f3415145ae8>, 1, [8])
          foldr(<function product_reducer at 0x7f3415145ae8>, 1, [])
          --> 1
          product_reducer(8, 1)
          --> 8
        --> 8
        product_reducer(6, 8)
        --> 48
      --> 48
      product_reducer(4, 48)
      --> 192
    --> 192
    product_reducer(2, 192)
    --> 384
  --> 384

The recursion first builds a full stack of calls for every element in the sequence, until the base case (empty list) is reached. Then the calls to product_reducer start executing. The first reduces 8 (the last element in the list) with 1 (the result of the base case). The second reduces this result with 6 (the second-to-last element in the list), and so on until we reach the final result.

Since foldr is just a generic traversal pattern, we can say that the real work here happens in the reducers. If we build a tree of invocations of product_reducer, we get:

foldr mul tree

And this is why it's called the right fold. It takes the rightmost element and combines it with init. Then it takes the result and combines it with the second rightmost element, and so on until the first element is reached.

More general operations with foldr

We've seen how foldr can implement all kinds of functions on lists by encapsulating a fundamental recursive pattern. Let's see a couple more examples. The function double shown above is just a special case of the functional map primitive:

def map(mapf, seq):
    if not seq:
        return []
        return [mapf(seq[0])] + map(mapf, seq[1:])

Instead of applying a hardcoded "multiply by 2" function to each element in the sequence, map applies a user-provided unary function. Here's map implemented in terms of foldr:

def map_with_foldr(mapf, seq):
    return foldr(lambda seqval, acc: [mapf(seqval)] + acc, [], seq)

Another functional primitive that we can implement with foldr is filter. This one is just a bit trickier because we sometimes want to "skip" a value based on what the filtering predicate returns:

def filter(predicate, seq):
    if not seq:
        return []
        maybeitem = [seq[0]] if predicate(seq[0]) else []
        return maybeitem + filter(predicate, seq[1:])

Feel free to try to rewrite it with foldr as an exercise before looking at the code below. We just follow the same pattern:

def filter_with_foldr(predicate, seq):
    def reducer(seqval, acc):
        if predicate(seqval):
            return [seqval] + acc
            return acc
    return foldr(reducer, [], seq)

We can also represent less "linear" operations with foldr. For example, here's a function to reverse a sequence:

def reverse_with_foldr(seq):
    return foldr(lambda seqval, acc: acc + [seqval], [], seq)

Note how similar it is to map_with_foldr; only the order of concatenation is flipped.

Left-associative operations and foldl

Let's probe at some of the apparent limitations of foldr. We've seen how it can be used to easily compute the product of numbers in a sequence. What about a ratio? For the list [3, 2, 2] the ratio is "3 divided by 2, divided by 2", or 0.75 [1].

If we take product_with_foldr from above and replace * by /, we get:

>>> foldr(lambda seqval, acc: seqval / acc, 1, [3, 2, 2])

What gives? The problem here is the associativity of the operator /. Take another look at the call tree diagram shown above. It's obvious this diagram represents a right-associative evaluation. In other words, what our attempt at a ratio did is compute 3 / (2 / 2), which is indeed 3.0; instead, what we'd like is (3 / 2) / 2. But foldr is fundamentally folding the expression from the right. This works well for associative operations like + or * (operations that don't care about the order in which they are applied to a sequence), and also for right-associative operations like exponentiation, but it doesn't work that well for left-associative operations like / or -.

This is where the left fold comes in. It does precisely what you'd expect - folds a sequence from the left, rather than from the right. I'm going to leave the division operation for later [2] and use another example of a left-associative operation: converting a sequence of digits into a number. For example [2, 3] represents 23, [3, 4, 5, 6] represents 3456, etc. (a related problem which is more common in introductory programming is converting a string that contains a number into an integer).

The basic reducing operation we'll use here is: acc * 10 + sequence value. To get 3456 from [3, 4, 5, 6] we'll compute:

(((((3 * 10) + 4) * 10) + 5) * 10) + 6

Note how this operation is left-associative. Reorganizing the parens to a rightmost-first evaluation would give us a completely different result.

Without further ado, here's the left fold:

def foldl(func, init, seq):
    if not seq:
        return init
        return foldl(func, func(init, seq[0]), seq[1:])

Note that the order of calls between the recursive call to itself and the call to func is reversed vs. foldr. This is also why it's customary to put acc first and seqval second in the reducing functions passed to foldl.

If we perform multiplication with foldl:

def product_with_foldl(seq):
    return foldl(lambda acc, seqval: acc * seqval, 1, seq)

We'll get this trace:

product_with_foldl([2, 4, 6, 8])
  foldl(<function product_reducer at 0x7f2924cbdc80>, 1, [2, 4, 6, 8])
    product_reducer(1, 2)
    --> 2
    foldl(<function product_reducer at 0x7f2924cbdc80>, 2, [4, 6, 8])
      product_reducer(2, 4)
      --> 8
      foldl(<function product_reducer at 0x7f2924cbdc80>, 8, [6, 8])
        product_reducer(8, 6)
        --> 48
        foldl(<function product_reducer at 0x7f2924cbdc80>, 48, [8])
          product_reducer(48, 8)
          --> 384
          foldl(<function product_reducer at 0x7f2924cbdc80>, 384, [])
          --> 384
        --> 384
      --> 384
    --> 384
  --> 384

Contrary to the right fold, the reduction function here is called immediately for each recursive step, rather than waiting for the recursion to reach the end of the sequence first. Let's draw the call graph to make the folding-from-the-left obvious:

foldl mul tree

Now, to implement the digits-to-a-number function task described earlier, we'll write:

def digits2num_with_foldl(seq):
    return foldl(lambda acc, seqval: acc * 10 + seqval, 0, seq)

Stepping it up a notch - function composition with foldr

Since we're looking at functional programming primitives, it's only natural to consider how to put higher order functions to more use in combination with folds. Let's see how to express function composition; the input is a sequence of unary functions: [f, g, h] and the output is a single function that implements f(g(h(...))). Note this operation is right-associative, so it's a natural candidate for foldr:

identity = lambda x: x

def fcompose_with_foldr(fseq):
    return foldr(lambda seqval, acc: lambda x: seqval(acc(x)), identity, fseq)

In this case seqval and acc are both functions. Each step in the fold consumes a new function from the sequence and composes it on top of the accumulator (which is the function composed so far). The initial value for this fold has to be the identity for the composition operation, which just happens to be the identity function.

>>> f = fcompose_with_foldr([lambda x: x+1, lambda x: x*7, lambda x: -x])
>>> f(8)

Let's take this trick one step farther. Recall how I said foldr is limited to right-associative operations? Well, I lied a little. While it's true that the fundamental recursive pattern expressed by foldr is right-associative, we can use the function composition trick to evaluate some operation on a sequence in a left-associative way. Here's the digits-to-a-number function with foldr:

def digits2num_with_foldr(seq):
    composed = foldr(
                lambda seqval, acc: lambda n: acc(n * 10 + seqval),
    return composed(0)

To understand what's going on, manually trace the invocation of this function on some simple sequence like [1, 2, 3]. The key to this approach is to recall that foldr gets to the end of the list before it actually starts applying the function it folds. The following is a careful trace of what happens, with the folded function replaced by g for clarify.

digits2num_with_foldl([1, 2, 3])
-> foldr(g, identity, [1, 2, 3])
-> g(1, foldr(g, identity, [2, 3]))
-> g(1, g(2, foldr(g, identity, [3])))
-> g(1, g(2, g(3, foldr(g, identity, []))))
-> g(1, g(2, g(3, identity)))
-> g(1, g(2, lambda n: identity(n * 10 + 3)))

Now things become a bit trickier to track because of the different anonymous functions and their bound variables. It helps to give these function names.

<f1 = lambda n: identity(n * 10 + 3)>
-> g(1, g(2, f1))
-> g(1, lambda n: f1(n * 10 + 2))
<f2 = lambda n: f1(n * 10 + 2)>
-> g(1, f2)
-> lambda n: f2(n * 10 + 1)

Finally, we invoke this returned function on 0:

f2(0 * 10 + 1)
-> f1(1 * 10 + 2)
-> identity(12 * 10 + 3)
-> 123

In other words, the actual computation passed to that final identity is:

((1 * 10) + 2) * 10 + 3

Which is the left-associative application of the folded function.

Expressing foldl with foldr

After the last example, it's not very surprising that we can take this approach to its logical conclusion and express the general foldl by using foldr. It's just a generalization of digits2num_with_foldr:

def foldl_with_foldr(func, init, seq):
    composed = foldr(
                lambda seqval, acc: lambda n: acc(func(n, seqval)),
    return composed(init)

In fact, the pattern expressed by foldr is very close to what is called primitive recursion by Stephen Kleene in his 1952 book Introduction to Metamathematics. In other words, foldr can be used to express a wide range of recursive patterns. I won't get into the theory here, but Graham Hutton's article A tutorial on the universality and expressiveness of fold is a good read.

foldr and foldl in Haskell

Now I'll switch gears a bit and talk about Haskell. Writing transformations with folds is not really Pythonic, but it's very much the default Haskell style. In Haskell recursion is the way to iterate.

Haskell is a lazily evaluated language, which makes the discussion of folds a bit more interesting. While this behavior isn't hard to emulate in Python, the Haskell code dealing with folds on lazy sequences is pleasantly concise and clear.

Let's starts by implementing product and double - the functions this article started with. Here's the function computing a product of a sequence of numbers:

myproduct [] = 1
myproduct (x:xs) = x * myproduct xs

And a sample invocation:

*Main> myproduct [2,4,6,8]

The function doubling every element in a sequence:

mydouble [] = []
mydouble (x:xs) = [2 * x] ++ mydouble xs

Sample invocation:

*Main> mydouble [2,4,6,8]

IMHO, the Haskell variants of these functions make it very obvious that a right-fold recursive pattern is in play. The pattern matching idiom of (x:xs) on sequences splits the "head" from the "tail" of the sequence, and the combining function is applied between the head and the result of the transformation on the tail. Here's foldr in Haskell, with a type declaration that should help clarify what goes where:

myfoldr :: (b -> a -> a) -> a -> [b] -> a
myfoldr _ z [] = z
myfoldr f z (x:xs) = f x (myfoldr f z xs)

If you're not familiar with Haskell this code may look foreign, but it's really a one-to-one mapping of the Python code for foldr, using some Haskell idioms like pattern matching.

These are the product and doubling functions implemented with myfoldr, using currying to avoid specifying the last parameter:

myproductWithFoldr = myfoldr (*) 1

mydoubleWithFoldr = myfoldr (\x acc -> [2 * x] ++ acc) []

Haskell also has a built-in foldl which performs the left fold. Here's how we could write our own:

myfoldl :: (a -> b -> a) -> a -> [b] -> a
myfoldl _ z [] = z
myfoldl f z (x:xs) = myfoldl f (f z x) xs

And this is how we'd write the left-associative function to convert a sequence of digits into a number using this left fold:

digitsToNumWithFoldl = myfoldl (\acc x -> acc * 10 + x) 0

Folds, laziness and infinite lists

Haskell evaluates all expressions lazily by default, which can be either a blessing or a curse for folds, depending on what we need to do exactly. Let's start by looking at the cool applications of laziness with foldr.

Given infinite lists (yes, Haskell easily supports infinite lists because of laziness), it's fairly easy to run short-circuiting algorithms on them with foldr. By short-circuiting I mean an algorithm that terminates the recursion at some point throughout the list, based on a condition.

As a silly but educational example, consider doubling every element in a sequence but only until a 5 is encountered, at which point we stop:

> foldr (\x acc -> if x == 5 then [] else [2 * x] ++ acc) [] [1,2,3,4,5,6,7]

Now let's try the same on an infinite list:

> foldr (\x acc -> if x == 5 then [] else [2 * x] ++ acc) [] [1..]

It terminates and returns the right answer! Even though our earlier stack trace of folding makes it appear like we iterate all the way to the end of the input list, this is not the case for our folding function. Since the folding function doesn't use acc when x == 5, Haskell won't evaluate the recursive fold further [3].

The same trick will not work with foldl, since foldl is not lazy in its second argument. Because of this, Haskell programmers are usually pointed to foldl', the eager version of foldl, as the better option. foldl' evaluates its arguments eagerly, meaning that:

  1. It won't support infinite sequences (but neither does foldl!)
  2. It's significantly more efficient than foldl because it can be easily turned into a loop (note that the recursion in foldl is a tail call, and the eager foldl' doesn't have to build a thunk of increasing size due to laziness in the first argument).

There is also an eager version of the right fold - foldr', which can be more efficient than foldr in some cases; it's not in Prelude but can be imported from Data.Foldable [4].

Folding vs. reducing

Our earlier discussion of folds may have reminded you of the reduce built-in function, which seems to be doing something similar. In fact, Python's reduce implements the left fold where the first element in the sequence is used as the zero value. One nice property of reduce is that it doesn't require an explicit zero value (though it does support it via an optional parameter - this can be useful when the sequence is empty, for example).

Haskell has its own variations of folds that implement reduce - they have the digit 1 as suffix: foldl1 is the more direct equivalent of Python's reduce - it doesn't need an initializer and folds the sequence from the left. foldr1 is similar, but folds from the right. Both have eager variants: foldl1' and foldr1'.

I promised to revisit calculating the ratio of a sequence; here's a way, in Haskell:

myratioWithFoldl = foldl1 (/)

The problem with using a regular foldl is that there's no natural identity value to use on the leftmost side of a ratio (on the rightmost side 1 works, but the associativity is wrong). This is not an issue for foldl1, which starts the recursion with the first item in the sequence, rather than an explicit initial value.

*Main> myratioWithFoldl [3,2,2]

Note that foldl1 will throw an exception if the given sequence is empty, since it needs at least one item in there.

Folding arbitrary data structures

The built-in folds in Haskell are defined on lists. However, lists are not the only data structure we should be able to fold. Why can't we fold maps (say, summing up all the keys), or even custom data structures? What is the minimum amount of abstraction we can extract to enable folding?

Let's start by defining a simple binary tree data structure:

data Tree a = Empty | Leaf a | Node a (Tree a) (Tree a)
  deriving Show

-- A sample tree with a few nodes
t1 = Node 10 (Node 20 (Leaf 4) (Leaf 6)) (Leaf 7)

Suppose we want to fold the tree with (+), summing up all the values contained within it. How do we go about it? foldr or foldl won't cut it here - they expect [a], not Tree a. We could try to write our own foldr:

foldTree :: (b -> a -> a) -> a -> Tree b -> a
foldTree _ z Empty = z
foldTree f z (Leaf x) = ??
foldTree f (Node x left right) = ??

There's a problem, however. Since we want to support an arbitrary folding result, we're not quite sure what to substitute for the ??s in the code above. In foldr, the folding function takes the accumulator and the next value in the sequence, but for trees it's not so simple. We may encounter a single leaf, and we may encounter several values to summarize; for the latter we have to invoke f on x as well as on the result of folding left and right. So it's not clear what the type of f should be - (b -> a -> a) doesn't appear to work [5].

A useful Haskell abstraction that can help us solve this problem is Monoid. A monoid is any data type that has an identity element (called mempty) and an associative binary operation called mappend. Monoids are, therefore, amenable to "summarization".

foldTree :: Monoid a => (b -> a) -> Tree b -> a
foldTree _ Empty = mempty
foldTree f (Leaf x) = f x
foldTree f (Node x left right) = (foldTree f left) <> f x <> (foldTree f right)

We no longer need to pass in an explicit zero element: since a is a Monoid, we have its mempty. Also, we can now apply a single (b -> a) function onto every element in the tree, and combine the results together into a summary using a's mappend (<> is the infix synonym of mappend).

The challenge of using foldTree is that we now actually need to use a unary function that returns a Monoid. Luckily, Haskell has some useful built-in monoids. For example, Data.Monoid.Sum wraps numbers into monoids under addition. We can find the sum of all elements in our tree t1 using foldTree and Sum:

> foldrTree Sum t1
Sum {getSum = 47}

Similarly, Data.Monoid.Product wraps numbers into monoids under multiplication:

> foldrTree Product t1
Product {getProduct = 33600}

Haskell provides a built-in typeclass named Data.Foldable that only requires us to implement a similar mapping function, and then takes care of defining many folding methods. Here's the instance for our tree:

instance Foldable Tree where
  foldMap f Empty = mempty
  foldMap f (Leaf x) = f x
  foldMap f (Node x left right) = foldMap f left <> f x <> foldMap f right

And we'll automatically have foldr, foldl and other folding methods available on Tree objects:

> Data.Foldable.foldr (+) 0 t1

Note that we can pass a regular binary (+) here; Data.Foldable employs a bit of magic to turn this into a properly monadic operation. We get many more useful methods on trees just from implementing foldMap:

> Data.Foldable.toList t1
> Data.Foldable.elem 6 t1

It's possible that for some special data structure these methods can be implemented more efficiently than by inference from foldMap, but nothing is stopping us from redefining specific methods in our Foldable instance. It's pretty cool, however, to see just how much functionality can be derived from having a single mapping method (and the Monoid guarantees) defined. See the documentation of Data.Foldable for more details.

[1]Note that I'm using Python 3 for all the code in this article; hence, Python 3's division semantics apply.
[2]Division has a problem with not having a natural "zero" element; therefore, it's more suitable for foldl1 and reduce, which are described later on.
[3]I'm prefixing most functions here with my since they have Haskell standard library builtin equivalents; while it's possible to avoid the name clashes with some import tricks, custom names are the least-effort approach, also for copy-pasting these code snippets into a REPL.
[4]I realize this is a very rudimentary explanation of Haskell laziness, but going deeper is really out of scope of this article. There are plenty of resources online to read about lazy vs. eager evaluation, if you're interested.
[5]We could try to apply f between the leaf value and z, but it's not clear in what order this should be done (what if f is sensitive to order?). Similarly for a Node, since there are no guarantees on the associativity of f, it's hard to predict what is the right way of applying it multiple times.

Clojure - the perfect language to expand your brain?

I sometimes get asked for recommendations of what programming language to "learn next". It's great that folks appreciate the fact that being exposed to several sufficiently different programming languages is an important step on the path to programming enlightenment. But it's also true that, given limited free time, the choice of which languages to learn is important. As the famous quote by Alan Perlis says:

A language that doesn't affect the way you think about programming, is not worth knowing.

In this post I want to explain why I think Clojure is a terrific language for this task. Whatever your background, it's almost guaranteed to affect the way you think about programming. It's a fairly new language (from 2007) that did a good job of collecting insights and paradigms from many existing languages and organizing them into a coherent whole. It's chock-full of great ideas. In other words, the perfect language to expand your brain.

Brain with Clojure inside

It's a Lisp!

First and foremost, Clojure belongs to the Lisp family of languages, like Common Lisp, Scheme and Racket. Lisp is one of the oldest and still most important families of programming languages to be familiar with. Being a Lisp in this case means several things:

  • Clojure is a dynamic language. No matter where you stand on the static vs. dynamic typing debate, knowing languages in both camps is important. Clojure has a kind of optional typing, but in essence it's dynamic.
  • It promotes combinations of built-in data structures (lists, maps, vectors) over objects and has very good reasons for doing so.
  • It promotes programming with higher-order functions. Some built-in features like reducers and transducers rely heavily on composing higher order functions for transforming other functions.
  • It has uniform syntax with full Lisp-style macro support! This capability has proved to be very elusive for non-Lisp languages, and seeing some real macros in action is enlightening. Clojure has a whole Go-like CSP equivalent implemented using macros in core.async.
  • It encourages REPL-based development, where functionality is built and testsed gradually from the bottom up in an interactive terminal.

Importantly, Clojure is a very modern Lisp, with tons of libraries for tasks common in our day and age. It also builds upon important recent research in data structures, such as efficient persistent vectors, that only appeared in the 1990s or later. I've been dabbling with various Lisps for almost 15 years, and Clojure is the first Lisp I'd consider using in production.

I want the stress the importance of full Lisp-style macros once again. Lisp's (and Clojure's) uniform syntax and macros force us to think about the Meta-circular abstraction - there's just no way around it! We have to think of our code as data, and this has profound implications. Back in 2005, when I was still a programming neophyte, I faced a performance challenge for a data-decoding program I was hacking on in Perl. Having learned some Lisp, I had the insight of generating and evaluating new Perl code on the fly, which turned out to be key to the success of that program. I'm confident I could not have come up with that solution back then without being aware of the "Lisp way".

Pragmatism and hard-won experience

The designers of Clojure are extremely pragmatic, building upon decades of industry experience. I strongly recommend watching some YouTube videos of talks by leading Clojure developers. In particular, watch a few of the more popular talks by Rich Hickey - Clojure's original designer. The Alan Perlis quote mentioned above may be his most popular, but there's another I find at least as insightful:

Simplicity does not precede complexity, but follows it.

There's a certain quality in programmers I believe we can all recognize. It can only come from real, hard-won experience of building systems for many years and pondering about how to build such systems more effectively next time. Rich Hickey certainly belongs to this quality category - his talks are very insightful, and this philosophy reigns in the design and implementation of Clojure, as well as its (friendly and vibrant) community.

Sequences and laziness for powerful in-language data processing

Applications that process and extract useful bits of information from large datasets all look alike in many programming languages, at least to some extent. What we really want in many cases is SQL-like primitives built into our languages, but this is often challenging (.NET's LINQ is one example of a successful approach).

Clojure combines pervasive sequence protocols with persistent data structures and laziness to make this kind of task natural, using only built-in tools. Here's a realistic example of a function taken from the Clojure Applied book:

(defn revenue-by-department [carts]
  (->> (filter :settled? carts)
       (mapcat :line-items)
       (map line-summary)
       (group-by :dept)
       (reduce-kv dept-total)))

All the functions used here are built-ins, including the ->> macro. This code reads like SQL and it's efficient too. In fact, it can be made somewhat more efficient and even seamlessly parallelized using Reducers and Transducers.

The clojure.spec library provides a degree of type-safety for nested data structures. If you've ever written comments like "this maps strings to lists, where each list element is a map of ...", clojure.spec makes it more formal and verifiable. This is very useful when working with data, and is another example of where Clojure's pragmatism shines; the language is dynamic in its core, but when you need static checking - it's there, optionally (core.typed is another option for a more statically typed flavor).

The right approach to object-orientation

Historically, Lisp programmers weren't the biggest proponents of OOP. This doesn't mean that OOP has absolutely no value in Lisp-y languages, however. Common Lisp has had CLOS for decades, and Clojure also comes with an array of OO-like features.

That said, Clojure's flavor of OOP is particularly powerful and tends to discourage bad practices. Clojure uses "protocols", which are a kind of interfaces, and encourages thinking in terms of protocols rather than in terms of classes with inheritance hierarchies, sort of like Go. Add to that an ability to do true multiple dispatch and you have a powerful modeling tool at your fingertips.

Built-in support for concurrency and parallelism

It wouldn't be a real language of the 2000s without powerful support for concurrency and parallelism right at the core. Clojure supports this in many ways, which are interleaved.

First, it's a functional language with persistent data structures. Persistent data structures are effectively immutable, which makes them very attractive in a multi-threaded context. Clojure has a great implementation of persistent data structures even for things like vectors (which are quite challenging to implement in an efficient manner). This is a bigger deal than may originally appear. Pure functions are often great in theory but fail in practice due to too much copying of non-trivial data structures; Clojure's elegant usage of persistent data structures solves this problem, making pure functions efficient and thus much more applicable to a wide array of problems.

Second, Clojure doesn't lock itself in the closet of purism and does support lower-level primitives for concurrency and mutation, where needed. Atoms are built-in mutable state units with atomic updates. Refs and transactions go further, implementing STM.

Third, Clojure comes with a bunch of concurrency tools built in. Promises, futures and threads leverage the underlying JVM's threading utilities.

Finally, Clojure has a pretty good implementation of Go's CSP in the core.async module. It provides two useful things:

  1. Channels and Go select-like functionality (alts!!) which makes it easy to work with safe concurrent queues and "share data by communicating".
  2. go-blocks with non-blocking primitives that make it possible to write cooperative "green threads" with ease.

Conclusion - try it

Finally, it's important to mention that Clojure is far from being an academic exercise. It's used in production by several companies, and has strong roots in the massive Java ecosystem. Clojure's interoperability with Java is seamless - Java code can be called from Clojure with no special provisions, and all core Clojure entities are Java object deep under the hood.

On the client side, Clojurescript has been gaining traction recently as another language-compiled-to-JS option. It brings the elegance of Clojure to client-side programming, among with some unique features like core.async for callback-free concurrency.

This doesn't mean that you should only consider Clojure as a fit for your next project. On the contrary, I recommend learning if even if there's little chance of using it in production any time soon. In fact, this was exactly my situation, Clojure is a great language to expand your programming horizons; who knows, maybe in the future you'll find it useful for some real work. If you do, great! If you don't, learning it will positively affect how you use other programming languages.

Book review: "Working Effectively with Legacy Code" by Michael C. Feathers

This book comes highly recommended in programmer circles, but for some reason it took me a long while to finally get to read it. This colors my impression of it somewhat, because it's been many years since I had to deal with the kind of projects and code this book describes.

Have you ever had to deal with real legacy code? What do I mean by real? OK, let's recap. Real legacy code is frequently called "spaghetti code"; another common nickname is the big ball of mud pattern. It's code that lacks sensible structure and - worst of all - lacks tests. Classes may have had some single responsibility a long time ago, but they no longer do. Years of maintenance and under-pressure bug fixes left them leaking abstractions left and right, sending ugly dependency tentacles into each other. Touch one character and the whole house of cards may fall down on you. Unit tests are unheard of.

To be honest, having spent the last few years at Google I've been spoiled. I almost forgot how such projects look. "Luckily" I still remember, however, which makes this book an interesting and poignant read. If the above strikes no chord for you - I suggest to skip it. The book will just seem horrible to you - proposing all kinds of ungodly hacks to break up dependencies in code and make it refactorable and testable. The hacks are a good match to the foe - they're about as awful as the code itself, so young and innocent developers may find themselves (rightfully) horrified. It's only the unhealed scars of old battles that will make this book palatable.

The basic premise of the book is simple, and can be summarized as follows:

  1. To improve some piece of code, we must be able to refactor it.
  2. To be able to refactor code, we must have tests that prove our refactoring didn't break anything.
  3. To have reasonable tests, the code has to be testable; that is, it should be in a form amenable to test harnessing. This most often means breaking implicit dependencies.

... and the author spends about 400 pages on how to achieve that. This book is dense, and it took me a long time to plow through it. I started reading linerarly, but very soon discovered this approach doesn't work. So I began hopping forward and backward between the main text and the "dependency-breaking techniques" chapter which holds isolated recipes for dealing with specific kinds of dependencies. There's quite a bit of repetition in the book, which makes it even more tedious to read.

The techniques described by the author are as terrible as the code they're up against. Horrible abuses of the preprocessor in C/C++, abuses of inheritance in C++ and Java, and so on. Particularly the latter is quite sobering. If you love OOP beware - this book may leave you disenchanted, if not full of hate.

To reiterate the conclusion I already presented earlier - get this book if you have to work with old balls of mud; it will be effort well spent. Otherwise, if you're working on one of those new-age continuously integrated codebases with a 2/1 test to code ratio, feel free to skip it.