Go and Algebraic Data Types



Algebraic data types (also known as variant types, sum types or discriminated unions) is a neat feature of some programming languages that lets us specify that a value might take one of several related types, and includes convenient syntax for pattern matching on these types at run-time. Here's a canonical binary tree example from Wikipedia, written in Haskell:

data Tree = Empty
          | Leaf Int
          | Node Tree Tree

A Tree can be either empty, or a leaf with one integer field, or a node with two Tree fields - its left and right children. Here's a function that finds the depth of such trees:

depth :: Tree -> Int
depth Empty = 0
depth (Leaf n) = 1
depth (Node l r) = 1 + max (depth l) (depth r)

It takes a Tree and returns an integer, and its workings are laid out by cases, depending on the run-time type of the parameter [1]. If we pass anything that's not a Tree into depth, we'll get an error. Very concise and elegant!

"Why doesn't Go have algebraic data types" is a commonly asked question, and there's a FAQ entry about it. Even more details can be found in a Reddit AMA session the Go developers did in 2016 and in this proposal issue. The short version of the answer is that it isn't clear how this feature would interact with interfaces; how would one variant type discriminate between multiple interfaces that may have the same methods, etc. Indeed, it's a tricky issue and no one has found a satisfactory solution yet.

Another common answer is that Go already supports very similar functionality via a combination of interfaces and run-time type switches. In this post I want to show how it can be done, with some examples from synthetic to "real life". The full code for the samples in this post is available here.

Here's the same binary tree type, written in Go [2]:

type Tree interface {
}

type Empty struct {
}

type Leaf struct {
    v int
}

type Node struct {
    left, right Tree
}

Tree is an interface; Empty, Leaf and Node implement the interface. Note that the Tree interface is empty, so any type implements it, even the built-in string. We can easily restrict this by providing a method that would only be implemented by the interesting types:

type Tree interface {
    isTree()
}

And we'll have to add empty imlementations for our types:

func (_ Leaf) isTree()  {}
func (_ Node) isTree()  {}
func (_ Empty) isTree() {}

Now only types with a isTree method will be allowed where the Tree interface is expected.

The depth function employs a type switch to do run-time type dispatching:

func depth(t Tree) int {
    switch nt := t.(type) {
    case Empty:
        return 0
    case Leaf:
        return 1
    case Node:
        return 1 + max(depth(nt.left), depth(nt.right))
    default:
        log.Fatalf("unexpected type %T", nt)
    }
    return 0
}

It's much more verbose than the Haskell pattern matching, but not less readable, and just as efficient. One obvious deficiency is having to manually reject other types in the default clause; Haskell's pattern matching does this automatically and also makes sure we've covered all cases.

A more realistic example

Let's turn to a more realistic example that you could legitimatley encounter in a Go program. Since I'm a compiler nerd this will be about trees again - abstract syntax trees. These data structures are a key layer in most compilers, produced by parsers and consumed by whatever comes after (type checking/inference, lowering, optimization, evaluation, etc).

For this sample I wrote a complete evaluator for a simple calculator language with arithmetic operations, variables you can set and access, and if conditionals. The full code is in parser-evaluator.go; here I'll just focus on the AST nodes and how to "pattern match" them. This is the Node interface:

type Node interface {
    isNode()
    String() string
}

It has the isNode guard method that all concrete node types will implement, along with a String method to make sure all nodes implement the fmt.Stringer interface for debugging. Here are a few sample concrete node types:

type AssignStmt struct {
    Pos  scanner.Position
    Name string
    Expr Node
}

type IfStmt struct {
    Pos  scanner.Position
    Cond Node
    Then Node
    Else Node
}

type IntConstant struct {
    Pos   scanner.Position
    Value int
}

If you look at the full code, all of these have a dummy isNode method, along with a String method.

This is how pattern matching happens in the Eval function that evaluates an expression:

func (eval *Evaluator) Eval(n Node) (int, error) {
    switch nt := n.(type) {
    case IntConstant:
        return nt.Value, nil
    // ...
    // ... more cases
    // ...
    case AssignStmt:
        val, err := eval.Eval(nt.Expr)
        if err != nil {
            return 0, err
        }
        eval.symbolTable[nt.Name] = val
        return 0, nil
    case IfStmt:
        condVal, err := eval.Eval(nt.Cond)
        if err != nil {
            return 0, err
        }
        // Lazily evaluate Then or Else based on the result of Cond.
        if condVal == 0 {
            elseVal, err := eval.Eval(nt.Else)
            if err != nil {
                return 0, err
            }
            return elseVal, nil
        } else {
            thenVal, err := eval.Eval(nt.Then)
            if err != nil {
                return 0, err
            }
            return thenVal, nil
        }
    }
    return 0, fmt.Errorf("unmatched node %s", n)
}

Again, a type switch to cleanly discriminate between all the run-time types n could take.

An even more realistic example

The evaluator shown in the previous section is something you could run into in real programs, and in fact you do. Go's own go/ast package uses the same idiom for its AST nodes [3].

Looking in src/go/ast/ast.go in Go's source code, we see:

// All statement nodes implement the Stmt interface.
type Stmt interface {
    Node
    stmtNode()
}

Node is an embedded interface for some position-related methods, and stmtNode is a dummy method only Stmt types implement. Then, looking in src/go/types/stmt.go we find many examples of the by-now familiar type switch:

// stmt typechecks statement s.
func (check *Checker) stmt(ctxt stmtContext, s ast.Stmt) {
  // ...
    switch s := s.(type) {
    case *ast.BadStmt, *ast.EmptyStmt:
        // ignore
    case *ast.IfStmt:
        check.openScope(s, "if")
        defer check.closeScope()

        check.simpleStmt(s.Init)
        var x operand
        check.expr(&x, s.Cond)
        // ...
    // ...
}

Conclusion

It seems that Go can do just fine without adding variant types, due to the challenges mentioned in the beginning of the post and the ease with which the major use-cases can be implemented using existing language features. While Go's interfaces and type switches are certainly more verbose than Haskell's ADT declarations with pattern matching, and provide a bit less static safety, they are nevertheless fully usable for the task and just as efficient.

Language design is a careful balance, and a lot can be said for keeping the language simple with a minimal number of features, even if this leads to small losses in expressivity. It's not worth adding a significant language feature just to cover for a small weakness in the existing ones that can be easily worked around.


[1]It's important to note that this is run-time type dispatch; the value has to have a run-time tag saying what its type is. This will be important in comparing with the Go implementation later on.
[2]This is not the idiomatic way of writing binary trees in Go, it's just here to provide a syntactically close comparison to the Haskell code.
[3]go/ast is a library package useful for constructing tools to process Go code. The Go compiler is now itself written in Go and has similar code in it (though AFAICT it's incompatible with the public-facing go/ast).

Measuring context switching and memory overheads for Linux threads



In this post I want to explore the costs of threads on modern Linux machines, both in terms of time and space. The background context is designing high-load concurrent servers, where using threads is one of the common schemes.

Important disclaimer: it's not my goal here to provide an opinion in the threads vs. event-driven models debate. Ultimately, both are tools that work well in some scenarios and less well in others. That said, one of the major criticisms of a thread-based model is the cost - comments like "but context switches are expensive!" or "but a thousand threads will eat up all your RAM!", and I do intend to study the data underlying such claims in more detail here. I'll do this by presenting multiple code samples and programs that make it easy to explore and experiment with these measurements.

Linux threads and NPTL

In the dark, old ages before version 2.6, the Linux kernel didn't have much specific support for threads, and they were more-or-less hacked on top of process support. Before futexes there was no dedicated low-latency synchronization solution (it was done using signals); neither was there much good use of the capabilities of multi-core systems [1].

The Native POSIX Thread Library (NPTL) was proposed by Ulrich Drepper and Ingo Molnar from Red Hat, and integrated into the kernel in version 2.6, circa 2005. I warmly recommend reading its design paper. With NPTL, thread creation time became about 7x faster, and synchronization became much faster as well due to the use of futexes. Threads and processes became more lightweight, with strong emphasis on making good use of multi-core processors. This roughly coincided with a much more efficient scheduler, which made juggling many threads in the Linux kernel even more efficient.

Even though all of this happened 13 years ago, the spirit of NPTL is still easily observable in some system programming code. For example, many thread and synchronization-related paths in glibc have nptl in their name.

Threads, processes and the clone system call

This was originally meant to be a part of this larger article, but it was getting too long so I split off a separate post on launching Linux processes and threads with clone, where you can learn about the clone system call and some measurements of how expensive it is to launch new processes and threads.

The rest of this post will assume this is familiar information and will focus on context switching and memory usage.

What happens in a context switch?

In the Linux kernel, this question has two important parts:

  1. When does a kernel switch happen
  2. How it happens

The following deals mostly with (2), assuming the kernel has already decided to switch to a different user thread (for example because the currently running thread went to sleep waiting for I/O).

The first thing that happens during a context switch is a switch to kernel mode, either through an explicit system call (such as write to some file or pipe) or a timer interrupt (when the kernel preempts a user thread whose time slice has expired). This requires saving the user space thread's registers and jumping into kernel code.

Next, the scheduler kicks in to figure out which thread should run next. When we know which thread runs next, there's the important bookkeeping of virtual memory to take care of; the page tables of the new thread have to be loaded into memory, etc.

Finally, the kernel restores the new thread's registers and cedes control back to user space.

All of this takes time, but how much time, exactly? I encourage you to read some additional online resources that deal with this question, and try to run benchmarks like lm_bench; what follows is my attempt to quantify thread switching time.

How expensive are context switches?

To measure how long it takes to switch between two threads, we need a benchmark that deliberatly triggers a context switch and avoids doing too much work in addition to that. This would be measuring just the direct cost of the switch, when in reality there is another cost - the indirect one, which could even be larger. Every thread has some working set of memory, all or some of which is in the cache; when we switch to another thread, all this cache data becomes unneeded and is slowly flushed out, replaced by the new thread's data. Frequent switches back and forth between the two threads will cause a lot of such thrashing.

In my benchmarks I am not measuring this indirect cost, because it's pretty difficult to avoid in any form of multi-tasking. Even if we "switch" between different asynchronous event handlers within the same thread, they will likely have different memory working sets and will interfere with each other's cache usage if those sets are large enough. I strongly recommend watching this talk on fibers where a Google engineer explains their measurement methodology and also how to avoid too much indirect switch costs by making sure closely related tasks run with temporal locality.

These code samples measure context switching overheads using two different techniques:

  1. A pipe which is used by two threads to ping-pong a tiny amount of data. Every read on the pipe blocks the reading thread, and the kernel switches to the writing thread, and so on.
  2. A condition variable used by two threads to signal an event to each other.
Ping pong paddles and ball

There are additional factors context switching time depends on; for example, on a multi-core CPU, the kernel can occasionally migrate a thread between cores because the core a thread has been previously using is occupied. While this helps utilize more cores, such switches cost more than staying on the same core (again, due to cache effects). Benchmarks can try to restrict this by running with taskset pinning affinity to one core, but it's important to keep in mind this only models a lower bound.

Using the two techniques I'm getting fairly similar results: somewhere between 1.2 and 1.5 microseconds per context switch, accounting only for the direct cost, and pinning to a single core to avoid migration costs. Without pinning, the switch time goes up to ~2.2 microseconds [2]. These numbers are largely consistent with the reports in the fibers talk mentioned above, and also with other benchmarks found online (like lat_ctx from lmbench).

What does this mean in practice?

So we have the numbers now, but what do they mean? Is 1-2 us a long time? As I have mentioned in the post on launch overheads, a good comparison is memcpy, which takes 3 us for 64 KiB on the same machine. In other words, a context switch is a bit quicker than copying 64 KiB of memory from one location to another.

Plot of thread/process launch and context switch

1-2 us is not a long time by any measure, except when you're really trying to optimize for extremely low latencies or high loads.

As an example of an artificially high load, here is another benchmark which writes a short message into a pipe and expects to read it from another pipe. On the other end of the two pipes is a thread that echoes one into the other.

Running the benchmark on the same machine I used to measure the context switch times, I get ~400,000 iterations per second (this is with taskset to pin to a single core). This makes perfect sense given the earlier measurements, because each iteration of this test performs two context switches, and at 1.2 us per switch this is 2.4 us per iteration.

You could claim that the two threads compete for the same CPU, but if I don't pin the benchmark to a single core, the number of iterations per second halves. This is because the vast majority of time in this benchmark is spent in the kernel switching from one thread to the other, and the core migrations that occur when it's not pinned greatly ouweigh the loss of (the very minimal) parallelism.

Just for fun, I rewrote the same benchmark in Go; two goroutines ping-ponging short message between themselves over a channel. The throughput this achieves is dramatically higher - around 2.8 million iterations per second, which leads to an estimate of ~170 ns switching between goroutines [3]. Since switching between goroutines doesn't require an actual kernel context switch (or even a system call), this isn't too surprising. For comparison, Google's fibers use a new Linux system call that can switch between two tasks in about the same time, including the kernel time.

A word of caution: benchmarks tend to be taken too seriously. Please take this one only for what it demonstrates - a largely synthetical workload used to poke into the cost of some fundamental concurrency primitives.

Remember - it's quite unlikely that the actual workload of your task will be negligible compared to the 1-2 us context switch; as we've seen, even a modest memcpy takes longer. Any sort of server logic such as parsing headers, updating state, etc. is likely to take orders of magnitude longer. If there's one takeaway to remember from these sections is that context switching on modern Linux systems is super fast.

Memory usage of threads

Now it's time to discuss the other overhead of a large number of threads - memory. Even though all threads in a process share their, there are still areas of memory that aren't shared. In the post about clone we've mentioned page tables in the kernel, but these are comparatively small. A much larger memory area that it private to each thread is the stack.

The default per-thread stack size on Linux is usually 8 MiB, and we can check what it is by invoking ulimit:

$ ulimit -s
8192

To see this in action, let's start a large number of threads and observe the process's memory usage. This sample launches 10,000 threads and sleeps for a bit to let us observe its memory usage with external tools. Using tools like top (or preferably htop) we see that the process uses ~80 GiB of virtual memory, with about 80 MiB of resident memory. What is the difference, and how can it use 80 GiB of memory on a machine that only has 16 available?

Virtual vs. Resident memory

A short interlude on what virtual memory means. When a Linux program allocates memory (with malloc) or otherwise, this memory initially doesn't really exist - it's just an entry in a table the OS keeps. Only when the program actually accesses the memory is the backing RAM for it found; this is what virtual memory is all about.

Therefore, the "memory usage" of a process can mean two things - how much virtual memory it uses overall, and how much actual memory it uses. While the former can grow almost without bounds - the latter is obviously limited to the system's RAM capacity (with swapping to disk being the other mechanism of virtual memory to assist here if usage grows above the side of physical memory). The actual physical memory on Linux is called "resident" memory, because it's actually resident in RAM.

There's a good StackOverflow discussion of this topic; here I'll just limit myself to a simple example:

int main(int argc, char** argv) {
  report_memory("started");

  int N = 100 * 1024 * 1024;
  int* m = malloc(N * sizeof(int));
  escape(m);
  report_memory("after malloc");

  for (int i = 0; i < N; ++i) {
    m[i] = i;
  }
  report_memory("after touch");

  printf("press ENTER\n");
  (void)fgetc(stdin);
  return 0;
}

This program starts by allocating 400 MiB of memory (assuming an int size of 4) with malloc, and later "touches" this memory by writing a number into every element of the allocated array. It reports its own memory usage at every step - see the full code sample for the reporting code [4]. Here's the output from a sample run:

$ ./malloc-memusage
started: max RSS = 4780 kB; vm size = 6524 kB
after malloc: max RSS = 4780 kB; vm size = 416128 kB
after touch: max RSS = 410916 kB; vm size = 416128 kB

The most interesting thing to note is how vm size remains the same between the second and third steps, while max RSS grows from the initial value to 400 MiB. This is precisely because until we touch the memory, it's fully "virtual" and isn't actually counted for the process's RAM usage.

Therefore, distinguishing between virtual memory and RSS in realistic usage is very important - this is why the thread launching sample from the previous section could "allocate" 80 GiB of virtual memory while having only 80 MiB of resident memory.

Back to memory overhead for threads

As we've seen, a new thread on Linux is created with 8 MiB of stack space, but this is virtual memory until the thread actually uses it. If the thread actually uses its stack, resident memory usage goes up dramatically for a large number of threads. I've added a configuration option to the sample program that launches a large number of threads; with it enabled, the thread function actually uses stack memory and from the RSS report it is easy to observe the effects. Curiously, if I make each of 10,000 threads use 400 KiB of memory, the total RSS is not 4 GiB but around 2.6 GiB. I suspect this is because the OS doesn't actually keep this many threads paged in at any time - some threads' memory gets paged out until the threads are active.

How to we control the stack size of threads? One option is using the ulimit command, but a better option is with the pthread_attr_setstacksize API. The latter is invoked programatically and populates a pthread_attr_t structure that's passed to thread creation. The more interesting question is - what should the stack size be set to?

As we have seen above, just creating a large stack for a thread doesn't automatically eat up all the machine's memory - not before the stack is being used. If our threads actually use large amounts of stack memory, this is a problem, because this severely limits the number of threads we can run concurrently. Note that this is not really a problem with threads - but with concurrency; if our program uses some event-driven approach to concurrency and each handler uses a large amount of memory, we'd still have the same problem.

If the task doesn't actually use a lot of memory, what should we set the stack size to? Small stacks keep the OS safe - a deviant program may get into an infinite recursion and a small stack will make sure it's killed early. Moreover, virtual memory is large but not unlimited; especially on 32-bit OSes, we might not have 80 GiB of virtual address space for the process, so a 8 MiB stack for 10,000 threads makes no sense. There's a tradeoff here, and the default chosen by 32-bit Linux is 2 MiB; the maximal virtual address space available is 3 GiB, so this imposes a limit of ~1500 threads with the default settings. On 64-bit Linux the virtual address space is vastly larger, so this limitation is less serious (though other limits kick in - on my machine the maximal number of threads the OS lets one process start is about 32K).

Therefore I think it's more important to focus on how much actual memory each concurrent task is using than on the OS stack size limit, as the latter is simply a safety measure.

Conclusion

The numbers reported here paint an interesting picture on the state of Linux multi-threaded performance in 2018. I would say that the limits still exist - running a million threads is probably not going to make sense; however, the limits have definitely shifted since the past, and a lot of folklore from the early 2000s doesn't apply today. On a beefy multi-core machine with lots of RAM we can easily run 10,000 threads in a single process today, in production. As I've mentioned above, it's highly recommended to watch Google's talk on fibers; through careful tuning of the kernel (and setting smaller default stacks) Google is able to run an order of magnitude more threads in parallel.

Whether this is sufficient concurrency for your application is very obviously project-specific, but I'd say that for higher concurrencies you'd probably want to mix in some asynchronous processing. If 10,000 threads can provide sufficient concurrency - you're in luck, as this is a much simpler model - all the code within the threads is serial, there are no issues with blocking, etc.


[1]For example, in order to implement POSIX semantics properly, a single thread was designated as a "manager" and managed operations like "create a new thread". This created an unfortunate serialization point and a bottleneck.
[2]These numbers also vary greatly between CPUs. The numbers reported herein are on my Haswell i7-4771. On a different contemporary machine (a low-end Xeon) I measured switch times that were about 50-75% longer.
[3]Curiously, pinning the Go program to a single core (by means of setting GOMAXPROCS=1 and running with taskset) increases the throughput by only by 10% or so. The Go scheduler is not optimized for this strange use case of endless hammering between two goroutine, but it performs very well regardless.
[4]Note that while for resident memory there's a convenient getrusage API, to report virtual memory size we have to parse /proc/PID/status.

On the uses and misuses of panics in Go



Go has a unique approach to error handling, with a combination of explicit error values and an exception-like panic mechanism. In this post I'm looking at the philosophical aspects of panic, trying to understand some of the conflicting guidelines coming from the Go team.

Most of all, it's a story of pragmatism in language design, with some musings on the merits and dangers of a pragmatic approach.

Introduction - errors and exceptions

Many programming languages support exceptions as a standard way of handling errors - for example Java or Python. While certainly convenient, exceptions have many issues as well, which is why they're frowned upon by other languages or their style guides. The main criticism of exceptions is that they introduce a "side channel" for control flow; when reading the code, you also have to keep in mind the path in which exceptions can flow and this makes reasoning about some code very difficult [1].

Let's talk about error handling in Go to make this concrete. I assume you know how "standard" error handling in Go works - it's quite hard to miss! Here's how we open a file:

f, err := os.Open("file.txt")
if err != nil {
  // handle error here
}
// do stuff with f here

If the file is missing, say, os.Open will return a non-nil error. In some other languages, errors are done differently. For example, Python's built-in open function will raise an exception if something is wrong:

try:
  with open("file.txt") as f:
    # do stuff with f here
except OSError as err:
  # handle exception err here

While error handling via exceptions in Python is consistent, it's also a target of criticism because of its pervasiveness. Even iterators use exceptions to signal the end of a sequence (StopIteration). The main question is "what does exceptional really mean?". Here's a relevant comment by Rob Pike from a mailing list discussion where the modern incarnation of Go's panic/recover mechanism is proposed:

This is exactly the kind of thing the proposal tries to avoid. Panic and recover are not an exception mechanism as usually defined because the usual approach, which ties exceptions to a control structure, encourages fine-grained exception handling that makes code unreadable in practice. There really is a difference between an error and what we call a panic, and we want that difference to matter. Consider Java, in which opening a file can throw an exception. In my experience few things are less exceptional than failing to open a file, and requiring me to write inside-out code to handle such a quotidian operation feels like a Procrustean imposition.

To be objective, exceptions have proponents that scoff at Go's explicit error handling for many reasons. For one, note the order of the code in the two minimal samples above. In Python the primary flow of the program immediately follows the open call, and the error handling is delegated to a later stage (not to mention that in many cases the exception will be caught higher up the call stack and not in this function at all). In Go, on the other hand, error handling comes first and may obscure the main flow of the program. Moreover, Go's error handling is very verbose - this being one of the major criticisms of the language. I'll mention one potential way to address this later in the post.

In addition to Rob's quote above, Go's philosophy towards exceptions is summarized well in the FAQ:

We believe that coupling exceptions to a control structure, as in the try-catch-finally idiom, results in convoluted code. It also tends to encourage programmers to label too many ordinary errors, such as failing to open a file, as exceptional.

However, in some cases having an exception-like mechanism is actually useful; in a high-level language like Go it's even essential, I'd say. This is why panic and recover exist.

The need to occasionally panic

Go is a safe language with run-time checks for some serious programming errors. For example, when you try to access a slice out of bounds, the result is not undefined behavior. Rather, the Go runtime will panic. For example, this minimal program:

package main

import (
  "fmt"
)

func main() {
  s := make([]string, 3)
  fmt.Println(s[5])
}

Will die with a runtime error followed by a stack trace:

panic: runtime error: index out of range

goroutine 1 [running]:
main.main()
  /tmp/sandbox209906601/main.go:9 +0x40

Other common things that panic include accessing nil structure fields through pointers, closing an already-closed channel, etc. What are the alternatives to panicking here? We could make slice access return a result, err pair, and we could also make slice element assignment a function that potentially returns an error, but this would result in cumbersome code. Imagine rewriting this snippet, where foo, bar and baz are all slices of strings:

foo[i] = bar[i] + baz[i]

Into something like:

br, err := bar[i]
if err != nil {
  return err
}
bz, err := baz[i]
if err != nil {
  return err
}
err := assign_slice_element(foo, i, br + bz)
if err != nil {
  return err
}

Looks like fun? Nope. Different languages handle this in different ways: Python and Java throw an exception if i points out of bounds in either of the slices/lists/arrays. C will emit code without bounds checking, with a good chance of accessing/trampling the wrong memory area, crashing the process or exposing it to security vulnerabilities. C++ takes the middle way, with some "performance oriented" modes that are unsafe and other modes (std::vector::at) that could throw an exception.

Since the verbosity of the rewritten snippet above is unacceptable, Go chose to have panics, which is an exception-like mechanism reserved for truly exceptional conditions such as bugs in the code.

This isn't restricted to built-ins; user code can also panic if needed. Sometimes code encounters errors that can only mean something went horribly wrong - a bug, or some key invariant being violated. For example, a switch case that just can't happen in the current context. The panic function exists for just such cases; it's morally equivalent to Python's raise and C++'s throw. That said, subtle but powerful restrictions on how exceptions are caught make Go's exception handling unique.

Restrictions on panic recovery in Go

When a panic is not caught/recovered anywhere in the calling stack, it will end up crashing the program with a stack dump, as shown above. This behavior is pretty useful for debugging, but not ideal in realistic scenarios. If we're writing a server that serves many clients, we don't want it to immediately crash because of an internal bug in some data parsing library it's using. It would be much better to catch such an error, log it, and keep serving other clients. Go's recover function can help with that. Here's a code sample from Effective Go demonstrating this:

func server(workChan <-chan *Work) {
    for work := range workChan {
        go safelyDo(work)
    }
}

func safelyDo(work *Work) {
    defer func() {
        if err := recover(); err != nil {
            log.Println("work failed:", err)
        }
    }()
    do(work)
}

Even though Go's panic and recover resemble exceptions at first glance, they come with some important and deliberate limitations that makes them less prone to common problems with exception-heavy code. Here's another quote from the FAQ:

Go also has a couple of built-in functions to signal and recover from truly exceptional conditions. The recovery mechanism is executed only as part of a function's state being torn down after an error, which is sufficient to handle catastrophe but requires no extra control structures and, when used well, can result in clean error-handling code.

Rob Pike's quote from the thread I linked to earlier is also relevant:

Our proposal instead ties the handling to a function - a dying function - and thereby, deliberately, makes it harder to use. We want you think of panics as, well, panics! They are rare events that very few functions should ever need to think about. If you want to protect your code, one or two recover calls should do it for the whole program. If you're already worrying about discriminating different kinds of panics, you've lost sight of the ball.

The specific limitation is that recover can only be called in a defer code block, which cannot return control to an arbitrary point, but can only do clean-ups and tweak the function's return values. The Python file-opening exception handling shown above can't work in Go - we can't just catch OSError and try to open another file (or create the file we failed to open) without significant restructuring of the code.

This limitation is coupled with an important coding guideline - keep panics within package boundaries. It's a good practice for packages not to panic in their public interfaces. Rather, the public-facing functions and methods should recover from panics internally and translate them to error messages. This makes panics friendlier to use, though high-availability servers will still likely want to have outer recovers installed just in case.

The siren call of panic

Every language feature is destined to be misused; this is just a fact of programming life, and it's not different for Go's panic. This is not to say that all misuess are categorically wrong though, just that the feature ends up being used for goals it was not originally designed to fulfill.

Consider this real example from the scanInt method in fmt/scan.go of the Go (1.10) standard library:

func (s *ss) scanInt(verb rune, bitSize int) int64 {
  if verb == 'c' {
    return s.scanRune(bitSize)
  }
  s.SkipSpace()
  s.notEOF()
  base, digits := s.getBase(verb)
  // ... other code
}

Each one of the methods SkipSpace, notEOF and getBase can fail, but where is the error handling? In fact, this package - like several others in the standard library - is using panics for some of its error handling internally. A panic from each of these will be recovered in the public API (like the Token method) and converted to an error. If we had to rewrite this code with explicit error handling, it would be more cumbersome, for sure [2]:

if err := s.SkipSpace(); err != nil {
  return err
}
if err := s.notEOF(); err != nil {
  return err
}
base, digits, err := s.getBase(verb)
if err != nil {
  return err
}
// ... other code

Of course, panic is not the only way to solve this. As Rob Pike says, Errors are Values and thus they are programmable, and we could devise some clever way to make the code flow better without using an exception-like escape mechanism. Other languages have useful features that would make it much simpler; for example Rust has the ? operator [3] that propagates an error returned from a given expression automatically, so in a hypothetical syntax we could write:

s.SkipSpace()?
s.notEOF()?
base, digits := s.getBase(verb)?

But we don't have this in Go (yet?), so the core Go team made the choice to use panics instead. They even condone this pattern in Effective Go:

With our recovery pattern in place, the do function (and anything it calls) can get out of any bad situation cleanly by calling panic. We can use that idea to simplify error handling in complex software.

And it's being used in several more places; a few I found with a quick search:

  • fmt/scan.go
  • json/encode.go
  • text/template/parse/parser.go

But isn't this... wrong?

I emphatize with folks lured by the siren, its call is strong here! But I also can't shake off the feeling that this goes against the principles designed into the language. In the quote shown above Rob Pike says:

In my experience few things are less exceptional than failing to open a file

But what is less exceptional than running into an unexpected character while parsing? Isn't it the most common kind of error a parser encounters? Pike goes on to say:

We want you think of panics as, well, panics! They are rare events that very few functions should ever need to think about.

But is a parsing error rare? And very many functions in fmt/scan.go have to "think about" panics because that's what they use for signaling errors!

If you're already worrying about discriminating different kinds of panics, you've lost sight of the ball.

But here is errorHandler from fmt/scan.go:

func errorHandler(errp *error) {
  if e := recover(); e != nil {
    if se, ok := e.(scanError); ok { // catch local error
      *errp = se.err
    } else if eof, ok := e.(error); ok && eof == io.EOF { // out of input
      *errp = eof
    } else {
      panic(e)
    }
  }
}

Is this not "worrying about discriminating different kinds of panics"?

Conclusion - pragmatism vs. purity

It's not my intention to attack the Go standard library developers here. As I've mentioned, I fully see why panics are attractive in some cases where call stacks are deep and sequences of error-signaling operations are common. I really hope Go will introduce some syntax that will make propagating an error easier, which would render this discussion moot.

Sometimes, it's better to be pragmatic than a zealot. If a certain language feature is really helpful in solving a problem, even outside of its classical domain of use, it may be better to use it than to ardently sticking to principles and ending up with convoluted code. Kind-of like my old defense of using goto for error handling in C. The Go guidelines are clear and the restrictions on recover are craftily placed - even when used for control flow in parsers, it's much harder to misuse than classical exceptions.

Interestingly, when this problem first drew my attention I was looking into the source of the json/encode.go package. It turns out that it was recently fixed to use classical error handling! Yes, some code turned more verbose, from:

if destring {
  switch qv := d.valueQuoted().(type) {
    case nil:
      d.literalStore(nullLiteral, subv, false)
    case string:
      d.literalStore([]byte(qv), subv, true)
    // ... other code

To:

if destring {
  q, err := d.valueQuoted()
  if err != nil {
    return err
  }
  switch qv := q.(type) {
  case nil:
    if err := d.literalStore(nullLiteral, subv, false); err != nil {
      return err
    }
  case string:
    if err := d.literalStore([]byte(qv), subv, true); err != nil {
      return err
    }

But overall it's not that bad and certainly wouldn't look unfamiliar to a Go coder. And it gives me hope :-)


[1]C++'s set of exception safety guarantees is a good example of some of the complexities involved.
[2]If you spend some time reading the mailing list discussion where the recover mechanism was proposed, you'll find Russ Cox mentioning a similar issue with parsing a binary stream and how to propagate errors through the process.
[3]Even C++ has a similar pattern that you will find in some codebases where a standard return type is used. Macros commonly named ASSIGN_OR_RETURN are popular in C++ code released by Google and show up in other places like LLVM.