Revisiting "Let's Build a Compiler"



There's an old compiler-building tutorial that has become part of the field's lore: the Let's Build a Compiler series by Jack Crenshaw (published between 1988 and 1995).

I ran into it in 2003 and was very impressed, but it's now 2025 and this tutorial is still being mentioned quite often in Hacker News threads. Why is that? Why does a tutorial from 35 years ago, built in Pascal and emitting Motorola 68000 assembly - technologies that are virtually unknown for the new generation of programmers - hold sway over compiler enthusiasts? I've decided to find out.

The tutorial is easily available and readable online, but just re-reading it seemed insufficient. So I've decided on meticulously translating the compilers built in it to Python and emit a more modern target - WebAssembly. It was an enjoyable process and I want to share the outcome and some insights gained along the way.

The result is this code repository. Of particular interest is the TUTORIAL.md file, which describes how each part in the original tutorial is mapped to my code. So if you want to read the original tutorial but play with code you can actually easily try on your own, feel free to follow my path.

A sample

To get a taste of the input language being compiled and the output my compiler generates, here's a sample program in the KISS language designed by Jack Crenshaw:

var X=0

 { sum from 0 to n-1 inclusive, and add to result }
 procedure addseq(n, ref result)
     var i, sum  { 0 initialized }
     while i < n
         sum = sum + i
         i = i + 1
     end
     result = result + sum
 end

 program testprog
 begin
     addseq(11, X)
 end
 .

It's from part 13 of the tutorial, so it showcases procedures along with control constructs like the while loop, and passing parameters both by value and by reference. Here's the WASM text generated by my compiler for part 13:

(module
  (memory 8)
  ;; Linear stack pointer. Used to pass parameters by ref.
  ;; Grows downwards (towards lower addresses).
  (global $__sp (mut i32) (i32.const 65536))

  (global $X (mut i32) (i32.const 0))

  (func $ADDSEQ (param $N i32) (param $RESULT i32)
    (local $I i32)
    (local $SUM i32)
    loop $loop1
      block $breakloop1
        local.get $I
        local.get $N
        i32.lt_s
        i32.eqz
        br_if $breakloop1
        local.get $SUM
        local.get $I
        i32.add
        local.set $SUM
        local.get $I
        i32.const 1
        i32.add
        local.set $I
        br $loop1
      end
    end
    local.get $RESULT
    local.get $RESULT
    i32.load
    local.get $SUM
    i32.add
    i32.store
  )

  (func $main (export "main") (result i32)
    i32.const 11
    global.get $__sp      ;; make space on stack
    i32.const 4
    i32.sub
    global.set $__sp
    global.get $__sp
    global.get $X
    i32.store
    global.get $__sp    ;; push address as parameter
    call $ADDSEQ
    ;; restore parameter X by ref
    global.get $__sp
    i32.load offset=0
    global.set $X
    ;; clean up stack for ref parameters
    global.get $__sp
    i32.const 4
    i32.add
    global.set $__sp
    global.get $X
  )
)

You'll notice that there is some trickiness in the emitted code w.r.t. handling the by-reference parameter (my previous post deals with this issue in more detail). In general, though, the emitted code is inefficient - there is close to 0 optimization applied.

Also, if you're very diligent you'll notice something odd about the global variable X - it seems to be implicitly returned by the generated main function. This is just a testing facility that makes my compiler easy to test. All the compilers are extensively tested - usually by running the generated WASM code [1] and verifying expected results.

Insights - what makes this tutorial so special?

While reading the original tutorial again, I had on opportunity to reminisce on what makes it so effective. Other than the very fluent and conversational writing style of Jack Crenshaw, I think it's a combination of two key factors:

  1. The tutorial builds a recursive-descent parser step by step, rather than giving a long preface on automata and table-based parser generators. When I first encountered it (in 2003), it was taken for granted that if you want to write a parser then lex + yacc are the way to go [2]. Following the development of a simple and clean hand-written parser was a revelation that wholly changed my approach to the subject; subsequently, hand-written recursive-descent parsers have been my go-to approach for almost 20 years now.
  2. Rather than getting stuck in front-end minutiae, the tutorial goes straight to generating working assembly code, from very early on. This was also a breath of fresh air for engineers who grew up with more traditional courses where you spend 90% of the time on parsing, type checking and other semantic analysis and often run entirely out of steam by the time code generation is taught.

To be honest, I don't think either of these are a big problem with modern resources, but back in the day the tutorial clearly hit the right nerve with many people.

What else does it teach us?

Jack Crenshaw's tutorial takes the syntax-directed translation approach, where code is emitted while parsing, without having to divide the compiler into explicit phases with IRs. As I said above, this is a fantastic approach for getting started, but in the latter parts of the tutorial it starts showing its limitations. Especially once we get to types, it becomes painfully obvious that it would be very nice if we knew the types of expressions before we generate code for them.

I don't know if this is implicated in Jack Crenshaw's abandoning the tutorial at some point after part 14, but it may very well be. He keeps writing how the emitted code is clearly sub-optimal [3] and can be improved, but IMHO it's just not that easy to improve using the syntax-directed translation strategy. With perfect hindsight vision, I would probably use Part 14 (types) as a turning point - emitting some kind of AST from the parser and then doing simple type checking and analysis on that AST prior to generating code from it.

Conclusion

All in all, the original tutorial remains a wonderfully readable introduction to building compilers. This post and the GitHub repository it describes are a modest contribution that aims to improve the experience of folks reading the original tutorial today and not willing to use obsolete technologies. As always, let me know if you run into any issues or have questions!


[1]This is done using the Python bindings to wasmtime.
[2]By the way, gcc switched from YACC to hand-written recursive-descent parsing in the 2004-2006 timeframe, and Clang has been implemented with a recursive-descent parser from the start (2007).
[3]

Concretely: when we compile subexpr1 + subexpr2 and the two sides have different types, it would be mighty nice to know that before we actually generate the code for both sub-expressions. But the syntax-directed translation approach just doesn't work that way.

To be clear: it's easy to generate working code; it's just not easy to generate optimal code without some sort of type analysis that's done before code is actually generated.


Recent posts

2025.11.24: Notes on the WASM Basic C ABI
2025.10.25: LaTeX, LLMs and Boring Technology
2025.10.11: Notes on using LaTeX to generate formulae
2025.09.30: Summary of reading: July - September 2025
2025.09.27: Consistent hashing
2025.09.06: Hilbert space: treating functions as vectors
2025.08.26: Implementing Forth in Go and C
2025.07.04: Notes on even and odd functions
2025.06.30: Summary of reading: April - June 2025
2025.06.14: Notes on Cramer's rule

See Archives for a full list.