Benefits of dependencies in software projects as a function of effort

One of the most pervasive debates in software engineering is whether dependencies are good or bad for projects. Does it make sense to implement all, or nearly all the functionality of a project on your own, or is it better to use the plethora of libraries available for many sub-tasks the project needs to perform.

The modern domain of web development places both sides of this debate in the spotlight. On one hand, it's easy to develop fairly complex web sites and applications with small teams, due to the large number of supporting libraries and frameworks available. On the other hand, this very multitude of libraries and frameworks is a problem in itself. It manifests in many ways - from the leftpad fiasco to the steep learning curve of modern web development today, where it seems that new frameworks are developed and old ones fall out of favor faster than the time it takes to learn them.

If you follow programming news, this debate is everywhere. Documentation pages like Writing Web Applications (using only the Go standard library) spark heated debates; on one side, defenders of the "no-dependencies" approach extol the virtues of clear, focused dependency-less code and its ease of maintenance and deployment. On the other side, many programmers claim that it's foolish to "reinvent the wheel", giving up on the many insights and hard-won truths absorbed and put to good use by library writers. The formerly derogatory, but now more accepted term NIH (Not Invented Here syndrome) is also used quite a bit.

I want to propose a simple formula that should help defuse many such debates, because in my opinion both sides are right - depending on the situation:

The benefit of dependencies is inversely proportional to the amount of effort spent on a software project.
benefits vs. effort plot

That's it. The more effort is spent on a project in terms of engineer-years, the less benefit dependencies have. For short, low-effort projects dependencies are hugely beneficial. For long, multi-person multi-year projects, not so much. In fact, for such longer-term projects the costs of dependencies begin to outweigh their benefits.

This observation is based on a long career developing software, managing software developers, and carefully observing the world of software development.

Take web development as an example. If you're a contractor that churns out web apps for customers every 2-3 weeks, it's almost certain your projects use libraries and frameworks. It saves so much time and effort, so why not?

However, if your company has a large and complex web app that 4 engineers have been hacking on for a couple of years (and will keep hacking on in the foreseeable future), it's likely that you only use the most foundational libraries (say, jQuery), and the rest is developed in-house.

The distinction between foundational and other libraries is a continuum; it's again a matter of scale. Few companies will reinvent a database for their project, yet if you're operating on Google's scale it can actually make sense.

reinventint the wheel

What's interesting about this is that a single project can go through different points on this dependency vs. effort curve throughout its lifetime. Many projects start small and simple, and rely on external dependencies for the heavy lifting. However, as time goes by and more effort is sunk into the project, almost inevitably the dependencies get replaced by in-house alternatives. This happens most often when the off-the-shelf dependency no longer covers all the use cases the project needs. Other reasons include faster development pace; to update a dependency, a team needs to contribute upstream and wait for the changes to be accepted and integrated. Not all teams like to wait.

A good example is 3D game developers. Almost invariably, small studios and developers start by using one of the existing game engines and focus on their game's unique content. With time, however, more and more big studios move to develop their own engines to cater for their unique needs. The effort spent on the project is now large, so dependencies are less beneficial.

One of the best articles on this subject I'm aware of is Joel Spolsky's In Defense of Not-Invented-Here Syndrome (from 2001). In that article Joel tells how the Microsoft Excel team strove to eliminate all dependencies in their project, including having their own C compiler at some point. They didn't do it because they were stupid or conceited - they did it because it made sense for their gigantic project.

Joel's point is slightly different from mine - he says that core functionalities are best developed in-house. This is true, but my formula tries to capture the picture from a different angle. When your project is starting, the web framework you use is not a core functionality - it's just a tool. With time, however, it makes more sense to treat it as core functionality, since so much effort was already spent on the project; the cost of extra effort to eliminate the dependency is diminished.

A brief tutorial on parsing reStructuredText (reST)

Docutils, the canonical library for processing and munging reStructuredText, is mostly used in an end-to-end mode where HTML or other user-consumable formats are produced from input reST files. However, sometimes it's useful to develop tooling that works on reST input directly and does something non-standard. In this case, one has to dig only a little deeper in Docutils to find useful modules to help with the task.

In this short tutorial I'm going to show how to write a tool that consumes reST files and does something other than generating HTML from them. As a simple but useful example, I'll demonstrate a link checker - a tool that checks that all web links within a reST document are valid. As a bonus, I'll show another tool that uses internal table-parsing libraries within Docutils that let us write pretty-looking ASCII tables and parse them.

Parsing reST text into a Document

This tutorial is a code walk-through for the complete code sample available online. I'll only show a couple of the most important code snippets from the full sample.

Docutils represents a reST file internally as your typical document tree (similarly to many XML and HTML parsers), where every node is of a type derived from docutils.nodes.Node. The top-level document is parsed into an object of type document [1].

We start by creating a new document with some default settings and populating it with the output of a Parser:

# ... here 'fileobj' is a file-like object holding the contents of the input
# reST file.

# Parse the file into a document with the rst parser.
default_settings = docutils.frontend.OptionParser(
document = docutils.utils.new_document(, default_settings)
parser = docutils.parsers.rst.Parser()
parser.parse(, document)

Processing a reST document with a visitor

Once we have the document, we can go through it and find the data we want. Docutils helps by defining a hierarchy of Visitor types, and a walk method on every Node that will recursively visit the subtree starting with this node. This is a very typical pattern for Python code; the standard library has a number of similar objects - for example ast.NodeVisitor.

Here's our visitor class that handles reference nodes specially:

class LinkCheckerVisitor(docutils.nodes.GenericNodeVisitor):
    def visit_reference(self, node):
        # Catch reference nodes for link-checking.

    def default_visit(self, node):
        # Pass all other nodes through.

How did I know it's reference nodes I need and not something else? Just experemintation :) Once we parse a reST document we can print the tree and it shows which nodes contain what. Coupled with reading the source code of Docutils (particularly the docutils/ module) it's fairly easy to figure out which nodes one needs to catch.

With this visitor class in hand, we simply call walk on the parsed document:

# Visit the parsed document with our link-checking visitor.
visitor = LinkCheckerVisitor(document)

That's it! To see what check_link does, check out the code sample.

Bonus: parsing ASCII grid tables with Docutils

Docutils supports defining tables in ASCII in a couple of ways; one I like in particular is "grid tables", done like this:

| Header row, column 1   | Header 2   | Header 3 | Header 4 |
| body row 1, column 1   | column 2   | column 3 | column 4 |
| body row 2             | Cells may span columns.          |
| body row 3             | Cells may  | - Table cells       |
+------------------------+ span rows. | - contain           |
| body row 4             |            | - body elements.    |

Even if we don't really care about reST but just want to be able to parse tables like the one above, Docutils can help. We can use its tableparser module. Here's a short snippet from another code sample:

def parse_grid_table(text):
    # Clean up the input: get rid of empty lines and strip all leading and
    # trailing whitespace.
    lines = filter(bool, (line.strip() for line in text.splitlines()))
    parser = docutils.parsers.rst.tableparser.GridTableParser()
    return parser.parse(docutils.statemachine.StringList(list(lines)))

The parser returns an internal representation of the table that can be easily used to analyze it or to munge & emit something else (by default Docutils can emit HTML tables from it).

One small caveat in this code to pay attention to: we need to represent the table as a list of lines (strings) and then wrap it in a docutils.statemachine.StringList object, which is a Docutils helper that provides useful analysis methods on lists of strings.

[1]David Goodger points out that Docutils uses all-lowercase class names for types that coincide with element/tag names.

Some notes on Luz - an assembler, linker and CPU simulator

A few years ago I wrote about Luz - a self-educational project to implement a CPU simulator and a toolchain for it, consisting of an assembler and a linker. Since then, I received some questions by email that made me realize I could do a better job explaining what the project is and what one can learn from it.

So I went back to the Luz repository and fixed it up to be more modern, in-line with current documentation standards on GitHub. The landing README page should now provide a good overview, but I also wanted to write up some less formal documentation I could point to - a place to show-off some of the more interesting features in Luz; a blog post seemed like the perfect medium for this.

As before, it makes sense to start with the Luz toplevel diagram:

Luz toplevel diagram

Luz is a collection of related libraries and programs written in Python, implementing all the stages shown in the diagram above.

The CPU simulator

The Luz CPU is inspired by MIPS (for the instruction set), by Altera Nios II (for the way "peripherals" are attached to the CPU), and by MPC 555 (for the memory controller) and is aimed at embedded uses, like Nios II. The Luz user manual lists the complete instruction set explaining what each instructions means.

The simulator itself is functional only - it performs the instructions one after the other, without trying to simulate how long their execution takes. It's not very remarkable and is designed to be simple and readable. The most interesting feature it has, IMHO, is how it maps "peripherals" and even CPU control registers into memory. Rather than providing special instructions or traps for OS system calls, Luz facilitates "bare-metal" programming (by which I mean, without an OS) by mapping "peripherals" into memory, allowing the programmer to access them by reading and writing special memory locations.

My inspiration here was soft-core embeddable CPUs like Nios II, which let you configure what peripherals to connect and how to map them. The CPU can be configured before it's loaded onto real HW, for example to attach as many SPI interfaces as needed. For Luz, to create a new peripheral and attach it to the simulator one implements the Peripheral interface:

class Peripheral(object):
    """ An abstract memory-mapped perhipheral interface.
        Memory-mapped peripherals are accessed through memory
        reads and writes.

        The address given to reads and writes is relative to the
        peripheral's memory map.
        Width is 1, 2, 4 for byte, halfword and word accesses.
    def read_mem(self, addr, width):
        raise NotImplementedError()

    def write_mem(self, addr, width, data):
        raise NotImplementedError()

Luz implements some built-in features as peripherals as well; for example, the core registers (interrupt control, exception control, etc). The idea here is that embedded CPUs can have multiple custom "registers" to control various features, and creating dedicated names for them bloats instruction encoding (you need 5 bits to encode one of 32 registers, etc.); it's better to just map them to memory.

Another example is the debug queue - a peripheral useful for testing and debugging. It's a single word mapped to address 0xF0000 in the simulator. When the peripheral gets a write, it stores it in a special queue and optionally emits the value to stdout. The queue can later be examined. Here is a simple Luz assembly program that makes use of it:

# Counts from 0 to 9 [inclusive], pushing these numbers into the debug queue

    .segment code
    .global asm_main

    .define ADDR_DEBUG_QUEUE, 0xF0000

    li $k0, ADDR_DEBUG_QUEUE

    li $r9, 10                          # r9 is the loop limit
    li $r5, 0                           # r5 is the loop counter

    sw $r5, 0($k0)                      # store loop counter to debug queue
    addi $r5, $r5, 1                    # increment loop counter
    bltu $r5, $r9, loop                 # loop back if not reached limit


Using the interactive runner to run this program we get:

$ python loop_simple_debugqueue
DebugQueue: 0x0
DebugQueue: 0x1
DebugQueue: 0x2
DebugQueue: 0x3
DebugQueue: 0x4
DebugQueue: 0x5
DebugQueue: 0x6
DebugQueue: 0x7
DebugQueue: 0x8
DebugQueue: 0x9
Finished successfully...
Debug queue contents:
['0x0', '0x1', '0x2', '0x3', '0x4', '0x5', '0x6', '0x7', '0x8', '0x9']


There's a small snippet of Luz assembly shown above. It's your run-of-the-mill RISC assembly, with the familiar set of instructions, fairly simple addressing modes and almost every instruction requiring registers (note how we can't store into the debug queue directly, for example, without dereferencing a register that holds its address).

The Luz user manual contains a complete reference for the instructions, including their encodings. Every instruction is a 32-bit word, with the 6 high bits for the opcode (meaning up to 64 distinct instructions are supported).

The code snippet also shows off some special features of the full Luz toolchain, like the special label asm_main. I'll discuss these later on in the section about linking.

Assembly languages are usually fairly simple to parse, and Luz is no exception. When I started working on Luz, I decided to use the PLY library for the lexer and parser mainly because I wanted to play with it. These days I'd probably just hand-roll a parser.

Luz takes another cool idea from MIPS - register aliases. While the assembler doesn't enforce any specific ABI on the coder, some conventions are very important when writing large assembly programs, and especially when interfacing with routines written by other programmers. To facilitate this, Luz designates register aliases for callee-saved registers and temporary registers.

For example, the general-purpose register number 19 can be referred to in Luz assembly as $r19 but also as $s1 - the callee-saved register 1. When writing standalone Luz programs, one is free to ignore these conventions. To get a taste of how ABI-conformant Luz assembly would look, take a look at this example.

To be honest, ABI was on my mind because I was initially envisioning a full programming environment for Luz, including a C compiler. When you have a compiler, you must have some set of conventions for generated code like procedure parameter passing, saved registers and so on; in other words, the platform ABI.


In my view, one of the distinguishing features of Luz from other assembler projects out there is the linker. Luz features a full linker that supports creating single "binaries" from multiple assembly files, handling all the dirty work necessary to make that happen. Each assembly file is first "assembled" into a position-independent object file; these are glued together by the linker which applies the necessary relocations to resolve symbols across object files. The prime sieve example shows this in action - the program is divided into three .lasm files: two for subroutines and one for "main".

As we've seen above, the main subroutine in Luz is called asm_main. This is a special name for the linker (not unlike the _start symbol for modern Linux assemblers). The linker collects a set of object files produced by assembly, and makes sure to invoke asm_main from the special location 0x100000. This is where the simulator starts execution.

Luz also has the concept of object files. They are not unlike ELF images in nature: there's a segment table, an export table and a relocation table for each object, serving the expected roles. It is the job of the linker to make sense in this list of objects and correctly connect all call sites to final subroutine addresses.

Luz's standalone assembler can write an assembled image into a file in Intel HEX format, a popular format used in embedded systems to encode binary images or data in ASCII.

The linker was quite a bit of effort to develop. Since all real Luz programs are small I didn't really need to break them up into multiple assembly files; but I really wanted to learn how to write a real linker :) Moreover, as already mentioned my original plans for Luz included a C compiler, and that would make a linker very helpful, since I'd need to link some "system" code into the user's program. Even today, Luz has some "startup code" it links into every image:

# The special segments added by the linker.
# __startup: 3 words
# __heap: 1 word
LINKER_STARTUP_CODE = string.Template(r'''
        .segment __startup

    LI      $$sp, ${SP_POINTER}
    CALL    asm_main

        .segment __heap
        .global __heap
        .word 0

This code sets up the stack pointer to the initial address allocated for the stack, and calls the user's asm_main.

Debugger and disassembler

Luz comes with a simple program runner that will execute a Luz program (consisting of multiple assembly files); it also has an interactive mode - a debugger. Here's a sample session with the simple loop example shown above:

$ python -i loop_simple_debugqueue

LUZ simulator started at 0x00100000

[0x00100000] [lui $sp, 0x13] >> set alias 0
[0x00100000] [lui $r29, 0x13] >> s
[0x00100004] [ori $r29, $r29, 0xFFFC] >> s
[0x00100008] [call 0x40003 [0x10000C]] >> s
[0x0010000C] [lui $r26, 0xF] >> s
[0x00100010] [ori $r26, $r26, 0x0] >> s
[0x00100014] [lui $r9, 0x0] >> s
[0x00100018] [ori $r9, $r9, 0xA] >> s
[0x0010001C] [lui $r5, 0x0] >> s
[0x00100020] [ori $r5, $r5, 0x0] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> s
[0x0010002C] [bltu $r5, $r9, -2] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> s
[0x0010002C] [bltu $r5, $r9, -2] >> s
[0x00100024] [sw $r5, 0($r26)] >> s
[0x00100028] [addi $r5, $r5, 0x1] >> r
$r0   = 0x00000000   $r1   = 0x00000000   $r2   = 0x00000000   $r3   = 0x00000000
$r4   = 0x00000000   $r5   = 0x00000002   $r6   = 0x00000000   $r7   = 0x00000000
$r8   = 0x00000000   $r9   = 0x0000000A   $r10  = 0x00000000   $r11  = 0x00000000
$r12  = 0x00000000   $r13  = 0x00000000   $r14  = 0x00000000   $r15  = 0x00000000
$r16  = 0x00000000   $r17  = 0x00000000   $r18  = 0x00000000   $r19  = 0x00000000
$r20  = 0x00000000   $r21  = 0x00000000   $r22  = 0x00000000   $r23  = 0x00000000
$r24  = 0x00000000   $r25  = 0x00000000   $r26  = 0x000F0000   $r27  = 0x00000000
$r28  = 0x00000000   $r29  = 0x0013FFFC   $r30  = 0x00000000   $r31  = 0x0010000C

[0x00100028] [addi $r5, $r5, 0x1] >> s 100
[0x00100030] [halt] >> q

There are many interesting things here demonstrating how Luz works:

  • Note the start up at 0x1000000 - this is where Luz places the start-up segment - three instructions that set up the stack pointer and then call the user's code (asm_main). The user's asm_main starts running at the fourth instruction executed by the simulator.
  • li is a pseudo-instruction, broken into two real instructions: lui for the upper half of the register, followed by ori for the lower half of the register. The reason for this is li having a 32-bit immediate, which can't fit in a Luz instruction. Therefore, it's broken into two parts which only need 16-bit immediates. This trick is common in RISC ISAs.
  • Jump labels are resolved to be relative by the assembler: the jump to loop is replaced by -2.
  • Disassembly! The debugger shows the instruction decoded from every word where execution stops. Note how this exposes pseudo-instructions.

The in-progress RTL implementation

Luz was a hobby project, but an ambitious one :-) Even before I wrote the first line of the assembler or simulator, I started working on an actual CPU implementation in synthesizable VHDL, meaning to get a complete RTL image to run on FPGAs. Unfortunately, I didn't finish this part of the project and what you find in Luz's experimental/luz_uc directory is only 75% complete. The ALU is there, the registers, the hookups to peripherals, even parts of the control path - dealing with instruction fetching, decoding, etc. My original plan was to implement a pipelined CPU (a RISC ISA makes this relatively simple), which perhaps was a bit too much. I should have started simpler.


Luz was an extremely educational project for me. When I started working on it, I mostly had embedded programming experience and was just starting to get interested in systems programming. Luz flung me into the world of assemblers, linkers, binary images, calling conventions, and so on. Besides, Python was a new language for me at the time - Luz started just months after I first got into Python.

Its ~8000 lines of Python code are thus likely not my best Python code, but they should be readable and well commented. I did modernize it a bit over the years, for example to make it run on both Python 2 and 3.

I still hope to get back to the RTL implementation project one day. It's really very close to being able to run realistic assembly programs on real hardware (FPGAs). My dream back then was to fully close the loop by adding a Luz code genereation backend to pycparser. Maybe I'll still fulfill it one day :-)