When a test fails, the first instinct is usually to dive right in - go through the code, sprinkle printouts and logging statements all around, fire up the debugger, and so on. But often, the correct approach is to write additional tests. Does this sound crazy? Well, it isn't. It's one of those true and tried techniques used by many experienced programmers.

Allow me to elaborate. Tests for software fall broadly into the following categories. This taxonomy is a blunt generalization and varies considerably with software size and other characteristics. But it's not important in itself - I just use it to illustrate my point. So starting with the most specific and ending with the most general, these are:

  • Unit tests
  • Module-level tests
  • Whole program tests
  • Integration tests (with other software)

From the first item (highly specific, targeted tests) on the list to the last (very general tests), coverage grows, as well as the complexity of debugging a failure. Debugging a unit test failure is usually quite easy. Integration tests can be devilishly difficult to debug, or even isolate and analyze. An unrelated component of the system may start failing intermittently when some other component (which passed all of its own tests) is changed. A compiler may start generating wrong code, but only when bootstrapped by compiling itself. It's those cases that send programmers crying to their therapist's couch.

So back to my original point. I posit that a great way to debug general tests is to write additional specific tests. Your module-level tests are failing? Write more unit tests. Your integration tests are failing? Write additional whole program tests, module tests or unit tests. The more specific the new tests are, the better. Remember that the most specific tests are the easiest to debug.

Debugging is all about the scientific method - you make assumptions and test them. Based on the results you make additional assumptions. Rinse. Repeat. While pondering about why something doesn't work - fortify your assumptions by writing tests for them. Facing a failing high-level test, try to write a more specific test that also fails. The benefits of this are two-fold. First, this will help you find the bug. Bugs are usually results of bad assumptions. This is especially true on the high generality level when pieces of software get integrated with others. More specific tests help verify those assumptions and are easier to debug. Second, you'll have more tests in the system when this bug-hunt ends, and that's always a good thing.