I have a few years of experience debugging software, and still do it on a regular basis. But in the past half year I'm also in the hardware world and I'm getting more and more convinced that debugging hardware is much, much harder.

While as a software developer I have a debugger, print-outs, memory dumps, various memory monitoring utilities, and what not, a hardware designer has almost nothing (well, comparatively). I have 16 puny bits I can show on the Logic Analyzer, and that's it! So there's much more of the think-try-disappoint cycle than in software. The hardest bugs in software come from its huge complexity. In programs that are only a few thousand lines of code bugs won't stay for long - they're easy to find. However, there can be very severe problems even in the simplest hardware.

For instance, yesterday we had the following bug:

Using the CPU, we read from some FPGA register. When the FPGA is not loaded, we should read all 0s, as there are pull downs on the bus. To our wonder, we read some weird number - 0x831B. We felt helpless, stripping the register reading code to assembly, and seeing how we keep reading this weird number. But then, we noticed something interesting - the opcode of the command 2 lines lower was 0x831B! What the hell?

It appeared the problem was the following: the CPU reads its own code from the RAM, using the same bus. It has a pipeline, so it may read a few commands forward. Apparently, the CPU had read the 0x831B instruction from the RAM, and then actually went to read from the FPGA. But, the pull-downs weren't fast enough, and the traces of 0x831B stayed on the bus, which we read, as if from the FPGA.

Solution: increase wait cycles when reading from FPGA - it gives time to the pull downs to pull the bus to 0s.

Hairy, but pretty exciting. After all, it's nice to see all these things happen. It was harder to believe when I only studied about it in Uni.