In the past few weeks there appears to be a resurgence of interest in assembly programming; most likely, this is due to the release of the source code for the Appolo 11 guidance computer on GitHub - a truly awesome repository, if you haven't seen it yet.
This inspired me to dig up a project I did ~3 years ago - a reimplementation of the wc command-line tool in pure x64 assembly. It's been open on GitHub from the start, but I never really mentioned it anywhere. Summoning my best powers of imagination, I named the project... wait for it... wcx64. It's ~400 lines of assembly code (most of which are comments) in the default gas syntax.
As a compiler hacker, and an embedded programmer back in the day, I've done my share of writing and digging in assembly code; wrote my own assembler, and did work on LLVM's assembler. There's an Assembly tag in this blog with a bunch of posts being assembly-related. I wrote "production" assembly for many architectures - from small 8-bit controllers, to x64, to obscure DSPs. However, almost all of this code was self-contained for very specific tasks.
The idea of wcx64 was to understand how realistic programs could be written from start to end, including dealing with the OS, the file system, input-output and so on. It's a nice "code kata" exercise I find useful when exploring new programming languages, because you get to do a lot of things "real" programs do, just confined to a very simple task. Here are some of the interesting things you'll find in the code:
- Processing command-line arguments (argc and argv).
- Reading from files and from the standard input and writing to standard output, using system calls.
- Writing functions that adhere to the x64 ABI calling convention, including passing and returning arguments and callee-saved registers.
- Fundamentals of string processing: very simple parsing of text using a two-state state machine, converting numbers to strings, etc.
And some facts about the outcome:
- wcx64 doesn't need any C runtime/support code to work. It invokes Linux system calls directly, and is completely self-contained.
- When assembled and linked, the binary size is 6.5 Kib.
- It's fast! On a couple of samples I tried, it's between 6 and 13 times faster than the command-line wc tool for processing 1 GiB files.
The performance is surprising to me. I didn't expect the difference to be this great. True, the inner loop of wcx64 is tight assembly, but I really didn't spend any time optimizing it, opting for clarity instead. My guess is that the real wc supports more features, like multi-byte characters.