"Beating" C with 400 lines of unoptimized assembly



Earlier this week I ran into a fun quick blog post named Beating C with 70 lines of Go, which reimplements the basic functionality of wc in Go using various approaches and compares their performance. Apparently it's inspired by an earlier Haskell-based post and several other offshoots.

This reminded me of my earlier post about reimplementing wc in pure x64 assembly, where I also measured the performance of my program against wc.

The optimized approach taken in the Go implementation is very similar to what I did in assembly, so it seemed like an interesting comparison. I started by generating a ~580 MiB file using xmlgen and ran the various versions against each other:

  • LC_TYPE=POSIX wc: 2.13 sec
  • wc-naive.go: 3.53 sec
  • wc-chunks.go: 1.37 sec
  • wcx64: 1.2 sec

Note the LC_TYPE setting for the system's wc. This is important for a fair comparison, because without this wc will attempt to do utf-8 decoding on all bytes in the file, which results in significant slowdowns. Since the Go versions use byte-counts and so does my wcx64, I force the comparison to be fair. In fact, this isn't a bad result for Go - the straightforward solution is almost as fast as the same approach direct-coded in assembly!

The Go blog post follows with parallelized versions which are much faster than the serial one, but I'm excluding it here because all the other competitors are single-threaded. This is not a serious benchmark anyway. If you prefer to be serious, this response using SIMD-optimized C blows everything out of the water:

  • fastlwc: 0.11 sec

The conclusion? Well, there's no real conclusion here, beyond that coding exercises like this are fun in any language :-)


Recent posts

2019.11.06: How to send good pull requests on GitHub
2019.10.21: Diffie-Hellman Key Exchange
2019.10.01: Simple Go project layout with modules
2019.09.30: Summary of reading: July - September 2019
2019.09.16: Go internals: capturing loop variables in closures
2019.09.03: RSA - theory and implementation
2019.08.28: The Chinese Remainder Theorem
2019.08.03: AES encryption of files in Go
2019.07.22: Faster XML stream processing in Go
2019.07.15: Passing callbacks and pointers to Cgo

See Archives for a full list.