What tool(s) do you use when you need to quickly search through a set of directories recursively, focusing only on C++ source code files (.cpp, .h, .hh and so on), looking for some string (or regular expression)? Oh, and if this search could also ignore some directories we really don't want to look into, like .svn, all the better.

I think it would be interesting to see what programmers answer to this question. My guess is:

  1. Newbies - will have no idea ("err, just manually grep in each directory?), or tell you to use an IDE "find in files" command.
  2. Disciples of the Unix way will probably quickly produce a concoction of find and grep, connected with pipes and xargs (Quiz: what is the shortest such command to answer all the requirements from above?)
  3. Experienced users will likely pull a ready made bash (or batch?) script that does this out of their toolbox, or say they use ack.

What is ack? Here's a short description taken directly from its home page:

ack is a tool like grep, designed for programmers with large trees of heterogeneous source code. ack is written purely in Perl, and takes advantage of the power of Perl's regular expressions.

Personally, I use ack myself. Or more precisely, have been using it until very recently. That's when I decided to write such a tool myself, in Python. This tool is called pss and is now publicly available (also on PyPI).

Here are some cool facts about pss:

  • It searches directories recursively by default.
  • It recognizes known file extensions for source code (for example, .c and .h files for C code) and lets you easily select which files you want to search (whether it's all Python files, all C files, all C and Python files, etc.)
  • You can search for patterns specified with regular expressions, and also use regular expressions to specify the file patterns to look at, in case the defaults aren't enough.
  • It ignores some well known temporary files and directories, as well as source-control directories such as .svn and .hg.
  • It produces a terminal-friendly, colored output, on Windows too! Color is used to conveniently set apart file names from the matches within them, as well as the matching portion of each line (in case you hate to scan each line looking for the actual matching string).
  • It contains a lot of options particularly helpful for searching source code.
  • It plays well as part of the Unix command line, with options that make it suitable for taking part in pipe-connected chains, if required.
  • All it requires to run (on Linux and Windows, although almost certainly on other platforms as well) is a Python installation (version 2.6 and up, including 3.x).

pss clones ack's functionality (implementing most of the features). The reason I decided to write and release it is mainly that Python is my language of choice, and installing Perl to run ack became a chore (chiefly on Windows machines, since on Linux Perl is usually installed by default). Really, the only reason I've been installing Perl on Windows boxes I had to work on in the past couple of years was to enable them to run ack.

Moreover, pss comes with a terminal-color library built-in, so unlike ack it doesn't require to install any additional modules to nicely color its output on Windows (ack requires Win32::Console::ANSI).

I have some ideas for extending pss with extra features, and wanted to be able to do that in Python, without having to dust off my Perl skills. Other Pythonistas may find pss attractive for the same reason. pss is implemented in a very modular manner - the main script is just a thin wrapper over a library which can be used programmatically for a variety of purposes. In other words, pss is quite hackable.

Finally, pss just seemed like a cool project to do. Its existence is not meant to detriment ack in any way. I've been using and enjoying ack for many years - thanks to ack's author Andy Lester for that!