Announcing pss: a tool for searching inside source code

October 14th, 2011 at 11:29 am

What tool(s) do you use when you need to quickly search through a set of directories recursively, focusing only on C++ source code files (.cpp, .h, .hh and so on), looking for some string (or regular expression)? Oh, and if this search could also ignore some directories we really don’t want to look into, like .svn, all the better.

I think it would be interesting to see what programmers answer to this question. My guess is:

  1. Newbies – will have no idea ("err, just manually grep in each directory?), or tell you to use an IDE "find in files" command.
  2. Disciples of the Unix way will probably quickly produce a concoction of find and grep, connected with pipes and xargs (Quiz: what is the shortest such command to answer all the requirements from above?)
  3. Experienced users will likely pull a ready made bash (or batch?) script that does this out of their toolbox, or say they use ack.

What is ack? Here’s a short description taken directly from its home page:

ack is a tool like grep, designed for programmers with large trees of heterogeneous source code. ack is written purely in Perl, and takes advantage of the power of Perl’s regular expressions.

Personally, I use ack myself. Or more precisely, have been using it until very recently. That’s when I decided to write such a tool myself, in Python. This tool is called pss and is now publicly available (also on PyPI).

Here are some cool facts about pss:

  • It searches directories recursively by default.
  • It recognizes known file extensions for source code (for example, .c and .h files for C code) and lets you easily select which files you want to search (whether it’s all Python files, all C files, all C and Python files, etc.)
  • You can search for patterns specified with regular expressions, and also use regular expressions to specify the file patterns to look at, in case the defaults aren’t enough.
  • It ignores some well known temporary files and directories, as well as source-control directories such as .svn and .hg.
  • It produces a terminal-friendly, colored output, on Windows too! Color is used to conveniently set apart file names from the matches within them, as well as the matching portion of each line (in case you hate to scan each line looking for the actual matching string).
  • It contains a lot of options particularly helpful for searching source code.
  • It plays well as part of the Unix command line, with options that make it suitable for taking part in pipe-connected chains, if required.
  • All it requires to run (on Linux and Windows, although almost certainly on other platforms as well) is a Python installation (version 2.6 and up, including 3.x).

pss clones ack’s functionality (implementing most of the features). The reason I decided to write and release it is mainly that Python is my language of choice, and installing Perl to run ack became a chore (chiefly on Windows machines, since on Linux Perl is usually installed by default). Really, the only reason I’ve been installing Perl on Windows boxes I had to work on in the past couple of years was to enable them to run ack.

Moreover, pss comes with a terminal-color library built-in, so unlike ack it doesn’t require to install any additional modules to nicely color its output on Windows (ack requires Win32::Console::ANSI).

I have some ideas for extending pss with extra features, and wanted to be able to do that in Python, without having to dust off my Perl skills. Other Pythonistas may find pss attractive for the same reason. pss is implemented in a very modular manner – the main script is just a thin wrapper over a library which can be used programmatically for a variety of purposes. In other words, pss is quite hackable.

Finally, pss just seemed like a cool project to do. Its existence is not meant to detriment ack in any way. I’ve been using and enjoying ack for many years – thanks to ack’s author Andy Lester for that!

Related posts:

  1. Analyzing C source code
  2. Browsing Python source code with Vim
  3. Choosing an open-source license for my code
  4. annoying tool problems at work
  5. Posting source code to Blogger

38 Responses to “Announcing pss: a tool for searching inside source code”

  1. oriNo Gravatar Says:
    find -regex ".*\.\(cpp\|h\|hh\)" ! -path '.svn' -exec grep include {} +
  2. Tyberius PrimeNo Gravatar Says:

    I’ve been using grin (http://pypi.python.org/pypi/grin) for just this purpose – seems there could be some useful cross pollination of the two projects possible.

  3. elibenNo Gravatar Says:

    ori,

    Ouch, yeah, something like that. My wrists ache just from looking at that line ;-)

    Tyberius,

    I was actually looking at grin prior to starting this project. It has some different design goals. I do want ack-like behavior of just knowing which file extensions to look for, given a “type”. grin is closer to vanilla grep in this respect.

  4. ripper234No Gravatar Says:

    Well, I guess I fall under your newbie category, because I always use my IDEs “Find in files” / “Find in project”. For Visual Studio and IntelliJ IDEA those find tools have always worked wonderfully for me.

  5. elibenNo Gravatar Says:

    Ron,

    Yay, the flame-bait worked! :-)

    Seriously, an IDE only solves a part of the problem. I can think of two main reasons:

    1. No single IDE will help when you have to work with many languages & file types in a single day (i.e. C++, Python, Javascript, XML and various textual config files)
    2. When working from the command-line, it’s a shame to leave it in order to go to an IDE. A tool integrated into the command-line mixes in harmoniously with the working flow. Not that I want this to become an IDE vs. command-line war…

    When working mostly on a single large project written in the same environment (i.e. Visual Studio for C++ code), I also use the Find in Files dialog because it’s integrated into the environment. pss integrates into a command-line environment.

  6. Julien OsterNo Gravatar Says:

    Sounds neat. I mostly use GNU global (and its emacs integration), though it has a different focus. It’s great for navigating through code, but not really meant for full searches. For that I use one of those find/grep/xargs-contraptions you mentioned, sitting in my history and getting bigger over time. Next time I’ll try yours instead.

  7. Pekka KlärckNo Gravatar Says:

    A colleague just introduced me to ack and having a Python powered version that I can easily install to Windows when needed sounds great. Unfortunately sudo pip install pss failed for me on Ubuntu with the following error:

    Running setup.py install for pss
    error: file ‘/path/build/pss/scripts/pss’ does not exist

  8. elibenNo Gravatar Says:

    Pekka

    Yes, I’m aware of this problem which AFAIK happens only on Python 2.6, because of some known pip problem. I’m working to fix it really soon, but the installation from source should work fine anyway.

    Update 15:37: it has now been fixed (in version 0.32) – pip installation should now work fine for Python 2.6

  9. Pekka KlärckNo Gravatar Says:

    Fix confirmed also on my system. Thanks!

    Seems to work great. Adding few examples to –help might be a good idea, though.

  10. elibenNo Gravatar Says:

    Pekka,

    Thanks for checking!

    Examples are available here: https://bitbucket.org/eliben/pss/wiki/Usage (this page is linked from the README). I prefer to keep the --help as concise as possible to serve as a reference for users after they gain the initial experience with the tool.

  11. EtienneNo Gravatar Says:

    Look nice!

    Unfortunately my first test was not too successful :-(

    > pss --python render(
    -bash: syntax error near unexpected token `(' # oh, yes, that's normal, my mistake!
    
    > pss --python render\(
    Traceback (most recent call last):
      File "/usr/local/bin/pss", line 2, in <module>
        from psslib.pss import main; main()
      File "/Library/Python/2.6/site-packages/psslib/pss.py", line 95, in main
        ncontext_after=ncontext_after)
      File "/Library/Python/2.6/site-packages/psslib/driver.py", line 168, in pss_run
        max_match_count=max_match_count)
      File "/Library/Python/2.6/site-packages/psslib/contentmatcher.py", line 54, in __init__
        literal_pattern=literal_pattern)
      File "/Library/Python/2.6/site-packages/psslib/contentmatcher.py", line 99, in _create_regex
        regex = re.compile(pattern, re.I if ignore_case else 0)
      File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.py", line 188, in compile
        return _compile(pattern, flags)
      File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.py", line 243, in _compile
        raise error, v # invalid expression
    sre_constants.error: unbalanced parenthesis
    
    > pss --python "render("
    Traceback (most recent call last):
      File "/usr/local/bin/pss", line 2, in <module>
        from psslib.pss import main; main()
      File "/Library/Python/2.6/site-packages/psslib/pss.py", line 95, in main
        ncontext_after=ncontext_after)
      File "/Library/Python/2.6/site-packages/psslib/driver.py", line 168, in pss_run
        max_match_count=max_match_count)
      File "/Library/Python/2.6/site-packages/psslib/contentmatcher.py", line 54, in __init__
        literal_pattern=literal_pattern)
      File "/Library/Python/2.6/site-packages/psslib/contentmatcher.py", line 99, in _create_regex
        regex = re.compile(pattern, re.I if ignore_case else 0)
      File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.py", line 188, in compile
        return _compile(pattern, flags)
      File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.py", line 243, in _compile
        raise error, v # invalid expression
    sre_constants.error: unbalanced parenthesis
    
    > pss --python "render\("
    It works!
    
    > pss --python -Q "render("
    It works!
    
    > pss --python render\\\(
    It also works!

    So it means that the search term is always treated as a regex. That can complicated the process sometimes. Most of the time I’m searching only for normal “words”, not regex.

    (Actually, most of the time I’m using TextWrangler/BBEdit to do this kind of search, simple and powerful and I can open the file directly.)

  12. Andy LesterNo Gravatar Says:

    That’s great, Eli. I’m glad you like ack, and I’m glad that you made your own tool to scratch your own itch.

    My plan for http://betterthangrep.com/ is to have all sorts of different tools that are better than grep, not just ack. When that time comes, I’d be glad to have pss on there.

  13. Andy LesterNo Gravatar Says:

    ori: Sure, you CAN write out a find/grep pipeline. No one is doubting your Unix chops. But why would you want to?

  14. roy_huNo Gravatar Says:

    Besides being Windows friendly, what benefits do I get over ack on Linux/Mac?

  15. NickNo Gravatar Says:

    Cool to see a python replacement, and cool to see a windows installation-friendly focus. But I think ack’s ability to run as a single file with no “installation” on unix is a killer feature that pss should emulate.

    I suppose you can’t get around requiring an entire hg checkout for the script, but you should make sure it is possible to run the utility directly from the checkout, without requiring any installation. Custom system-wide pylib installations suck on systems that have package managers, and user-directory pylib installations are a pain if you use virtualenvs. I prefer to just have a src checkout of a utility and then symlink the main script into my ~/bin folder.

    But pss fails when run without installation, because the script can’t find psslib when it’s run out of the checkout. I think all you have to do is move the script out of the ‘scripts’ subfolder to the root, and then the script will be able to import psslib without requiring psslib be installed to a site-directory.

    Then people who don’t want to do a permanent install of pss can just do a hg checkout and add a softlink from ~/bin/pss to ~/utils/pss/pss (or wherever they stick the checkout). It’s also super easy to keep up to date then, just a single hg pull -u command.

  16. elibenNo Gravatar Says:

    Etienne,

    Yes, the search pattern is regex by default, and as you discovered, the -Q flag is the cure when all you want is a simple string. I think that the case you present is rare enough to not significantly affect the choice of regex as default – after all, if a literal string was the default, a flag would have to passed each time a regex is required, and that would quickly become tiresome.

    roy_hu,

    There’s nothing more to it than I explained in the post itself, really. If you’re a Python programmer, it may appeal to you to have the tool written in Python thus lending itself to deeper understanding and modification. If you don’t care about that (which is totally legitimate), ack is a fine tool and does the work for you.

  17. elibenNo Gravatar Says:

    Nick,

    That’s easily solvable! All you have to do is run:

    > PYTHONPATH=/path/to/pss/root /path/to/pss/script <args>

    I actually have this attached to an alias pss, so whenever I run pss on my box it always uses the latest source version.

  18. LeighNo Gravatar Says:

    Can you put in a –emacs or -n flag which makes it output
    filename:lineno:…matching line…

    that’s the same format used by ‘grep -n’ and ack and ‘ant -emacs’ Then it will work with emacs.

  19. Pekka KlärckNo Gravatar Says:

    Noticed two limitations:

    - There’s no -l switch like in ack and grep.
    - Colors should be turned off if sys.stdout.isattty() is False (like ack and grep do)

    At least the latter was easy to implement:
    https://bitbucket.org/eliben/pss/pull-request/2/highlight-colors-by-default-only-when

  20. Jack DiederichNo Gravatar Says:

    alias ag=’ack-grep –python –ignore-dir=migrations’

  21. elibenNo Gravatar Says:

    Leigh and Pekka,

    It would be great if you could open Issues on the pss Bitbucket page (https://bitbucket.org/eliben/pss/issues) about these feature requests. Thanks in advance.

    Jack,

    Could you please elaborate? The context of your comment is unclear.

  22. Nick CoghlanNo Gravatar Says:

    Eli, you may want to look into including a “__main__.py” file in the top level directory, and potentially even publishing a drop in executable “pss” zip file.

    Very cool idea, though.

  23. uNo Gravatar Says:

    It does not apply to all projects, but git grep is another solution.

  24. AdamNo Gravatar Says:

    Ummm… eclipse solves this pretty well :)

  25. NNMNo Gravatar Says:

    I’ve been thinking of making such a program for a LONG time. Glad to see there is some interest in it.
    Some requirements: Lightweight, NO indexing, NO background services, Open Source, NO copyright…
    Always assumed Microsoft would add that feature to VS, but it’s not, as far as I know (which limits to VS2008 sp1).
    Will probably make it some day, code will be available on “the code project”.

  26. JonNo Gravatar Says:

    cscope (mostly) does this too. It dates back all the way to 80s at Bell Labs.

    You put a cool spin on the idea though. Good work!

  27. James LaingNo Gravatar Says:

    For anyone who works under Windows, you might like to try Pipelines: http://www.tenfiftytwo.co.uk/pipelines.

  28. SlinkyNo Gravatar Says:

    Excuberant ctags, anyone?

    You can integrate it with Vim, too…

  29. elibenNo Gravatar Says:

    Jon and Slinky,

    ctags and cscope are great tools, and I actually use them both (integrated into Vim). But pss does not really compete with them directly, but rather co-exists with them.

  30. Nemesis FixxNo Gravatar Says:

    c(tags|scope) these are the tools from God himself!

  31. TimoNo Gravatar Says:

    git grep ftw

  32. HiankunNo Gravatar Says:

    I’ve just tried pss with my cpp source code. That’s easy and helpful.
    Thank you for the great work.

  33. elibenNo Gravatar Says:

    u and Timo,

    git grep appears to be useful for Git-based repositories, but obviously it’s not a general purpose solution. Besides, pss has more features :)

  34. dhunterNo Gravatar Says:

    That’s a really nice tool, kudos!!

  35. bryaneNo Gravatar Says:

    Okay, I’m late to the game. The shortest script is “grep -r ” – recursive grep. No need for find or xargs.

  36. elibenNo Gravatar Says:

    bryane,

    And how do you ignore certain directories you’re not interested in?

  37. OlivierNo Gravatar Says:

    For emacs users, there are a lot of integrated alternatives. I took (once) the time to install grep-o-matic http://www.emacswiki.org/emacs/GrepMode#toc16 and it works just great. Searching a keyword in a repository is now only a shortcut away.

  38. elibenNo Gravatar Says:

    Olivier,

    I agree editor-integrated search tools are sometimes more convenient, but no always – occasionally you’re just working in the terminal. Besides, creating an Emacs plugin for pss shouldn’t be hard – I think someone is already working on one for Vim.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)