A problem that sometimes comes up with source-controlled code is to find a revision in which some line was deleted, or otherwise modified in a way that blame can't decipher. In other words, we want to grep over all revisions of some file to know which revisions contain a certain pattern. Note that the goal is not to search in the commit log (which is trivial), but rather in the code itself.

Well, if you're using Mercurial or Git, you're lucky because both provide built-in methods for doing this.

With Mercurial, use hg grep.

With Git, you can either use git grep in conjunction with git rev-list, or git log -S (more details in this SO thread).

What about Subversion, though? SVN, to the best of my knowledge, does not have this functionality built-in. Moreover, SVN's design makes this task inherently slow because no revisions past the last one are actually kept on your machine (unless the repository is local) and you have to ask the server for each revision. That's a lot of network traffic.

That said, if you're willing to tolerate the slowness (and sometimes there's no choice!), then the following script - svnrevgrep - makes it as simple as with Git or Mercurial:

import re, sys, subprocess

def run_command(cmd):
    """ Run shell command, return its stdout output.
    """
    return subprocess.check_output(cmd.split(), universal_newlines=True)

def svnrevgrep(filename, s):
    """ Go over all revisions of filename, checking if s can be found
        in them.
    """
    log = run_command('svn log ' + filename)
    for ver in re.findall('r\d+', log, flags=re.MULTILINE):
        cmd = 'svn cat -r %s %s' % (ver.rstrip('r'), filename)
        contents = run_command(cmd)
        print('%s: %s' % (ver, 'found' if re.search(s, contents)
                                       else 'not found'))
if __name__ == '__main__':
    if len(sys.argv) != 3:
        print('Usage: %s <path> <regex>' % sys.argv[0])
    else:
        svnrevgrep(sys.argv[1], sys.argv[2])

It basically goes over all revisions of the file starting with the most recent one and looks for the pattern.

Note that while one could imagine using some kind of binary searching to find the first revision in which the regex appears (or doesn't), this won't work in the general case because code sometimes is added, then deleted, then re-added, then deleted again (this happens when refactoring or when reverting problematic commits).

Finally, if you find yourself doing the above frequently for a given repository, you may be better off with:

git svn clone <path>
git grep <...>

Comments

comments powered by Disqus