Some notes on logging and SSH access from cron jobs

In the process of making the semi-official CPython mirror on Github auto-update, I ventured into cron-land; it's a land I've hardly been to before, so here's a quick blog post describing some of the interesting things I learned. This was written for Ubuntu 12.04, but should apply with very minimal changes to any Linux.

The basic stuff: crontab -e to edit your crontab, crontab -l to dump it to stdout.

If you're wondering which tasks cron ran recently look in /var/log/syslog.

A common problem that comes up with writing crontabs is that the environment the cron jobs are executed in are different from your normal environment. They will have your username, but not much in terms of environment variables you have that set up the way your terminal experience looks. A good way to see what kind of environment cron has when it runs your jobs is to add this rule:

*/1 * * * * env > /tmp/my-cronenv

This tells cron to dump its environment to /tmp/my-cronenv every minute. Once you have a my-cronenv file, you can reproduce running your jobs in cron's environment by running them as:

$ env - /tep/my-cronenv`  <the script>

Another common question that comes up is "how to do logging from my cron jobs?". The mechanics of logging itself depend on the language the script is written in, of course. For Python there's the logging package. But where to store those logs? If you want your logs to be where all the cool kids' logs are, that would be /var/log. But you usually don't have non-sudo permissions in that directory. So do this, replacing foobar with your username:

$ sudo mkdir /var/log/foobar_logs
$ sudo chown foobar /var/log/foobar_logs/

From now on, you're free to create new files and edit existing ones in /var/log/foobar_logs.

A hairier problem exists with SSH. Suppose that you want your cron job to log into some remote server (whether for Git access, scp, rsync, or remote command execution) for which you've diligently set up a public/private key pair. And you even went as far as to run ssh-agent on your local machine to avoid entering that pesky private key passphrase every time (you do use a passphrase for your secret key, right?) How do you make sure that your cron jobs have proper access to ssh-agent and don't need the passphrase?

There's a number of ways to go about this, but I found this walkthrough using keychain effective.

First, install the keychain program. Second, add this to your ~/.bash_profile (we don't need this to run for every terminal, just on login):

# Use keychain to keep ssh-agent information available in a file
/usr/bin/keychain $HOME/.ssh/id_rsa
source $HOME/.keychain/${HOSTNAME}-sh

Tweak as needed for the location of your private SSH keys. Also, make sure your .bash_profile is actually invoked at start-up. When logging into Ubuntu graphically, this may not be the case unless it's sourced in .profile.

Third, add this to the cron job script (if your cron job is a Python program, just wrap it in a shell script):

source $HOME/.keychain/${HOSTNAME}-sh

That's all.