Bootstrapping virtualenv

April 20th, 2013 at 5:18 am

The packaging situation in Python is "imperfect" for a good reason – packaging is simply a very difficult problem to solve (see the amount of effort poured into Linux distribution package management for reference). One of the core issues is that project X may require version V of library L, and when you come to install project Y it may refuse to work with that version and require a newer one, with which project X can’t work. So you’re in an impasse.

The solution many Python programmers and projects have adopted is to use virtualenv. If you haven’t heard about virtualenv, you’re missing out – go read about it now.

I’m not going to write a tutorial about virtualenv or extoll its virtues here – enough bits have been spilled about this on the net already. What I plan to do is share an interesting problem I ran into and the solution I settled on.

I had to install some packages (Sphinx and related tools) on a new machine into a virtualenv. But the machine only had a basic Python installation, without setuptools or distribute, and without virtualenv. These aren’t hard to install, but I wondered if there’s an easy way to avoid installing anything. Turns out there is.

The idea is to create a "bootstrap" virtual environment that would have all the required tools to create additional virtual environments. It turns out to be quite easy with the following script (inspired by the answer in this SO discussion):

import sys
import subprocess

VENV_VERSION = '1.9.1'
PYPI_VENV_BASE = 'http://pypi.python.org/packages/source/v/virtualenv'
PYTHON = 'python2'
INITIAL_ENV = 'py-env0'

def shellcmd(cmd, echo=True):
    """ Run 'cmd' in the shell and return its standard out.
    """
    if echo: print '[cmd] {0}'.format(cmd)
    out = subprocess.check_output(cmd, stderr=sys.stderr, shell=True)
    if echo: print out
    return out

dirname = 'virtualenv-' + VENV_VERSION
tgz_file = dirname + '.tar.gz'

# Fetch virtualenv from PyPI
venv_url = PYPI_VENV_BASE + '/' + tgz_file
shellcmd('curl -O {0}'.format(venv_url))

# Untar
shellcmd('tar xzf {0}'.format(tgz_file))

# Create the initial env
shellcmd('{0} {1}/virtualenv.py {2}'.format(PYTHON, dirname, INITIAL_ENV))

# Install the virtualenv package itself into the initial env
shellcmd('{0}/bin/pip install {1}'.format(INITIAL_ENV, tgz_file))

# Cleanup
shellcmd('rm -rf {0} {1}'.format(dirname, tgz_file))

The script downloads and unpacks a recent virtualenv (substitute your desired version in VENV_VERSION) from PyPI and uses it directly (without installing) to create a new virtual env. By default, virtualenv will install setuptools and pip into this environment. Then, the script also installs virtualenv into the same environment. This is the bootstrap part.

Voila! py-env0 (or whatever you substituted in INITIAL_ENV) is now a self-contained virtual environment with all the tools you need to create new environments and install stuff into them.

This script is for Python 2 but can be trivially adapted for Python 3. In Python 3, the situation is actually more interesting. Python 3.3 (which is really the one you ought to be using if you’ve switched to 3 already) comes with virtualenv in the standard library (venv package), so downloading and installing it is not required.

That said, its virtualenv will not install setuptools and pip into the environments it creates. So YMMV here: if you need setuptools and pip there, go with a variation of the script above. If not, you don’t need anything special really, just use the python3.3 -m venv.

P.S. The packaging situation is getting better though. There was a lot of focus during the recent PyCon on this. One of the interesting announcements was that distribute is merging back into setuptools.

Related posts:

  1. Installing Python 2.7 on Ubuntu
  2. Automating boring testing activities with tox
  3. bootstrapping parsers
  4. Installing Python 2.5 on Bluehost

15 Responses to “Bootstrapping virtualenv”

  1. xiscuNo Gravatar Says:

    Hi Eli,
    by clicking on the link in “”"…adopted is to use virtualenv….”"” one gets redirected to:
    http://eli.thegreenplace.net/2013/04/20/bootstrapping-virtualenv/www.virtualenv.org/
    (a 404)

  2. elibenNo Gravatar Says:

    @xiscu: fixed, thank you.

  3. JohnNo Gravatar Says:

    Why write something like this in python instead of bash?

  4. elibenNo Gravatar Says:

    @John,

    I dislike bash scripting for many reasons. Mostly because the language is awful, and non-trivial scripts get unreadable. And there’s an unwritten rule that every bash script starts small and grows to be much larger with time. Instead of rewriting it in Python when it gets too hard to understand, I just start with Python in the first place. Note that given some support functions like shellcmd above, it’s not really longer than a corresponding bash script.

  5. Piotr DobrogostNo Gravatar Says:

    What do you think about J.F. Sebastian’s solution at http://stackoverflow.com/a/12946537/95735 ?

  6. nilujeNo Gravatar Says:

    There are issues with virtualenv that are difficult to solve.

    - updates. This is not really related to virtualenv but a problem between PIP and the distribution packaging system. I use Debian a lot in production on which Python packages are almost always outdated (when they exist). I only see two solutions to have more updated packages : deploy everything in virtualenv (and upgrades must be done by someone who knows the Python ecosystem) or package everything in a debian format (a lot of work, as every project I create uses a lot of different modules)
    - install on production servers. At my company, our servers don’t have access to the internet. There’s nothing unnecessary on them, it means that there’s no compiler. I could create a virtualenv on a build machine and transfer the whole virtualenv to the production server but it wouldn’t work because there are hardcoded paths in a virtualenv. Of course, the workaround is to create the virtualenv on the same exact location on both machines.

    Have you ever tried buildout (http://www.buildout.org/)? I talked on IRC with a guy a few months ago, and he told me that the fact that I don’t have a compiler on the production server could be solved (not easily though) using it.
    Maybe do you know another option?

    Regards,

  7. elibenNo Gravatar Says:

    @Piotr: Looks very similar to my code. Anything special I should be seeing there?

    @niluje: I actually think that distributing projects within virtualenvs with all the dependencies setup is a great solution. The --relocatable option helps with the renaming / moving to another directory problem.

  8. Christian HeimesNo Gravatar Says:

    I have some improvement proposals for your script:

    * use https
    * include and verify MD5 sum of virtualenv package
    * use sys.executable instead of a hard coded Python interpreter
    * unpack files into a tempfile.mkdtemp() directory and remove it with shutils.rmtree() afterwards
    * don’t use curl, use Python’s urllib
    * don’t use the tar command, use Python’s tarfile module
    * don’t run subprocesses with shell=True, use a list of arguments instead

  9. Tom LynnNo Gravatar Says:

    I wrote a similar script for this, sandboxer.py , and set up a tinyurl.com forwarder. This means you can set up a virtualenv with e.g. Sphinx and sphinx-pyreverse (and their dependencies) with the one liner:

    python -m urllib http://tinyurl.com/sandboxer | python - Sphinx sphinx-pyreverse

    This creates a ./sandbox virtualenv directory with the packages installed (add “-s DIRNAME” to use a different location).

    Downsides of this approach: it doesn’t track the latest virtualenv, the URL forwarder points at the latest dev version of the script so is no good for repeatability, and “python -m urllib” is a poor replacement for curl (no HTTPS, poor error handling). Most of these can be fixed by downloading sandboxer.py or using a particular version of it (either by full URL or via another URL shortener). Significant missing features: support for using local pypi mirrors and --relocatable — I accept pull requests :-) .

  10. elibenNo Gravatar Says:

    @Christian: I kind-of like this intermediate step in bash-compatibility. I think there’s a certain simple charm in just a list of command-line invocations. Sure I could rewrite it all with Python modules but it would look very different then. One significant advantage of writing bash-like code for scripts like this is that you can easily test each command in isolation from the command-line. But I guess YMMV. For a “real” program, I would definitely do most of the things you suggest.

    As for HTTP & MD5, that’s just pure paranoia of a security-focused mind ;-)

  11. FrankNo Gravatar Says:

    Personally, I prefer virtualenvburrito: https://github.com/brainsik/virtualenv-burrito

    It allows for simple bootstrapping of virtualenv & virtualenvwrapper (I never use the former without the latter)

    just do:

    curl -s https://raw.github.com/brainsik/virtualenv-burrito/master/virtualenv-burrito.sh | $SHELL

    in your shell, and you’re good to go.

  12. gadaboutNo Gravatar Says:

    I’ve never understood why pythonista never evaluate 0install solution concerning packaging issues.

  13. MartinNo Gravatar Says:

    @gadabout Hey I didn’t know at all about 0install, Actually at first sight I thought hey cool, but I read into some getting started and as being used to the slickness of a PKGBUILD (Archlinux) I really dislike the xml metdata file. Otherwise it looks good.

  14. LeibovichNo Gravatar Says:

    @Eli, why didn’t you build a virtualenv in a single machine, made sure it’s relocatable by editing a few scripts, and simply rsync it to all other computers.

    This is similar to the Go-deploy-with-rsync procedure, a characteristic that makes Go really appealing IMHO.

  15. Piotr DobrogostNo Gravatar Says:

    It would be interesting to use virtualenv right from the zip, without unpacking it earlier.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)