Programmatically populating a Django database

This is a quick post to demonstrate a very useful way of programmatically populating the models (i.e. database) of a Django application.

The canonical way to accomplish this is fixtures - the loaddata and dumpdata commands, but these seem to be more useful when you already have some data in the DB. Alternatively, you could generate the JSON information loadable by loaddata programmatically, but this would require following its format exactly (which means observing how real dumps are structured). One could, for the very first entries, just laboriously hammer them in through the admin interface. As programmers, however, we have a natural resentment for such methods.

Since Django apps are just Python modules, there's a much easier way. The very first chapter of the Django tutorial hints at the approach by test-driving the shell management command, which opens a Python shell in which the application is accessible, so the model classes can be imported and through them data can be both examined and created.

The same tutorial also mentions that you can bypass manage.py by pointing DJANGO_SETTINGS_MODULE to your project's settings and then calling django.setup(). This provides a clue on how the same steps can be done from a script, but in fact there's an even easier way.

There's no need to bypass manage.py, since it's a wonderful convenience wrapper around the Django project administration tools. It can be used to create custom management commands - e.g. your own commands parallel to shell, dumpdata, and so on. Not only that creating such commands gives you a very succinct, boilterplate-free way of writing custom management scripts, it also gives you a natural location to house them, per application.

Here's some simple code that adds a couple of tags into a blog-like model. Let's say the application is named blogapp:

from django.core.management.base import BaseCommand
from blogapp.models import Post, Tag

class Command(BaseCommand):
    args = '<foo bar ...>'
    help = 'our help string comes here'

    def _create_tags(self):
        tlisp = Tag(name='Lisp')
        tlisp.save()

        tjava = Tag(name='Java')
        tjava.save()

    def handle(self, *args, **options):
        self._create_tags()

This code has to be placed in a file within the blogapp/management/commands directory in your project. If that directory doesn't exist, create it. The name of the script is the name of the custom command, so let's call it populate_db.py. Another thing that has to be done is creating __init__.py files in both the management and commands directories, because these have to be Python packages. The directory tree will look like this:

blogapp
├── admin.py
├── __init__.py
├── management
│   ├── commands
│   │   ├── __init__.py
│   │   └── populate_db.py
│   └── __init__.py
├── models.py
... other files

That's it. Now you should be able to invoke this command with:

$ python manage.py populate_db

All the facilities of manage.py are available, such as help:

$ python manage.py help populate_db
Usage: manage.py populate_db [options] <foo bar ...>

our help string comes here

Options:
...

Note how help and args are taken from the Command class we defined. manage.py will also pass custom positional arguments and keyword options to our command, if needed. More details on writing custom management commands are available in this Django howto.

Once you start playing with such a custom data entry script, some of the existing Django management commands may come in very useful. You can see the full list by running manage.py help, but here's a list of those I found handy in the context of this post.

For dumping, dumpdata is great. Once your data grows a bit, you may find it useful only to dump specific models, or even specific rows by specifying primary keys with --pks. I also find the --indent=2 option to be essential when doing the default JSON dumps.

The flush command will clear the DB for you. A handy "undo" for those very first forays into entering data. Be careful with this command once you have real data in the DB.

Finally, the sqlall command is very useful when you're trying to figure out the structure of your models and the connections between them. IMHO model problems are important to detect early in the development of an application.

To conclude, I just want to mention that while custom management commands live within applications, nothing ties them to a specific app. It is customary for Django management commands to accept app and model names as arguments. While a data entry command is naturally tied to some application and model, this doesn't necessarily have to be the case in general. You can even envision an "app" named my_custom_commands which you can add to projects and reuse its functionality between them.