Clearing the database with Django commands

In a previous post, I presented a method of loading initial data into a Django database by using a custom management command. An accompanying task is cleaning the database up. Here I want to discuss a few options of doing that.

First, some general design notes on Django management commands. If you run manage.py help you'll see a whole bunch of commands starting with sql. These all share a common idiom - print SQL statements to the standard output. Almost all DB engines have means to pipe commands from the standard input, so this plays great with the Unix philosophy of building pipes of single-task programs.

Django even provides a convenient shortcut for us to access the actual DB that's being used with a given project - the dbshell command.

As an example, we have the sqlflush command, which returns a list of the SQL statements required to return all tables in the database to the state they were in just after they were installed. In a simple blog-like application with "post" and "tag" models, it may return something like:

$ python manage.py sqlflush
BEGIN;
DELETE FROM "auth_permission";
DELETE FROM "auth_group";
DELETE FROM "django_content_type";
DELETE FROM "django_session";
DELETE FROM "blogapp_tag";
DELETE FROM "auth_user_groups";
DELETE FROM "auth_group_permissions";
DELETE FROM "auth_user_user_permissions";
DELETE FROM "blogapp_post";
DELETE FROM "blogapp_post_tags";
DELETE FROM "auth_user";
DELETE FROM "django_admin_log";

COMMIT;

Note there's a lot of tables here, because the project also installed the admin and auth applications from django.contrib.

We can actually execute these SQL statements, and thus wipe out all the DB tables in our database, by running:

$ python manage.py sqlflush | python manage.py dbshell

For this particular sequence, since it's so useful, Django has a special built-in command named flush.

But there's a problem with running flush that may or may not bother you, depending on what your goals are. It wipes out all tables, and this means authentication data as well. So if you've created a default admin user when jump-starting the application, you'll have to re-create it now.

Perhaps there's a more gentle way to delete just your app's data, without messing with the other apps? Yes. In fact, I'm going to show a number of ways.

First, let's see what other existing management commands have to offer. sqlclear will emit the commands needed to drop all tables in a given app. For example:

$ python manage.py sqlclear blogapp
BEGIN;
DROP TABLE "blogapp_tag";
DROP TABLE "blogapp_post";
DROP TABLE "blogapp_post_tags";

COMMIT;

So we can use it to target a specific app, rather than using the kill-all approach of flush. There's a catch, though. While flush runs delete to wipe all data from the tables, sqlclear removes the actual tables. So in order to be able to work with the database, these tables have to be re-created. Worry not, there's a command for that:

$ python manage.py sql blogapp
BEGIN;
CREATE TABLE "blogapp_post_tags" (
    "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
    "post_id" integer NOT NULL REFERENCES "blogapp_post" ("id"),
    "tag_id" varchar(50) NOT NULL REFERENCES "blogapp_tag" ("name"),
    UNIQUE ("post_id", "tag_id")
)
;
CREATE TABLE "blogapp_post" (
    "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
 <.......>
)
;
CREATE TABLE "blogapp_tag" (
 <.......>
)
;

COMMIT;

So here's a first way to do a DB cleanup: pipe sqlclear appname into dbshell. Then pipe sql appname to dbshell.

An alternative way, which I like less, is to take the subset of DELETE statements generated by sqlflush, save them in a text file, and pipe it through to dbshell when needed. For example, for the blog app discussed above, these statements should do it:

BEGIN;
DELETE FROM "blogapp_tag";
DELETE FROM "blogapp_post";
DELETE FROM "blogapp_post_tags";
DELETE
COMMIT;

The reason I don't like it is that it forces you to have explicit table names stored somewhere, which is a duplication of the existing models. If you happen to change some of your foreign keys, for example, tables will need changing so this file will have to be regenerated.

The approach I like best is more programmatic. Django's model API is flexible and convenient, and we can just use it in a custom management command:

from django.core.management.base import BaseCommand
from blogapp.models import Post, Tag

class Command(BaseCommand):
    def handle(self, *args, **options):
        Tag.objects.all().delete()
        Post.objects.all().delete()

Save this code as blogapp/management/commands/clear_models.py, and now it can be invoked with:

$ python manage.py clear_models