Django sessions - part III: User authentication

In the previous two articles of this series we learned how Django implements sessions, thus allowing the abstraction of persistent state in a web application. The session framework can be employed by developers to implement all kinds of interesting features for their application, but Django also uses it for its own needs. Specifically, Django's user authentication system relies on the session framework to do its job.

The user authentication system allows users to log in and out of the application, and act based on a set of permissions. Borrowing from the Django Book:

This system is often referred to as an auth/auth (authentication and authorization) system. That name recognizes that dealing with users is often a two-step process. We need to

Verify (authenticate) that a user is who he or she claims to be (usually by checking a username and password against a database of users)
Verify that the user is authorized to perform some given operation (usually by checking against a table of permissions)

In this, the final part of the series, I want to explain how Django's user authentication is implemented. I will focus on item 1 in the list above - authentication, which makes actual use of sessions [1].

Snooping on HTTP traffic

Before diving into the source code of Django, let's see how authentication looks when viewed from the level of HTTP traffic. I'll be using this view to test things:

def test_user(request):
    user_str = str(request.user)
    if request.user.is_authenticated():
        return HttpResponse('%s is logged in' % user_str)
    else:
        return HttpResponse('%s is not logged in' % user_str)

Before I logged in, I get the message "AnonymousUser is not logged in". The server doesn't return any cookie.

When I log in with Django's default login template, more interesting things can be observed. The login form sends a POST request to the server, with my login information in the form data:

username:eliben
password:password

Assuming this is a valid username/password pair, the server sends back a session ID in a cookie:

Set-Cookie:sessionid=4980ec04546e434c1ea13c675fafbc98;

What is this session? As we saw in the previous article, it's a key into the django_session DB table. Decoding the session data from the table, I get:

{'_auth_user_id': 1, '_auth_user_backend': 'django.contrib.auth.backends.ModelBackend'}
[26/Jun/2011 11:28:48] "GET /user/ HTTP/1.1" 200 19

By looking into the auth_user table, I can indeed see that eliben is the user with ID = 1. Also, in subsequent requests to the server, my browser sends the aforementioned session ID in a cookie, as expected.

When I log out, the server sends a different session ID, which now contains an empty dictionary - no user is logged in.

Auth middleware

Similarly to the path we've taken with sessions, it's instrumental to first see how authentication middleware is implemented. Or in other words, how did the "user" attribute get into my HTTP request?

The answer is in contrib/auth/middleware.py [2]:

class LazyUser(object):
    def __get__(self, request, obj_type=None):
        if not hasattr(request, '_cached_user'):
            from django.contrib.auth import get_user
            request._cached_user = get_user(request)
        return request._cached_user


class AuthenticationMiddleware(object):
    def process_request(self, request):
        request.__class__.user = LazyUser()
        return None

Behold, we've encountered a rare sighting of one of Python's more obscure, and yet powerful features - descriptors. Explaining descriptors fully will take an article of its own, so I kindly direct you to google it [3]. Here I'll just briefly explain how this specific code works.

AuthenticationMiddleware is a middleware class, implementing the process_request hook. What it does is attach the LazyUser descriptor to the user attribute of the request class. The LazyUser descriptor implements __get__, which will get called when we access request.user in our views. This __get__ simply caches the user object in another attribute of the request class - _cached_user, making sure the possibly costly get_user operation doesn't get fully executed for each access to the user attribute.

Finding the active user

Recall that request.user gets us the currently logged-in user, if there is one. Let's see how this gets done. In the code sample above, the user is accessed with get_user(request). Here's get_user:

def get_user(request):
    from django.contrib.auth.models import AnonymousUser
    try:
        user_id = request.session[SESSION_KEY]
        backend_path = request.session[BACKEND_SESSION_KEY]
        backend = load_backend(backend_path)
        user = backend.get_user(user_id) or AnonymousUser()
    except KeyError:
        user = AnonymousUser()
    return user

Looking at the beginning of the file this function is defined in (auth/__init__.py), we see:

SESSION_KEY = '_auth_user_id'
BACKEND_SESSION_KEY = '_auth_user_backend'

So what get_user does is just try to extract the user from the current session. Recall from the HTTP snooping section that when a user is actually logged in, the _auth_user_id and _auth_user_backend entries are set in the session dictionary. get_user reads them, and turns to the auth backend to fetch the user object, with the backend.get_user method.

The default auth backend is auth.backends.ModelBackend - the DB backed user model (contained in all those auth_* tables that get added to your DB when the auth framework is enabled). Its get_user method simply does this:

def get_user(self, user_id):
    try:
        return User.objects.get(pk=user_id)
    except User.DoesNotExist:
        return None

Standard Django code for fetching data from a DB, where User is a model defined in contrib/auth/models.py.

The User model

User is a fairly standard Django model with a bunch of fields and helper methods that help decoding them. The most interesting part, IMHO, and the one I'll focus here on is setting and verifying the user's password. Here's the set_password method:

def set_password(self, raw_password):
    if raw_password is None:
        self.set_unusable_password()
    else:
        import random
        algo = 'sha1'
        salt = get_hexdigest(algo, str(random.random()), str(random.random()))[:5]
        hsh = get_hexdigest(algo, salt, raw_password)
        self.password = '%s$%s$%s' % (algo, salt, hsh)

We see this uses the accepted modern approach - instead of storing the password itself (in plaintext), a cryptographic hash is computed and stored. Further, the password is salted, to defeat a potential rainbow table cracking attack. The password's hash, together with the salt value and the algorithm used for the hashing (which is SHA1 by default) are then stored in the database, separated by dollar signs. For example, here's my user's password field:

sha1$f0670$2a0781bd8f2361042ebdf0cd1b3ce1e8be3f8dcc

To verify the password, the check_password function is invoked [4]:

def check_password(raw_password, enc_password):
    """
    Returns a boolean of whether the raw_password was correct. Handles
    encryption formats behind the scenes.
    """
    algo, salt, hsh = enc_password.split('$')
    return constant_time_compare(hsh, get_hexdigest(algo, salt, raw_password))

It computes the hash on the password provided and compares it to the one stored in the DB. What is this constant_time_compare call about though? This is a function that compares two strings in time that depends on the length of the strings, but not the amount of matching characters in the beginning. Such comparison is important cryptographically, to thwart timing attacks.

Logging in

Django provides a powerful and versatile login view (in django.contrib.auth.views) to allow an application implement logging-in functionality. What this view does is explained quite well in Django's auth's docs. Here, I will focus on how it works.

login is your typical form-handling Django view. If it gets a GET request, it displays the login form. On the other hand, for a POST request, it tries to log the user in. This is the interesting part.

On first sight, it's hard to see where exactly the login authentication is handled. The POST request handling part of the login view is:

if request.method == "POST":
    form = authentication_form(data=request.POST)
    if form.is_valid():
        netloc = urlparse.urlparse(redirect_to)[1]

        # Use default setting if redirect_to is empty
        if not redirect_to:
            redirect_to = settings.LOGIN_REDIRECT_URL

        # Security check -- don't allow redirection to a different
        # host.
        elif netloc and netloc != request.get_host():
            redirect_to = settings.LOGIN_REDIRECT_URL

        # Okay, security checks complete. Log the user in.
        auth_login(request, form.get_user())

        if request.session.test_cookie_worked():
            request.session.delete_test_cookie()

        return HttpResponseRedirect(redirect_to)

After some head scratching and stunning feats of reverse engineering [5], it became clear that the authentication happens in the call form.is_valid. This call invokes (after some steps [6]) AuthenticationForm.clean, which itself calls authenticate. A bit down the road, the flow of control reaches ModelBackend.authenticate:

def authenticate(self, username=None, password=None):
    try:
        user = User.objects.get(username=username)
        if user.check_password(password):
            return user
    except User.DoesNotExist:
        return None

And we've already seen the definition of check_password in section "The User model". Alright, so now we know how the login view authenticates the user. What does it do next? After some more checks, the auth_login function is called, which is an alias for django.contrib.auth.login. This function uses the session framework to save the cookie we've seen in the first section of this article.

Summary

This concludes the article on how user authentication works, and also the 3-article series on Django sessions. Sessions in general and user logins in particular are one aspect of web applications we take for granted. It just works, automagically. And yet, as I hope these articles have demonstrated, the magic that happens under the hood isn't really difficult to understand. Sure, there are a lot of concepts to grasp and code to read, but with some determination enlightenment is just around the corner.

[1]	Authorization is a simple matter of keeping tables of users, groups and permissions.

[2]	Just a reminder: at the time of writing this series of articles, the latest released version of Django is 1.3, which is what I'm looking at here. In this particular code sample I've removed a longish assertion line from `process_request`.

[3]	But start with this excellent article by Raymond Hettinger.

[4]	It's invoked by the `check_password` method which also handles a backwards-compatibility issue, which I don't cover here.

[5]	I'm kidding :-) All it took was 30 seconds to strategically insert `traceback.print_stack()` into `User.check_password` and see where it's being called from when I log in.

[6]	According to the Django docs, the `clean` method of a subclass of `Form` is responsible for whole-form validation. Not the most intuitive nomenclature, I'd say.