HTTP is a stateless protocol - the server is not required to retain information or status about each user for the duration of multiple requests.

For smart web applications, however, this isn't good enough. You want to login into an application and have it remember you across requests. A good example is maintaining a "shopping cart" at some merchandise website, which you gradually fill as you browse through the products that interest you.

To solve this problem, HTTP cookies were invented by Netscape back in the 1990s. Cookies are formally defined in RFC2965, but to spare you all that jabber, cookies can be described very simply.

A cookie is just an arbitrary string sent by the server to the client as part of the HTTP response. The client will then return this cookie back to the server in subsequent requests. The information stored in the cookie is opaque to the client - it's only for the server's own use. This scheme allows the client to identify itself back to the server with some state the server has assigned it. Here's a more detailed flow of events:

  1. The client connects to the server for the first time, and sends a normal HTTP request (say, a simple GET for the main page).
  2. The server wants to track the client's state and in its HTTP response (which contains the page contents) attaches a Set-Cookie header. This header's information is a set of key, value pairs, where both keys and values are strings that make sense for the server, but for the client are a black box.
  3. In subsequent requests the client makes to the server, it adds a Cookie header in the HTTP requests it sends, with the cookie information the server specified in previous responses.

Implementation-wise, the client stores the latest cookie received from various servers (which are easily identifiable by their URLs). Even if the next time the client accesses the server is a few days after the previous request, it will still send this information (assuming the cookie hasn't expired), and the server will be able to identify it. This is why I can point my browser to Amazon today, not having visited it for some weeks, and the website will greet me with "Hello, Eli".

The above is a necessarily simplified explanation of cookies - I have no intention of repeating the contents of the RFC here. There are a lot of details I've left out like expiration time, filtering of cookies by paths, various size and amount limits the user agents (web browsers, etc.) are forced to abide, and so on. However, it's a sufficient amount of details for the needs of this article, so let's see some code.

Setting cookies in Python, without Django

The following demonstrates how to set cookies in from a Python server-side application without using Django. For simplicity, I'll just use the web server built-in into the Python standard library:

from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler
import Cookie

class MyRequestHandler(SimpleHTTPRequestHandler):
    def do_GET(self):
        content = "<html><body>Path is: %s</body></html>" % self.path
        self.send_header('Content-type', 'text/html')
        self.send_header('Content-length', str(len(content)))

        cookie = Cookie.SimpleCookie()
        cookie['id'] = 'some_value_42'



server = HTTPServer(('', 59900), MyRequestHandler)

This is a very simple application that just shows the path that the client requested. The more interesting thing happens below the covers - the application also sets a cookie. If we examine the HTTP response sent by this application to a client that connected to it, we'll see this among the headers:

Set-Cookie: id=some_value_42

In a similar manner, the Cookie module allows the server to parse cookies returned by the client in Cookie headers, using the load method.

Setting and reading cookies with Django

Django makes setting and reading cookies almost trivial. Here's a simple view that checks whether the client set the id cookie in its request, and if it hadn't, sends the cookie to the client (so that the client will have it for the next request):

def test_cookie(request):
    if 'id' in request.COOKIES:
        cookie_id = request.COOKIES['id']
        return HttpResponse('Got cookie with id=%s' % cookie_id)
        resp = HttpResponse('No id cookie! Sending cookie to client')
        resp.set_cookie('id', 'some_value_99')
        return resp

As you can see, cookies are taken from the COOKIES dict-like attribute of Django's HttpRequest, and set by calling the set_cookie method of HttpResponse. Couldn't be any simpler. What we're really here for is to understand how these things work under the hood of Django, so let's dive in.

How cookies are implemented in Django

The recommended way to deploy Django applications is with WSGI, so I'll focus on the WSGI backend implemented in Django. This is a good place to mention that at the time of this writing, I'm looking into the source code of Django 1.3, which is installed in site-packages/django in the usual installation structure of Python.

Looking at Django's WSGIRequest class (which inherits from http.Request) we can see that COOKIES is a property that hides a dict attribute named self._cookies behind a getter/setter pair. The dict is initialized in _get_cookies:

def _get_cookies(self):
    if not hasattr(self, '_cookies'):
        self._cookies = http.parse_cookie(self.environ.get('HTTP_COOKIE', ''))
    return self._cookies

This appears to be a lazy initialization that should aid performance - if the view doesn't want to look into the cookies of a request, there's no need to parse them. Cookies are taken from the HTTP_COOKIE entry of the request's environment object, per the WSGI specification. What about http.parse_cookie? This is a utility method in Django's HTTP module:

def parse_cookie(cookie):
    if cookie == '':
        return {}
    if not isinstance(cookie, Cookie.BaseCookie):
            c = SimpleCookie()
            c.load(cookie, ignore_parse_errors=True)
        except Cookie.CookieError:
            # Invalid cookie
            return {}
        c = cookie
    cookiedict = {}
    for key in c.keys():
        cookiedict[key] = c.get(key).value
    return cookiedict

As you can see, it uses the Cookie module from the standard library to parse the cookie with the load method, similarly to what I mentioned above for the non-Django code.

Setting cookies on a response is done with the set_cookie method of HttpResponse. This method simply writes down the new cookie in its self.cookies attribute. WSGIHandler then adds the cookies to its response headers when sending the response.

Wrapping up

As you can see, cookies are relatively easy to handle in Python, and in particular with Django. That said, when writing a Django application it's rare to be needing cookies directly, because cookies are a fairly low-level building block. Django's higher level session framework is much easier to use and is the recommended way to implement persistent state in applications. The next part of the article will examine how to use Django sessions and how they work under the hood.


comments powered by Disqus