Safely using destructors in Python

June 12th, 2009 at 8:40 am

This post applies to Python 2.5 and 2.6 – if you see any difference for Python 3, please let me know.

Destructors are a very important concept in C++, where they’re an essential ingredient of RAII – virtually the only real safe way to write code that involves allocation and deallocation of resources in an exception-throwing program.

In Python, destructors are needed much less, because Python has a garbage collector that handles memory management. However, while memory is the most common resource allocated, it is not the only one. There are also sockets and database connections to be closed, files, buffers and caches flushed and a few more resources that need to be released when an object is done with them.

So Python has the destructor concept – the __del__ method. For some reason, many in the Python community believe that __del__ is evil and shouldn’t be used. However, a simple grep of the standard library shows dozens of uses of __del__ in classes we all use and love, so where’s the catch? In this article I’ll try to make it clear (first and foremost for myself), when __del__ should be used, and how.

Simple code samples

First a basic example:

class FooType(object):
    def __init__(self, id):
        self.id = id
        print self.id, 'born'

    def __del__(self):
        print self.id, 'died'


ft = FooType(1)

This prints:

1 born
1 died

Now, recall that due to the usage of a reference-counting garbage collector, Python won’t clean up an object when it goes out of scope. It will clean it up when the last reference to it has gone out of scope. Here’s a demonstration:

class FooType(object):
    def __init__(self, id):
        self.id = id
        print self.id, 'born'

    def __del__(self):
        print self.id, 'died'

def make_foo():
    print 'Making...'
    ft = FooType(1)
    print 'Returning...'
    return ft

print 'Calling...'
ft = make_foo()
print 'End...'

This prints:

Calling...
Making...
1 born
Returning...
End...
1 died

The destructor was called after the program ended, not when ft went out of scope inside make_foo.

Alternatives to the destructor

Before I proceed, a proper disclosure: Python provides a better method for managing resources than destructors – contexts. I won’t turn this into a tutorial of contexts, but you should really get yourself familiar with the with statement and objects that can be used inside. For example, the best way to handle writing to a file is:

with open('out.txt', 'w') as of:
    of.write('222')

This makes sure the file is properly closed when the block inside with exits, even if exceptions are thrown. Note that this demonstrates a standard context manager. Another is threading.lock, which returns a context manager very suitable to be used in a with statement. You should read PEP 343 for more details.

While recommended, with isn’t always applicable. For example, assume you have an object that encapsulates some sort of a database that has to be committed and closed when the object ends its existence. Now suppose the object should be a member variable of some large and complex class (say, a GUI dialog, or a MVC model class). The parent interacts with the DB object from time to time in different methods, so using with isn’t practical. What’s needed is a functioning destructor.

Where destructors go astray

To solve the use case I presented in the last paragraph, you can employ the __del__ destructor. However, it’s important to know that this doesn’t always work well. The nemesis of a reference-counting garbage collector is circular references. Here’s an example:

class FooType(object):
    def __init__(self, id, parent):
        self.id = id
        self.parent = parent
        print 'Foo', self.id, 'born'

    def __del__(self):
        print 'Foo', self.id, 'died'


class BarType(object):
    def __init__(self, id):
        self.id = id
        self.foo = FooType(id, self)
        print 'Bar', self.id, 'born'

    def __del__(self):
        print 'Bar', self.id, 'died'


b = BarType(12)

Output:

Foo 12 born
Bar 12 born

Ouch… what has happened? Where are the destructors? Here’s what the Python documentation has to say on the matter:

Circular references which are garbage are detected when the option cycle detector is enabled (it’s on by default), but can only be cleaned up if there are no Python-level __del__() methods involved.

Python doesn’t know the order in which it’s safe to destroy objects that hold circular references to each other, so as a design decision, it just doesn’t call the destructors for such methods!

So, now what?

Shouldn’t we use destructors because of this deficiency? I’m very surprised to see that many Pythonistas think so, and recommend to use explicit close methods. But I disagree – explicit close methods are less safe, since they are easy to forget to call. Moreover, when exceptions can happen (and in Python they happen all the time), managing explicit closing becomes very difficult and burdensome.

I actually think that destructors can and should be used safely in Python. With a couple of precautions, it’s definitely possible.

First and foremost, note that justified cyclic references are a rare occurrence. I say justified on purpose – a lot of uses in which cyclic references arise are an example of bad design and leaky abstractions.

As a general rule of thumb, resources should be held by the lowest-level objects possible. Don’t hold a DB resource directly in your GUI dialog. Use an object to encapsulate the DB connection and close it safely in the destructor. The DB object has no reason whatsoever to hold references to other objects in your code. If it does – it violates several good-design practices.

Sometimes Dependency Injection can help prevent cyclic references in complex code, but even in those rare few cases when you find yourself needing a true cyclic reference, there’s a solution. Python provides the weakref module for this purpose. The documentation quickly reveals that this is exactly what we need here:

A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else. A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping.

Here’s the previous example rewritten with weakref:

import weakref

class FooType(object):
    def __init__(self, id, parent):
        self.id = id
        self.parent = weakref.ref(parent)
        print 'Foo', self.id, 'born'

    def __del__(self):
        print 'Foo', self.id, 'died'


class BarType(object):
    def __init__(self, id):
        self.id = id
        self.foo = FooType(id, self)
        print 'Bar', self.id, 'born'

    def __del__(self):
        print 'Bar', self.id, 'died'

b = BarType(12)

Now we get the result we want:

Foo 12 born
Bar 12 born
Bar 12 died
Foo 12 died

The tiny change in this example is that I use weakref.ref to assign the parent reference in the constructor FooType. This is a weak reference, so it doesn’t really create a cycle. Since the GC sees no cycle, it destroys both objects.

Conclusion

Python has perfectly usable object destruction via the __del__ method. It works fine for the vast majority of use-cases, but chokes on cyclic references. Cyclic references, however, are often a sign of bad design, and few of them are justified. For the teeny tiny amount of uses cases where justified cyclic references have to be used, the cycles can be easily broken with weak references, which Python provides in the weakref module.

References

Some links that were useful in the preparation of this article:

Related posts:

  1. Pure virtual destructors in C++
  2. Python internals: adding a new statement to Python
  3. Parsing C++ in Python with Clang
  4. Python metaclasses by example
  5. Local execution of Python CGI scripts – with Python 3

17 Responses to “Safely using destructors in Python”

  1. soulcheckNo Gravatar Says:

    Second example sparkled one question.

    How is ‘going out of scope’ defined for classes allocated on the heap?? afaik python allocates all classes on the heap so ft didn’t really go out of scope after leaving make_foo (exactly as it woudn’t go out of scope in a similar c++)

  2. Jean-Paul CalderoneNo Gravatar Says:

    What you’re calling “destructors” here are actually “finalizers”. Python also has destructors, but it’s rare that custom destructors are required (and they can only be defined in C).

    Also, the main problem with your argument is that it is very difficult to ensure that an object doesn’t become a part of a cycle after some new development or maintenance is done to the library of which it is part.

    How do you write a unit test which verifies an object isn’t part of any cycle in your program?

    If you can’t write such a unit test, how do you gain confidence that your finalizers will continue to be called?

  3. elibenNo Gravatar Says:

    @soulcheck,
    The interesting scope here is lexical. Had I not returned ft from make_foo, its destructor would’ve been invoked. What goes out of scope is the reference, not the allocated memory on the heap.

    In C++, if rt’d be a pointer allocated with new, the pointer would go out of scope, but the allocated memory would not. It would actually be lost (leaked). In correct RAII-abiding C++, such ‘bare’ allocations are a no-no anyway, and you should use smart pointers instead.

    @Jean-Paul,
    A new development is an active process. Assume you have an object which you know isn’t part of a cycle because it doesn’t contain any references to high-level objects. Adding such a reference is an active process you’re consciously performing, so you have the time to make a decision – such as making the reference a weakref. It doesn’t just “happen on its own”.

    If you have a reference to a high-level object inside to start with, just make it a weakref from the start, because 99% that’s what you need.

    You make a good point about unit tests, though. It’s interesting to ponder how finalizations can be properly tested.

  4. soulcheckNo Gravatar Says:

    @eliben It would leak unless deallocated outside of the function, but otherwise I agree. My point was that objects (not pointers unless also created on the heap ;) ) created on the heap don’t have defined scope.

    Cheers

  5. Marius GedminasNo Gravatar Says:

    I believe there are more downsides people usually mention about Python’s finalizers:

    – they behave differently in different Python implementations (Jython, IronPython, PyPy), so their execution may be delayed

    – they may not necessarily get called before the program exits

    – weird things happen if you do something in your __del__ that causes the object to get resurrected by creating another reference to it

    Personally, I think __del__ methods are okay-ish for a last-ditch defense, but should not be relied on in code. Always free your resources explicitly by calling the appropriate close() method.

  6. mike bayerNo Gravatar Says:

    __del__ is inadequate for more reasons than one. Your solution of never building any cycles between objects is difficult to ensure – and mistakenly adding non-weakref’ed cycles adds hard to track memory leaks. Additionally, traversing the objects along weakrefs as opposed to direct references introduces palpable function call overhead in a performance critical application. You can’t even count on the state of “self” within __del__ – the state of the object is in undefined, some attributes may be there or may not be there depending on random circumstances.

    The better way IMO is to just use the weakref() alone with a callback as the finalizer – the weakref can even be from the object to itself (just make sure the callback doesn’t reference self). when the callback is called, the original referent is gone, so there’s no chance that you might “accidentally” call some state off of self that won’t always be there. you then don’t need to worry about cycles causing surprise memory leaks.

  7. elibenNo Gravatar Says:

    @mike & marius,

    I appreciate the input, but this raises hard questions:

    If __del__ is so badly broken, why does it exist at all?
    Why do many prominent Python libraries use it, including quite a few in the standard library?

    Can you clarify this:

    You can’t even count on the state of “self” within __del__ – the state of the object is in undefined, some attributes may be there or may not be there depending on random circumstances.

    And give an example when an attribute may not be there when __del__ runs?

  8. Mark RoddyNo Gravatar Says:

    @eliben
    There are a lot of issues that come up with use of the __del__ method (these also probably deserves there own blog post).

    However, I would not attribute them to the method being broken, I’d attribute these issues to people abusing this method beyond it’s intended use. The documentation on this method makes pretty clear all the problems that can occur. People should be taking the time to read this info before implementing the __del__ special method as part of their object design.

  9. Richard JonesNo Gravatar Says:

    __del__ can be called during program exit – thus the state of your program is completely undefined (for example, modules may have been collected & cleaned out). It’s not broken, you just have to be very, very careful when using __del__.

  10. elibenNo Gravatar Says:

    @Richard,

    Do you mean to say that an object can have its instance vars cleaned up before its __del__ method is called?
    Wouldn’t this mean that __del__ is useless by definition?

  11. Lennart RegebroNo Gravatar Says:

    Summarum: __del__ is not guaranteed to get called. At all. It’s not only for cyclical references. Therefore it can’t be used to do thinks as deallocate things that Python does not automatically deallocate anyway.

    The calling order of __del__ is not guaranteed. Therefore you can not use it for locking resources.

    It’s also tricky to use, but that’s another thing, Trickyness is not the problem.

    Summa Summarum: If it can’t be used for resource allocation, and it can’t be used for locking, then what can it be used for? Yeah, I don’t know. So why does it exist? Beats me.

  12. ChadNo Gravatar Says:

    We avoid all use of __del__ in the IMVU client for two reasons:

    1) It’s very difficult to ensure the no-cycles requirement, especially with common use of closures and continuous refactoring.

    2) I’ve witnessed situations in Python 2.5 where a cycle of objects without __del__ methods references a leaf object with a __del__ method and the leaf object prevents the cycle from being collected. Unfortunately, I can’t reproduce this in a test, so I suspect it’s an interpreter bug.

    weakref callbacks on leaf resources and explicit destruction have worked better for us.

  13. Imre TuskeNo Gravatar Says:

    This is an issue I faced when I started to learn Python (and came from a C++ background). At the end I also found weakref as something useful, but during my investigations in python finalizers (which are not destuctors, indeed), I frowned upon the realization that your finalizers are not even guaranteed to be called.

    Beats me, too. Unfortunately, the conclusion is that for the time being you can’t properly implement C++ RAII ctor/dtor symmetry in Python. Too bad. :(

  14. John McGeheeNo Gravatar Says:

    Here is a little erratum for you: “when it does out of scope” should be “when it does go out of scope”.

    Feel free to delete this comment after you make the change.

  15. elibenNo Gravatar Says:

    John McGehee, thanks – fixed.

  16. Josh AyersNo Gravatar Says:

    I regularly use __del__ methods to automatically free resources. As long as reference cycles are avoided, it works perfectly well.

    The gc.garbage list can be inspected within a unit test to verify all objects are collectable. Any objects that have reference cycles and __del__ methods will be added to that list. Whenever I have some object that must be cleaned up properly, I always write a test that creates a few of the objects, then deletes them and verifies that gc.garbage is still empty.

  17. John DawsonNo Gravatar Says:

    If your code explicitly calls .close() (or uses a with statement that does the same), then exceptions in the .close() method itself can be caught and handled. Exceptions that happen in destructors can’t be caught by your program.

    Most of the time, this doesn’t matter. However, today, I encountered a bug where the code was writing to the /proc filesystem, it wasn’t until .close() was called that we saw the ESRCH error returned by the operating system. Since this happened in the file destructor, the program printed “IOError: [Errno 3] No such process” in its output, but there was no backtrace, and the exception catchers in the surrounding code didn’t know anything went wrong.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)