Python development – improving ElementTree for 3.3

March 2nd, 2012 at 2:41 pm

This blog has been unusually quiet lately. Real-life factors such as traveling for work and the sleeping patterns of my daughter aside, the main reason for the quietness has been that I was spending a bit more time working on Python in the past month.

In particular, I’d like to focus on changes in the xml.etree.ElementTree package for Python 3.3.

xml.etree.ElementTree is arguably the most popular standard library package for processing XML. It has a friendly, Pythonic API and a C accelerator with very good performance.

One annoying aspect of using the package is, however, the need to explicitly ask for the C accelerator, and fall back to the (much slower) pure Python implementation if that’s not available. In other words, this incantation is very common for code that uses ElementTree for XML processing:

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

What’s interesting is that starting with Python 3, the official Python policy is to transparently hide the C accelerators inside the module:

A common pattern in Python 2.x is to have one version of a module implemented in pure Python, with an optional accelerated version implemented as a C extension; for example, pickle and cPickle. This places the burden of importing the accelerated version and falling back on the pure Python version on each user of these modules. In Python 3.0, the accelerated versions are considered implementation details of the pure Python versions. Users should always import the standard version, which attempts to import the accelerated version and falls back to the pure Python version.

This was quite a large task, however, so in practice it was stretched to several releases in the 3.x line. In particular, in Python 3.2 cElementTree still has to be imported explicitly to access the C accelerator.

Well, no more. Starting with Python 3.3, all you’ll have to do is:

import xml.etree.ElementTree as ET

This will import the accelerated C module if it exists, and the pure Python module otherwise. The cElementTree module is not going to be needed any longer, although it will stay in the standard library as a thin alias, for backwards compatibility.

This wouldn’t be very interesting if ElementTree had been a usual package. In fact, it was one of the very few externally maintained packages in the standard library. Historically, the package was donated to CPython by its maintainer Fredrik Lundh, who kept the copyright. This made the package somewhat challenging to maintain for the Python core developers, since any change had to be coordinated with Fredrik and his up-stream standalone distribution.

Although de-facto the standard library ElementTree already diverged a bit from Fredrik’s implementation (especially due to the great efforts of Florent Xicluna), the change discussed here is at the package’s interface, rather than its implementation, so it raised a lively discussion in the Python core development mailing list. Luckily, Fredrik readily agreed to cede further maintenance of ElementTree to the Python developers, so the copyright/maintenance obstacle disappeared.

Some work remains to further improve ElementTree, and there are a few relevant issues open in the Python bug tracker:

  • Issue #14006: The ElementTree documentation could use some love.
  • Issues #14007 and #14128: some mismatches between documentation and implementation.
  • A few other open issues can be found by searching the tracker for ElementTree

I’m currently focusing on the latter (#14128). Specifically, while the Element class can be subclassed in the Python implementation, it can’t in the C implementation (because there Element is just a factory function for creating new objects). I already have a patch for this attached to the issue, after which I plan to work out the other discrepancies.

Python development is a cooperative effort, and I’m grateful to many other devs for their help in issues related to ElementTree. More help is needed, though! So if you’re thinking of starting contributing to Python, the ElementTree package is a good place to start because there is a lot of remaining work, and it is currently actively in focus of a few core devs so getting meaningful contributions committed should be relatively easy.

Related posts:

  1. Faster XML iteration with ElementTree
  2. Processing XML in Python with ElementTree
  3. Python development switches to Mercurial source control
  4. Book review: “Beginning game development with Python and Pygame” by Will McGugan
  5. It’s time for Python 2.7

4 Responses to “Python development – improving ElementTree for 3.3”

  1. Philip JenveyNo Gravatar Says:

    Hurray! Thanks Eli, and Florent

  2. Nick CoghlanNo Gravatar Says:

    One minor nit – the PSF never requires copyright assignments, even in the normal case for contributions, where maintenance moves to python-dev and the python.org infrastructure. Instead, the Contributor Licensing Agreements just grant a non-exclusive license that includes relicensing permissions. It’s all the PSF needs to distribute Python (and adjust the licensing if that ever proves necessary in the future), while still allowing contributors to retain their copyrights.

    So there was never a copyright (or any other legal) obstacle, only one of courtesy and honouring the deal that had been made with Fredrik at the time the module was incorporated.

    One other nice recent addition to the docs is the redirect from xml.dom to the etree docs.

  3. elibenNo Gravatar Says:

    Thanks for the clarification, Nick.

    As for the docs, it’s from xml.dom.minidom and not exactly a redirect, more of a note recommending etree when the user doesn’t specifically require DOM. Hopefully this will route more users to etree by default.

  4. entrpreneurNo Gravatar Says:

    I definitely need to forward this to my friend we were just talking about this the other day!

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)