As I've mentioned previously, starting with Python 3.3 the C accelerator of the xml.etree.ElementTree module is going to be imported by default. This should make quite a bit of code faster for those who were not aware of the existence of the accelerator, and reduce the amount of boilerplate importing for everyone.
As Python 3.3 is nearing its first beta, more work was done in the past few weeks; mostly fixing all kinds of problems that arose from the aforementioned transition. But in this post I want to focus on one feature that was added this weekend - much faster iteration over the parsed XML tree.
ElementTree offers a few tools for iterating over the tree and for finding interesting elements in it, but the basis for them all is the iter method:
Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. If tag is not None or '*', only elements whose tag equals tag are returned from the iterator.
And until very recently, this iter was implemented in Python, even when the C accelerator was loaded. This was achieved by calling PyRun_String on a "bootstrap" string defining the method (as well as a bunch of other Python code), when the C extension module was being initialized. In the past few months I've been slowly and surely decimating this bootstrap code, trying to move as much functionality as possible into the C code and replacing stuff with actual C API calls. The last bastion was iter (and its cousin itertext) because its implementation in C is not trivial.
Well, that last bastion has now fallen and the C accelerator of ElementTree no longer has any Python bootstrap code - iter is actually implemented in C. And the great "side effect" of this is that the iter method (and all the other methods that rely on it, like find, iterfind and others) is now much faster. On a relatively large XML document I timed a 10x speed boost for simple iteration looking for a specific tag. I hope that this will make a lot of XML processing code in Python much faster out-of-the-box.
This change is already in Python trunk and will be part of the 3.3 release. I must admit that I didn't spend much time optimizing the C code implementing iter, so there may still be an area for improvement. I have a hunch that it can be made a few 10s of percents faster with a bit of effort. If you're interested to help, drop me a line and I will be happy to discuss it.