It’s time for Python 2.6

August 7th, 2009 at 9:40 am

I’ve been waiting to upgrade Python to 2.6 for quite some time now. Python 2.6 is a good starting point for a future transition to Py3K, plus it has some nice new features over 2.5 that I was eager to use.

However, when you’re writing serious applications in Python (especially for work) that use 3rd party libraries, upgrading isn’t simple as you have to wait until the libraries you’re using get upgraded.

Well, this week the last library I’ve been waiting for has finally announced Python 2.6 support – PyQwt [1]. So I’ve cleaned up my Python 2.5 installation [2], installed 2.6 [3] with all the modules I routinely use, and now I’m running Python 2.6!

Python 2.6 has quite a few new features. Here are some that I find most interesting:

The documentation

The documentation was revamped (using the Sphinx tool for generating nice HTML from reStructuredText). It can be noticed when browsing documentation – the formatting is definitely friendlier, and it appears that the documentation of some modules was improved with more examples.

multiprocessing

A lot is being written about Python threads and their inability to really use multi-core CPUs because of the GIL. Well, the multiprocessing module solves it by providing an API for spawning child processes that’s completely compatible with the threading API. It also supports Queue and Pipe for convenient, synchronized communication between processes.

It looks like multiprocessing is the answer people needed to make Python programs faster by utilizing multiple CPUs and cores. Being compatible with the threading API it’s easy to use in safe and powerful ways, which is really great.

‘with’ statement in the present, not the future

I really like the context managers and with statement features introduced in Python 2.5. The only problem is that you had to use a from __future__ import in every file using them. Well, in 2.6 you no longer have to.

ABCs

I know, I know, Python has duck typing and enforcing interfaces is needlessly restrictive. However, I still find Abstract Base Classes useful from time to time, if only to document an interface user implementations should adhere to. I’ll quote Doug Hellman:

This capability is especially useful in situations where a third-party is going to provide implementations, such as with plugins to an application, but can also aid you when working on a large team or with a large code-base where keeping all classes in your head at the same time is difficult or not possible.

This is from PyMOTW: ABC which seems like a nice tutorial.

bin

This is a small feature, but a nice one nevertheless. The new built-in bin function provides a simple and fast way to represent numbers as binary strings:

>>> bin(42)
'0b101010'

Due to the kind of work I mostly do with Python (embedded system communication, binary parsing, etc.) I always had to implement the functionality of bin on my own. Now I don’t have to, and the built-in is faster, which is great.

fractions

>>> from fractions import Fraction
>>> Fraction(16, -10)
Fraction(-8, 5)
>>> Fraction(123)
Fraction(123, 1)
>>> Fraction()
Fraction(0, 1)
>>> Fraction('3/7')
Fraction(3, 7)
[40794 refs]
>>> Fraction(' -3/7 ')
Fraction(-3, 7)
>>> Fraction('1.414213 \t\n')
Fraction(1414213, 1000000)
>>> Fraction('-.125')
Fraction(-1, 8)

So far I’ve only needed fractions (rational numbers) for solving Project Euler problems. I downloaded and used SymPy especially for its Rational class. Now there’s a built-in.

namedtuple

namedtuple“ in the collections module is a useful idiom I’ve borrowed into my 2.5 code a long time ago. It’s great to have it built-in at last.

>>> from collections import namedtuple
>>> MessageType = namedtuple('MessageType', 'id src dest data')
>>> new_msg = MessageType(id=12, src=0x1123, dest=0x1255, data='sdasdfsdf')
>>> new_msg
MessageType(id=12, src=4387, dest=4693, data='sdasdfsdf')
>>> new_msg.id
12
>>> for f in new_msg:
...   print f
...
12
4387
4693
sdasdfsdf
>>>

namedtuple is immediately useful, but it’s still a bit unpolished, IMHO. Hopefully now that it’s in the standard library, it will be worked on and improved even more.

new itertools

Several new generators have been added to the itertools module: product, permutations and combinations. These are very handy when dealing with combinatoric problems.

For example:

>>> list(itertools.combinations('XYZ', 2))
[('X', 'Y'), ('X', 'Z'), ('Y', 'Z')]

os.path.relpath

A new, useful function that was added to the os.path module:

>>> os.path.relpath(r'c:\data\utils\temp', r'c:\data')
'utils\\temp'

Queue.PriorityQueue and Queue.LifoQueue

I just love the Queue module, and use it almost every time I have to write threaded (and soon, multiprocessing) code. It’s simply the best way to communicate between threads in Python.

2.6 has added two new types of synchonized queues: a priority queue and a LIFO queue (stack). I still don’t know what I’m going to use these for, but it’s great to have them in the toolbox.

json

This is more of an anti-feature, as I see it. I would really, really prefer YAML to be included as a built-in. Not only YAML is more useful than JSON [4], but implementing JSON once you have a YAML parser is trivial.

Yes, there’s PyYAML, it’s great and all. But I wish I had one less module to install.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg

[1] Well, actually PyQwt supports 2.6 for some time now, but I was waiting for the Windows binary installer.
[2] Which isn’t very convenient when you have dozens of modules installed with Windows binary installers. I couldn’t find a better way than uninstalling them one by one.
[3] ActivePython 2.6.2
[4] YAML can do everything JSON can, and much more. For example, it’s quite convenient as a configuration file format and as a replacement for XML in general.

Related posts:

  1. JSON is YAML, but YAML is not JSON
  2. Useful resources for using YAML in Ruby
  3. Python documentation annoyance
  4. Python threads: communication and stopping
  5. Distributed computing in Python with multiprocessing

9 Responses to “It’s time for Python 2.6”

  1. Eric Olivier LEBIGOT (EOL)No Gravatar Says:

    Nice overview of why Python 2.6 is a great upgrade!

    Why did you have to uninstall Python 2.5 modules? Under Mac OS X, I have co-existing Python 2.5 and Python 2.6 installs, complete with 3rd-party modules. This is really convenient when you _have_ to use Python 2.5.

  2. Marcin CieslikNo Gravatar Says:

    The performance of YAML is really too painful to be a substitute for JSON. Try serializing/de-serializing some 100Mb (I did not succeed). Embarrassing, I have priority Queue in one of my codes:
    class PriorityQueue(Queue):
    """
    A priority queue using a heap on a list. This Queue is thread but not
    process safe.
    """
    def _init(self, maxsize):
    self.maxsize = maxsize
    self.queue = []

    def _put(self, item):
    return heappush(self.queue, item)

    def _get(self):
    return heappop(self.queue)

    I guess I should read the docs more carefully :)

  3. René A.No Gravatar Says:

    Talking about convenience… also, besides multiple Python installations, really give ‘virtualenv’ a try.

    You can customize on a per project basis you’re desired python environment, ranging from your Python version to a minimalistic set of desired packages.

  4. elibenNo Gravatar Says:

    @Marcin,

    Out of curiosity – why would you want to serialize a 100MB file as JSON/YAML? These are text formats, and as such aren’t suitable for such large scale serialization. Wouldn’t cPickle or shelve be better for this?

  5. Marcin CieslikNo Gravatar Says:

    Why JSON/YAML? Well I don’t and cPickle, marshal are certainly better (but not always faster!). But users of my software might want a text format. The software (PaPy) uses serialization to exchange data between processes in a parallel pipeline. I wanted to make it protocol agnostic, but YAML failed miserably (it scales worse then linear!), while JSON does not. To avoid user problems I dropped YAML.

  6. Michael FoordNo Gravatar Says:

    JSON is much more widely used than YAML. :-)

  7. elibenNo Gravatar Says:

    @Michael,

    My point, I guess, is just to say that except a few “language lawyer nitpicks”, YAML is a superset of JSON. So just including YAML would give JSON-lovers their JSON and YAML-lovers their extended features.

  8. José HérnandezNo Gravatar Says:

    Thanks for the excellent review of 2.6! One question did arise after reading about namedtuples: what’s the difference between that and dictionaries?

  9. elibenNo Gravatar Says:

    @José,

    Think of a named tuple like a once-created-then-accessed Struct. It’s convenient to pass complex arguments or return values around. I guess dicts can also be used for that, but less naturally. And of course dicts are useful for other things as well.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)