Python insights

This page collects some Python (2.5) insights for my own use, but can perhaps be useful for other people too.

xrange vs. range

Always use xrange for iteration, i.e.:

for i in xrange(10):
  ...

xrange is more efficient because it generates an iterable object, and not the whole list like range.

>>> k = range(10)
>>> k
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> m = xrange(10)
>>> m
xrange(10)

In Py3K, xrange will be renamed to range, and the functionality of range will be achieved by list(range(n))

Initializing a 2D list

While this can be done safely to initialize a list:

lst = [0] * 3

The same trick won’t work for a 2D list (list of lists):

>>> lst_2d = [[0] * 3] * 3
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [5, 0, 0], [5, 0, 0]]

The operator * duplicates its operands, and duplicated lists constructed with [] point to the same list. The correct way to do this is:

>>> lst_2d = [[0] * 3 for i in xrange(3)]
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [0, 0, 0], [0, 0, 0]]

Read-only attributes

In Python, object attributes are R/W accessible to outside code. Sometimes, you may want some of the attributes to be read-only. Although this can be achieved by __setattr___, it will also intercept assignments from inside the object (by its own methods attempting to modify self.attr).

A better way is to use properties. These can be added to new-style classes (classes that derive from object) by calling the built-in property function. The most convenient way to use this function is via a decorator:

class Parrot(object):
    def __init__(self):
        self._voltage = 100000
 
    @property
    def voltage(self):
        """Get the current voltage."""
        return self._voltage

This class now has a read-only attribute named voltage:

>>> blacky = Parrot()
>>> blacky.voltage
100000
>>> blacky.voltage = 5000
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: can't set attribute
>>>

String reverse

Python doesn’t have a built-in reverse method for strings. Luckily, this can be easily done with slices:

def reverse(str):
    return str[::-1]

Dynamic code evaluation

Python has two constructs for dynamic code evaluation: eval, which works for single expressions, and exec which is more general. The following example demonstrates the use of exec:

def create_function(code, name='foo'):
    """ Create and return the function defined in 'code'.
        'name' specifies the name of the function, as 
        given in the 'def' in the code.
    """
    d = {}
    exec code.strip() in globals(), d
    return d[name]
 
 
def make_packet_extract(a, b):
    code = """
        def foo(packet):
            return ord(packet[%d]) + 256 * ord(packet[%d])
        """ % (a, b)
 
    return create_function(code, 'foo')
 
 
foo = make_packet_extract(3, 4)
print foo('abcdefg')

A couple of things to note here:

  1. exec is given a dict of global and local variables. 99% of the times it’s a good idea to provide it with globals() for the global variables, and a local dict for the locals (unless the function you’re defining modifies the global environment, but this isn’t recommended).
  2. The code string passed to exec is stripped of leading and trailing whitespace. This is because the function definition is indented, and Python doens’t like an indentation for no scope reason.
  3. Using create_function will not work if you place it in a separate file from make_packet_extra, because globals() returns the dictionary of the module where it is defined, not the module where it is called

Turning a callable into an iterator

Suppose you have the following extremely useful class somewhere. It has already been defined and used, and you can’t change it:

class RandomChunker(object):
    """ Returns random chunks from the string 
        provided at creation time.
    """
    def __init__(self, str, a=1, b=4):
        self.str = str
        self.a = a
        self.b = b
        self.pos = 0
 
    def chunk(self):
        """ Return the next random chunk from the
            input string. When the string's end
            has been reached, None is returned.
        """
        if self.pos >= len(self.str): return None
 
        chunk_size = randint(self.a, self.b)
        if chunk_size > len(self.str) - self.pos:
            chunk_size = len(self.str) - self.pos
 
        ret = self.str[self.pos:self.pos+chunk_size]
        self.pos += chunk_size
        return ret

It implements a quite common idiom: return useful values while they exist, and None (or EOF, or any other end value) when there’s nothing more to return.

How do you comfortably iterate over all the values of such a class/function ? Here’s one way:

rc = RandomChunker("abracadabra12345")
 
while 1:
    chunk = rc.chunk()
    if chunk is None: break
    print chunk

This isn’t very comfortable… There’s a better way - using the iter function:

rc = RandomChunker("abracadabra12345")
 
for chunk in iter(rc.chunk, None):
    print chunk

Much prettier, isn’t it ? With this method, you can also return all chunks at once:

all_chunks = list(iter(rc.chunk, None))
print all_chunks

Default values for a dictionary

Suppose you have a list of words, and you want to create a dict with a wordcount - each word mapped to the amount of times it appears in the list. Here’s a solution with defaultdict:

from collections import defaultdict
 
 
def elemcount(elems):
    count = defaultdict(lambda: 0)
    for e in elems: count[e] += 1
    return count
 
 
count = elemcount(['ax', 'ex', 'bx', 'ex', 'ex', 'bx'])
 
for ec in count:
    print ec, count[ec]

defaultdict enables us to implicitly initialize all the dictionary values which are accessed for reading to known values, and solve this problem gracefully. Without it, we’d have to check for the existence of e in the dict and explicitly initialize it.

The immutability of Python strings

Did this ever happen to you ?

>>> name = 'big foot'
>>> name[2]
'g'
>>> name[2] = 'G'
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: 'str' object does not support item assignment
>>> Yikes !!

The strings in Python are immutable, just like numbers and tuples. This means that you can create them, move them around, but not change them. Why is this so ? For a few reasons (you can find a better discussion online):

  • By design, strings in Python are considered elemental and unchangeable. This spurs better, safer programming styles.
  • The immutability of strings has efficiency benefits, chiefly in the area of lower storage requirements.
  • It also makes strings safer to use as dictionary keys

If you look around the Python web a little, you’ll notice that the most frequent advice to “how to change my string” is “design your code so that you won’t have to change it”. Fair enough, but what other options are there ? Here are a few:

  • name = name[:2] + 'G' + name[3:] - this is an inefficient way to do the job. Python’s slice semantics ensure that this works correctly in all cases (as long as your index is in range), but involving several string copies and concatenations, it’s hardly your best shot at efficient code. Although if you don’t care for that (and most chances are you don’t), it’s a solid solution.
  • Use the MutableString class from module UserString. While no more efficient than the previous method (it performs the same trick under the hood), it is more consistent syntactically with normal string usage.
  • Use a list instead of a string to store mutable data. Convert back and forth using list and join. Depending on what you really need, ord and chr may also be useful.
  • Use an array object. This is perhaps your best option if you use the string to hold constrained data, such as ‘binary’ bytes.

If you have comments on this page, please post them here, or drop me an email