Python insights
This page collects some Python (2.5) insights for my own use, but can perhaps be useful for other people too.
xrange vs. range
Always use xrange for iteration, i.e.:
for i in xrange(10): ...
xrange is more efficient because it generates an iterable object, and not the whole list like range.
>>> k = range(10) >>> k [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> m = xrange(10) >>> m xrange(10)
In Py3K, xrange will be renamed to range, and the functionality of range will be achieved by list(range(n))
Initializing a 2D list
While this can be done safely to initialize a list:
lst = [0] * 3
The same trick won’t work for a 2D list (list of lists):
>>> lst_2d = [[0] * 3] * 3 >>> lst_2d [[0, 0, 0], [0, 0, 0], [0, 0, 0]] >>> lst_2d[0][0] = 5 >>> lst_2d [[5, 0, 0], [5, 0, 0], [5, 0, 0]]
The operator * duplicates its operands, and duplicated lists constructed with [] point to the same list. The correct way to do this is:
>>> lst_2d = [[0] * 3 for i in xrange(3)] >>> lst_2d [[0, 0, 0], [0, 0, 0], [0, 0, 0]] >>> lst_2d[0][0] = 5 >>> lst_2d [[5, 0, 0], [0, 0, 0], [0, 0, 0]]
Read-only attributes
In Python, object attributes are R/W accessible to outside code. Sometimes, you may want some of the attributes to be read-only. Although this can be achieved by __setattr___, it will also intercept assignments from inside the object (by its own methods attempting to modify self.attr).
A better way is to use properties. These can be added to new-style classes (classes that derive from object) by calling the built-in property function. The most convenient way to use this function is via a decorator:
class Parrot(object): def __init__(self): self._voltage = 100000 @property def voltage(self): """Get the current voltage.""" return self._voltage
This class now has a read-only attribute named voltage:
>>> blacky = Parrot() >>> blacky.voltage 100000 >>> blacky.voltage = 5000 Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: can't set attribute >>>
String reverse
Python doesn’t have a built-in reverse method for strings. Luckily, this can be easily done with slices:
def reverse(str): return str[::-1]
Dynamic code evaluation
Python has two constructs for dynamic code evaluation: eval, which works for single expressions, and exec which is more general. The following example demonstrates the use of exec:
def create_function(code, name='foo'): """ Create and return the function defined in 'code'. 'name' specifies the name of the function, as given in the 'def' in the code. """ d = {} exec code.strip() in globals(), d return d[name] def make_packet_extract(a, b): code = """ def foo(packet): return ord(packet[%d]) + 256 * ord(packet[%d]) """ % (a, b) return create_function(code, 'foo') foo = make_packet_extract(3, 4) print foo('abcdefg')
A couple of things to note here:
execis given a dict of global and local variables. 99% of the times it’s a good idea to provide it withglobals()for the global variables, and a local dict for the locals (unless the function you’re defining modifies the global environment, but this isn’t recommended).- The code string passed to
execis stripped of leading and trailing whitespace. This is because the function definition is indented, and Python doens’t like an indentation for no scope reason. - Using
create_functionwill not work if you place it in a separate file frommake_packet_extra, becauseglobals()returns the dictionary of the module where it is defined, not the module where it is called
Turning a callable into an iterator
Suppose you have the following extremely useful class somewhere. It has already been defined and used, and you can’t change it:
class RandomChunker(object): """ Returns random chunks from the string provided at creation time. """ def __init__(self, str, a=1, b=4): self.str = str self.a = a self.b = b self.pos = 0 def chunk(self): """ Return the next random chunk from the input string. When the string's end has been reached, None is returned. """ if self.pos >= len(self.str): return None chunk_size = randint(self.a, self.b) if chunk_size > len(self.str) - self.pos: chunk_size = len(self.str) - self.pos ret = self.str[self.pos:self.pos+chunk_size] self.pos += chunk_size return ret
It implements a quite common idiom: return useful values while they exist, and None (or EOF, or any other end value) when there’s nothing more to return.
How do you comfortably iterate over all the values of such a class/function ? Here’s one way:
rc = RandomChunker("abracadabra12345") while 1: chunk = rc.chunk() if chunk is None: break print chunk
This isn’t very comfortable… There’s a better way - using the iter function:
rc = RandomChunker("abracadabra12345") for chunk in iter(rc.chunk, None): print chunk
Much prettier, isn’t it ? With this method, you can also return all chunks at once:
all_chunks = list(iter(rc.chunk, None)) print all_chunks
Default values for a dictionary
Suppose you have a list of words, and you want to create a dict with a wordcount - each word mapped to the amount of times it appears in the list. Here’s a solution with defaultdict:
from collections import defaultdict def elemcount(elems): count = defaultdict(lambda: 0) for e in elems: count[e] += 1 return count count = elemcount(['ax', 'ex', 'bx', 'ex', 'ex', 'bx']) for ec in count: print ec, count[ec]
defaultdict enables us to implicitly initialize all the dictionary values which are accessed for reading to known values, and solve this problem gracefully. Without it, we’d have to check for the existence of e in the dict and explicitly initialize it.
The immutability of Python strings
Did this ever happen to you ?
>>> name = 'big foot' >>> name[2] 'g' >>> name[2] = 'G' Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: 'str' object does not support item assignment >>> Yikes !!
The strings in Python are immutable, just like numbers and tuples. This means that you can create them, move them around, but not change them. Why is this so ? For a few reasons (you can find a better discussion online):
- By design, strings in Python are considered elemental and unchangeable. This spurs better, safer programming styles.
- The immutability of strings has efficiency benefits, chiefly in the area of lower storage requirements.
- It also makes strings safer to use as dictionary keys
If you look around the Python web a little, you’ll notice that the most frequent advice to “how to change my string” is “design your code so that you won’t have to change it”. Fair enough, but what other options are there ? Here are a few:
name = name[:2] + 'G' + name[3:]- this is an inefficient way to do the job. Python’s slice semantics ensure that this works correctly in all cases (as long as your index is in range), but involving several string copies and concatenations, it’s hardly your best shot at efficient code. Although if you don’t care for that (and most chances are you don’t), it’s a solid solution.- Use the
MutableStringclass from moduleUserString. While no more efficient than the previous method (it performs the same trick under the hood), it is more consistent syntactically with normal string usage. - Use a list instead of a string to store mutable data. Convert back and forth using
listandjoin. Depending on what you really need,ordandchrmay also be useful. - Use an
arrayobject. This is perhaps your best option if you use the string to hold constrained data, such as ‘binary’ bytes.
If you have comments on this page, please post them here, or drop me an email
