Welcome


... to my place on the web - a personal website where I keep articles, freeware programs and code snippets I've written, and a weblog where I share my ideas and insights on programming, technology, books, and life in general.

To browse the website, use the site map on the right-hand side of the screen. Below, you will find the ten most recent posts.

Commenting out a block of code with Vim

August 25th, 2010 at 5:17 am

After some looking-around I’ve found two interesting techniques to comment (or comment out) a block of code:

Substitution on a range of lines

Given a range of lines, a simple substitution command may add or remove comments. For example, for Python code s/^/#/ and s/^#// will add or remove a comment from the beginning of the line, respectively.

To make a range of lines for this operation you can use any Vim technique, like for example explicitly specifying the range:

:M,N s/^/#/

Will comment out lines in the inclusive range [M:N].

A simpler way is to use the visual selection mode, by pressing V (capital v for line selection), selecting the needed lines and then executing the substitution command.

Using block-mode visual selection

  1. Move to the first column in the first line you want to comment-out.
  2. Press Ctrl-V to start block-mode selection.
  3. Move down to select the first column of a block of lines.
  4. Press I and then the desired comment starter (i.e. #)
  5. Press ESC and the insertion will be applied to the whole block.

To uncomment with this techniques follow the directions but instead of I use x to delete the first char.

Others…

If you have other techniques to suggest, please let me know. Also if you’re familiar with a good plugin that makes this really easy and also knows which types of source code require which comment chars, I’d like to hear about it.

Some problems installing Ubuntu 10.04 on VirtualBox

August 18th, 2010 at 7:48 am

The HD space I allocated to my previous Ubuntu VM ran out, so instead of employing tricks to increase the HD, I decided to create a new VM and upgrade to 10.04 while I’m at it. All my data is quite easy to back-up and recover, and I’m still having the old VM around, just in case.

I downloaded a fresh 10.04 .iso from a torrent but it wouldn’t boot in VirtualBox. After selecting “Install Ubuntu” in the welcome screen it would just hang with an empty black screen.

What eventually solved the problem was installing a new version of VirtualBox (3.2.8, while previously I ran 3.0.12). Re-installing VirbualBox was a painful process since it decided it must disconnect me from the network and it took a reboot to cause my network to work again.

Once installed, however, VirtualBox 3.2.8 installed Ubuntu 10.04 smoothly, with network, shared folders and guest add-ons all working very well.

A new gaming addiction

August 14th, 2010 at 3:01 pm

And Blizzard is, once again, to blame for my complete lack of productivity lately.

Wow, what a game! Complete addiction (the kind when you have bunker deployment tactics in your dreams). And the worst part is that they’ve made it almost infinitely re-playable with a system of achievements. For example, today I’ve finished the main campaign:

But I can’t help thinking about going through the challenges, trying some missed-choice missions in the campaign, and even engaging in multi-player. By the way, my battle.net Real ID is my email: eliben@gmail.com.

P.S. Ron, I know you want to.

Contributing to Python

July 23rd, 2010 at 5:07 pm

I’ve been involved in open-source projects almost since the first days of my "serious" programming (back in 1998), but these were always projects I started myself. I’ve long been thinking about joining one of the big and established open-source projects, both to make a contribution and to improve my own skills by working with some great people on interesting things.

Once I started tinkering with Python around two years ago, it became the major candidate for my contribution – both because working on to Python can really make a difference for a huge amount of users, and because Python’s inner development circles include some of the brightest programmers I ever ran into. Joining this clique, even as a humble minor contributor, is very appealing.

So, a few weeks ago, inspired a couple of articles, I’ve finally made the plunge.

http://eli.thegreenplace.net/wp-content/uploads/2010/07/smilingpython.gif

For now, my contributions are very minor: I’ve been involved in a few issues, and made several patches. A few were even committed into Python – one documentation patch and two patches fixing bugs in the trace.py module in Python 3.x

I’m also "in progress" on several other issues, dealing with the trace.py module (improving its documentation, adding unit tests and debugging some issues with 3.x), documentation fixes for some standard library modules and a bug fix for difflib. Once you make the first step, finding more things to work on is quite easy. Python’s code and documentation are of relatively high quality, but like in any major software project, there’s place for improvement almost everywhere you look, even if the improvements are very minor (making the documentation more consistently formatted or clearer).

A few words on how I work on Python.

Although Python is well-supported on Windows and can be built on it without much trouble, Linux is the most convenient platform to use for development IMO. I’m using a Ubuntu VM running on VirtualBox on top of my Windows XP machine.

Python’s code is kept in a Subversion repository, to which you can get a read-only access when you’re not a core committer. It means you can’t really interact with the repository, and if you want to save your temporary work, you’re on your own.

Luckily, Python is in the process of moving to Mercurial, and already has a functional mirror set up. Mercurial is a much better SCM tool for this purpose, because it allows you to work locally with your repository, only pulling changes from the official one when necessary.

Here’s my workflow with the Mercurial mirror of Python:

http://eli.thegreenplace.net/wp-content/uploads/2010/07/pythonrepos.png

My local Mercurial repo is where I do all my hacking, occasionally backing-up to my personal clone at code.google.com. This lets me explore various ideas, create temporary fixes, all of this with full version control. From time to time, I’m pulling a fresh snapshot from Python’s official Mercurial mirror to get back on track, but I will always be able to get back to my own changes, because everything is safely stored in the history of my repo.

However, I still keep the SVN checkouts around, because:

  1. I want to make sure my changes work on a clean check-out from Python’s official repository, which is still SVN.
  2. I create patches against the SVN repo (with svn diff), because Mercurial creates slightly different diffs. Since committers actually commit into the SVN repo, this makes their lives easier.

It’s easy to keep several versions of Python around. For example, I have the repositories for the 3.x development branch (both Mercurial for hacking and SVN for patches), plus the 2.7 and 2.6 maintenance branches. To get a new version/branch all one needs is:

  1. Check it out from SVN or clone from Mercurial
  2. configure and then make
  3. Create a link somewhere on PATH to the relevant executable (for example I have in ~/bin a link named py27 for the 2.7 version, py3d for the debug build of the latest 3.x, and so on). The Python interpreter, once executed, knows where to find its own libraries, making it very simple to work with several versions of Python simultaneously.

To conclude, now you know what’s been keeping me busy in the past month or so. Contributing to Python is something I’ve long wanted doing, and I’m happy that I finally started. It turned out to be much less difficult than I originally expected, and I now firmly believe that any competent developer with the desire to help and some free time on his hands can become a contributor.

P.S. I had the privilege of receiving useful guidance from Terry Reedy, and I’d like to thank him for that. We still cooperate on several issues, and I hope we’ll continue working together. "Pair-contribution" seems like an interesting model the Python community may want to look into. I also want to thank Alexander Belopolsky for getting my fixes for trace.py quickly committed.

Enhorabuena a España

July 12th, 2010 at 3:42 pm

It was a tough game against Holland’s karate^H^H^H^H^H… err, football team, but Spain deserved to win.

Earlier this year my brother and I went to Barcelona where we saw this in the window of some bank:

It says that it will provide a 3% yearly revenue on some kind of long-term deposit, and will increase the revenue to 4% if Spain gains the world cup. This is what sports fanaticism looks like!

Summary of reading: May – June 2010

July 8th, 2010 at 7:33 pm

Recently writing a full review for each and every book I read has become tiresome, so I’m changing the approach. Once in a few months I’m going to post the list of books I’ve read and re-read in that period, with very short reviews (up to a few sentences). Certain books may encourage me to write fuller reviews, which I’ll just do in the usual manner.

So, for May – June 2010 the list is:

  • "On Writing Well" by William Zinsser – A good book on improving one’s writing skills.
  • "Burning Bright" by John Steinbeck – An unusual short story structured as a play. Quick and fun to read.
  • "The Solitude of Prime Numbers" by Paolo Giordano (read in Spanish) – a sad story, beautifully written. Though rough-edged in some places, this book was very enjoyable.
  • "Sweet Thursday" by John Steinbeck – Sequel to "Cannery Row". Though somewhat less original, it’s still fun to read.

Re-reads (books I’ve already read in the past, and have recently re-read):

  • “Programming Pearls” by Jon Bentley
  • “The Moral Animal” by Robert Wright

Python internals: adding a new statement to Python

June 30th, 2010 at 7:18 pm

This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I’m taking a hands-on approach here: I’m going to add an until statement to Python.

All the coding for this article was done against the cutting-edge Py3k branch in the Python Mercurial repository mirror.

The until statement

Some languages, like Ruby, have an until statement, which is the complement to while (until num == 0 is equivalent to while num != 0). In Ruby, I can write:

num = 3
until num == 0 do
  puts num
  num -= 1
end

And it will print:

3
2
1

So, I want to add a similar capability to Python. That is, being able to write:

num = 3
until num == 0:
  print(num)
  num -= 1

A language-advocacy digression

This article doesn’t attempt to suggest the addition of an until statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python’s philosophy of minimalism. All I’m trying to do here, really, is gain some insight into the inner workings of Python.

Modifying the grammar

Python uses a custom parser generator named pgen. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file Grammar/Grammar [1]. This is a simple text file that specifies the grammar of Python.

Two modifications have to be made to the grammar file. The first is to add a definition for the until statement. I found where the while statement was defined (while_stmt), and added until_stmt below [2]:

compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite

Note that I’ve decided to exclude the else clause from my definition of until, just to make it a little bit different (and because frankly I dislike the else clause of loops and don’t think it fits well with the Zen of Python).

The second change is to modify the rule for compound_stmt to include until_stmt, as you can see in the snippet above. It’s right after while_stmt, again.

When you run make after modifying Grammar/Grammar, notice that the pgen program is run to re-generate Include/graminit.h and Python/graminit.c, and then several files get re-compiled.

Modifying the AST generation code

After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are much simpler to work with in subsequent stages of the compilation process.

So, we’re going to visit Parser/Python.asdl which defines the structure of Python’s ASTs and add an AST node for our new until statement, again right below the while:

| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)

If you now run make, notice that before compiling a bunch of files, Parser/asdl_c.py is run to generate C code from the AST definition file. This (like Grammar/Grammar) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since Parser/asdl_c.py is a Python script, this is a kind of bootstrapping – to build Python from scratch, Python already has to be available.

While Parser/asdl_c.py generated the code to manage our newly defined AST node (into the files Include/Python-ast.h and Python/Python-ast.c), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file Python/ast.c. There, a function named ast_for_stmt converts parse tree nodes for statements into AST nodes. Again, guided by our old friend while, we jump right into the big switch for handling compound statements and add a clause for until_stmt:

case while_stmt:
    return ast_for_while_stmt(c, ch);
case until_stmt:
    return ast_for_until_stmt(c, ch);

Now we should implement ast_for_until_stmt. Here it is:

static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
    /* until_stmt: 'until' test ':' suite */
    REQ(n, until_stmt);

    if (NCH(n) == 4) {
        expr_ty expression;
        asdl_seq *suite_seq;

        expression = ast_for_expr(c, CHILD(n, 1));
        if (!expression)
            return NULL;
        suite_seq = ast_for_suite(c, CHILD(n, 3));
        if (!suite_seq)
            return NULL;
        return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
    }

    PyErr_Format(PyExc_SystemError,
                 "wrong number of tokens for 'until' statement: %d",
                 NCH(n));
    return NULL;
}

Again, this was coded while closely looking at the equivalent ast_for_while_stmt, with the difference that for until I’ve decided not to support the else clause. As expected, the AST is created recursively, using other AST creating functions like ast_for_expr for the condition expression and ast_for_suite for the body of the until statement. Finally, a new node named Until is returned.

Note that we access the parse-tree node n using some macros like NCH and CHILD. These are worth understanding – their code is in Include/node.h.

Digression: AST composition

I chose to create a new type of AST for the until statement, but actually this isn’t necessary. I could’ve saved some work and implemented the new functionality using composition of existing AST nodes, since:

until condition:
   # do stuff

Is functionally equivalent to:

while not condition:
  # do stuff

Instead of creating the Until node in ast_for_until_stmt, I could have created a Not node with an While node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped.

Compiling ASTs into bytecode

The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article.

The code we will look at next is Python/compile.c. Following the lead of while, we find the function compiler_visit_stmt, which is responsible for compiling statements into bytecode. We add a clause for Until:

case While_kind:
    return compiler_while(c, s);
case Until_kind:
    return compiler_until(c, s);

If you wonder what Until_kind is, it’s a constant (actually a value of the _stmt_kind enumeration) automatically generated from the AST definition file into Include/Python-ast.h. Anyway, we call compiler_until which, of course, still doesn’t exist. I’ll get to it an a moment.

If you’re curious like me, you’ll notice that compiler_visit_stmt is peculiar. No amount of grep-ping the source tree reveals where it is called. When this is the case, only one option remains – C macro-fu. Indeed, a short investigation leads us to the VISIT macro defined in Python/compile.c:

#define VISIT(C, TYPE, V) {\
    if (!compiler_visit_ ## TYPE((C), (V))) \
        return 0; \

It’s used to invoke compiler_visit_stmt in compiler_body. Back to our business, however…

As promised, here’s compiler_until:

static int
compiler_until(struct compiler *c, stmt_ty s)
{
    basicblock *loop, *end, *anchor = NULL;
    int constant = expr_constant(s->v.Until.test);

    if (constant == 1) {
        return 1;
    }
    loop = compiler_new_block(c);
    end = compiler_new_block(c);
    if (constant == -1) {
        anchor = compiler_new_block(c);
        if (anchor == NULL)
            return 0;
    }
    if (loop == NULL || end == NULL)
        return 0;

    ADDOP_JREL(c, SETUP_LOOP, end);
    compiler_use_next_block(c, loop);
    if (!compiler_push_fblock(c, LOOP, loop))
        return 0;
    if (constant == -1) {
        VISIT(c, expr, s->v.Until.test);
        ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
    }
    VISIT_SEQ(c, stmt, s->v.Until.body);
    ADDOP_JABS(c, JUMP_ABSOLUTE, loop);

    if (constant == -1) {
        compiler_use_next_block(c, anchor);
        ADDOP(c, POP_BLOCK);
    }
    compiler_pop_fblock(c, LOOP, loop);
    compiler_use_next_block(c, end);

    return 1;
}

I have a confession to make: this code wasn’t written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin compiler_while function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the dis module, which has a list of Python bytecodes with descriptions, it’s possible to understand what’s going on.

That’s it, we’re done… Aren’t we?

After making all the changes and running make, we can run the newly compiled Python and try our new until statement:

>>> until num == 0:
...   print(num)
...   num -= 1
...
3
2
1

Voila, it works! Let’s see the bytecode created for the new statement by using the dis module as follows:

import dis

def myfoo(num):
    until num == 0:
        print(num)
        num -= 1

dis.dis(myfoo)

Here’s the result:

4           0 SETUP_LOOP              36 (to 39)
      >>    3 LOAD_FAST                0 (num)
            6 LOAD_CONST               1 (0)
            9 COMPARE_OP               2 (==)
           12 POP_JUMP_IF_TRUE        38

5          15 LOAD_NAME                0 (print)
           18 LOAD_FAST                0 (num)
           21 CALL_FUNCTION            1
           24 POP_TOP

6          25 LOAD_FAST                0 (num)
           28 LOAD_CONST               2 (1)
           31 INPLACE_SUBTRACT
           32 STORE_FAST               0 (num)
           35 JUMP_ABSOLUTE            3
      >>   38 POP_BLOCK
      >>   39 LOAD_CONST               0 (None)
           42 RETURN_VALUE

The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for until. If the jump isn’t executed, the loop body keeps running until it jumps back to the condition at operation 35.

Feeling good about my change, I then tried running the function (executing myfoo(3)) instead of showing its bytecode. The result was less than encouraging:

Traceback (most recent call last):
  File "zy.py", line 9, in <module>
    myfoo(3)
  File "zy.py", line 5, in myfoo
    print(num)
SystemError: no locals when loading 'print'

Whoa… this can’t be good. So what went wrong?

The case of the missing symbol table

One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to PySymtable_Build in PyAST_Compile calls into the symbol table module (Python/symtable.c), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope.

To fix the problem, we have to modify the symtable_visit_stmt function in Python/symtable.c, adding code for handling until statements, after the similar code for while statements [3]:

case While_kind:
    VISIT(st, expr, s->v.While.test);
    VISIT_SEQ(st, stmt, s->v.While.body);
    if (s->v.While.orelse)
        VISIT_SEQ(st, stmt, s->v.While.orelse);
    break;
case Until_kind:
    VISIT(st, expr, s->v.Until.test);
    VISIT_SEQ(st, stmt, s->v.Until.body);
    break;

And now we really are done. Compiling the source after this change makes the execution of myfoo(3) work as expected.

Conclusion

In this article I’ve demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn’t difficult to implement, because I used a similar and existing statement as a guideline.

The Python compiler is a sophisticated chunk of software, and I don’t claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler’s principles and source code. It will serve as a base for future articles that will get deeper into the compiler.

References

I used a few excellent references for the construction of this article. Here they are, in no particular order:

  • PEP 339: Design of the CPython compiler – probably the most important and comprehensive piece of official documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python.
  • "Python Compiler Internals" – an article by Thomas Lee
  • "Python: Design and Implementation" – a presentation by Guido van Rossum
  • Python (2.5) Virtual Machine, A guided tour – a presentation by Peter Tröger
http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1] From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run configure and make to build Python.
[2] This demonstrates a common technique I use when modifying source code I’m not familiar with: work by similarity. This principle won’t solve all your problems, but it can definitely ease the process. Since everything that has to be done for while also has to be done for until, it serves as a pretty good guideline.
[3] By the way, without this code there’s a compiler warning for Python/symtable.c. The compiler notices that the Until_kind enumeration value isn’t handled in the switch statement of symtable_visit_stmt and complains. It’s always important to check for compiler warnings!

AES encryption of files in Python with PyCrypto

June 25th, 2010 at 6:26 pm

The PyCrypto module seems to provide all one needs for employing strong cryptography in a program. It wraps a highly optimized C implementation of many popular encryption algorithms with a Python interface. PyCrypto can be built from source on Linux, and Windows binaries for various versions of Python 2.x were kindly made available by Michael Foord on this page.

My only gripe with PyCrypto is its documentation. The auto-generated API doc is next to useless, and this overview is somewhat dated and didn’t address the questions I had about the module. It isn’t surprising that a few modules were created just to provide simpler and better documented wrappers around PyCrypto.

In this article I want to present how to use PyCrypto for simple symmetric encryption and decryption of files using the AES algorithm.

Simple AES encryption

Here’s how one can encrypt a string with AES:

from Crypto.Cipher import AES

key = '0123456789abcdef'
mode = AES.MODE_CBC
encryptor = AES.new(key, mode)

text = 'j' * 64 + 'i' * 128
ciphertext = encryptor.encrypt(text)

Since the PyCrypto block-level encryption API is very low-level, it expects your key to be either 16, 24 or 32 bytes long (for AES-128, AES-196 and AES-256, respectively). The longer the key, the stronger the encryption.

Having keys of exact length isn’t very convenient, as you sometimes want to use some mnemonic password for the key. In this case I recommend picking a password and then using the SHA-256 digest algorithm from hashlib to generate a 32-byte key from it. Just replace the assignment to key in the code above with:

import hashlib

password = 'kitty'
key = hashlib.sha256(password).digest()

Keep in mind that this 32-byte key only has as much entropy as your original password. So be wary of brute-force password guessing, and pick a relatively strong password (kitty probably won’t do). What’s useful about this technique is that you don’t have to worry about manually padding your password – SHA-256 will scramble a 32-byte block out of any password for you.

The next thing the code does is set the block mode of AES. I won’t get into all the details, but unless you have some special requirements, CBC should be good enough for you.

We create a new AES encryptor object with Crypto.Cipher.AES.new, and give it the encryption key and the mode. Next comes the encryption itself. Again, since the API is low-level, the encrypt method expects your input to consist of an integral number of 16-byte blocks (16 is the size of the basic AES block).

The encryptor object has an internal state when used in the CBC mode, so if you try to encrypt the same text with the same encryptor once again – you will get different results. So be careful to create a fresh AES encryptor object for any encryption/decryption job.

Decryption

To decrypt the ciphertext, simply add:

decryptor = AES.new(key, mode)
plain = decryptor.decrypt(ciphertext)

And you get your plaintext back again.

A word about the initialization vector

The initialization vector (IV) is an important part of block encryption algorithms that work in chained modes like CBC. For the simple example above I’ve ignored the IV, but for a more serious application this is a grave mistake. I don’t want to get too deep into cryptographic theory here, but it suffices to say that the IV is as important as the salt in hashed passwords, and the lack of correct IV usage led to the cracking of the WEP encryption for wireless LAN.

PyCrypto allows one to pass an IV into the AES.new creator function. For maximal security, the IV should be randomly generated for every new encryption and can be stored together with the ciphertext. Knowledge of the IV won’t help the attacker crack your encryption. What can help him, however, is your reusing the same IV with the same encryption key for multiple encryptions.

Encrypting and decrypting files

The following function encrypts a file of any size. It makes sure to pad the file to a multiple of the AES block length , and also handles the random generation of IV.

import os, random, struct
from Crypto.Cipher import AES

def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    """ Encrypts a file using AES (CBC mode) with the
        given key.

        key:
            The encryption key - a string that must be
            either 16, 24 or 32 bytes long. Longer keys
            are more secure.

        in_filename:
            Name of the input file

        out_filename:
            If None, '<in_filename>.enc' will be used.

        chunksize:
            Sets the size of the chunk which the function
            uses to read and encrypt the file. Larger chunk
            sizes can be faster for some files and machines.
            chunksize must be divisible by 16.
    """
    if not out_filename:
        out_filename = in_filename + '.enc'

    iv = ''.join(chr(random.randint(0, 0xFF)) for i in range(16))
    encryptor = AES.new(key, AES.MODE_CBC, iv)
    filesize = os.path.getsize(in_filename)

    with open(in_filename, 'rb') as infile:
        with open(out_filename, 'wb') as outfile:
            outfile.write(struct.pack('<Q', filesize))
            outfile.write(iv)

            while True:
                chunk = infile.read(chunksize)
                if len(chunk) == 0:
                    break
                elif len(chunk) % 16 != 0:
                    chunk += ' ' * (16 - len(chunk) % 16)

                outfile.write(encryptor.encrypt(chunk))

Since it might have to pad the file to fit into a multiple of 16, the function saves the original file size in the first 8 bytes of the output file (more precisely, the first sizeof(long long) bytes). It randomly generates a 16-byte IV and stores it in the file as well. Then, it reads the input file chunk by chunk (with chunk size configurable), encrypts the chunk and writes it to the output. The last chunk is padded with spaces, if required.

Working in chunks makes sure that large files can be efficiently processed without reading them wholly into memory. For example, with the default chunk size it takes about 1.2 seconds on my computer to encrypt a 50MB file. PyCrypto is fast!

Decrypting the file can be done with:

def decrypt_file(key, in_filename, out_filename=None, chunksize=24*1024):
    """ Decrypts a file using AES (CBC mode) with the
        given key. Parameters are similar to encrypt_file,
        with one difference: out_filename, if not supplied
        will be in_filename without its last extension
        (i.e. if in_filename is 'aaa.zip.enc' then
        out_filename will be 'aaa.zip')
    """
    if not out_filename:
        out_filename = os.path.splitext(in_filename)[0]

    with open(in_filename, 'rb') as infile:
        origsize = struct.unpack('<Q', infile.read(struct.calcsize('Q')))[0]
        iv = infile.read(16)
        decryptor = AES.new(key, AES.MODE_CBC, iv)

        with open(out_filename, 'wb') as outfile:
            while True:
                chunk = infile.read(chunksize)
                if len(chunk) == 0:
                    break
                outfile.write(decryptor.decrypt(chunk))

            outfile.truncate(origsize)

First the original size of the file is read from the first 8 bytes of the encrypted file. The IV is read next to correctly initialize the AES object. Then the file is decrypted in chunks, and finally it’s truncated to the original size, so the padding is thrown out.

Sad, but just

June 24th, 2010 at 6:19 pm

Squadra Azzurra, I love you, but you deserved it. There’s no one else to blame. In the last 15 minutes of this game we finally saw a real football team wearing blue, but it was too little, too late.

Boars in our backyard

June 18th, 2010 at 7:50 am

A large family of boars is frequently visiting our neighborhood to feed on acorns and drink water from the underground sprinkle tubes. Recently they’ve become very brave and arrive in full daylight. Here are some photos of the “baby boars” (much larger than they were a few weeks ago!) taken from my window at 8:20 AM this morning:

Boars

Boars

I also managed to run down the steps and get a closer shot standing just a couple of meters from them:

Boars

And a few more, 20 minutes later (when they fearlessly came back):

Boars

Boars

Boars