AES encryption of files in Python with PyCrypto

June 25th, 2010 at 6:26 pm

The PyCrypto module seems to provide all one needs for employing strong cryptography in a program. It wraps a highly optimized C implementation of many popular encryption algorithms with a Python interface. PyCrypto can be built from source on Linux, and Windows binaries for various versions of Python 2.x were kindly made available by Michael Foord on this page.

My only gripe with PyCrypto is its documentation. The auto-generated API doc is next to useless, and this overview is somewhat dated and didn’t address the questions I had about the module. It isn’t surprising that a few modules were created just to provide simpler and better documented wrappers around PyCrypto.

In this article I want to present how to use PyCrypto for simple symmetric encryption and decryption of files using the AES algorithm.

Simple AES encryption

Here’s how one can encrypt a string with AES:

from Crypto.Cipher import AES

key = '0123456789abcdef'
mode = AES.MODE_CBC
encryptor = AES.new(key, mode)

text = 'j' * 64 + 'i' * 128
ciphertext = encryptor.encrypt(text)

Since the PyCrypto block-level encryption API is very low-level, it expects your key to be either 16, 24 or 32 bytes long (for AES-128, AES-196 and AES-256, respectively). The longer the key, the stronger the encryption.

Having keys of exact length isn’t very convenient, as you sometimes want to use some mnemonic password for the key. In this case I recommend picking a password and then using the SHA-256 digest algorithm from hashlib to generate a 32-byte key from it. Just replace the assignment to key in the code above with:

import hashlib

password = 'kitty'
key = hashlib.sha256(password).digest()

Keep in mind that this 32-byte key only has as much entropy as your original password. So be wary of brute-force password guessing, and pick a relatively strong password (kitty probably won’t do). What’s useful about this technique is that you don’t have to worry about manually padding your password – SHA-256 will scramble a 32-byte block out of any password for you.

The next thing the code does is set the block mode of AES. I won’t get into all the details, but unless you have some special requirements, CBC should be good enough for you.

We create a new AES encryptor object with Crypto.Cipher.AES.new, and give it the encryption key and the mode. Next comes the encryption itself. Again, since the API is low-level, the encrypt method expects your input to consist of an integral number of 16-byte blocks (16 is the size of the basic AES block).

The encryptor object has an internal state when used in the CBC mode, so if you try to encrypt the same text with the same encryptor once again – you will get different results. So be careful to create a fresh AES encryptor object for any encryption/decryption job.

Decryption

To decrypt the ciphertext, simply add:

decryptor = AES.new(key, mode)
plain = decryptor.decrypt(ciphertext)

And you get your plaintext back again.

A word about the initialization vector

The initialization vector (IV) is an important part of block encryption algorithms that work in chained modes like CBC. For the simple example above I’ve ignored the IV, but for a more serious application this is a grave mistake. I don’t want to get too deep into cryptographic theory here, but it suffices to say that the IV is as important as the salt in hashed passwords, and the lack of correct IV usage led to the cracking of the WEP encryption for wireless LAN.

PyCrypto allows one to pass an IV into the AES.new creator function. For maximal security, the IV should be randomly generated for every new encryption and can be stored together with the ciphertext. Knowledge of the IV won’t help the attacker crack your encryption. What can help him, however, is your reusing the same IV with the same encryption key for multiple encryptions.

Encrypting and decrypting files

The following function encrypts a file of any size. It makes sure to pad the file to a multiple of the AES block length , and also handles the random generation of IV.

import os, random, struct
from Crypto.Cipher import AES

def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    """ Encrypts a file using AES (CBC mode) with the
        given key.

        key:
            The encryption key - a string that must be
            either 16, 24 or 32 bytes long. Longer keys
            are more secure.

        in_filename:
            Name of the input file

        out_filename:
            If None, '<in_filename>.enc' will be used.

        chunksize:
            Sets the size of the chunk which the function
            uses to read and encrypt the file. Larger chunk
            sizes can be faster for some files and machines.
            chunksize must be divisible by 16.
    """
    if not out_filename:
        out_filename = in_filename + '.enc'

    iv = ''.join(chr(random.randint(0, 0xFF)) for i in range(16))
    encryptor = AES.new(key, AES.MODE_CBC, iv)
    filesize = os.path.getsize(in_filename)

    with open(in_filename, 'rb') as infile:
        with open(out_filename, 'wb') as outfile:
            outfile.write(struct.pack('<Q', filesize))
            outfile.write(iv)

            while True:
                chunk = infile.read(chunksize)
                if len(chunk) == 0:
                    break
                elif len(chunk) % 16 != 0:
                    chunk += ' ' * (16 - len(chunk) % 16)

                outfile.write(encryptor.encrypt(chunk))

Since it might have to pad the file to fit into a multiple of 16, the function saves the original file size in the first 8 bytes of the output file (more precisely, the first sizeof(long long) bytes). It randomly generates a 16-byte IV and stores it in the file as well. Then, it reads the input file chunk by chunk (with chunk size configurable), encrypts the chunk and writes it to the output. The last chunk is padded with spaces, if required.

Working in chunks makes sure that large files can be efficiently processed without reading them wholly into memory. For example, with the default chunk size it takes about 1.2 seconds on my computer to encrypt a 50MB file. PyCrypto is fast!

Decrypting the file can be done with:

def decrypt_file(key, in_filename, out_filename=None, chunksize=24*1024):
    """ Decrypts a file using AES (CBC mode) with the
        given key. Parameters are similar to encrypt_file,
        with one difference: out_filename, if not supplied
        will be in_filename without its last extension
        (i.e. if in_filename is 'aaa.zip.enc' then
        out_filename will be 'aaa.zip')
    """
    if not out_filename:
        out_filename = os.path.splitext(in_filename)[0]

    with open(in_filename, 'rb') as infile:
        origsize = struct.unpack('<Q', infile.read(struct.calcsize('Q')))[0]
        iv = infile.read(16)
        decryptor = AES.new(key, AES.MODE_CBC, iv)

        with open(out_filename, 'wb') as outfile:
            while True:
                chunk = infile.read(chunksize)
                if len(chunk) == 0:
                    break
                outfile.write(decryptor.decrypt(chunk))

            outfile.truncate(origsize)

First the original size of the file is read from the first 8 bytes of the encrypted file. The IV is read next to correctly initialize the AES object. Then the file is decrypted in chunks, and finally it’s truncated to the original size, so the padding is thrown out.

10 Responses to “AES encryption of files in Python with PyCrypto”

  1. Mike DriscollNo Gravatar Says:

    I agree that the docs were confusing at best. I spent several days working with the people on the PyCrypto mailing list to figure out how to decrypt something that should have taken me only a couple of minutes to figure out. Still, the guys on the list were quite helpful.

    Thanks for doing a little more documentation on this cool module here!

    - Mike

  2. TerrasqueNo Gravatar Says:

    Interesting post :)

    Was looking a bit on it earlier, and typed down the basics. Your encrypt / decrypt function is way more advanced, though :)

    Agree with the lack of documentation, needed some guessing and experimenting to get it to work.

    Going to post my notes here, hope you’ll find them useful, maybe get some new ideas :)

    My notes (showing syntax of random pool, sha256 and rsa):
    —————Note: might not be completely correct, but seems to work —-
    Random pool:
    Crypto.Util.randpool

    rp = randpool.RandomPool(numbytes) # numbytes = number of bytes of randomness in pool. Default 160
    rp.entropy # entropy left in class
    rp.add_event() # adds time since last call in the entropy – call often
    rp.stir() # Mix up the randomness pool
    random = rp.get_bytes(int) # get int number of random bytes

    SHA256:
    Crypto.Hash.SHA256
    sha = SHA256.new()
    sha.update(plain)
    digest = sha.digest()

    RSA
    Crypto.PublicKey.RSA
    priv = RSA.Generate(bits=2048, rp.get_bytes) #Generate keypair
    pub = keys.publickey()

    import pickle
    strpriv = pickle.dumps(priv) #Save keypair to string
    priv = pickle.loads(strpriv)

    strpub = pickle.dumps(pub) #Save pubkey to string
    pub = pickle.loads(strpub)

    #Encrypt data with pubkey, need privkey to decrypt
    enc = pub.encrypt(smalldata, “”) # len(Smalldata) <= keysize
    smalldata = priv.decrypt(enc)

    sign = priv.sign(hash, "")
    verified = pub.verify(hash, sign)

  3. TerrasqueNo Gravatar Says:

    Two small typos in RSA part:
    priv = RSA.generate(2048, rp.get_bytes) # small g
    pub = priv.publickey() #priv, not keys

  4. PNNo Gravatar Says:

    Strangeness. I only have the simple AES encryption running on my system (OS X 10.6.3, with Python 2.6.5 with the latest easy_install pycrypto applied).

    I ran the script with -i option to inspect the state and play around and get:

    >>> help(ciphertext)
    problem in <???c^?9p?Y?,?)E????rzݴe?1Į@?A
    i??J5_~-?o6b[?fnP?֎?;D?67?҆3sB
    ??8q??U??F?G???Y?{?Ni?t3"?”?H?{?DŽ??#*?????/??גK?(?c('??Ĺ?????b???&#C?Z}`?3<??y?ln
    aB&????sي????
    ??q ? – : __import__() argument 1 must be string without null bytes, not str

    Then my terminal sometimes starts printing garbage UNIX characters as if viewing binary data and needs to be reset.

    That can’t be good :-)

    I changed it to print values and I encounter the same, the shell is switched font as if printing binary characters.

    Will need to play some more.

  5. elibenNo Gravatar Says:

    @Terrasque,

    Thanks for the notes. I notice you use RandomPool – note that on the front page of pycrypto.org they warn in bold about not using this for random numbers.

    @PN,

    Don’t try to print binary data on a terminal. You can alwas use the encode('hex') method of a string to print it in hexadecimal.

  6. TerrasqueNo Gravatar Says:

    Ooh, that wasn’t there when I last checked the page! Thanks :) *updates notes*

    Seems like os.urandom is a good drop-in replacement.

    “RSA.generate(int bits(2048), os.urandom)” should work.

    From urandom documentation: “The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation”

  7. Waqas AmanNo Gravatar Says:

    Thanks for the “AES encryption of files in Python with PyCrypto”, seriously it helped alot for a newbie like me. I have been trying to XOR (using PyCrypto) two random numbers Rand-1 and Rand-2 and check whether I can get back the Rand-2 or not but, the result doesnt resemble with each other… I think i am coding it wornlgy…kindly correct me

    from Crypto.Util import randpool
    from Crypto.Cipher import XOR

    one = randpool.RandomPool()
    print “Rand-1 :: “,one.get_bytes(16)

    two = randpool.RandomPool()
    print “Rand-2 :: “,two.get_bytes(16)

    objXOR = XOR.new(one.get_bytes(16)) # setting the key as Rand-1
    xored_value = objXOR.encrypt(two.get_bytes(16))

    print “Rand-1 xor Rand-2 :: “, xored_value

    print “Decrypted text : “, objXOR.decrypt(xored_value)

    after decryption the decrypted xored_value and Rand-2(two) doesnt match up, which it SHOULD be…here are the results..

    Rand-1 :: †X@óÂ’ÕÅ¡ävòU‰Ú
    Rand-2 :: ニラQ%マÒ0â4ᅠà>BÎz
    Rand-1 xor Rand-2 :: Áø‡‹¢j4A
    h‹Í7ÞrÁ
    Decrypted text : ˿÷
    ЍᆴÛòU33ラeᄇヤ

    a little help will be appreciated !! where I went wrong in the coding….??

  8. MuhannadNo Gravatar Says:

    thanks a lot, very helpful , i was looking for this..i implemented AES encryption in python depending on a tutorial and some implementations in c# and php before i know about this module, but this one works very fast….thanks again.

  9. Melania PinchockNo Gravatar Says:

    Good article man Thanks

  10. DaniloNo Gravatar Says:

    FANTASTIC. You are a life-saver!

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)