AES encryption of files in Python with PyCrypto

June 25th, 2010 at 6:26 pm

[Updated 15.11.2013: passing IV is required in the new PyCrypto]

The PyCrypto module seems to provide all one needs for employing strong cryptography in a program. It wraps a highly optimized C implementation of many popular encryption algorithms with a Python interface. PyCrypto can be built from source on Linux, and Windows binaries for various versions of Python 2.x were kindly made available by Michael Foord on this page.

My only gripe with PyCrypto is its documentation. The auto-generated API doc is next to useless, and this overview is somewhat dated and didn’t address the questions I had about the module. It isn’t surprising that a few modules were created just to provide simpler and better documented wrappers around PyCrypto.

In this article I want to present how to use PyCrypto for simple symmetric encryption and decryption of files using the AES algorithm.

Simple AES encryption

Here’s how one can encrypt a string with AES:

from Crypto.Cipher import AES

key = '0123456789abcdef'
IV = 16 * '\x00'           # Initialization vector: discussed later
mode = AES.MODE_CBC
encryptor = AES.new(key, mode, IV=IV)

text = 'j' * 64 + 'i' * 128
ciphertext = encryptor.encrypt(text)

Since the PyCrypto block-level encryption API is very low-level, it expects your key to be either 16, 24 or 32 bytes long (for AES-128, AES-196 and AES-256, respectively). The longer the key, the stronger the encryption.

Having keys of exact length isn’t very convenient, as you sometimes want to use some mnemonic password for the key. In this case I recommend picking a password and then using the SHA-256 digest algorithm from hashlib to generate a 32-byte key from it. Just replace the assignment to key in the code above with:

import hashlib

password = 'kitty'
key = hashlib.sha256(password).digest()

Keep in mind that this 32-byte key only has as much entropy as your original password. So be wary of brute-force password guessing, and pick a relatively strong password (kitty probably won’t do). What’s useful about this technique is that you don’t have to worry about manually padding your password – SHA-256 will scramble a 32-byte block out of any password for you.

The next thing the code does is set the block mode of AES. I won’t get into all the details, but unless you have some special requirements, CBC should be good enough for you.

We create a new AES encryptor object with Crypto.Cipher.AES.new, and give it the encryption key and the mode. Next comes the encryption itself. Again, since the API is low-level, the encrypt method expects your input to consist of an integral number of 16-byte blocks (16 is the size of the basic AES block).

The encryptor object has an internal state when used in the CBC mode, so if you try to encrypt the same text with the same encryptor once again – you will get different results. So be careful to create a fresh AES encryptor object for any encryption/decryption job.

Decryption

To decrypt the ciphertext, simply add:

decryptor = AES.new(key, mode, IV=IV)
plain = decryptor.decrypt(ciphertext)

And you get your plaintext back again.

A word about the initialization vector

The initialization vector (IV) is an important part of block encryption algorithms that work in chained modes like CBC. For the simple example above I’ve ignored the IV (just using a buffer of zeros), but for a more serious application this is a grave mistake. I don’t want to get too deep into cryptographic theory here, but it suffices to say that the IV is as important as the salt in hashed passwords, and the lack of correct IV usage led to the cracking of the WEP encryption for wireless LAN.

PyCrypto allows one to pass an IV into the AES.new creator function. For maximal security, the IV should be randomly generated for every new encryption and can be stored together with the ciphertext. Knowledge of the IV won’t help the attacker crack your encryption. What can help him, however, is your reusing the same IV with the same encryption key for multiple encryptions.

Encrypting and decrypting files

The following function encrypts a file of any size. It makes sure to pad the file to a multiple of the AES block length , and also handles the random generation of IV.

import os, random, struct
from Crypto.Cipher import AES

def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    """ Encrypts a file using AES (CBC mode) with the
        given key.

        key:
            The encryption key - a string that must be
            either 16, 24 or 32 bytes long. Longer keys
            are more secure.

        in_filename:
            Name of the input file

        out_filename:
            If None, '<in_filename>.enc' will be used.

        chunksize:
            Sets the size of the chunk which the function
            uses to read and encrypt the file. Larger chunk
            sizes can be faster for some files and machines.
            chunksize must be divisible by 16.
    """
    if not out_filename:
        out_filename = in_filename + '.enc'

    iv = ''.join(chr(random.randint(0, 0xFF)) for i in range(16))
    encryptor = AES.new(key, AES.MODE_CBC, iv)
    filesize = os.path.getsize(in_filename)

    with open(in_filename, 'rb') as infile:
        with open(out_filename, 'wb') as outfile:
            outfile.write(struct.pack('<Q', filesize))
            outfile.write(iv)

            while True:
                chunk = infile.read(chunksize)
                if len(chunk) == 0:
                    break
                elif len(chunk) % 16 != 0:
                    chunk += ' ' * (16 - len(chunk) % 16)

                outfile.write(encryptor.encrypt(chunk))

Since it might have to pad the file to fit into a multiple of 16, the function saves the original file size in the first 8 bytes of the output file (more precisely, the first sizeof(long long) bytes). It randomly generates a 16-byte IV and stores it in the file as well. Then, it reads the input file chunk by chunk (with chunk size configurable), encrypts the chunk and writes it to the output. The last chunk is padded with spaces, if required.

Working in chunks makes sure that large files can be efficiently processed without reading them wholly into memory. For example, with the default chunk size it takes about 1.2 seconds on my computer to encrypt a 50MB file. PyCrypto is fast!

Decrypting the file can be done with:

def decrypt_file(key, in_filename, out_filename=None, chunksize=24*1024):
    """ Decrypts a file using AES (CBC mode) with the
        given key. Parameters are similar to encrypt_file,
        with one difference: out_filename, if not supplied
        will be in_filename without its last extension
        (i.e. if in_filename is 'aaa.zip.enc' then
        out_filename will be 'aaa.zip')
    """
    if not out_filename:
        out_filename = os.path.splitext(in_filename)[0]

    with open(in_filename, 'rb') as infile:
        origsize = struct.unpack('<Q', infile.read(struct.calcsize('Q')))[0]
        iv = infile.read(16)
        decryptor = AES.new(key, AES.MODE_CBC, iv)

        with open(out_filename, 'wb') as outfile:
            while True:
                chunk = infile.read(chunksize)
                if len(chunk) == 0:
                    break
                outfile.write(decryptor.decrypt(chunk))

            outfile.truncate(origsize)

First the original size of the file is read from the first 8 bytes of the encrypted file. The IV is read next to correctly initialize the AES object. Then the file is decrypted in chunks, and finally it’s truncated to the original size, so the padding is thrown out.

Related posts:

  1. Faking standard C header files for pycparser
  2. XML or YAML for configuration files
  3. splitting MP3 files
  4. Sharing files and printers through a firewall

56 Responses to “AES encryption of files in Python with PyCrypto”

  1. Mike DriscollNo Gravatar Says:

    I agree that the docs were confusing at best. I spent several days working with the people on the PyCrypto mailing list to figure out how to decrypt something that should have taken me only a couple of minutes to figure out. Still, the guys on the list were quite helpful.

    Thanks for doing a little more documentation on this cool module here!

    - Mike

  2. TerrasqueNo Gravatar Says:

    Interesting post :)

    Was looking a bit on it earlier, and typed down the basics. Your encrypt / decrypt function is way more advanced, though :)

    Agree with the lack of documentation, needed some guessing and experimenting to get it to work.

    Going to post my notes here, hope you’ll find them useful, maybe get some new ideas :)

    My notes (showing syntax of random pool, sha256 and rsa):
    —————Note: might not be completely correct, but seems to work —-
    Random pool:
    Crypto.Util.randpool

    rp = randpool.RandomPool(numbytes) # numbytes = number of bytes of randomness in pool. Default 160
    rp.entropy # entropy left in class
    rp.add_event() # adds time since last call in the entropy – call often
    rp.stir() # Mix up the randomness pool
    random = rp.get_bytes(int) # get int number of random bytes

    SHA256:
    Crypto.Hash.SHA256
    sha = SHA256.new()
    sha.update(plain)
    digest = sha.digest()

    RSA
    Crypto.PublicKey.RSA
    priv = RSA.Generate(bits=2048, rp.get_bytes) #Generate keypair
    pub = keys.publickey()

    import pickle
    strpriv = pickle.dumps(priv) #Save keypair to string
    priv = pickle.loads(strpriv)

    strpub = pickle.dumps(pub) #Save pubkey to string
    pub = pickle.loads(strpub)

    #Encrypt data with pubkey, need privkey to decrypt
    enc = pub.encrypt(smalldata, “”) # len(Smalldata) <= keysize
    smalldata = priv.decrypt(enc)

    sign = priv.sign(hash, "")
    verified = pub.verify(hash, sign)

  3. TerrasqueNo Gravatar Says:

    Two small typos in RSA part:
    priv = RSA.generate(2048, rp.get_bytes) # small g
    pub = priv.publickey() #priv, not keys

  4. PNNo Gravatar Says:

    Strangeness. I only have the simple AES encryption running on my system (OS X 10.6.3, with Python 2.6.5 with the latest easy_install pycrypto applied).

    I ran the script with -i option to inspect the state and play around and get:

    >>> help(ciphertext)
    problem in <???c^?9p?Y?,?)E????rzݴe?1Į@?A
    i??J5_~-?o6b[?fnP?֎?;D?67?҆3sB
    ??8q??U??F?G???Y?{?Ni?t3"?”?H?{?DŽ??#*?????/??גK?(?c('??Ĺ?????b???&#C?Z}`?3<??y?ln
    aB&????sي????
    ??q ? – : __import__() argument 1 must be string without null bytes, not str

    Then my terminal sometimes starts printing garbage UNIX characters as if viewing binary data and needs to be reset.

    That can’t be good :-)

    I changed it to print values and I encounter the same, the shell is switched font as if printing binary characters.

    Will need to play some more.

  5. elibenNo Gravatar Says:

    @Terrasque,

    Thanks for the notes. I notice you use RandomPool – note that on the front page of pycrypto.org they warn in bold about not using this for random numbers.

    @PN,

    Don’t try to print binary data on a terminal. You can alwas use the encode('hex') method of a string to print it in hexadecimal.

  6. TerrasqueNo Gravatar Says:

    Ooh, that wasn’t there when I last checked the page! Thanks :) *updates notes*

    Seems like os.urandom is a good drop-in replacement.

    “RSA.generate(int bits(2048), os.urandom)” should work.

    From urandom documentation: “The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation”

  7. Waqas AmanNo Gravatar Says:

    Thanks for the “AES encryption of files in Python with PyCrypto”, seriously it helped alot for a newbie like me. I have been trying to XOR (using PyCrypto) two random numbers Rand-1 and Rand-2 and check whether I can get back the Rand-2 or not but, the result doesnt resemble with each other… I think i am coding it wornlgy…kindly correct me

    from Crypto.Util import randpool
    from Crypto.Cipher import XOR

    one = randpool.RandomPool()
    print “Rand-1 :: “,one.get_bytes(16)

    two = randpool.RandomPool()
    print “Rand-2 :: “,two.get_bytes(16)

    objXOR = XOR.new(one.get_bytes(16)) # setting the key as Rand-1
    xored_value = objXOR.encrypt(two.get_bytes(16))

    print “Rand-1 xor Rand-2 :: “, xored_value

    print “Decrypted text : “, objXOR.decrypt(xored_value)

    after decryption the decrypted xored_value and Rand-2(two) doesnt match up, which it SHOULD be…here are the results..

    Rand-1 :: †X@óÂ’ÕÅ¡ävòU‰Ú
    Rand-2 :: ニラQ%マÒ0â4ᅠà>BÎz
    Rand-1 xor Rand-2 :: Áø‡‹¢j4A
    h‹Í7ÞrÁ
    Decrypted text : ˿÷
    ЍᆴÛòU33ラeᄇヤ

    a little help will be appreciated !! where I went wrong in the coding….??

  8. MuhannadNo Gravatar Says:

    thanks a lot, very helpful , i was looking for this..i implemented AES encryption in python depending on a tutorial and some implementations in c# and php before i know about this module, but this one works very fast….thanks again.

  9. Melania PinchockNo Gravatar Says:

    Good article man Thanks

  10. DaniloNo Gravatar Says:

    FANTASTIC. You are a life-saver!

  11. fooNo Gravatar Says:

    Thank you so much for this!!!!!!

  12. IrishNo Gravatar Says:

    thank you man! that was very helpful!
    thanks!

  13. SteveNo Gravatar Says:

    Brilliant! Thank you very much for posting this. It saved me tons of time.

  14. J. N.No Gravatar Says:

    I was looking to implement reasonable encryption for a string that I’m passing over a wire. It’s probably overkill, so I was looking for something quick to implement. I found your code, and figured I could modify it to use StringIO pretty easily, so that’s what I did. I like the idea of being able to pass a file or a StringIO to the same encryption / decryption methods.

    Thanks for the post!

    –J.

    #!/usr/bin/env python
    #
    # Code adapted from: http://eli.thegreenplace.net/2010/06/25/aes-encryption-of-files-in-python-with-pycrypto/
    #
    #
    
    import os, random, struct
    from Crypto.Cipher import AES
    from StringIO import StringIO
    import hashlib
    import base64
    
    ## def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
    ## ( This is an adaptation from using filenames in order that StringIO can be used to encrypt a string. )
    ## Note: If in_file / out_file is provided, open with +b!
    
    def encrypt_file(key, in_file, out_file=None, chunksize=64*1024):
        """ Encrypts a file using AES (CBC mode) with the
            given key.
    
            key:
                The encryption key - a string that must be
                either 16, 24 or 32 bytes long. Longer keys
                are more secure.
    
            in_file:
                Input file
    
            out_file:
                If None, a StringIO will be returned.
    
            chunksize:
                Sets the size of the chunk which the function
                uses to read and encrypt the file. Larger chunk
                sizes can be faster for some files and machines.
                chunksize must be divisible by 16.
        """
        if not out_file:
            out_file = StringIO()
    
        iv = ''.join(chr(random.randint(0, 0xFF)) for i in range(16))
        encryptor = AES.new(key, AES.MODE_CBC, iv)
    
        in_file.seek(0,2)
        filesize=in_file.tell()
        in_file.seek(0)
    
        # filesize = os.path.getsize(in_file)
    
        infile=in_file
    
        outfile=out_file
        outfile.seek(0)
    
        outfile.write(struct.pack('<Q', filesize))
        outfile.write(iv)
    
        while True:
    
            chunk = infile.read(chunksize)
            if len(chunk) == 0:
                break
            elif len(chunk) % 16 != 0:
                chunk += ' ' * (16 - len(chunk) % 16)
    
            outfile.write(encryptor.encrypt(chunk))
    
        outfile.seek(0)
        return outfile
    
    ## def decrypt_file(key, in_filename, out_filename=None, chunksize=24*1024):
    ## ( This is an adaptation from using filenames in order that StringIO can be used to encrypt a string. )
    ## Note: If in_file / out_file is provided, open with +b!
    
    def decrypt_file(key, in_file, out_file=None, chunksize=24*1024):
        """ Decrypts a file using AES (CBC mode) with the
            given key. Parameters are similar to encrypt_file.
        """
        if not out_file:
            out_file = StringIO()
    
        infile=in_file
        infile.seek(0)
    
        outfile=out_file
        outfile.seek(0)
    
        origsize = struct.unpack('<Q', infile.read(struct.calcsize('Q')))[0]
        iv = infile.read(16)
        decryptor = AES.new(key, AES.MODE_CBC, iv)
    
        while True:
            chunk = infile.read(chunksize)
            if len(chunk) == 0:
                break
            outfile.write(decryptor.decrypt(chunk))
    
        outfile.truncate(origsize)
    
        outfile.seek(0)
        return outfile
    
    # Method suggested by Eli by turn mnemonic password into 32 byte key.
    def getHashKey(aKey):
        return hashlib.sha256(aKey).digest()
    
    # My ( J. Norment's ) Additions
    
    def getInFile(aFileName=None):
    
        if not aFileName:
            return StringIO()
        else:
            return open(aFileName,'rb')
    
    def getOutFile(aFileName=None):
    
        if not aFileName:
            return StringIO()
        else:
            return open(aFileName,'wb')
    
    def getB64encoded(aString):
        return base64.b64encode(aString)
    
    def getB64decoded(aString):
        return base64.b64decode(aString)
  15. MichaelNo Gravatar Says:

    Thanks for your code…

  16. Dave PawsonNo Gravatar Says:

    Thanks. Clear and very usable.
    Much appreciated.

    Dave

  17. onlyoneNo Gravatar Says:

    Hey, this code does not work. has any one tested with md5sum the files before and after ?

    Thanks

  18. elibenNo Gravatar Says:

    onlyone,

    Can you show/link to a case where you thing it “does not work”? The decrypted file is identical to the input, as far as I have tested.

  19. onlyoneNo Gravatar Says:

    Hi Eliben, thanks for your response. in my case it is a network capture (pcap file with around 200MB)

    I’m doing this :

    random_bytes = os.urandom(16)
    encrypt_file(key=random_bytes, in_filename=file_to_encrypt, chunksize=268435456)
    decrypt_file(key=random_bytes,in_filename=file_to_encrypt+”.enc” ,out_filename=”/install/decrypt/”+file_to_encrypt, chunksize=268435456)
    # I know that os.random is not safe. also i have tried with a string generator that returns for example ‘K5TGZ57E0QHAQJOE’

    the decrypted does not match the md5sum from the original file. Plus after the decryption i cannot open the file on wireshark

    any ideias ?

  20. elibenNo Gravatar Says:

    onlyone,

    Sorry, I can’t reproduce the problem at all. I tried with your key, chunksize and some large files (100-300 MB) and the result is always byte-identical to the input. Do you have the same problem with small files (i.e. some text file)? I also think it would be better to move the discussion to email at this point.

  21. onlyoneNo Gravatar Says:

    Hey all, it works after all ! Sorry !

  22. DhanushNo Gravatar Says:

    doe this program encrypt the contents of a file or some random text?

  23. DhanushNo Gravatar Says:

    and in encryptor = AES.new(key, AES.MODE_CBC, iv)
    why is ‘iv’ for?

  24. elibenNo Gravatar Says:

    Dhanush,

    I don’t understand your first question. As for the second, there’s a section in the post above called A word about the initialization vector. For more details, Google “encryption initialization vector”.

  25. CubeXNo Gravatar Says:

    Hey, Im trying to run your code, but i cant get it done. My error:
    File “C:\Python32\lib\site-packages\Crypto\Cipher\blockalgo.py”, line 141, in __init__
    self._cipher = factory.new(key, *args, **kwargs)
    ValueError: IV must be 16 bytes long

    Methods I tried:
    password = ‘kitty’
    #key = hashlib.sha256(password.encode(encoding=’utf-8′)).digest()
    key =’0123456789abcdef’
    Neither key works, any thoughts?

  26. elibenNo Gravatar Says:

    CubeX,

    This code is for Python 2. It has to be adapted for Python 3 (for example, using bytes instead of strings, as in IV – b''.join ...)

  27. nikosNo Gravatar Says:

    Hi,

    I try to encrypt a 4096-bytes file in Python with pycrypto, write it to SD card
    and then read it from embedded board with C.

    From the C side I have used http://www.literatecode.com/aes256
    which works on 16-byte long array
    The source code is there and is not very large.

    both sides ECB mode

    key = “0123456789ABCDEFGHIJKLMNOPQRSTUV” #Use the same from the ‘other side’
    mode = AES.MODE_ECB

    encryptor = AES.new(key, mode) #Using pycrypt module
    encrypted = encryptor.encrypt(string16) # string16 has length 16
    ….

    But the is no compatibility.
    From the board side, the board can decode the board-encoded file
    The same from the PC-python side, PC-python can decode the PC-decoded file
    but board can not read the PC file!

    Any Idea what could be wrong?

    Regards
    Nikos

  28. elibenNo Gravatar Says:

    nikos,

    This is really a PyCrypto vs. that guy’s implementation question. You should make sure all the parameters are the same, padding issues (for a text shorter than the key) etc.

  29. BrianNo Gravatar Says:

    When I try NIST test vectors (http://csrc.nist.gov/archive/aes/rijndael/wsdindex.html), I can’t seem to get PyCrypto to pass them. I’m guessing there is an endianness rathole somewhere, but it seems to go beyond endian-flipping the input and the output to determine the incompatibility. Just curious if this is any concern, I’m guessing the security is uncompromised given decent chaining.

  30. azukiNo Gravatar Says:

    I know little about the inner workings of python and equally little about cryptography so I can’t say anything for sure but…

    1. Isn’t python random highly predictable? If we can anticipate the iv, we can derive some information about the plaintext, which isn’t secure.

    2. About the sha256… What I know is that normal key functions use sha256 thousands, if not hundred of thousands or even millions of times in order to make the password as hard to find via a dictionary attack as possible. With just one sha256, the attacker will find the password in no time even if it is max length

    3. There are PyCrypto functions that deal with both of these issues. Use these instead.

    4. This does nothing to prevent tampering.

  31. PlutonianNo Gravatar Says:

    I haven’t been able to figure out what the ‘[0]‘ at the end of the following two lines mean.

    if not out_filename:
    out_filename = os.path.splitext(in_filename)[0]

    with open(in_filename, ‘rb’) as infile:
    origsize = struct.unpack(‘<Q', infile.read(struct.calcsize('Q')))[0]

  32. elibenNo Gravatar Says:

    @Plutonian,

    Python list/tuple indexing.

  33. Kumar VarunNo Gravatar Says:

    Hi,

    i’m getting a trouble in switching form linux to windows.

    i used your encryption/decryption function in an application i developed on linux. it was working fine on ubuntu.

    but later i ported the same app on windows 7 and also took the Encrypted file (which i encrypted on Linux). Now when i’m running the app and trying to decrypt that Encrypted File. i’m getting a strange error.

    ValueError: Input strings must be a multiple of 16 in length

    Will it be a trouble to encrypt a file on linux and then trying to decrypt on Windows?

  34. Kumar VarunNo Gravatar Says:

    hey please discard my previous query i did a mistake. i did a small change in the code and instead of filename, i was passing file object.

    For my infile, i opened it in ‘r’ mode instead of ‘rb’, that caused this trouble.

    Apologize for my mistake.

  35. Colin MortimerNo Gravatar Says:

    I’m looking for a little support with the code from this page. Thanks in advance for the help and for the original code. I am implementing this with Python 3.x and while I have managed to get it working on some files it doesn’t work on all. It seems the issues I am getting occur with both large and small files. Some files I can encrypt and decrypt and the hash digest still matches, other it does not. When looking in to the files with a hex editor I can see that towards the end of the decrypted file the data does not match of the original file. In the ASCII view of the editor I see \x this and \x that instead of ASCII representations of characters. Does anybody have any idea why that may be? See below the encryption and decryption scripts I have been using, only slight modifications from the code listed here:

    import os
    import random
    import struct
    import sys
    import hashlib
    from Crypto.Cipher import AES
    
    def encrypt_file(password, in_filename, out_filename=None, chunksize=64*1024):
        """ Encrypts a file using AES (CBC mode) with the
            given key.
    
            key:
                The encryption key - a string that must be
                either 16, 24 or 32 bytes long. Longer keys
                are more secure.
    
            in_filename:
                Name of the input file
    
            out_filename:
                If None, '<in_filename>.enc' will be used.
    
            chunksize:
                Sets the size of the chunk which the function
                uses to read and encrypt the file. Larger chunk
                sizes can be faster for some files and machines.
                chunksize must be divisible by 16.
        """
    
        key = hashlib.sha256((password).encode('utf-8')).digest()
    
        if not out_filename:
            out_filename = in_filename + '.encrypted'
    
        iv = bytes([ random.randint(0, 0xFF) for i in range(16)])
        encryptor = AES.new(key, AES.MODE_CBC, iv)
        filesize = os.path.getsize(in_filename) 
    
        with open(in_filename, 'rb') as infile:
            with open(out_filename, 'wb') as outfile:
                outfile.write(struct.pack('<Q', filesize))
                outfile.write(iv)
    
                while True:
                    chunk = infile.read(chunksize)
                    if len(chunk) == 0:
                        break
                    elif len(chunk) % 16 != 0:
                        chunk = str(chunk)
                        chunk += ' ' * (16 - len(chunk) % 16)
    
                    outfile.write(encryptor.encrypt(chunk))
    
    def main():
        if len(sys.argv) < 3:
            print("Usage = script.py + password + <input file>")
            exit()
    
        encrypt_file(sys.argv[1], sys.argv[2])
    
    if __name__ == "__main__":
        main()

    and decrypt:

    import os
    import random
    import struct
    import sys
    import hashlib
    from Crypto.Cipher import AES
    
    def decrypt_file(password, in_filename, out_filename=None, chunksize=64*1024):
        """ Decrypts a file using AES (CBC mode) with the
            given key. Parameters are similar to encrypt_file,
            with one difference: out_filename, if not supplied
            will be in_filename without its last extension
            (i.e. if in_filename is 'aaa.zip.enc' then
            out_filename will be 'aaa.zip')
        """
    
        key = hashlib.sha256((password).encode('utf-8')).digest()
    
        if not out_filename:
            out_filename = os.path.splitext(in_filename)[0]
    
        with open(in_filename, 'rb') as infile:
            origsize = struct.unpack('<Q', infile.read(struct.calcsize('Q')))[0]
            iv = infile.read(16)
            decryptor = AES.new(key, AES.MODE_CBC, iv)
    
            with open(out_filename, 'wb') as outfile:
                while True:
                    chunk = infile.read(chunksize)
                    if len(chunk) == 0:
                        break
                    outfile.write(decryptor.decrypt(chunk))
    
                outfile.truncate(origsize)
    
    def main():
        if len(sys.argv) < 3:
            print("Usage = script.py + password + <input file>")
            exit()
    
        decrypt_file(sys.argv[1], sys.argv[2])
    
    if __name__ == "__main__":
        main()

    What have I done wrong?

    Thanks again.

  36. mrabNo Gravatar Says:

    @Colin Mortimer: Your ‘encrypt_file’ function has these lines:

    chunk = str(chunk)
    chunk += ' ' * (16 - len(chunk) % 16)

    Try sticking with bytes instead:

    chunk += b' ' * (16 - len(chunk) % 16)
  37. Colin MortimerNo Gravatar Says:

    @mrab: You sir are a legend. Thanks for your response! I have tested with that change across both large and small files and when checking the hash of the files am ending up with an exact copy of the original once decrypted. Thanks again. Colin

  38. Matěj CeplNo Gravatar Says:

    Allow me small unashamed self-promotion. The similar things you can do with pycrypto (although, I think in better way … I prefer using highly-tested and engineered-by-true-paranoids library to pycrypto) you can do with m2crypto. To move it to the current age, I have tried to create 100% faithful mirror of SVN in git: https://luther.ceplovi.cz/git/m2crypto.git/ (mirror on https://github.com/mcepl/m2crypto).

    Also in the branch python 3 I have my current state of effort to create port to Python 3. import works, but now I beat up the code to submission, so that the testing suite succeeds. Patches and pull requests are more than welcome!

    Matěj

  39. naliNo Gravatar Says:

    Hi eliben,

    I used your code to encrypt a file which has 16 byte aligned 0xFF stream at the end. So since 16 byte of 0xFF is repeating, shouldn’t I get a repetitive stream at the end of encrypted file? But I don’t. What might be the issue?

  40. Jaime GagoNo Gravatar Says:

    Great tutorial for the newbies to PyCrypto like me, and the 2 functions are working A-OK, thanks a lot!

  41. ColinNo Gravatar Says:

    Nali: I think the reason that you are seeing that is due to the fact that the script uses the CBC (Cipher Block Chaining) method. This means that each block to be encrypted is manipulated first with details of the last encrypted block. This makes decryption reliant upon all blocks being present and also prevents repetitive strings in plain text blocks creating repetitive strings in the cipher text blocks.

  42. GiriNo Gravatar Says:

    When I tried to implement the same, I am getting the following error:

    origsize = struct.unpack('<Q', infile.read(struct.calcsize('<Q')))[0]
    struct.error: unpack requires a string argument of length 8

    Where I could have gone wrong?

  43. ColinNo Gravatar Says:

    Giri,

    Have a look at the second (‘<Q') in your code and then look at the example code again. There doesnt need to be the < on the second Q.

    Colin

  44. MaaikeNo Gravatar Says:

    I’ve tried to implement this in my python 2.7 but i keep getting an error. It says SyntaxError: non-keyword arg after keyword arg. The key i have chosen is the following: “nNuErEBt6xoPF5P2″. What am i doing wrong?

    Maaike

  45. HenrikNo Gravatar Says:

    Hey im getting the following error:

    Traceback (most recent call last):
    File “./backup.py”, line 114, in
    decrypt_file(key, temp_crypted_filename)
    File “./backup.py”, line 63, in decrypt_file
    origsize = struct.unpack(‘<Q', infile.read(struct.calcsize('Q')))[0]
    struct.error: unpack requires a string argument of length 8

    Anybody an idea?

  46. harish barvekarNo Gravatar Says:

    This error is coming when decrypting . Plz help me….

    outfile.write(decryptor.decrypt(chunk))
    ValueError: Input strings must be a multiple of 16 in length

  47. katieNo Gravatar Says:

    Hello, this is a great tutorial, however I have what might seem to you as a dumb question, does my data always need to be 16 bytes only? What happens if it is >16 bytes; do I need to feed AES by separating it into chunks of 16 bytes or will the AES take care of it. Please shed some light on it. Reply very much appreciated.

  48. elibenNo Gravatar Says:

    @katie,

    The examples in this post show arbitrary-sized data, so I’m not sure what you’re asking.

  49. Brando MirandaNo Gravatar Says:

    I was wondering if this code worked for arbitrary lengths of files.

    Since you are using the first 8 bytes of the file that you encrypt to store the length, does that mean the decrypt function works still with arbitrary lengths of files?

    Thanks in advance!

  50. elibenNo Gravatar Says:

    @Brando,

    Yes, it works for arbitrary sizes.

  51. Brando MirandaNo Gravatar Says:

    I have another quick question, why are you reading a file in chunks versus just calling .read() and then encrypting the whole thing?

    Thanks in advance again! :)

  52. elibenNo Gravatar Says:

    @Brando,

    The input file may be huge, so it’s a common idiom not to read it wholly into memory. There may not be enough memory. If you’re sure the file is fairly small, you can do it in a single read, although in Python at least this will only save you a couple of lines of code.

  53. BhanumathiNo Gravatar Says:

    i need a python program that should encrypt an audio file….. the encrypted audio file has to be played(mean the whole audio file must be split into small chunks of audio.. then combine the spilted audio chunks after shuffle.) mean original file =original song.. encrypted file =mixed chunks of original and also we have to decrypt that to original song….. I tried lot but it encrypting audio header also hence the encrypted audio is not in playable format plzzzz help me its part of my project in our college

  54. Deepal JayasekaraNo Gravatar Says:

    An amazing blog!! You saved my life :) :)

  55. henkNo Gravatar Says:

    Your post provided a good introduction into using encryption in Python. I used your code and it worked like a charm. During the google search I found another very interesting post that I would like to share with you. The post provides information about how to make the keywords more secure. I will copy the link and I suggest you read the part about “key stretching

    http://bityard.blogspot.de/2012/08/symmetric-crypto-with-pycrypto-part-3.html.

    I think I will integrate this with the code you supplied here.

    Thanks for the for the time you invested in writing this post. it was very very helpful.

  56. MattNo Gravatar Says:

    Anyway you know how to make an IV like C# does with Rfc2898DeriveBytes(#key#, salt).
    I need it to work with
    aes = Crypto.Cipher.AES.new(‘jxkLb39$Vxp3948#lxMx’,Crypto.Cipher.AES.MODE_CBC,salt)

    My josn file that I need to access is being encrypted by .NET using that weird IV…

    any clues?

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)