AES encryption of files in Python with PyCrypto
June 25th, 2010 at 6:26 pmThe PyCrypto module seems to provide all one needs for employing strong cryptography in a program. It wraps a highly optimized C implementation of many popular encryption algorithms with a Python interface. PyCrypto can be built from source on Linux, and Windows binaries for various versions of Python 2.x were kindly made available by Michael Foord on this page.
My only gripe with PyCrypto is its documentation. The auto-generated API doc is next to useless, and this overview is somewhat dated and didn’t address the questions I had about the module. It isn’t surprising that a few modules were created just to provide simpler and better documented wrappers around PyCrypto.
In this article I want to present how to use PyCrypto for simple symmetric encryption and decryption of files using the AES algorithm.
Simple AES encryption
Here’s how one can encrypt a string with AES:
from Crypto.Cipher import AES
key = '0123456789abcdef'
mode = AES.MODE_CBC
encryptor = AES.new(key, mode)
text = 'j' * 64 + 'i' * 128
ciphertext = encryptor.encrypt(text)
Since the PyCrypto block-level encryption API is very low-level, it expects your key to be either 16, 24 or 32 bytes long (for AES-128, AES-196 and AES-256, respectively). The longer the key, the stronger the encryption.
Having keys of exact length isn’t very convenient, as you sometimes want to use some mnemonic password for the key. In this case I recommend picking a password and then using the SHA-256 digest algorithm from hashlib to generate a 32-byte key from it. Just replace the assignment to key in the code above with:
import hashlib
password = 'kitty'
key = hashlib.sha256(password).digest()
Keep in mind that this 32-byte key only has as much entropy as your original password. So be wary of brute-force password guessing, and pick a relatively strong password (kitty probably won’t do). What’s useful about this technique is that you don’t have to worry about manually padding your password – SHA-256 will scramble a 32-byte block out of any password for you.
The next thing the code does is set the block mode of AES. I won’t get into all the details, but unless you have some special requirements, CBC should be good enough for you.
We create a new AES encryptor object with Crypto.Cipher.AES.new, and give it the encryption key and the mode. Next comes the encryption itself. Again, since the API is low-level, the encrypt method expects your input to consist of an integral number of 16-byte blocks (16 is the size of the basic AES block).
The encryptor object has an internal state when used in the CBC mode, so if you try to encrypt the same text with the same encryptor once again – you will get different results. So be careful to create a fresh AES encryptor object for any encryption/decryption job.
Decryption
To decrypt the ciphertext, simply add:
decryptor = AES.new(key, mode)
plain = decryptor.decrypt(ciphertext)
And you get your plaintext back again.
A word about the initialization vector
The initialization vector (IV) is an important part of block encryption algorithms that work in chained modes like CBC. For the simple example above I’ve ignored the IV, but for a more serious application this is a grave mistake. I don’t want to get too deep into cryptographic theory here, but it suffices to say that the IV is as important as the salt in hashed passwords, and the lack of correct IV usage led to the cracking of the WEP encryption for wireless LAN.
PyCrypto allows one to pass an IV into the AES.new creator function. For maximal security, the IV should be randomly generated for every new encryption and can be stored together with the ciphertext. Knowledge of the IV won’t help the attacker crack your encryption. What can help him, however, is your reusing the same IV with the same encryption key for multiple encryptions.
Encrypting and decrypting files
The following function encrypts a file of any size. It makes sure to pad the file to a multiple of the AES block length , and also handles the random generation of IV.
import os, random, struct
from Crypto.Cipher import AES
def encrypt_file(key, in_filename, out_filename=None, chunksize=64*1024):
""" Encrypts a file using AES (CBC mode) with the
given key.
key:
The encryption key - a string that must be
either 16, 24 or 32 bytes long. Longer keys
are more secure.
in_filename:
Name of the input file
out_filename:
If None, '<in_filename>.enc' will be used.
chunksize:
Sets the size of the chunk which the function
uses to read and encrypt the file. Larger chunk
sizes can be faster for some files and machines.
chunksize must be divisible by 16.
"""
if not out_filename:
out_filename = in_filename + '.enc'
iv = ''.join(chr(random.randint(0, 0xFF)) for i in range(16))
encryptor = AES.new(key, AES.MODE_CBC, iv)
filesize = os.path.getsize(in_filename)
with open(in_filename, 'rb') as infile:
with open(out_filename, 'wb') as outfile:
outfile.write(struct.pack('<Q', filesize))
outfile.write(iv)
while True:
chunk = infile.read(chunksize)
if len(chunk) == 0:
break
elif len(chunk) % 16 != 0:
chunk += ' ' * (16 - len(chunk) % 16)
outfile.write(encryptor.encrypt(chunk))
Since it might have to pad the file to fit into a multiple of 16, the function saves the original file size in the first 8 bytes of the output file (more precisely, the first sizeof(long long) bytes). It randomly generates a 16-byte IV and stores it in the file as well. Then, it reads the input file chunk by chunk (with chunk size configurable), encrypts the chunk and writes it to the output. The last chunk is padded with spaces, if required.
Working in chunks makes sure that large files can be efficiently processed without reading them wholly into memory. For example, with the default chunk size it takes about 1.2 seconds on my computer to encrypt a 50MB file. PyCrypto is fast!
Decrypting the file can be done with:
def decrypt_file(key, in_filename, out_filename=None, chunksize=24*1024):
""" Decrypts a file using AES (CBC mode) with the
given key. Parameters are similar to encrypt_file,
with one difference: out_filename, if not supplied
will be in_filename without its last extension
(i.e. if in_filename is 'aaa.zip.enc' then
out_filename will be 'aaa.zip')
"""
if not out_filename:
out_filename = os.path.splitext(in_filename)[0]
with open(in_filename, 'rb') as infile:
origsize = struct.unpack('<Q', infile.read(struct.calcsize('Q')))[0]
iv = infile.read(16)
decryptor = AES.new(key, AES.MODE_CBC, iv)
with open(out_filename, 'wb') as outfile:
while True:
chunk = infile.read(chunksize)
if len(chunk) == 0:
break
outfile.write(decryptor.decrypt(chunk))
outfile.truncate(origsize)
First the original size of the file is read from the first 8 bytes of the encrypted file. The IV is read next to correctly initialize the AES object. Then the file is decrypted in chunks, and finally it’s truncated to the original size, so the padding is thrown out.
Related posts:

June 25th, 2010 at 21:00
I agree that the docs were confusing at best. I spent several days working with the people on the PyCrypto mailing list to figure out how to decrypt something that should have taken me only a couple of minutes to figure out. Still, the guys on the list were quite helpful.
Thanks for doing a little more documentation on this cool module here!
- Mike
June 26th, 2010 at 02:27
Interesting post
Was looking a bit on it earlier, and typed down the basics. Your encrypt / decrypt function is way more advanced, though
Agree with the lack of documentation, needed some guessing and experimenting to get it to work.
Going to post my notes here, hope you’ll find them useful, maybe get some new ideas
My notes (showing syntax of random pool, sha256 and rsa):
—————Note: might not be completely correct, but seems to work —-
Random pool:
Crypto.Util.randpool
rp = randpool.RandomPool(numbytes) # numbytes = number of bytes of randomness in pool. Default 160
rp.entropy # entropy left in class
rp.add_event() # adds time since last call in the entropy – call often
rp.stir() # Mix up the randomness pool
random = rp.get_bytes(int) # get int number of random bytes
SHA256:
Crypto.Hash.SHA256
sha = SHA256.new()
sha.update(plain)
digest = sha.digest()
RSA
Crypto.PublicKey.RSA
priv = RSA.Generate(bits=2048, rp.get_bytes) #Generate keypair
pub = keys.publickey()
import pickle
strpriv = pickle.dumps(priv) #Save keypair to string
priv = pickle.loads(strpriv)
strpub = pickle.dumps(pub) #Save pubkey to string
pub = pickle.loads(strpub)
#Encrypt data with pubkey, need privkey to decrypt
enc = pub.encrypt(smalldata, “”) # len(Smalldata) <= keysize
smalldata = priv.decrypt(enc)
sign = priv.sign(hash, "")
verified = pub.verify(hash, sign)
June 26th, 2010 at 02:42
Two small typos in RSA part:
priv = RSA.generate(2048, rp.get_bytes) # small g
pub = priv.publickey() #priv, not keys
June 26th, 2010 at 03:26
Strangeness. I only have the simple AES encryption running on my system (OS X 10.6.3, with Python 2.6.5 with the latest easy_install pycrypto applied).
I ran the script with -i option to inspect the state and play around and get:
>>> help(ciphertext)
problem in <???c^?9p?Y?,?)E????rzݴe?1Į@?A
i??J5_~-?o6b[?fnP?֎?;D?67?҆3sB
??8q??U??F?G???Y?{?Ni?t3"??H?{?DŽ??#*?????/??גK?(?c('??Ĺ?????b???&#C?Z}`?3<??y?ln
aB&????sي????
??q ? – : __import__() argument 1 must be string without null bytes, not str
Then my terminal sometimes starts printing garbage UNIX characters as if viewing binary data and needs to be reset.
That can’t be good
I changed it to print values and I encounter the same, the shell is switched font as if printing binary characters.
Will need to play some more.
June 26th, 2010 at 06:43
@Terrasque,
Thanks for the notes. I notice you use RandomPool – note that on the front page of pycrypto.org they warn in bold about not using this for random numbers.
@PN,
Don’t try to print binary data on a terminal. You can alwas use the
encode('hex')method of a string to print it in hexadecimal.June 28th, 2010 at 10:04
Ooh, that wasn’t there when I last checked the page! Thanks
*updates notes*
Seems like os.urandom is a good drop-in replacement.
“RSA.generate(int bits(2048), os.urandom)” should work.
From urandom documentation: “The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation”
June 29th, 2010 at 17:00
Thanks for the “AES encryption of files in Python with PyCrypto”, seriously it helped alot for a newbie like me. I have been trying to XOR (using PyCrypto) two random numbers Rand-1 and Rand-2 and check whether I can get back the Rand-2 or not but, the result doesnt resemble with each other… I think i am coding it wornlgy…kindly correct me
from Crypto.Util import randpool
from Crypto.Cipher import XOR
one = randpool.RandomPool()
print “Rand-1 :: “,one.get_bytes(16)
two = randpool.RandomPool()
print “Rand-2 :: “,two.get_bytes(16)
objXOR = XOR.new(one.get_bytes(16)) # setting the key as Rand-1
xored_value = objXOR.encrypt(two.get_bytes(16))
print “Rand-1 xor Rand-2 :: “, xored_value
print “Decrypted text : “, objXOR.decrypt(xored_value)
after decryption the decrypted xored_value and Rand-2(two) doesnt match up, which it SHOULD be…here are the results..
Rand-1 :: †X@óÂ’ÕÅ¡ävòU‰Ú
Rand-2 :: ニラQ%マÒ0â4ᅠà>BÎz
Rand-1 xor Rand-2 :: Áø‡‹¢j4A
h‹Í7ÞrÁ
Decrypted text : ˿÷
ЍᆴÛòU33ラeᄇヤ
a little help will be appreciated !! where I went wrong in the coding….??
July 17th, 2010 at 20:15
thanks a lot, very helpful , i was looking for this..i implemented AES encryption in python depending on a tutorial and some implementations in c# and php before i know about this module, but this one works very fast….thanks again.
July 28th, 2010 at 14:22
Good article man Thanks
August 2nd, 2010 at 16:56
FANTASTIC. You are a life-saver!
October 22nd, 2010 at 22:18
Thank you so much for this!!!!!!
December 14th, 2010 at 14:22
thank you man! that was very helpful!
thanks!
January 7th, 2011 at 23:31
Brilliant! Thank you very much for posting this. It saved me tons of time.
May 26th, 2011 at 18:26
I was looking to implement reasonable encryption for a string that I’m passing over a wire. It’s probably overkill, so I was looking for something quick to implement. I found your code, and figured I could modify it to use StringIO pretty easily, so that’s what I did. I like the idea of being able to pass a file or a StringIO to the same encryption / decryption methods.
Thanks for the post!
–J.
March 4th, 2012 at 12:20
Thanks for your code…
June 27th, 2012 at 09:55
Thanks. Clear and very usable.
Much appreciated.
Dave
June 30th, 2012 at 10:23
Hey, this code does not work. has any one tested with md5sum the files before and after ?
Thanks
June 30th, 2012 at 11:59
onlyone,
Can you show/link to a case where you thing it “does not work”? The decrypted file is identical to the input, as far as I have tested.
June 30th, 2012 at 12:32
Hi Eliben, thanks for your response. in my case it is a network capture (pcap file with around 200MB)
I’m doing this :
random_bytes = os.urandom(16)
encrypt_file(key=random_bytes, in_filename=file_to_encrypt, chunksize=268435456)
decrypt_file(key=random_bytes,in_filename=file_to_encrypt+”.enc” ,out_filename=”/install/decrypt/”+file_to_encrypt, chunksize=268435456)
# I know that os.random is not safe. also i have tried with a string generator that returns for example ‘K5TGZ57E0QHAQJOE’
the decrypted does not match the md5sum from the original file. Plus after the decryption i cannot open the file on wireshark
any ideias ?
June 30th, 2012 at 14:34
onlyone,
Sorry, I can’t reproduce the problem at all. I tried with your key, chunksize and some large files (100-300 MB) and the result is always byte-identical to the input. Do you have the same problem with small files (i.e. some text file)? I also think it would be better to move the discussion to email at this point.
July 1st, 2012 at 11:38
Hey all, it works after all ! Sorry !
July 11th, 2012 at 13:35
doe this program encrypt the contents of a file or some random text?
July 11th, 2012 at 14:36
and in encryptor = AES.new(key, AES.MODE_CBC, iv)
why is ‘iv’ for?
July 12th, 2012 at 09:12
Dhanush,
I don’t understand your first question. As for the second, there’s a section in the post above called A word about the initialization vector. For more details, Google “encryption initialization vector”.
July 27th, 2012 at 23:18
Hey, Im trying to run your code, but i cant get it done. My error:
File “C:\Python32\lib\site-packages\Crypto\Cipher\blockalgo.py”, line 141, in __init__
self._cipher = factory.new(key, *args, **kwargs)
ValueError: IV must be 16 bytes long
Methods I tried:
password = ‘kitty’
#key = hashlib.sha256(password.encode(encoding=’utf-8′)).digest()
key =’0123456789abcdef’
Neither key works, any thoughts?
July 28th, 2012 at 06:43
CubeX,
This code is for Python 2. It has to be adapted for Python 3 (for example, using bytes instead of strings, as in IV –
b''.join ...)July 28th, 2012 at 20:53
Hi,
I try to encrypt a 4096-bytes file in Python with pycrypto, write it to SD card
and then read it from embedded board with C.
From the C side I have used http://www.literatecode.com/aes256
which works on 16-byte long array
The source code is there and is not very large.
both sides ECB mode
key = “0123456789ABCDEFGHIJKLMNOPQRSTUV” #Use the same from the ‘other side’
mode = AES.MODE_ECB
…
encryptor = AES.new(key, mode) #Using pycrypt module
encrypted = encryptor.encrypt(string16) # string16 has length 16
….
But the is no compatibility.
From the board side, the board can decode the board-encoded file
The same from the PC-python side, PC-python can decode the PC-decoded file
but board can not read the PC file!
Any Idea what could be wrong?
Regards
Nikos
July 29th, 2012 at 04:58
nikos,
This is really a PyCrypto vs. that guy’s implementation question. You should make sure all the parameters are the same, padding issues (for a text shorter than the key) etc.
September 11th, 2012 at 22:30
When I try NIST test vectors (http://csrc.nist.gov/archive/aes/rijndael/wsdindex.html), I can’t seem to get PyCrypto to pass them. I’m guessing there is an endianness rathole somewhere, but it seems to go beyond endian-flipping the input and the output to determine the incompatibility. Just curious if this is any concern, I’m guessing the security is uncompromised given decent chaining.
December 6th, 2012 at 11:03
I know little about the inner workings of python and equally little about cryptography so I can’t say anything for sure but…
1. Isn’t python random highly predictable? If we can anticipate the iv, we can derive some information about the plaintext, which isn’t secure.
2. About the sha256… What I know is that normal key functions use sha256 thousands, if not hundred of thousands or even millions of times in order to make the password as hard to find via a dictionary attack as possible. With just one sha256, the attacker will find the password in no time even if it is max length
3. There are PyCrypto functions that deal with both of these issues. Use these instead.
4. This does nothing to prevent tampering.
January 3rd, 2013 at 12:25
I haven’t been able to figure out what the ‘[0]‘ at the end of the following two lines mean.
if not out_filename:
out_filename = os.path.splitext(in_filename)[0]
with open(in_filename, ‘rb’) as infile:
origsize = struct.unpack(‘<Q', infile.read(struct.calcsize('Q')))[0]
January 4th, 2013 at 10:32
@Plutonian,
Python list/tuple indexing.
January 23rd, 2013 at 02:10
Hi,
i’m getting a trouble in switching form linux to windows.
i used your encryption/decryption function in an application i developed on linux. it was working fine on ubuntu.
but later i ported the same app on windows 7 and also took the Encrypted file (which i encrypted on Linux). Now when i’m running the app and trying to decrypt that Encrypted File. i’m getting a strange error.
Will it be a trouble to encrypt a file on linux and then trying to decrypt on Windows?
January 25th, 2013 at 02:24
hey please discard my previous query i did a mistake. i did a small change in the code and instead of filename, i was passing file object.
For my infile, i opened it in ‘r’ mode instead of ‘rb’, that caused this trouble.
Apologize for my mistake.
January 28th, 2013 at 02:44
I’m looking for a little support with the code from this page. Thanks in advance for the help and for the original code. I am implementing this with Python 3.x and while I have managed to get it working on some files it doesn’t work on all. It seems the issues I am getting occur with both large and small files. Some files I can encrypt and decrypt and the hash digest still matches, other it does not. When looking in to the files with a hex editor I can see that towards the end of the decrypted file the data does not match of the original file. In the ASCII view of the editor I see \x this and \x that instead of ASCII representations of characters. Does anybody have any idea why that may be? See below the encryption and decryption scripts I have been using, only slight modifications from the code listed here:
and decrypt:
What have I done wrong?
Thanks again.
February 9th, 2013 at 18:47
@Colin Mortimer: Your ‘encrypt_file’ function has these lines:
Try sticking with bytes instead:
February 13th, 2013 at 06:20
@mrab: You sir are a legend. Thanks for your response! I have tested with that change across both large and small files and when checking the hash of the files am ending up with an exact copy of the original once decrypted. Thanks again. Colin
February 17th, 2013 at 11:19
Allow me small unashamed self-promotion. The similar things you can do with pycrypto (although, I think in better way … I prefer using highly-tested and engineered-by-true-paranoids library to pycrypto) you can do with m2crypto. To move it to the current age, I have tried to create 100% faithful mirror of SVN in git: https://luther.ceplovi.cz/git/m2crypto.git/ (mirror on https://github.com/mcepl/m2crypto).
Also in the branch python 3 I have my current state of effort to create port to Python 3. import works, but now I beat up the code to submission, so that the testing suite succeeds. Patches and pull requests are more than welcome!
Matěj
February 23rd, 2013 at 08:48
Hi eliben,
I used your code to encrypt a file which has 16 byte aligned 0xFF stream at the end. So since 16 byte of 0xFF is repeating, shouldn’t I get a repetitive stream at the end of encrypted file? But I don’t. What might be the issue?
April 14th, 2013 at 23:32
Great tutorial for the newbies to PyCrypto like me, and the 2 functions are working A-OK, thanks a lot!
April 15th, 2013 at 01:49
Nali: I think the reason that you are seeing that is due to the fact that the script uses the CBC (Cipher Block Chaining) method. This means that each block to be encrypted is manipulated first with details of the last encrypted block. This makes decryption reliant upon all blocks being present and also prevents repetitive strings in plain text blocks creating repetitive strings in the cipher text blocks.
April 29th, 2013 at 20:58
When I tried to implement the same, I am getting the following error:
origsize = struct.unpack('<Q', infile.read(struct.calcsize('<Q')))[0]struct.error: unpack requires a string argument of length 8
Where I could have gone wrong?
May 31st, 2013 at 01:12
Giri,
Have a look at the second (‘<Q') in your code and then look at the example code again. There doesnt need to be the < on the second Q.
Colin