Tags Perl
I now have a pretty interesting task at work, and I intend to do it in Perl.

In the core of the task is a binary file. The file is a stream of bits, and the program should find interesting information in it. Now, it wouldn't be difficult if not two facts:

1) The bitstream is not byte-aligned. Meaning that an interesting chunk may start at the 19th bit of the file, or 2nd bit of the file, or whatever. So, handling the file as an array of integers is impossible, and as array of chars is extremely difficult. 2) The file may get very large - up to hundreds of megabytes.

Now, the first issue has a pretty elegant solution, which would work if the second issue wouldn't exist. I'll explain:

I can read the bitstream and unpack() it into a nice string of 1s and 0s, and then work on it. Perl is quite good in searching patterns in nice strings, so all is great. However, think of the second issue now: suppose I have a 100 MB file. I unpack it, so each bit now takes 8 bits (one char in the string). 800 MB - pooh ! I'm out of memory, not to speak of performance.

I'm trying to get some advice on Perlmonks, but for the moment nothing good was suggested. I'll probably have to go with the solution, but instead of reading the whole file into a string, I'd read it in chunks to a buffer. It promises to be somewhat messy, but I don't see a better way at the moment.


comments powered by Disqus