My Python-based parser and AST generator for ANSI C – pycparser has been downloaded more than 500 times since January, when version 1.03 was released.
From time to time I even get an occasional fan-mail with either feedback or complaints. Though there are much fewer complaints and bug reports than I’d expect, the most common issue that comes up is standard C include headers pycparser is having trouble with.
I’ve written before about the context sensitivity of C, which means that all the headers a C file includes must be parsed before it, in order to find out which identifiers are types. Since most C code uses at least some standard headers (stdio, string and stdlib are probably the most popular), pycparser needs to be able to parse those.
But this is often a problem, since each compiler tool-suite creates its own standard headers, with its own idiosyncrasies, compiler-extensions and weird definitions. pycparser successfully parses the headers supplied with the MinGW GCC port, but it’s a problem for me to make sure it can handle all the varieties of standard headers out there.
So, the other day I had this idea – why won’t I create "fake" standard C header files, just for pycparser. After all, it doesn’t need much out of them – only to know which identifiers are types. It doesn’t care, for example, about declarations of functions, because in C function calls are unambiguous and can be tagged as such without seeing the function declaration (verifying the amount and types of arguments is another matter, but pycparser doesn’t do that anyway).
So, using pycparser itself I’ve parsed the standard header files from MinGW, and detected all typedef statements. Then, I generated fake typedef statements out of them into a single header file, and added an include of this file into empty .h files named exactly like all the standard headers.
The same was done with all the #define constants, since cpp needs those to operate correctly.
Note that I didn’t have to keep the full typedef for each definition, just a fake:
typedef int FILE;
This is because pycparser really doesn’t care about the type FILE was defined to be, it only needs to know that FILE is a type.
The directory with the fake include files was released in utils/fake_libc_include with pycparser version 1.04, and can also be accessed directly from the pycparser SVN. With it in place, pycparser no longer depends on real standard C header files, and also runs faster because the fake includes are much smaller and simpler.