Faking standard C header files for pycparser

May 22nd, 2009 at 9:47 am

My Python-based parser and AST generator for ANSI C – pycparser has been downloaded more than 500 times since January, when version 1.03 was released.

From time to time I even get an occasional fan-mail with either feedback or complaints. Though there are much fewer complaints and bug reports than I’d expect, the most common issue that comes up is standard C include headers pycparser is having trouble with.

I’ve written before about the context sensitivity of C, which means that all the headers a C file includes must be parsed before it, in order to find out which identifiers are types. Since most C code uses at least some standard headers (stdio, string and stdlib are probably the most popular), pycparser needs to be able to parse those.

But this is often a problem, since each compiler tool-suite creates its own standard headers, with its own idiosyncrasies, compiler-extensions and weird definitions. pycparser successfully parses the headers supplied with the MinGW GCC port, but it’s a problem for me to make sure it can handle all the varieties of standard headers out there.

So, the other day I had this idea – why won’t I create "fake" standard C header files, just for pycparser. After all, it doesn’t need much out of them – only to know which identifiers are types. It doesn’t care, for example, about declarations of functions, because in C function calls are unambiguous and can be tagged as such without seeing the function declaration (verifying the amount and types of arguments is another matter, but pycparser doesn’t do that anyway).

So, using pycparser itself I’ve parsed the standard header files from MinGW, and detected all typedef statements. Then, I generated fake typedef statements out of them into a single header file, and added an include of this file into empty .h files named exactly like all the standard headers.

The same was done with all the #define constants, since cpp needs those to operate correctly.

Note that I didn’t have to keep the full typedef for each definition, just a fake:

typedef int FILE;

This is because pycparser really doesn’t care about the type FILE was defined to be, it only needs to know that FILE is a type.

The directory with the fake include files was released in utils/fake_libc_include with pycparser version 1.04, and can also be accessed directly from the pycparser SVN. With it in place, pycparser no longer depends on real standard C header files, and also runs faster because the fake includes are much smaller and simpler.

Related posts:

  1. pycparser now supports C99
  2. pycparser v1.0 is out!
  3. On parsing the C standard library headers
  4. From C to AST and back to C with pycparser
  5. pycparser v1.06 released

7 Responses to “Faking standard C header files for pycparser”

  1. PierreNo Gravatar Says:

    Hi Eli,

    I downloaded version 1.04 and the fake headers are not in the util directory.
    You might want to re-release to fix that.
    Otherwise I got a simple test running which is nice.
    Another issue I have is that cpp_args expects a string and I can’t provide a list of -I directives. If I do, the strings are used a single argument by cpp so that it does not work.

    Thanks a lot for writing this cool parser, I like the simple visitors and the fact that it provides real C parsing with Pure Python.

  2. elibenNo Gravatar Says:

    @Pierre,

    Thanks for letting me know about the missing headers. I’ve fixed it now.
    Regarding cpp_args, can’t you just join your -I strings?

  3. Sam FalknerNo Gravatar Says:

    @eliben, you can’t just join the -I strings. Look at parse_file, and how it takes cpp_args as a single string, and adds it as one element to the path_list list. If you give cpp_args multiple args, you end up with:

    ['cpp', '-Ifoo -Ibar', 'filename']

    rather than

    ['cpp', '-Ifoo', '-Ibar', 'filename']

    You might want to either call split() on cpp_args, or have cpp_args be passed in as a list.

    Does that make sense?

    Oh, and thanks for the awesome pycparser! :-)

  4. Sam FalknerNo Gravatar Says:

    @eliben, here’s a patch. I hacked in the solution, and then tried from memory to get the file back to the original, so it may be a little off, but this is the basic idea:

    --- __init__.py-old	2009-10-14 16:33:58.000000000 -0600
    +++ __init__.py	2009-10-14 16:30:16.000000000 -0600
    @@ -46,7 +46,7 @@
         """
         if use_cpp:
             path_list = [cpp_path]
    -        if cpp_args != '': path_list += [cpp_args]
    +        if cpp_args != '': path_list += cpp_args.split()
             path_list += [filename]
    
             # Note the use of universal_newlines to treat all newlines

    With this, I can pass cpp_args=’-Ifoo -Ibar’ and it works fine now.

    Of course, you might rather do it another way; whatever works!

    And thanks again. I really enjoy pycparser.

  5. elibenNo Gravatar Says:

    Sam,

    I’ve uploaded version 1.05 where this is fixed. Now you can pass a list to that function and it will work as expected. Thanks for raising the issue.

  6. Sam FalknerNo Gravatar Says:

    @eliben — thanks!

    - Sam

  7. marsNo Gravatar Says:

    I have used pycparser to parse my C code, which is written to verify HW functions on FPGA, to generate test reports. This program is really awesome!! Thanks!

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)