pycparser

pycparser is a complete parser for the C language, written in pure Python. It is a module designed to be easily integrated into applications that need to parse C source code. pycparser is licensed with the LGPL, and its source code (as well as official releases) is available from its Google Code page.

Here’s an example of using pycparser:

parser = CParser()

buf = '''
  static void foo(int k)
  {
      j = p && r || q;
      return j;
  }
'''

t = parser.parse(buf, 'x.c')
t.show()

Prints:

FileAST:
  FuncDef:
    Decl: foo, [], ['static']
      FuncDecl:
        ParamList:
          Decl: k, [], []
            TypeDecl: k, []
              IdentifierType: ['int']
        TypeDecl: foo, []
          IdentifierType: ['void']
    Compound:
      Assignment: =
        ID: j
        BinaryOp: ||
          BinaryOp: &&
            ID: p
            ID: r
          ID: q
      Return:
        ID: j

What is it good for?

Anything that needs C code to be parsed. Personally, I’m using pycparser to write a compiler from C into a proprietary assembly language. I’ve also used it for writing C code "comprehension helpers" – tracking assignments to global variables and other variables that depend upon them throughout a code base.

But I can imagine other interesting uses for it – writing semantic analysis tools, static checkers, experimenting with modifying C’s syntax. pycparser is unique in the sense that it’s written in pure Python – a very high level language that’s easy to experiment with and tweak. To people familiar with Lex and Yacc, pycparser’s code will be simple to understand.

Which version of C does pycparser support?

At the moment, pycparser supports ANSI/ISO C89, the language described by Kernighan and Ritchie in "The C Programming language, 2nd edition" (K&R2), with only selected extensions from C99. The currently supported C99 features are:

  • Allowing a comma after the last value in an enumeration list

pycparser doesn’t support any GCC extensions.

What grammar does pycparser follow?

pycparser very closely follows the ANSI C grammar provided in the end of K&R2. Listings of this grammar (often in Yacc syntax) can be easily found by a simple web search. Google for ansi c grammar to get started.

Download and support

To download pycparser as an installable Python module, submit bug reports or see the source code, this is the address: http://code.google.com/p/pycparser/