A few minutes ago I've tagged pycparser v2.10 and pushed the release to PyPI.

The biggest feature in this release is the ability to handle even darker context-sensitive corners of C. For example, consider this snippet of C code, taken from my article on context sensitivity of C's grammar:

typedef int AA;

void foo()
  AA AA;
  int BB = AA * 2;

The line saying AA AA is defining a new variable named AA of type AA. From that point on, within foo, the variable shadows the type. pycparser now parses this correctly:

  Typedef <ext[0]>: name=AA, quals=[], storage=['typedef']
    TypeDecl <type>: declname=AA, quals=[]
      IdentifierType <type>: names=['int']
  FuncDef <ext[1]>:
    Decl <decl>: name=foo, quals=[], storage=[], funcspec=[]
      FuncDecl <type>:
        TypeDecl <type>: declname=foo, quals=[]
          IdentifierType <type>: names=['void']
    Compound <body>:
      Decl <block_items[0]>: name=AA, quals=[], storage=[], funcspec=[]
        TypeDecl <type>: declname=AA, quals=[]
          IdentifierType <type>: names=['AA']
      Decl <block_items[1]>: name=BB, quals=[], storage=[], funcspec=[]
        TypeDecl <type>: declname=BB, quals=[]
          IdentifierType <type>: names=['int']
        BinaryOp <init>: op=*
          ID <left>: name=AA
          Constant <right>: type=int, value=2

Most of the work for this change was contributed by Sye van der Veen who heroically hacked pycparser's grammar rules into submission by using even more context information than was used before. As I predicted in that same article, all of this made the code somewhat less palatable and is in line with the general observation that LALR-based parsers are sorely inadequate for parsing real-life programming languages.

On a positive note, this is a user-focused release. Some of the internal implementation's clarity was sacrificed to provide a better end product, which is a parser that can handle more cases in the language. I hope users find it useful.


comments powered by Disqus