Last week I received an email from a user of pycparser that mentioned the strange AST that results when pycparser parses a switch statement.

Let's take the following snippet of C code for example. Don't look for semantic sense in it - it's just used to test the parser:

switch (myvar) {
    case 10:
        k = 10;
        p = k + 1;
        return 10;
    case 20:
    case 30:
        return 20;
    default:
        break;
}

And the AST pycparser was generating for this code:

Switch:
  ID: myvar
  Compound:
    Case:
      Constant: int, 10
      Assignment: =
        ID: k
        Constant: int, 10
    Assignment: =
      ID: p
      BinaryOp: +
        ID: k
        Constant: int, 1
    Return:
      Constant: int, 10
    Case:
      Constant: int, 20
      Case:
        Constant: int, 30
        Return:
          Constant: int, 20
    Default:
      Break:

There are two problems here:

  1. Only the first statement inside each case is made a child of that case - the other statements are siblings.
  2. Two consecutive case statements without any other statements in between (fall-through) cause the second case to become the child of the first one. If additional consecutive case statements follow, they nest even further.

Since the parser follows the C grammar pretty closely, I immediately went to look into the C99 standard, and indeed, this is exactly the parse tree that it mandates. Here's the relevant portion of the language grammar (from section A.2.3):

(6.8) statement:
              labeled-statement
              compound-statement
              expression-statement
              selection-statement
              iteration-statement
              jump-statement
(6.8.1) labeled-statement:
              identifier : statement
              case constant-expression : statement
              default : statement

Note that a case (and default, which is equivalent to case in this whole discussion) must be followed by one, and only one other statement. This explains why pycparser parses the code above the way it does.

However, the goal of pycparser is not to generate a parse tree. It is to generate an abstract syntax tree (AST), which follows the language semantics rather than its grammar. Hey, I already wrote about this stuff!

So today I fixed this part of pycparser, by adding a dedicated AST transformation after parsing a switch statement. The transformation isn't really complicated, and the AST pycparser generates now is much friendlier. Here it is, for the same code:

Switch:
  ID: myvar
  Compound:
    Case:
      Constant: int, 10
      Assignment: =
        ID: k
        Constant: int, 10
      Assignment: =
        ID: p
        BinaryOp: +
          ID: k
          Constant: int, 1
      Return:
        Constant: int, 10
    Case:
      Constant: int, 20
    Case:
      Constant: int, 30
      Return:
        Constant: int, 20
    Default:
      Break:

As you can see, the problems mentioned above were fixed. This fix is available in the pycparser repo, and will be part of the next release.