ASTs for analyzing C

July 11th, 2008 at 9:27 am

As I wrote here, I’ve commonly found myself in the need to analyze C source code programmatically. In that post, I’ve also mentioned c2c, a nice open-source tool that analyzes C source code and can generate ASTs as an intermediate step. However, c2c is written in C and hence not convenient enough to extend and hack.

So I’ve decided to give my Python skills more practice and write an analyzer for C in Python, using PLY for the lexer & parser. The project is already online - with the lexer functioning and a set of tests for it (the focus for now is ANSI C90, assuming it has been preprocessed with some standard cpp).

When I sat down to implement the parser, the issue of the AST quickly came up. I want my parser to build the AST that can later be processed. But what kind of AST to build ? How detailed to make it ? These are untrivial questions.

I turned to Python itself for the answers. The standard compiler module has a built-in AST walker that allows to walk ASTs generated from Python’s code. The AST format itself is defined in a text file, and the corresponding Python module is cleverly generated automatically (ast.txt and astgen.py in Tools/compiler of Python’s source distribution). I like this approach, because it allows for a very detailed AST (which is good for convenient recursive walking) and avoids writing tons of boilerplate code by employing code generation.

Curiously, the Python compiler itself (CPython) uses another, though similar technique. It defines the Python grammar using ASDL (Abstract Syntax Description Language), and generates the C code for the compiler from it.

Anyway, now I’m in the process of deciding on the best AST approach for my C analyzer. I like the method of generating the AST code automatically from a readable specification quite a lot, so there’s a good chance I’ll borrow astgen.py for my needs.

I’ll report on the progress of this project in the future.

Related posts:

  1. Analyzing C source code
  2. pycparser v1.0 is out!
  3. a VHDL parser in Perl
  4. the answer for parsing C ?
  5. Parsing C: more on #line directives

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)


generic acomplia purchase cialis overnight delivery cheap acomplia online buy generic clomid buy cialis low price viagra without prescription where to buy cialis lowest price levitra where to buy propecia cheap cialis from canada lasix no prescription viagra without rx cheap accutane tablets viagra online without prescription viagra no rx buying cialis online zithromax viagra in uk free cialis cialis us where to buy acomplia find cialis online buy viagra lowest price accutane prescription buy cheap accutane online cialis buy buy generic cialis online acomplia order propecia online lowest price synthroid synthroid without a prescription synthroid online buy propecia online cheap levitra online where to buy levitra cialis online review synthroid prices cialis generic cialis buy drug buy viagra on line viagra pharmacy cialis for order price of levitra zithromax online where to buy synthroid soma generic generic clomid propecia online stores viagra cheap drug cheap generic soma cialis cheap zithromax online cheap order accutane online purchase zithromax online purchase viagra online buy cheap clomid cheap generic propecia zithromax pharmacy online pharmacy cialis cheapest acomplia cost of cialis no prescription viagra free viagra purchase lasix online cialis from india viagra from india order discount cialis soma online stores find no rx cialis cialis no rx required find viagra without prescription approved cialis pharmacy lasix discount