bootstrapping parsers

May 28th, 2004 at 12:54 pm

As I mentioned in the past, I have a column on Game Dev.net about algorithms.

The current series of articles deals with the basics of parsing – building a regular expression engine (constructing NFAs, converting to DFAs, generating code, etc.)

One interesting problem I ran into, is how to parse the regular expressions that I get as input. For instance:

(a|b)*abb

I have to make some kind of a parse tree from this, to know that a and b are alternated, that * is applied to them, etc. I could, of course, define a regex to parse regexes, but how would I parse *that* regex. A typical chicken and egg, catch 22 problem.

I went to see the source code of Flex, to see how they do it. Well, not surprisingly, they just bootstrap – they have a .l and .y files for their own syntax, which are probably processed with an older version of Flex :-)

I guess this issue of bootstrapping exists in any “first compiler”.

Related posts:

  1. regex gotcha in P::RD
  2. Some problems of recursive descent parsers
  3. Recursive descent, LL and predictive parsers
  4. pss v0.33 released
  5. Analyzing C source code

Comments are closed.