As I mentioned in the past, I have a column on Game Dev.net about algorithms.

The current series of articles deals with the basics of parsing - building a regular expression engine (constructing NFAs, converting to DFAs, generating code, etc.)

One interesting problem I ran into, is how to parse the regular expressions that I get as input. For instance:

(a|b)*abb

I have to make some kind of a parse tree from this, to know that a and b are alternated, that * is applied to them, etc. I could, of course, define a regex to parse regexes, but how would I parse *that* regex. A typical chicken and egg, catch 22 problem.

I went to see the source code of Flex, to see how they do it. Well, not surprisingly, they just bootstrap - they have a .l and .y files for their own syntax, which are probably processed with an older version of Flex :-)

I guess this issue of bootstrapping exists in any "first compiler".


Comments

comments powered by Disqus