As I mentioned in the past, I have a column on Game Dev.net about algorithms.
The current series of articles deals with the basics of parsing – building a regular expression engine (constructing NFAs, converting to DFAs, generating code, etc.)
One interesting problem I ran into, is how to parse the regular expressions that I get as input. For instance:
I have to make some kind of a parse tree from this, to know that a and b are alternated, that * is applied to them, etc. I could, of course, define a regex to parse regexes, but how would I parse *that* regex. A typical chicken and egg, catch 22 problem.
I went to see the source code of Flex, to see how they do it. Well, not surprisingly, they just bootstrap – they have a .l and .y files for their own syntax, which are probably processed with an older version of Flex
I guess this issue of bootstrapping exists in any “first compiler”.