When Knuth designed the MIX assembly language (MIXAL), he didn't think of the poor chaps who'll want to apply nice parsing techniques to it. In his time, it was OK to assume that each column starts in a certain location on a punch card, thus making parsing trivial.

Each line of MIXAL may contain 1-3 fields, a label, a command and an address (argument). All are optional ! For instance, the following are all legal lines (comments after *):

* single HLT command
HLT             
* label, command ,argument
3H  SLA 5       
* command + argument
CON DATA        

I'd analyze this as follows:

There may be either 1, 2 or 3 tokens

  • If there's 1, it's a command (illegal command name => error)
  • If there are two, it's either command + argument or label + command. So, check if the first token is a legal command, if it is - assume the second option. If it isn't -> the first.
  • If there are three, it's label,command,argument
I just wonder how to represent such a thing in a parser. I'd like to use Parse::RecDescent... I once spoke with the creator of MDK and he said that he regrets that he doesn't have a *real* parser - it causes some bugs in the assembler.