I'm now going through "Perl best practices", and although I mostly like it, sometimes I meet passages I disagree with. For instance, the following, in section 12.2 Line Boundaries:

In addition to always using the /x flag, always use the /m flag. In every regular expression you ever write.

The normal behaviour of the ^ and $ metacharacters is unintuitive to most programmers, especially if they're coming from a Unix background. Almost all of the Unix utilities that feature regular expressions (e.g., sed, grep, awk) are intrinsically line-oriented. So in those utilities, ^ and $ naturally mean "match at the start of any line" and "match at the end of any line", respectively.

But they don't mean that in Perl.

In Perl, ^ and $ mean "match at the start of the entire string" and "match at the end of the entire string".

And then:

The /m mode makes ^ and $ work "naturally" (That is, it makes them work in the unnatural way in which most programmers think they work). Under /m, ^ no longer means "match at the start of the string"; it means "match at the start of any line". Likewise, $ no longer means "at end of string"; it means "at end of any line".

I can't disagree more ! Not all programmers come from a Unix background, especially now, 18 years into Perl's existence. I, for once, almost never used sed/awk. When I began programming on a Unix, Perl was already there and I quickly endorsed it as my swiss-knife tool instead of csh/awk/sed. Hence, for me ^ and $ matching at the beginning and end of string is the most natural thing possible. Why shouldn't I use a popular Perl regex syntax, only because some Unix old-timers confuse it with something which it's not ?

In the next section, the author goes on to suggest:

Even if you don't adopt the previous practice of always using /m, using ^ and $ with their default meanings is a bad idea. Sure, you know what ^ and $ actually mean in a Perl regex. But will those who read or maintain your code know? Or is it more likely that they will misinterpret those metacharacters in the ways described earlier?

As is commonly discussed on Perlmonks, people not knowing the language well enough are not a reason not to use Perl's advanced features !

Perl provides markers that alwaysand unambiguouslymean "start of string" and "end of string": \A and \z (capital A, but lowercase z). They mean "start/end of string" regardless of whether /m is active. They mean "start/end of string" regardless of what the reader thinks ^ and $ mean.

I may be a n00b, but I've never used \A and \z and only vaguely remember what they do. OTOH, I know perfectly well what ^ and $ are. I think that most programmers are like this...

The vast majority of the advices given in this book are very good, though.


comments powered by Disqus