But it
doesn't really have a large vocabulary. Well, not sure how you're defining vocabulary, but there's really only four or five types of things - quantifiers, character classes, positions, groups, and alternation, and none of those have more than a handful of variants.
(Then, to reduce having to type {...} and [...] and (...) as much, there's shorthand quantifiers, shorthand classes, shorthand positions.)
It is a bit of a pain that the syntax uses the same symbols for different meanings, but - once you understand when \ and ? mean the different things, and a few other bits - then the rest isn't so bad, and far less arbitrary than it seems on the surface.
And it's also annoying that we've got at least five major programming variants (Perl/PCRE/Python/.NET/Java) which all have
slight differences/benefits and then cut-down versions in JavaScript/grep/awk/etc.
But it still doesn't deserve the bad reputation a lot of people assign it.
withregardstoremembering/understanding,themostimportantthing,is
notwritingregexesthatlooklikethis-becauseitdoesn'thelpanyonewith
figuringoutwhat'sgoingonwhenyouremoveallformattinginformation.
It's just a pity that extended/comment mode (where unescaped whitespace is ignored, and # starts a comment) is not the default one in almost all implementations, so people think they must squish it all on a single line.
EDITED: 31 Oct 2011 14:22 by BOUGHTONP