Regular expression understanding

From: Drew (X3N0PH0N)31 Oct 2011 14:38
To: Peter (BOUGHTONP) 13 of 57
quote: me
I strongly agree :Y


You just expect disagreement (hug)
From: Drew (X3N0PH0N)31 Oct 2011 14:40
To: Peter (BOUGHTONP) 14 of 57
I think the problem with regex is not that it's hard as such (as you say, it's not) it's just that it's a large vocabulary and it's quite arbitrary.

I can do pretty complex things with regex after a bit of reminding myself what's what. But then 20 minutes later I've forgotten it all again. That's the problem with regex :(
From: Peter (BOUGHTONP)31 Oct 2011 15:15
To: Drew (X3N0PH0N) 15 of 57
But it doesn't really have a large vocabulary. Well, not sure how you're defining vocabulary, but there's really only four or five types of things - quantifiers, character classes, positions, groups, and alternation, and none of those have more than a handful of variants.

(Then, to reduce having to type {...} and [...] and (...) as much, there's shorthand quantifiers, shorthand classes, shorthand positions.)

It is a bit of a pain that the syntax uses the same symbols for different meanings, but - once you understand when \ and ? mean the different things, and a few other bits - then the rest isn't so bad, and far less arbitrary than it seems on the surface.

And it's also annoying that we've got at least five major programming variants (Perl/PCRE/Python/.NET/Java) which all have slight differences/benefits and then cut-down versions in JavaScript/grep/awk/etc.

But it still doesn't deserve the bad reputation a lot of people assign it.


withregardstoremembering/understanding,themostimportantthing,is
notwritingregexesthatlooklikethis-becauseitdoesn'thelpanyonewith
figuringoutwhat'sgoingonwhenyouremoveallformattinginformation.

It's just a pity that extended/comment mode (where unescaped whitespace is ignored, and # starts a comment) is not the default one in almost all implementations, so people think they must squish it all on a single line.
EDITED: 31 Oct 2011 15:22 by BOUGHTONP
From: 99% of gargoyles look like (MR_BASTARD)31 Oct 2011 15:18
To: Peter (BOUGHTONP) 16 of 57

Of course, you're probably right.

 

My problem is simply that I usually turn to regex when I need to get something done (and learning it just gets in the way of doing something more interesting), rather than sitting down and taking the time to learn it properly.

From: Drew (X3N0PH0N)31 Oct 2011 15:23
To: Peter (BOUGHTONP) 17 of 57
quote:
But it doesn't really have a large vocabulary. Well, not sure how you're defining vocabulary, but there's really only four or five types of things - quantifiers, character classes, positions, groups, and alternation, and none of those have more than a handful of variants.


Yeah, that's grammar/syntax which I agree is pretty neat.

The vocabulary isn't that large but it's quite large and that combined with its arbitrariness (i.e. everything is one character, so can't be differentiated/remembered that way and the characters often don't obviously relate to their subjects and so on - makes memorising hard) which makes it difficult.

And it genuinely is complex when you get into back/forward references and have to worry about greediness and that kinda stuff. That's a genuine headfuck.
EDITED: 31 Oct 2011 15:24 by X3N0PH0N
From: Peter (BOUGHTONP)31 Oct 2011 15:37
To: Drew (X3N0PH0N) 18 of 57

I'm only half sure what you're on about with that middle paragraph. :S

 


Most times when people worry about greediness, they should actually be using lazy quantifiers, or a negative character class.

 

(If I was designing regex from scratch, I'd either make lazy the default, or have no default, so that people had to learn there are three different modes, and when each is appropriate.)

 


If you're using back references a lot, you're likely getting into the territory where a simple parser is likely the better choice (probably using a number of smaller, more basic regexes).

EDITED: 31 Oct 2011 15:37 by BOUGHTONP
From: 99% of gargoyles look like (MR_BASTARD)31 Oct 2011 16:11
To: Peter (BOUGHTONP) 19 of 57
I am both a lazy quantifier AND a negative character class.
From: 99% of gargoyles look like (MR_BASTARD)31 Oct 2011 16:11
To: 99% of gargoyles look like (MR_BASTARD) 20 of 57
I am also putting off performance reviews. God the tedium! :(
From: Mizzy31 Oct 2011 16:34
To: Peter (BOUGHTONP) 21 of 57

sigh, yes I know I wasn't paying attention where I chopped up the code and I was generalising broadly, :-(( sorry I wasn't up to PB standard, but I didn't have time to write a dissertation :-P

 

PS have you considered a career as a QSA ? :-)

 

I still stand by regexbuddy it does a nice job of validating regex against different regex flavours.

From: Drew (X3N0PH0N)31 Oct 2011 16:45
To: Mizzy 22 of 57
Hey I assumed you were new but you know about PB and his standards. WHO ARE YOU?!?!?!!?

Also, if you are new, tea?
From: 99% of gargoyles look like (MR_BASTARD)31 Oct 2011 16:49
To: Drew (X3N0PH0N) 23 of 57
(Registered: 20 Mar 2007)
From: Mizzy31 Oct 2011 16:52
To: Drew (X3N0PH0N) 24 of 57

I'm not new, I are extreme olds, I even remember teh pcf forum on delphi.

 

I also know your not called LUCY either Xen,

 


I'll only partake of tea if it's at least Assam and nice and strong.
otherwise you can keep your dishwater ;-)

 

Guessed who I am yet ????

From: Peter (BOUGHTONP)31 Oct 2011 16:57
To: Mizzy 25 of 57
A QSA job means I spend more of my time getting frustrated by badly written software than I already do.
At least with development, I get to try and fix the problem some of the time. :P


I've never bothered using any regex software, because I know what I'm doing so it'd just get in the way (and look hideous; dunno why people doing regex syntax highlighting pick such horrible colours schemes).

I'm sure RegexBuddy does do a good job with the different flavours, since it's written by the guy that runs regular-expressions.info (which has a comprehensive reference of what they each support), but being required to pay €30 for proprietary software isn't that great. :/

(I know there are free equivalents, but no idea how they actually compare.)
EDITED: 31 Oct 2011 16:58 by BOUGHTONP
From: Drew (X3N0PH0N)31 Oct 2011 17:02
To: Mizzy 26 of 57
HOW THE FUCK DO YOU KNOW ALL THESE THINGS ABOUT ME!?!??!!?!?

ARE YOU WATCHING ME NOW?!?!?!!?!! MY HAIR IS A MESS, HANG ON. OK, HOW'S THAT?!??!!!?!!?!?

Assam is an interesting choice. Too earthy and malty for me, I prefer the zesty zing of Ceylon and the like. A very respectable choice though.

I do not know who you are. Give us a clue!
From: Mizzy31 Oct 2011 17:07
To: Peter (BOUGHTONP) 27 of 57

But as a QSA you'd get to be picky all day and get paid lots and lots and lots I thought you'd like that.

 

At 3am when your seeing double a bit of syntax highlighting regardless of the colours makes life a lot easier, especially when your stuck with fixing a bitch of a SIEM collector plugin.

 

it is a bit steep but the others are an order of magnitude less helpful.

 

 

 

 

 

 

 

 

 

edit ------ heres that clue for Xen-------

EDITED: 31 Oct 2011 17:10 by MIZZY
From: Mizzy31 Oct 2011 17:19
To: Drew (X3N0PH0N) 28 of 57

I works in Security
I don't live south of London or north of Birmingham.

 

<natural paranoia>
can say any more teh fears they hurt
</natural paranoia>

From: Drew (X3N0PH0N)31 Oct 2011 17:25
To: Mizzy 29 of 57
Hmmm. Are you James Bond?
From: Mizzy31 Oct 2011 17:30
To: Drew (X3N0PH0N) 30 of 57

no Drew, I'm more likely to be moneypenny :-)

 


It is I, Marie.

From: Drew (X3N0PH0N)31 Oct 2011 17:32
To: Mizzy 31 of 57
Aha!

I remember you.

<pours a nice strong cup of assam>
From: Mizzy31 Oct 2011 17:37
To: Drew (X3N0PH0N) 32 of 57

<takes nice cup of assam>
ahhhhh, thank you.. you make good tea.