Regular Expression Gurus Out There?


I am learning Perl regular expressions (on an OS X 10.1 workstation with the default Perl install).

I am trying to duplicate the functionality of many of these message boards that allow pseudo-HTML code, for example using [*b*]text here[/*b*] to bold a phrase (minus the asterisks). As such, I have written the following code to test the regular expression (bear with me):

$tag = "[bold]This is some text![/bold]";

if ($tag =~ /(\[)([a-z]+)(\])(.+?)(\[\/\1\])/) {

     print "matched!\n";
     print "atom 1 = $1\n";
     print "atom 2 = $2\n";
     print "atom 3 = $3\n";
     print "atom 4 = $4\n";
     print "atom 5 = $5\n";
     print "atom 6 = $6\n";

else {
     print "no match!\n";

When run as is, this code returns "no match!", but as far as I can tell it should match. When I remove the parentheses from the initial atom:

if ($tag =~ /\[([a-z]+)(\])(.+?)(\[\/\1\])/) {

it matches properly, which seems strange because as far as I know the parentheses should have no effect on the match inside; they just store the value into a variable for later use. Unfortunately, I want to grab this first atom using $1 so I need the parantheses there.

I'd appreciate if someone with regexp know-how would enlighten me. I'm going batty!


Well, the immediate problem is one of those Doh! type problems, namely, you want the last piece to match \2 not \1 since \1 is a '[':

if ($tag =~ /(\[)([a-z]+)(\])(.+?)(\[\/\2\])/) {

Of course, since you know that first should be the '[' character, you probably don't need to save it...personally, however, I would write this as:

if( $tag =~ /\[([a-z]+)\]([^[]*)\[\/\1\]/) {

since the only info you really need is the tag name and data in the tag. Also, I changed the data match to be [^[]* as that will match anything which isn't a '[' instead of the less-efficient .+? (which causes the parser to backtrack until it finds the match following .+?; a great regex book is Friedl's Mastering Regular Expressions).


That really *is* a doh! I started out with something very similar to what you propose, but forgot to update the variable somewhere along the line. Thanks for the additional insight...I am going to check out that book.