>/D_
Published on

Non capturing groups in Perl regular expressions

Authors
  • avatar
    Name
    Frank
    Twitter

Non capturing groups are very handy, recently I was working on grabbing a portion of xml out of a document. After trying the PHP 5 xmlReader() class I opted for the quick and dirty Perl regular expressions for this script.

Problem is with the dot/period operator . which matches any character except for newlines \n.

So the regex needed to grab anything between two nodes and accept newlines, I used something like this:

$pattern = "/<node>((?:.|\n)+?)<\/node>/i";

The ?: part of the expression essentially means the group enclosed by the same brackets will not be captured in say a $matches array if you supply one to the preg_match() or preg_match_all() functions.

Inside that grouping the period . is used to match any character, or | the \n newline character - so this effectively means any character available will be matched.

Added ? after the + for the non greedy match.

That being said I highly recommend removing newlines from your source content before applying a regular expression because they are a pain in the arse.