- Published on
Non capturing groups in Perl regular expressions
- Authors
- Name
- Frank
Non capturing groups are very handy, recently I was working on grabbing a portion of xml out of a document. After trying the PHP 5 xmlReader() class I opted for the quick and dirty Perl regular expressions for this script.
Problem is with the dot/period operator . which matches any character except for newlines \n.
So the regex needed to grab anything between two nodes and accept newlines, I used something like this:
$pattern = "/<node>((?:.|\n)+?)<\/node>/i";
The ?: part of the expression essentially means the group enclosed by the same brackets will not be captured in say a $matches array if you supply one to the preg_match() or preg_match_all() functions.
Inside that grouping the period . is used to match any character, or | the \n newline character - so this effectively means any character available will be matched.
Added ? after the + for the non greedy match.
That being said I highly recommend removing newlines from your source content before applying a regular expression because they are a pain in the arse.