Non capturing groups in Perl regular expressions

by frank on May 20, 2009

in Tips of the Week

Non capturing groups are very handy, recently I was working on grabbing a portion of xml out of a document. After trying the PHP 5 xmlReader() class I opted for the quick and dirty Perl regular expressions for this script.

Problem is with the dot/period operator . which matches any character except for newlines \n.

So the regex needed to grab anything between two nodes and accept newlines, I used something like this:

$pattern = "/<node>((?:.|\n)+?)<\/node>/i";

The ?: part of the expression essentially means the group enclosed by the same brackets will not be captured in say a $matches array if you supply one to the preg_match() or preg_match_all() functions.

Inside that grouping the period . is used to match any character, or | the \n newline character – so this effectively means any character available will be matched.

Added ? after the + for the non greedy match.

That being said I highly recommend removing newlines from your source content before applying a regular expression because they are a pain in the arse.

More posts like this one:

  1. Don’t install IE on Ubuntu 8.04

Was this article useful?

rss feed icon

Email this article to yourself or...

rss feed icon

Subscribe to the RSS feed for more useful articles and tips.

Share this article with others

  • del.icio.us
  • Twitter
  • Reddit
  • StumbleUpon
  • Facebook
  • Digg