Skip to content


Tip of the Week : Non capturing groups in Perl regular expressions

Non capturing groups are very handy, recently I was working on grabbing a portion of xml out of a document. After trying the PHP 5 xmlReader() class I opted for the quick and dirty Perl regular expressions for this script.

Problem is with the dot/period operator . which matches any character except for newlines \n.

So the regex needed to grab anything between two nodes and accept newlines, I used something like this:

$pattern = "/<node>((?:.|\n)+?)<\/node>/i";

The ?: part of the expression essentially means the group enclosed by the same brackets will not be captured in say a $matches array if you supply one to the preg_match() or preg_match_all() functions.

Inside that grouping the period . is used to match any character, or | the \n newline character - so this effectively means any character available will be matched.

Added ? after the + for the non greedy match.

That being said I highly recommend removing newlines from your source content before applying a regular expression because they are a pain in the arse.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • E-mail this story to a friend!
  • Fark
  • Reddit
  • StumbleUpon

Profile:  Frank has been programming for the web using PHP, Javascript and numerous libraries and frameworks for the past 5 years. More articles.

Posted in Tips of the Week.

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Some HTML is OK

(required)

(required, but never shared)

or, reply to this post via trackback.