next up previous contents
Next: Substitutions Up: Regular Expressions Previous: Greediness of Regular Expressions   Contents

Matching Multiple Occurences of a Pattern

The previous example raises an interesting question: How can we get tagged expressions when, as in the previous example, there is more than one occurence of a regular expression on a single line? Since this situation only makes sense when we are using the g option (otherwise only the first match will be successful), we simply need to examine the return value of the m regular expression operator in an array context - that array will contain all of the tagged expressions found in the string being tested. We can apply this idea to the previous example:
$str = '<img src = "/one.jpg"> <br> <img src = "/two.jpg">'; 
@images = ($str =~  /< *img +src *= *["']([^"']+)["']/ig);
printf("%d images found:\n %s\n",scalar(@images),join("\n",@images));
# 2 images found: 
# /one.jpg
# /two.jpg
If there is more than one tagged subexpression in the regular expression, all of the tagged pieces are returned in the array, in the order they were encountered in the text being matched.

Alternatively, the m operator, using the g modifier, can be placed inside the clause of a while loop, allowing you to process the results of a match one at a time. As long as the specified regular expression continues to be found in the target string, the statements in the body of the while clause will continue to be executed. Suppose we wish to read the lines of a file, recording how many times various email addresses are encountered. For purposes of this example, we'll assume all email addresses are of the form name@domain.

     while(<>){
         chomp;
         while(/\b([\w.]+@[\w.]+)\b/g){
             $count{$1}++;
         }
      }
      foreach $k (keys(%count)){
         print "$count{$k} occurrences of $k\n";
      }
Since email addresses can contain periods, a character class consisting of \w and period (.) was used in the regular expression.


next up previous contents
Next: Substitutions Up: Regular Expressions Previous: Greediness of Regular Expressions   Contents
Phil Spector 2002-10-18