s
operator.
Using this operator, we can search for regular expressions, and replace all or
part of them with text of our choosing. In the simplest case, we may want to
replace all of the literal occurences of one string with another string. Suppose
we wish to replace all occurences of the word ``dog'' in a piece of text with
the word ``cat''. A common error when using the substitution operator is
to forget the g option in this case. Remember that, without this option,
only the first occurence of the regular expression will be changed.
If the text within which we wanted to perform the changes was called $line
,
we could use the following command:
$line =~ s/dog/cat/g;Note that this will change every occurence of dog to cat, even occurences embedded inside other words (like endogenous), which may not be what you desire. The word boundary symbol, \b should always be used when you specifically want to change a word, not just a character string:
$line =~ s/\bdog\b/cat/g;The previous command will not change the string dog to cat unless dog appears alone as a word. To make the substitution case-insensitive, simply add an i either before or after the trailing g of the substitute command. The
s
operator returns the number of substitutions which
took place. This is handy in its own right, but also allows you to use the operator
as a clause of a while or if statement which will be executed if
any substitutions took place.
Sometimes you wish to make substitutions in a variable while retaining a copy of the
original, unsubstituted string. For example, you may have a piece of text containing
angle brackets. To print this text to the screen, you'd want to leave the angle
brackets in place, but to insert the string in an html file, you'd need to substitute
the symbol <
for the left angle bracket (<), and >
for
the right angle bracket (>). A common perl construct for this sort of task
is to use a parenthetic assignment on the left hand side of the regular expression
operator; this copies the string to a new location and then modifies the copy,
while leaving the original intact. Suppose that our text containing angle brackets
is contained in a string called $orig
, and we wish to create a string called
$new
which has the modified version. We could use code like this:
($new = $orig) =~ s/</</g; $new =~ s/>/>/g;Naturally, the assignment of
$orig
to $new
could have
been done in a separate statement, but most perl programmers use the above construct
to copy and modify a character string in a single statement.
The tagging mechanism described in
Section can be used to rearrange pieces of a string in the
substitution. Suppose we have lines of text, each with a word followed by a number,
and we wish to print out the lines with the number preceding the words. We can
tag each piece of the line and then refer to the tagged pieces in the usual way
(
$1
, $2
, etc.) on the right hand side of the substitution:
s/(\w+)\s+(\d+)/$2 $1/;The use of \s+ as a separator between the two patterns allows for any amount of whitespace between the word and the number.
One of the modifiers in Table requires some additional
explanation. When you use the
e
modifier in a substitution operation, the
substituted text is not used literally; it is treated as a piece of perl code, and
evaluated. Consider a file which contains line numbers at the beginning of each
line, and we wish to print the lines, but with each line number incremented by five.
Note that there is no direct solution for the problem using regular expressions. But
if we replace numbers with the result of evaluating an expression like $x + 5
,
the problem becomes very simple. Here's an implementation of that idea using the
e
modifier:
while(<>){ s/^(\d+)/$1 + 5/e; print; }Each occurence of a number at the beginning of each line will be replaced by the result of evaluating a perl statement which adds 5 to that number. Due to the greediness of regular expressions (Section
e
modifier is to reformat numbers using the sprintf
function
(Section s/(\b)([\d.]+)(\b)/sprintf("%s%7.2f%s",$1,$2,$3)/eg;