. ^ $ + ? * ( ) [ ] { } | \you must precede them with a backslash.
A character class is represented in a regular expression by a series of characters
surrounded by square brackets ([]
), and will be matched by any of the
characters within the brackets. When you specify one of the ``special'' punctuation
characters mentioned above inside a character class, you don't need to use the
backslash (except for square brackets, dashes (-; see below) and backslash
itself), but there is no harm
in doing so. If the first character of a character class is the caret (^),
then the character class will be matched by any character except those
characters which are between the square brackets. Such a construction is known
as a negated character class.
Several shortcuts are available when you're writing a character class. Ranges of
letters or digits can be specified by placing a dash between the beginning and
ending characters. Thus to literally include a dash in a character class you must
precede it with a backslash.
Furthermore, perl provides special escape sequences,
listed in Table . Each of the sequences in
Table
can be used by itself to represent a character class,
or can be included inside a character class to extend the range of characters for
which the class will match.
These character class shortcuts are very useful for verifying input which your programs may receive, from either a command line interface, or through CGI scripts running on a web server. As a very simple example, suppose we are expecting a username which contains only letters, numbers or the underscore symbol. We can easily print a message advising of an illegal entry with a program fragment like:
print "Illegal username entered\n" if $username =~ /\W/;
Finally, a very useful escape sequence for constructing regular expressions is \b. Inside a character class, \b has it's usually meaning as a backspace, but outside of a character class, \b is matched only at a word boundary. This eliminates the need to individually check for all the different places that a word might be (surrounded by spaces, at the beginning or end of a sentence, followed by punctuation or newline, etc.). Similar to the other regular expression escape sequences, \B is matched by anything except a word boundary.
|