\
). These characters are
. ^ $ + ? * ( ) [ ] { } | \
A character class is represented by one or more characters surrounded by square
brackets ([]
). When Python encounters a character class in a regular
expression, it will be matched by an occurrence of any of the characters within the
character class. Ranges of characters (like a-z
or 5-9
) are allowed
in character classes. (If you need to specify a dash inside a character class, make
sure that it is the first character in the class, so that Python doesn't confuse it
with a range of characters.) If the first character in a character class is a caret
(^
), then the character class is matched by any character except those listed
within the square brackets. As a useful shortcut, Python provides some escape
sequences which represent common character classes inside of regular expressions.
These sequences are summarized in Table 8.1
|
As mentioned previously, certain punctuation symbols have special meanings inside
of regular expressions. The caret (^
) indicates the beginning of a string,
while the dollar sign ($
) indicates the end of a string. Furthermore,
within a regular expression, parentheses can be used to group together several
characters or character classes. Finally a number of characters known as modifiers
and listed in
Table 8.2 can be used within
regular expressions. Modifiers can follow a character, character class or a
parenthesized group of characters and/or character classes, and expand the range
of what will be matched by the entity which precedes them. For example, the regular
expression 'cat' would only be matched by a string containing those specific
letters in the order given, while the regular expression 'ca*t' would be
matched by strings containing sequences such as ct, caat,
caaat, and so on.