Next: Finding Regular Expression Matches
Up: The re module: Regular
Previous: Constructing Regular Expressions
  Contents
Compiling Regular Expressions
Before Python can actually test a string to see if it contains a regular expression,
it must internally process the regular expression through a technique known as
compiling. Once a regular expression is compiled, Python can perform the comparison
very rapidly. For convenience, most of the functions in the re module will
accept an uncompiled regular expression, but keep in mind that if you are repeatedly
performing a regular expression search on a series of strings, it will be more
efficient to compile the regular expression once, creating a regular expression
object, and to invoke the regular expression
function as a method on this object. Suppose we wish to search for email addresses in a set of strings.
As a simple approximation, we'll assume that email addresses are of the form
user@domain.
To create a compiled regular expression the following statement could be used:
>>> emailpat = re.compile(r'[\w.]+@[\w.]+')
Note the use of the r modifier to create a raw string - this technique
should usually be used when you are constructing a regular expression. In words, we
can describe this regular expression as ``one or more alphanumeric characters or
periods, followed by an at sign (@
), followed by one or more alphanumeric characters
or periods.
In later
sections, we'll see how to use this compiled object to search for the regular expression
in strings.
Another advantage of compiling a regular expression is that, when you compile a
regular expression, you can specify a number of options modifying the way that Python
will treat the regular expression. Each option is defined as a constant in the
re module in two forms; a long form and a single letter abbreviation. To
specify an option when you compile a regular expression, pass these constants as
a second argument to the compile function. To specify more than one option,
join the options with the bitwise or operator (|
). The available options
are summarized in Table 8.3.
Table 8.3:
Options for the compile function
Short Name |
Long Name |
Purpose |
I |
IGNORECASE |
Non-case-sensitive match |
M |
MULTILINE |
Make ^ and $ match beginning and end of lines within |
|
|
the string, not just the beginning and end of the string |
S |
DOTALL |
allow . to match newline, as well as any other character |
X |
VERBOSE |
ignore comments and unescaped whitespace |
|
Next: Finding Regular Expression Matches
Up: The re module: Regular
Previous: Constructing Regular Expressions
  Contents
Phil Spector
2003-11-12