next up previous contents
Next: Finding Regular Expression Matches Up: The re module: Regular Previous: Constructing Regular Expressions   Contents


Compiling Regular Expressions

Before Python can actually test a string to see if it contains a regular expression, it must internally process the regular expression through a technique known as compiling. Once a regular expression is compiled, Python can perform the comparison very rapidly. For convenience, most of the functions in the re module will accept an uncompiled regular expression, but keep in mind that if you are repeatedly performing a regular expression search on a series of strings, it will be more efficient to compile the regular expression once, creating a regular expression object, and to invoke the regular expression function as a method on this object. Suppose we wish to search for email addresses in a set of strings. As a simple approximation, we'll assume that email addresses are of the form user@domain. To create a compiled regular expression the following statement could be used:
>>> emailpat = re.compile(r'[\w.]+@[\w.]+')
Note the use of the r modifier to create a raw string - this technique should usually be used when you are constructing a regular expression. In words, we can describe this regular expression as ``one or more alphanumeric characters or periods, followed by an at sign (@), followed by one or more alphanumeric characters or periods. In later sections, we'll see how to use this compiled object to search for the regular expression in strings.

Another advantage of compiling a regular expression is that, when you compile a regular expression, you can specify a number of options modifying the way that Python will treat the regular expression. Each option is defined as a constant in the re module in two forms; a long form and a single letter abbreviation. To specify an option when you compile a regular expression, pass these constants as a second argument to the compile function. To specify more than one option, join the options with the bitwise or operator (|). The available options are summarized in Table 8.3.


Table 8.3: Options for the compile function
Short Name Long Name Purpose
I IGNORECASE Non-case-sensitive match
M MULTILINE Make ^ and $ match beginning and end of lines within
    the string, not just the beginning and end of the string
S DOTALL allow . to match newline, as well as any other character
X VERBOSE ignore comments and unescaped whitespace



next up previous contents
Next: Finding Regular Expression Matches Up: The re module: Regular Previous: Constructing Regular Expressions   Contents
Phil Spector 2003-11-12