Finding Regular Expression Matches

Next: Tagging in Regular Expressions Up: The re module: Regular Previous: Compiling Regular Expressions Contents

Finding Regular Expression Matches

The re module provides three functions to test for the presence of a regular expression in a string. Each of these functions can be called in two slightly different ways, depending on whether the regular expression has already been compiled or not.

The function match looks for a regular expression at the beginning of a string. The function search looks for a regular expression anywhere in a string. When invoked as methods on a compiled regular expression, each of these functions accepts two optional arguments, pos and endpos, which specify the starting and ending positions within the string if you need to match a regular expression in a substring. Each of these functions returns a match object, described below if the regular expression is found, and the special value None if the regular expression is not found.

These functions act as methods for compiled regular expressions, and as functions when their first argument is a regular expression. For example, suppose we wish to search for email addresses as defined in Section 8.5.3, in a series of strings. After importing the re module, we could compile the regular expression and search a string with the following statements:

>>> emailpat = re.compile(r'[\w.]+@[\w.]+')
>>> str = 'Contact me at myname@mydomain.com
>>> emailpat.search(str)
<re.MatchObject instance at e95d8>

If we were only going to use the regular expression once, we could call the search function directly:

>>> re.search(r'[\w.]+@[\w.]+',str)
<re.MatchObject instance at e7ac8>

The third function for finding regular expressions in a string is findall. Rather than returning a match object, it returns a list containing the patterns which actually matched the regular expression. Like the other two functions, it can be called as a method or a function, depending on whether the regular expression has already been compiled.

>>> re.findall(r'[\w.]+@[\w.]+',str)
['myname@mydomain.com']
>>> emailpat.findall(str)
['myname@mydomain.com']

One very useful feature of findall is that, as its name implies, it will return multiple occurrences of regular expressions:

>>> newstr = 'My email addresses are myname@mydomain.org and \
... othername@otherdomain.net'
>>> emailpat.findall(newstr)
['myname@mydomain.org', 'othername@otherdomain.net']

While not actually used for matching regular expressions, it should be mentioned that the re module provides a split function, which can be used like the split function of the string module (See Section 8.4.2), but which will split a string based on regular expressions. Like the other functions in the re module it can be invoked as a method on a compiled regular expression, or called as a normal function:

>>> plmin = re.compile('[+-]')
>>> str = 'who+what-where+when'
>>> plmin.split(str)
['who', 'what', 'where', 'when']
>>> re.split('[+-]',str)
['who', 'what', 'where', 'when']

Next: Tagging in Regular Expressions Up: The re module: Regular Previous: Compiling Regular Expressions Contents

Phil Spector 2003-11-12