(?P<name>
...)
, where
name is the name you wish to associate with the tagged expression, and ...represents
the tagged expression itself. For example, suppose we have employee records for name,
office number
and phone extension which look like these:
Smith 209 x3121 Jones 143 x1134 Williams 225 555-1234Normally, to tag each element on the line, we'd use regular parentheses:
recpat = re.compile(r'(\w+) (\d+) (x?[0-9-]+)')To refer to the three tagged patterns as name, room and phone, we could use the following expression:
recpat1 = re.compile(r'(?P<name>\w+) (?P<room>\d+) (?P<phone>x?[0-9-]+)')First, note that using named groups does not override the default behaviour of tagging - the findall function and method will still work in the same way, and you can always refer to the tagged groups by number. However, when you use the group method on a match object returned by search or match, you can use the name of the group instead of the number (although the number will still work):
>>> record = 'Jones 143 x1134' >>> m = recpat1.search(record) >>> m.group('name') 'Jones' >>> m.group('room') '143' >>> m.group('phone') 'x1134'
Now suppose we wish to refer to the tagged groups as part of a substitution pattern. Specifically, we wish to change each record to one with just the room number followed by the name. Using the pattern without named groups, we could do the following:
>>> recpat.sub(r'\2 \1',record) '143 Jones'With named groups, we can use the syntax
\g<name>
to refer to the tagged group in
substitution text:
>>> recpat1.sub('\g<room> \g<name>',record) '143 Jones'
To refer to a tagged group within a regular expression, the notation (?P=name)
can be used. Suppose we're trying to detect duplicate words appearing next to each
other on the same line. Without named groups, we could do the following:
>>> line = 'we need to to find the repeated words' >>> re.findall(r'(\w+) \1',line) ['to']Using named groups we can make the regular expression a little more readable:
>>> re.findall(r'(?P<word>\w+) (?P=word)',line) ['to']Notice when this form for named groups is used, the parentheses do not create a new grouped pattern.