>>> txt = "I love dogs. My dog's name is Fido" >>> re.sub('dog','cat',txt) "I love cats. My cat's name is Fido"Like other functions in the module, sub can be called as a method if a regular expression has been compiled.
>>> ssnpat = re.compile('\d\d\d-\d\d-\d\d\d\d') >>> txt = 'Jones, Smith Room 419 SSN: 031-24-9918' >>> ssnpat.sub('xxx-xx-xxx',txt) 'Jones, Smith Room 419 SSN: xxx-xx-xxx'If you need to specify any of the flags in Table 8.5.3 in the regular expression to be substituted, you must use a compiled regular expression.
The default behaviour of sub is to substitute all occurrences of regular expressions found; an optional argument named count can be passed to sub to limit the number of substitutions it performs. If the number of times a substitution occurs is of interest, the subn method (or function) of the re module can be used with the same arguments as sub, but it will return a tuple containing the substituted string and the number of substitutions which took place.
When the regular expression to be substituted contains tagged patterns, these patterns
can be used as part of the replacement text passed to sub. You can refer to
the tagged patterns by preceding their number with a backslash; the first tagged pattern
can be refered to as \1
, the second as \2
, and so on. Thus to reverse
the order of pairs words and numbers in a string, we could use a call to sub
like the following:
>>> txt = 'dog 13 cat 9 chicken 12 horse 8' >>> re.sub('(\w+) (\d+)',r'\2 \1',txt) '13 dog 9 cat 12 chicken 8 horse'
For more complex substitutions, a function can be passed to sub or subn in place of a replacement string; each time a substitution is to be performed, Python passes the appropriate match object to this function, and uses the return value of the function as the replacement text. Consider the task of changing decimal numbers to their hexadecimal equivalents in a string of text. Using Python's string formatting features, this is easy to do using the x format qualifier. For example:
>>> x = 12 >>> '%02x' % 12 '0c'To make such a modification as part of a regular expression substitution, we can write a function to extract the appropriate text from a match object and return the desired hexadecimal equivalent, and pass this function in place of a replacement string to the sub method:
>>> txt = 'Group A: 19 23 107 95 Group B: 32 41 213 29' >>> def tohex(m): ... return '%02x' % int(m.group()) ... >>> re.sub('\d+',tohex,txt) 'Group A: 13 17 6b 5f Group B: 20 29 d5 1d'
For a simple function like this one, it may be more convenient to define an anonymous function using the lambda operator (Section 7.6):
>>> re.sub('\d+',lambda x: '%02x' % int(x.group()),txt) 'Group A: 13 17 6b 5f Group B: 20 29 d5 1d'