next up previous contents
Next: The re module: Regular Up: The string module Previous: String Constants   Contents


Functions in the string module

As mentioned in Section 2.4.4, as of version 2.0, much of the functionality once provided by the string module is now made available through string methods. If you are using an older version of python, or if you inherit code written for an older version of python, you may have to use or understand the functions described in this section. While these functions will likely be supported for a while, it's best to switch to string methods at your earliest convenience, because the functions in this module will not be supported forever. Two of the most useful functions in the string module are split and join. The split function takes a string and returns a list of the words in the string. By default, a word is defined as a group of non-white space characters separated by one or more white space characters. An optional second argument (named sep) provides a character or string of characters to use as the separator which defines what a word is. Note that, when you specify a second argument, each occurrence of the character or string defines a word; in particular multiple occurrences of the separator will generate multiple empty strings in the output. This can be illustrated by the following simple example:
>>> import string
>>> str = 'one two   three four    five'
>>> string.split(str)
['one', 'two', 'three', 'four', 'five']
>>> string.split(str,' ')
['one', 'two', '', '', 'three', 'four', '', '', '', 'five']
In the first (default) case, any number of blanks serves as a separator, whereas when a blank is provided as the separator character, fields separated by multiple blanks produce empty strings in the output list.

Finally, an optional third argument (maxsplit) limits the number of times split will break apart its input string. If more separators are present in the input string than the maxsplit argument implies, the remainder of the string is returned as the final element of the list. For example to split a string into a list with the first word as the first element, and the remainder of the string as the second element, it suffices to call split with maxsplit set to 1:

>>> who = 'we are the knights who say ni'
>>> string.split(who)
['we', 'are', 'the', 'knights', 'who', 'say', 'ni']
>>> string.split(who,' ',1)
['we', 'are the knights who say ni']

The function join provides the opposite functionality of split. It accepts a sequence of strings, and joins them together, returning a single string. By default, a blank is inserted between each of the original strings; the optional named argument sep allows you to provide an alternative string to be used as a separator. As a simple example of the join function, consider producing comma-separated data suitable for input to a spreadsheet program.

>>> import string
>>> values = [120.45,200.30,150.60,199.95,260.50]     
>>> print string.join(map(str,values),',')
120.45,200.3,150.6,199.95,260.5
Since the first argument to join must be a sequence of strings, the map function was used to convert each element of the values list to a string.

Three functions are provided in the string module for removing whitespace from strings: lstrip, rstrip and strip which removing leading, trailing and both leading and trailing whitespace from a string, respectively. Each of the functions accepts a string and returns the stripped string.

A variety of functions dealing with capitalization are contained in the string module. The capitalize function returns its input string with the first letter capitalized. The capwords function capitalizes the first letter of each word in a string, replaces multiple blanks between words with a single blank, and strips leading and trailing whitespace. The swapcase function accepts a string and returns a string with the case of each character in the original string reversed (uppercase becomes lowercase and vice versa). The upper function returns a string with all the characters of its input string converted to uppercase; the lower function converts all characters to lowercase.


next up previous contents
Next: The re module: Regular Up: The string module Previous: String Constants   Contents
Phil Spector 2003-11-12