next up previous contents
Next: Numeric Data Up: String Operations Previous: Indexing and Slicing   Contents


Functions and Methods for Character Strings

The core language provides only one function which is useful for working with strings; the len function, which returns the number of characters which a character string contains. In versions of Python earlier than 2.0, tools for working with strings were provided by the string module (Section 8.4). Starting with version 2.0, strings in python became ``true'' objects, and a variety of methods were introduced to operate on strings. If you find that the string methods described in this section are not available with your version of ython, refer to Section 8.4 for equivalent capabilities through the string module. (Note that on some systems, a newer version of Python may be available through the name python2.)

Since strings are the first true objects we've encountered a brief description of methods is in order. As mentioned earlier (Section 1.4.3), when dealing with objects, functions are known as methods. Besides the terminology, methods are invoked slightly differently than functions. When you call a function like len, you pass the arguments in a comma separated list surrounded by parentheses after the function name. When you invoke a method, you provide the name of the object the method is to act upon, followed by a period, finally followed by the method name and the parenthesized list of additional arguments. Remember to provide empty parentheses if the method does not take any arguments, so that python can distinguish a method call with no arguments from a reference to a variable stored within the object.

Strings in python are immutable objects; this means that you can't change the value of a string in place. If you do want to change the value of a string, you need to invoke a method on the variable containing the string you wish to change, and to reassign the value of that operation to the variable in question, as some of the examples below will show.


Table 2.2: String Methods
Split and Join
Name Purpose Arguments
join Insert a string between each element of a sequence sequence
split Create a list from ``words'' in a string sep(optional)
maxsplit(optional)
splitlines Create a list from lines in a string keepends(optional)
Methods for searching
Note: Each of these functions accepts optional arguments start and end
which limit the range of the search.
Name Purpose Arguments
count Count the number of occurences of substring substring
find Return the lowest index where substring is found, substring
and -1 if not found
index Like find, but raises ValueError if not found substring
rfind Return the highest index where substring if found, substring
and -1 if not found
rindex Like rfind, but raises ValueError if not found substring
Methods for Justification
Name Purpose Arguments
center Centers a string in a given width width
ljust Left justifies a string width
lstrip Removes leading whitespace
rjust Right justifies a string width
rstrip Removes trailing whitespace
strip Removes leading and trailing whitespace
Methods for Case (upper/lower)
Name Purpose Arguments
capitalize Capitalize the first letter of the string
lower Make all characters lower case
swapcase Change upper to lower and lower to upper
title Capitalize the first letter of each word in the string
upper Make all characters upper case


Many of the string methods provided by python are listed in Table 2.2. Among the most useful are the methods split and join. The split method operates on a string, and returns a list, each of whose elements is a word in the original string, where a word is defined by default as a group of non-whitespace characters, joined by one or more whitespace characters. If you provide one optional argument to the split method, it is used to split the string as an alternative to one or more whitespace characters. Note the subtle difference between invoking split with no arguments, and an argument consisting of a single blank space:

>>> str = 'This parrot  is dead'
>>> str.split()
['This', 'parrot', 'is', 'dead']
>>> str.split(' ')
['This', 'parrot', '', 'is', 'dead']
When more than one space is encountered in the string, the default method treats it as if it were just a single space, but when we explicitly set the separator character to a single space, multiple spaces in the string result in extra elements in the resultant list. You can also obtain the default behavior for split by specifying None for the sep argument.

The maxsplit argument to the split method will result in a list with maxsplit + 1 elements. This can be very useful when you only need to split part of a string, since the remaining pieces will be put into a single element of the list which is returned. For example, suppose you had a file containing definitions of words, with the word being the first string and the definition consisting of the remainder of the line. By setting maxsplit to 1, the word would become the first element of the returned list, and the definition would become the second element of the list, as the following example shows:

>>> line = 'Ni a sound that a knight makes'
>>> line.split(maxsplit=1)
['Ni', 'a sound that a knight makes']
In some versions of python, the split method will not accept a named argument for maxsplit. In that case, you would need to explicitly specify the separator, using None to obtain the default behavior.
>>> line.split(None,1)
['Ni', 'a sound that a knight makes']

When using the join method for strings, remember that the method operates on the string which will be used between each element of the joined list, not on the list itself. This may result in some unusual looking statements:

>>> words = ['spam','spam','bacon','spam']
>>> ' '.join(words)
'spam spam bacon spam'
Of course, you could assign the value of ' ' to a variable to improve the appearance of such a statement.

The index and find functions can be useful when trying to extract substrings, although techniques using the re module (Section 8.5) will generally be more powerful. As an example of the use of these functions, suppose we have a string with a parenthesized substring, and we wish to extract just that substring. Using the slicing techniques explained in Section 2.4.3, and locating the substring using, for example index and rindex, here's one way to solve the problem:

>>> model = 'Turbo Accelerated Widget (MMX-42b) Press'
>>> try:
...     model[model.index('(') + 1 : model.rindex(')')] 
... except ValueError:
...     print 'No parentheses found'
... 
'MMX-42b'
When you use these functions, make sure to check for the case where the substring is not found, either the ValueError raised by the index functions, or the returned value of -1 from the find functions.

Remember that the string methods will not change the value of the string they are acting on, but you can achieve the same effect by overwriting the string with the returned value of the method. For example, to replace a string with an equivalent version consisting of all upper-case characters, statements like the following could be used:

>>> language = 'python'
>>> language = language.upper()
>>> language
'PYTHON'

Finally, python offers a variety of so-called predicate methods, which take no arguments, and return 1 if all the characters in a string are of a particular type, and 0 otherwise. These functions, whose use should be obvious from their names, include isalnum, isalpha, isdigit, islower, isspace, istitle, and isupper.

Related modules: string, re, stringIO.

Related exceptions: TypeError, IndexError.


next up previous contents
Next: Numeric Data Up: String Operations Previous: Indexing and Slicing   Contents
Phil Spector 2003-11-12