next up previous contents
Next: Expansion of Filename wildcards Up: Using Modules Previous: Substitutions   Contents


Operating System Services: os and shutil modules

The os module provides a dictionary named environ whose keys are the names of all the currently defined environmental variables, and whose corresponding values are the values of those variables. In addition to accessing the values of environmental variables, you can change the values of elements in the environ dictionary to modify the values of environmental variables; note that these changes will only be in effect for subsequent operating system commands in your current Python session, and will be discarded once you exit your Python session.

Since one use of Python is as a replacement for shell scripts, it's natural that there should be facilities to perform the sorts of tasks that would usually be done with a file manager or by typing into a command shell. One basic function, provided by the os module, is system, which accepts a single string argument, and executes the string as a command through the operating system's shell. Although it is often tempting to use system for a wide variety of common tasks, note that each call to system spawns a new command shell on your computer, so, in many cases, it will be a very inefficient way to perform a task. In addition, errors in the execution of a command passed to system will not automatically raise an exception, so if you need to verify that such a command executed properly, you should check the operating system's return code, which is passed back into the Python environment as the value returned by system function.

The shutil module provides commands to perform many common file manipulations without the need for spawning a new process. As always, the first step in using the functions in this module is to import the module using the import statement as discussed in Section 8.2. Some of the functions contained in the shutil module are summarized in Table 8.4


Table 8.4: Selected functions in the shutil module
Function Name Purpose Arguments
copyfile Makes a copy of a file src - source file
    dest - destination file
copy Copies files src - source file
    dest - destination file or directory
copytree Copies an entire directory src - source directory
    dest - destination directory
rmtree removes an entire directory path - path of directory


When you specify a filename which does not begin with a special character to any of the functions in the os or shutil module, the name is resolved relative to the current working directory. To retrieve the name of the current working directory, the getcwd function of the os module can be called with no arguments; to change the current directory, the chdir function of the os module can be called with a single string argument providing the name of the directory to use as the current working directory. In particular, note that calling the operating system's cd (Unix) or chdir (Windows) functions through the system function mentioned above will not work, since the change will only take place in the shell which is spawned to execute the command, not in the current process.

The listdir function of the os module accepts a single argument consisting of a directory path, and returns a list containing the names of all files and directories within that directory (except for the special entries ``.'' and ``..''.) The names are returned in arbitrary order.

Contained within the os module is the path module, providing a number of functions for working with filenames and directories. While you can import os.path directly, the module is automatically imported when you import the os module; simply precede the names of the functions in the module with the identifier os.path. Some of the more useful functions in this module are summarized in Table 8.5; each accepts a single argument.


Table 8.5: Functions in the os.path module
Function Name Purpose Returns
abspath Resolve a filename relative to absolute pathname
the current working directory
basename Return the basename of a path basename
dirname Return the directory name of a path directory name
exists Tests for existence of a path 1 if the path exists, 0 otherwise
expanduser Expands ``tilda'' (~) paths expanded path
(or original path if no tilda)
expandvars Expands shell variables expanded version of input
getsize Return the size of a file size of file in bytes
isfile Tests for regular files 1 if path is a regular file, 0 otherwise
isdir Tests for directories 1 if path is a directory, 0 otherwise
islink Tests for links 1 if path is a link, 0 otherwise


It should be mentioned that the list of filenames returned by listdir is not fully qualified; that is only the last portion of the filename is returned. Most of the other functions in the os modules require a fully-qualified pathname. There are two methods to insure that these filenames will get resolved correctly. The first involves calling chdir to make the directory of interest the current working directory; then the filenames returned by listdir will be correctly resolved since the files will be found in the current directory. The second approach, illustrated in the following function, involves prepending the directory name to the filenames returned by listdir.

Consider a function to add up the sizes of all the files in a given directory. The isdir and islink functions can be used to make sure that directories and links are not included in the total. One possible implementation of this function is as follows:

import os
def sumfiles(dir):
    files = os.listdir(dir)
    
    sum = 0
    for f in files:
        fullname = os.path.join(dir,f)
        if not os.path.isdir(fullname) and not os.path.islink(fullname):
            sum = sum + os.path.getsize(fullname)

    return sum
Notice that the join function of the os.path module was used to create the full pathname - this insures that the appropriate character is used when combining the directory and filename.

While it may not be as easy to read as the previous function, operations like the ones carried out by sumfiles are good candidates for the functional programming techniques described in Section 7.6. Here's another implementation of this function using those techniques:

def sumfiles1(dir):
    files = os.listdir(dir)
    files = map(os.path.join,[dir] * len(files),files)
    files = filter(lambda x:not os.path.isdir(x) and \
            not os.path.islink(x),files)
    sizes = map(os.path.getsize,files)
    return reduce(lambda x,y:x + y,sizes,0)

In the previous example, only files in the specified directory were considered, and subdirectories were ignored. If the goal is to recursively search a directory and all its subdirectories, one approach would be to write a function to search a single directory, and recursively call it each time another directory is found. However, the walk function of the os.path module automates this process for you. The walk function accepts three arguments: the starting path, a user-written function which will be called each time a directory is encountered, and a third argument allowing additional information to be passed to the user-written function. The user-written function is passed three arguments each time it is called. The first argument is the third argument which was passed to walk, the second argument is the name of the directory which was encountered, and the third argument is a list of files (returned by listdir). To extend the previous example to total up the file sizes for all the files in a directory and recursively through all subdirectories, we could create a function like the following:

def sumit(arg,dir,files):
    files = map(os.path.join,[dir] * len(files),files)
    files = filter(lambda x:not os.path.isdir(x) and \
            not os.path.islink(x),files)
    arg[0] = arg[0] + reduce(lambda x,y:x + y,map(os.path.getsize,files),0)
Since the return value of the user-written function is ignored, the total size of the files encountered must be passed through the arg parameter of the function. Recall from Section 7.3 that only mutable objects can be modified when passed to a function; thus a list is passed to the function, and the first element of the list is used to accumulate the file sizes. (An alternative would be to use global variables, but the use of such variables should always be avoided when a reasonable alternative exists.) To call the function to sum up the sizes of all the files rooted in the current directory, we would use walk in the following way:
total = [0]
dir = '.'
os.path.walk(dir,sumit,total)
print 'Total size of all files rooted at %s: %d bytes' % (dir,total[0])


next up previous contents
Next: Expansion of Filename wildcards Up: Using Modules Previous: Substitutions   Contents
Phil Spector 2003-11-12