next up previous contents
Next: A Few Perl Functions Up: Interacting with the Operating Previous: User Information Functions   Contents

Reading Directories

Perl provides something called a directory handle to allow you to process all the files in a given directory. You call the opendir function similarly to the regular open function (Section [*]), passing it a directory handle, and the name of the directory you wish to read. Then each scalar call to the readdir function using the directory handle will return the name of a file in the directory, or undef when all the names have been returned. In an array context, a call to readdir returns a list of all the files in the directory corresponding to the directory handle passed to readdir.

Suppose we wish to list the names and sizes of all the files in a directory (specified on the command line) which are larger than 50,000 bytes. We could use opendir and readdir as follows:

   $thedir = $ARGV[0];
   opendir(DIR,$thedir) || die "Couldn't open $thedir";
   while(defined($filename = readdir(DIR))){
        if(-s "$thedir/$filename" > 50000){
            printf("%s: %d\n",$filename,-s _);
        }
   }
Note that readdir returns only the tail of the filename, not its complete path. Thus, when using the -s operator, it was necessary to precede the filename with the directory name which was passed to opendir. The printf function was used since the expression -s _ would not be interpolated within the double quotes.

While it would be possible to write a recursive program which would descend down all the subdirectories encountered in the while loop of the previous example, if you need to recurse through an entire directory tree, you can use the File::Find module to simplify the task. This module provides a function called find, which is passed a function reference (explained below) and a list of directories. The supplied function is then called for each regular file in the directories. When a subdirectory is encountered, the find function changes its working directory to that directory, and, when your function is called, it sets $_ to the basename of the current file, and stores the fully qualified file name in the variable $File::Find::name. This is the notation used in perl whenever you need to refer to a variable or function within a module, when the name of the variable or function has not been exported by that module. The value of the current directory is similarly stored in the variable $File::Find::dir.

For example, suppose we wish to find the largest file in a given directory or any of its subdirectories. First, we need to write a function which will determine if a given file is bigger than the biggest one encountered. The function passed to find does not accept a filename as an argument; instead it relies on the variables described in the previous paragraph, namely $_ and $File::Find::name. A function for the current problem might look like this:

     sub bigfile{
        if(-s > $biggest){
           $biggest = -s _;
           $bigfile = $File::Find::name;
        }
     }
The global variables $biggest and $bigfile are used to hold the results of the search. To create a subroutine reference to pass to the find function, precede the name of the function with a backslash and an ampersand (\&); this is required so that perl will not confuse the reference to the function with a reference to a scalar or array. As in previous examples, we'll assume that the starting directory is passed through the command line. Since find changes the working directory as it recurses the directory tree, all file tests can be performed on the variable $_, which contains just the basename of the file. The fully qualified pathname is stored in $File::Find::name, and is used here to record the full name of the largest file encountered.
     use File::Find;
     find(\&bigfile,$ARGV[0]);                
     print "file=$bigfile, size=$biggest\n";


next up previous contents
Next: A Few Perl Functions Up: Interacting with the Operating Previous: User Information Functions   Contents
Phil Spector 2002-10-18