next up previous contents
Next: A Simple CGI program Up: Perl on the Web Previous: Transmitting Information through CGI   Contents

Data Tainting

Before proceeding to the actual functions available for CGI programming and HTML generation through the CGI module, it's worth taking a minute to talk about problems which might occur when you are receiving information from an outside source (like a fill-out form) which will be used inside your CGI program. If you are calling any outside programs (through the system or open functions, for example), an unscrupulous user could insert additional information through the variables you extract in your CGI script. Furthermore, if you're not careful about the value of the environmental variable PATH, which determines where your CGI program will search for the programs you tell it to run, a malicious user might trick your program into thinking it was calling a harmless program while it was actually calling a very dangerous one. For this reason, perl provides the -T flag, which implements two forms of taint checks. The first insures that you have actively set your PATH variable to some fixed value; on UNIX systems, a safe choice is usually
     $ENV{PATH} = "/bin:/usr/bin";

The second form of taint checking can be summarized as follows: Any variable which is constructed from outside sources must be reset by referencing a tagged pattern of a regular expression search. Notice that perl doesn't (and probably can't) check that the tagged expression is doing something worthwhile; it simply forces you to think about the problem, and apply a (hopefully) useful solution. This is usually very simple. For example, suppose we are accessing some information about a product name, which we are going to use to open a file constructed from that name. An unscrupulous user might manage to enter a name like ``rm *|'' - if we blindly pass that string to the open command, it could possibly remove many files. So before using any information garnered from outside sources in a perl program, you should carefully determine what characters you are willing to accept in that information, and either eliminate the others, or print a (stern) warning message that such strings are unacceptable in your program. If you only wanted alphanumeric data, you could use code like this:

if($product_name =~ /^(\w+)/){
   $product_name = $1;
}
else{
   print "Illegal characters encountered in product name\n";
}
To accept characters which are legal in email addresses, you might untaint input like this:
if($email_address =~ /^([-\@\w.]+)/){
   $email_address = $1;
}
else{
    print "Illegal characters encountered in email address\n";
}


next up previous contents
Next: A Simple CGI program Up: Perl on the Web Previous: Transmitting Information through CGI   Contents
Phil Spector 2002-10-18