CGI Programming

1  Data

The R.cgi script calls your R program in such a way that it doesn't automatically load any data into the R environment. So if you want to have data available to your CGI program, you'll need to explicitly get the data into R's environment. For reasons of efficiency, your program should always use the load function to load a previously saved binary version of the data you need. The most convenient place to store these objects is right in the cgi-bin directory from which your program will execute.
Suppose we wish to create a CGI program that will accept the name of one of the variables from the wine data frame, and then display a summary of the data. Before you plan to run the script, the wine data should be saved in a simple R session that's started after you've changed your working directory to be your cgi-bin directory. The command to do this is
save(wine,file='wine.rda')

Next, we can create a form, which would be saved in the public_html directory. Here's a simple example, which we'll save in the file wine.html:
<html><body>
<h1>Summary for Wine Variables</h1>
<form action='cgi-bin/R.cgi/dowine.cgi'>
Enter the name of the variable:  
<input type=text name=winevar><br>
<center>
<input type=submit value='Run'>
</center>
</form>
</body></html>

The dowine.cgi program would look like this:
load('wine.rda')

HTMLheader()

winevar = formData$winevar
tag(h1)
cat('Summary for wine$',winevar,sep='')
untag(h1)
tag(h2)
tag(pre)
print(summary(wine[[winevar]]))
untag(pre)
untag(h2)
cat('</body></html>')

Here's the form:
Here's the result of submitting the form:

2  Combo Forms

Of course, having the user remember the name of the variable they're interested in isn't a very user-friendly strategy, but the thought of manually preparing a form that lists all the variables isn't very appealing either. The problem can be solved by having the CGI program generate the form the first time it's called, and then processing the form when it's submitted back to the web server. If we call the CGI program directly (not through a form submission), the formData list will be empty, and we can use that condition to tell whether we need to generate the form or respond to it. Since R will be generating the form, it's very easy to have it provide a choice for each variable. For this example, let's use a drop down menu that will display the names of each variable. Here's a program that will both generate the form and respond to it:
if(length(formData) == 0){
    HTMLheader()
    tag(h1)
    cat('Summary for Wine Variables')
    untag(h1)
    cat("<form action='dowine1.cgi'>")
    cat("Choose the variable:")
    cat("<select name='winevar'>")
    load("wine.rda")
    sapply(names(wine),function(x)cat("<option value='",x,"'>",x,"<br>\n",sep=''))
    cat("</select>")
    cat('<input type="submit" value="Run">')
    cat("</form></body></html>")
} else {
   load('wine.rda')
   HTMLheader()
   winevar = formData$winevar
   tag(h1)
   cat('Summary for wine$',winevar,sep='')
   untag(h1)
   tag(h2)
   tag(pre)
   print(summary(wine[[winevar]]))
   untag(pre)
   untag(h2)
   untag(h2)
   cat('</body></html>')
}

One very important thing to notice if you use this approach - the action= argument should specify only the name of the program, without the usual R.cgi/; since R is already calling your program, it thinks it's in the R.cgi "directory".
Here's the result of calling the program directly:
and here's the result after making a choice:

3  Graphs

Certainly one of the most useful functions of CGI scripting with R is to display graphics generated by R, based on a user's choices as specified on a web form. This would provide a simple way to allow people unfamiliar with R to produce attractive graphs; if a means is provided for data input, and enough options are provided through checkboxes, drop-down menus, radiobuttons, etc, a complete web-based graphing solution could be developed.
To properly create and display graphics with a CGI program, it's necessary to understand the difference between the internal paths (which your R program will see) and the external paths (which are the addresses typed into the browser's address field.) For example, the way the class webserver is configured, the directory into which you would put HTML pages is (once again assuming your SCF id is s133xx):
/class/u/s133/s133xx/public_html/

This directory provides a convenient place to place graphics generated by your scripts. To the outside world, this directory would be indicated as:
http://springer/~s133xx/

or
http://localhost:8080/~s133xx/

So as far as the webserver is concerned (i.e. the way the outside world would find your files through a URL), the directory is known as
/~s133xx/

To create graphics from your CGI script, you first create a variable called graphDir and set it equal to the full internal name of the directory into which you'll write your graphs. In our example it would be /class/u/s133/s133xx/public_html/. Then use the webPNG function, specifying the name (without any leading directories) that you want to use for your graph. In order to generate the appropriate HTML so that your image will be displayed, you can use the img function of the CGIwithR library. This function takes two arguments. The first is the name of the graphic you produced via webPNG, and the second is called graphURLroot, and should be set to the "outside" view of your public_html directory, namely /~s133xx/. (Note the trailing slashes in both the graphDir and graphURLroot; they are required.)
To illustrate, let's create a simple CGI program that will generate some random data and create a conditioning plot containing histograms.
library(lattice)
HTMLheader()
x = data.frame(z = rnorm(1000), g = factor(sample(1:5,size=1000,replace=TRUE)))
graphDir='/class/u/s133/s133xx/public_html/'
cat("Now I'm going to plot some histograms:<br>")
webPNG(file='hist.png')
histogram(~z|g,data=x)
invisible(dev.off())
img(src='hist.png',graphURLroot='/~s133xx/')
cat("</body></html>")

The size of the plot can be controlled by passing width= and height= arguments to webPNG; the units for these arguments are pixels.
If you are using lattice graphics and your plot does not appear, try passing the call to the lattice function to the print function.
Notice the call to dev.off; without it, your graph may not be properly terminated, and only some (or possibly none) of the graph will be displayed.

4  Hidden Variables

Suppose we have a web page that displays a choice of data frames for possible further analysis. Once a data frame is chosen, another page could display the variables that are available from that data frame, and a choice of plots could be provided. Remember that each time our CGI program is called, a new R process begins. So how can we "remember" the data set name in between invocations of our program? The answer is to use hidden variables. Any time you create an HTML form, you can create as many hidden variables as you need to store information that needs to be available in the next step of processing. These hidden variables are exactly like any other CGI variable, but there is no visible sign of the variable on the form that is displayed in the user's browser. To create a hidden variable, use code like the following:
<input type="hidden" name="varname" value="the value">

Here's an implementation of a program that looks in the current directory for any files with an extension of .rda, provides a drop down menu of data set names, then allows a choice of variables, and finally produces the requested plot:
HTMLheader()
if(length(formData) == 0){
    datasources = list.files('.',pattern='\\.rda$')
    datasources = sub('\\.rda$','',datasources)
    cat('<form action="doplot.cgi">')
    cat('<select name=dataset>\n')
    sapply(datasources,function(x)cat('<option value="',x,'">',x,'</option>\n'))
    cat('</select>\n')
    cat('<center><button type="submit">Run</button></center></form>')
} else if('dataset' %in% names(formData)){
    dataset = formData$dataset
    dataset = gsub(' ','',dataset)
    load(paste(dataset,'.rda',sep=''))
    cat('<form action="doplot.cgi">\n')
    cat('<p>X-variable:<br>\n')
    sapply(names(get(dataset)),function(i)cat('<input type="radio" name="xvar" value="',i,'">',i,'<br>\n'))
    cat('<p>Y-variable:<br>\n')
    sapply(names(get(dataset)),function(i)cat('<input type="radio" name="yvar" value="',i,'">',i ,'<br>\n'))
    cat('<input type="hidden" name="set" value="',dataset,'">\n')
    cat('<center><button type="submit">Run</button></center></form>')
    cat('</form>')
} else{
    dataset = gsub(' ','',formData$set)
    load(paste(dataset,'.rda',sep=''))
    xvar=gsub(' ','',formData$xvar)
    yvar=gsub(' ','',formData$yvar)
    graphDir = '/home/spector/public_html/'
    webPNG(file='theplot.png',graphDir=graphDir)
    thedata = get(dataset)
    plot(thedata[[xvar]],thedata[[yvar]],xlab=xvar,ylab=yvar)
    img(src='theplot.png',graphURLroot='/~spector/')
    invisible(dev.off())
    }
cat('</body></html>')

This program has three sections: the first displays the initial form showing the data frame names (invoked when no formData is available); the second displays the variable choices (invoked when a dataframe variable is specified), and the third, which is invoked when formData is available, but the dataset variable is not defined.

5  Outgoing HTTP Headers

We've already seen that when a web browser makes a request to a web server, it sends a series of headers before the actual content (if any). The web server also sends headers to the browser, but up until now we've let the R.cgi wrapper script take care of that detail for us. In most cases, the only header that needs to be sent to a browser is one that informs the browser that we're sending it HTML (as opposed to, say an image or other binary file). That header looks like this:
Content-type: text/html

and it would be followed by a completely blank line to signal the end of the headers. The R.cgi script examines what you're about to send to the web browser, and, if it doesn't find the "Content-type" line, it inserts it before it sends your output to the browser. Thus, if you do insert that line, you are taking responsibility for the outgoing headers, and, if desired, you can add additional ones.
Just about the only header line that you might consider adding is one that specifies the value of a cookie to the browser. Cookies are small pieces of text, associated with a particular website, that are stored by a browser, and sent to web servers if they match the domain and possibly the path of the URL that initially set them. There are two types of cookies: session cookies, which expire when a particular browser session is ended and the browser is shut down, and persistent cookies, that are stored in a text file on the computer on which the browser is running, and will expire at a date specified in the header that defined the cookie. For this example, we'll create a session cookie, and then access it through a different script. If every web transaction had a form associated with it, we could use hidden CGI variables to do much of the work that cookies do, but, since they're stored on the user's computer, they are more reliable, and don't require any special programming. Here's an example of a script that sets a cookie:
if(length(formData) == 0){
   HTMLheader()
   cat("What is your name?")
   cat('<form action="setcookie.cgi">\n')
   cat('<input type="entry" name="name"><br>\n')
   cat('<button type="submit">Run</button>\n')

} else if('name' %in% names(formData)){
    name = formData$name
    cat("Content-type: text/html\nSet-Cookie: thename=",name,"; path=/~s133xx/\n\n",sep='')
    cat("Hello there, ",name)
}

cat('</body></html>')

Since everyone in class is sharing the same webserver, I've added a path= specification to the cookie header. For this class, it probably is a good idea to prevent people from getting cookies set by other programs. Note the two newlines at the end of the header line - these are essentially to make sure that the browser understands that the headers are done and the content is following. If you want to create persistent headers, you need to add an expires= specification to the Set-cookie header. The format of the expiration time must be followed precisely; in particular, the parts of the date much be separated by dashes, and the only allowable time zone is GMT. Here's an example of a header containing a variable, path and expiration date:
Set-Cookie: thename=Somebody; path=/~s133xx/; expires=Monday, 09-May-10 00:00:00 GMT

Now let's look at a program that will retrieve an already set cookie. When a browser recognizes a domain/path/time combination for which it has an active cookie, it sends it back to the webserver in a Cookie: header, not as a CGI variable. The format of the cookie is name=value, similar to the format in which CGI variables are transmitted. This means that we'll need to use Sys.getenv to access the environmental variable called HTTP_COOKIE.
HTMLheader()
cookie = Sys.getenv('HTTP_COOKIE')
name = gsub('^ *thename=(.*)$','\\1',cookie)
cat('Welcome back, ',name)
cat('</body></html>')

Notice that you can only access the cookies in a CGI program, not in ordinary HTML, but you don't need any form elements to get the cookie.

6  Creating Pretty Output

Since the output created by CGI programs is interpreted by the browser as HTML, we can use any HTML commands by simply having our program generate the necessary HTML statements. One simple way of organizing output in HTML is the HTML table. A table begins with the string <table>, and ends with </table>. Each row of the table begins with <tr>, and ends with </tr>; each element within a row begins with <td> and ends with </td>. To specify headings, the th tag can be used in place of td. This suggests the following function to produce one row of an HTML table:
makerow = function(x,tag='td'){
   st = paste('<',tag,'>',sep='')
   end= paste('</',tag,'>',sep='')
   cat(paste(paste('<tr>',st,sep=''),
	     paste(x,collapse=paste(end,st,sep='')),
             paste(end,'</tr>',sep='')),"\n")
}

To print an entire data frame, we can first print the names as a header line, then use apply to print the body of the data frame:
dftable = function(df){
   cat('<table border=1>')
   makerow(names(df),tag='th') 
   apply(df,1,makerow)
   cat('</table>')
}

An example of using these functions will be presented in the next section.

7  File Upload

We've already seen that that an input element with type=file will create an entry field and browse button to allow a user to specify a file for upload. In the following program, we'll create such a field to allow a user to specify a local comma-separated file which will then be read into R, and displayed as a table. File upload using the CGIwithR library simply places the content of the uploaded file into a character string in the formData list corresponding to the name of the CGI variable specified in the HTML form. To treat this character string as a file, we can use the textConnection function.
The following program will upload a comma-separated file, which will then be read by read.csv, and displayed using the dftable function from the previous section:
HTMLheader()
if(length(formData) == 0){
   cat('<form action="readcsv.cgi" method=post enctype="multipart/form-data">\n')
   cat('<input type=file name=thefile><br>')
   cat('<input type=submit value="Upload">')
   cat('</form>')
} else{
   makerow = function(x,tag='td'){
       st = paste('<',tag,'>',sep='')
       end= paste('</',tag,'>',sep='')
       cat(paste(paste('<tr>',st,sep=''),
                 paste(x,collapse=paste(end,st,sep='')),
                 paste(end,'</tr>',sep='')),"\n")
   }
   dftable = function(df){
       cat('<table border=1>')
       makerow(names(df),tag='th')
       apply(df,1,makerow)
       cat('</table>')
   }
  txtcon = textConnection(formData$thefile)
  df = read.csv(txtcon)
  dftable(df)
}
cat('</body></html>')

8  Debugging CGI Programs

The first step in debugging an R program is to make sure that there are no obvious syntax errors. This can be done easily by changing directories to the location of your CGI programs, running R and using the source command to execute your program. If you see any syntax errors, they should be corrected before attempting to run your program through the web server. You can simulate input from a form by creating a list called formData and loading appropriate elements into named values in that list.
When a program is run as a CGI program through a webserver, the program's standard error, which contains error messages, is normally routed to the webserver's error logs. Unless you are the owner of the webserver, these logs are usually not readable. To redirect error messages to your browser (which represents the standard output stream of a CGI program), you can use the following command inside your CGI program:
sink(file=stdout(),type='message')

Remember that the error messages will not be displayed using HTML, so they may be difficult to read.
If simulating your web data through an artificially constructed formData list is not sufficient to resolve problems in getting your program to run properly through the webserver, you can have a form generate the formData list, and save it to a file; then when you are testing the program offline, you can load that copy of formData, giving you an interactive session with the same data as would be generated by the web page.


File translated from TEX by TTH, version 3.67.
On 15 Apr 2011, 16:35.