Lattice Plots

1  Customizing the Panel Function

One of the basic concepts of lattice plots is the idea of a panel. Each separate graph that is displayed in a multi-plot lattice graph is known as a panel, and for each of the basic types of lattice plots, there's a function called panel.plottype, where plottype is the type of plot in question. For example, the function that actually produces the individual plots for xyplot is called panel.xyplot. To do something special inside the panels, you can pass your own panel function to the lattice plotting routines using the panel= argument. Generally, the first thing such a function would do is to call the default panel plotting routine; then additional operations can be performed with functions like panel.points, panel.lines, panel.text. (See the help page for panel.functions to see some other possibilities.) For example, in the income versus literacy plot, we might want to show the best regression line that goes through the points for each continent, using the panel.lmline function. Here's how we could construct and call a custom panel function:
> mypanel = function(x,y,...){
+   panel.xyplot(x,y,...)
+   panel.lmline(x,y)
+ }
xyplot(income ~ literacy | cont,data=world,panel=mypanel)

The plot is shown below.
By default, the lattice functions display their panels from bottom to top and left to right, similar to the way points are drawn on a scatterplot. If you'd like the plots to be displaying going from top to bottom, use the as.table=TRUE argument to any of the lattice plotting functions.
Now that we've seen some of the basics of how the lattice library routines work, we'll take a look at some of the functions that are available. Remember that there are usually similar alternatives available among the traditional graphics functions, so you can think of these as additional choices that are available, and not necessarily the only possibility for producing a particular type of plot.

2  Univariate Displays

Univariate displays are plots that are concerned with the distribution of a single variable, possibly comparing the distribution among several subsamples of the data. They are especially useful when you are first getting acquainted with a data set, since you may be able to identify outliers or other problems that could get masked by more complex displays or analyses.

2.1  dotplot

A simple but surprisingly useful display for small to moderate amounts of univariate data is the dotplot. Each observation's value for a variable is plotted as a dot along a line that spans the range of the variable's value. In the usual case, there will be several such lines, one for each level of a grouping variable, making it very easy to spot differences in the variable's distribution for different groups.
To illustrate, we'll use a a data set from a wine recognition experiment where a number of chemical and other measurements were taken on wines from three cultivars. The data is available at http://www.stat.berkeley.edu/~spector/s133/data/wine.data; information about the variables is at http://www.stat.berkeley.edu/~spector/s133/data/wine.names Suppose we are interested in comparing the alcohol level of wines from the three different cultivars:
> wine = read.csv('http://www.stat.berkeley.edu/~spector/s133/data/wine.data',header=FALSE)
> names(wine) = c("Cultivar", "Alcohol", "Malic.acid", "Ash", "Alkalinity.ash",
+                 "Magnesium", "Phenols", "Flavanoids", "NF.phenols", "Proanthocyanins",                
+                 "Color.intensity","Hue","OD.Ratio","Proline")
> wine$Cultivar = factor(wine$Cultivar)
> dotplot(Cultivar~Alcohol,data=wine,ylab='Cultivar')

The plot is shown below.

2.2  bwplot

The bwplot produces box/whisker plots. Unfortunately, notched boxplots are not currently available using bwplot. To create a box/whisker plot of Alcohol for the three cultivars, we can use the same formula we passed to dotplot:
> bwplot(Alcohol~Cultivar,data=wine,xlab='Cultivar',ylab='Alcohol')

The plot is shown below.
For both dotplot and bwplot, if you switch the roles of the variables in the formula, the orientation of the plot will change. In other words, the lines in the dotplot will be displayed vertically instead of horizontally, and the boxplots will be displayed horizontally instead of vertically.

2.3  densityplot

As its name implies, this function produces smoothed plots of densities, similar to passing a call to the density function to the plot function. To compare multiple groups, it's best to create a conditioning plot:
> densityplot(~Alcohol|Cultivar,data=wine)

Notice that, for plots like this, the formula doesn't have a left hand side. The plot is shown below:
As another example of a custom panel function, suppose we wanted to eliminate the points that are plotted near the x-axis and replace them with what is known as a rug - a set of tickmarks pointing up from the x-axis that show where the observations were. In practice, many people simply define panel functions like this on the fly. After consulting the help pages for panel.densityplot and panel.rug, we could replace the points with a rug as follows:
> densityplot(~Alcohol|Cultivar,data=wine,panel=function(x,...){
+                              panel.densityplot(x,plot.points=FALSE)
+                              panel.rug(x=x)
+                              })

Of course, if you find it easier or more convenient to define a custom panel function separate from the call to the plotting function, you can use that method. Here's the result:

3  barchart

A bar chart is like a histogram for categorical data. The barchart function expects its input data frame to already have the numbers of observations for each grouping tabulated. For the simplest case of a single variable with no conditioning variable, you can use a call to table on the right hand side of the tilda to produce a vertical bar chart:
> barchart(~table(cont),data=world)

The plot is shown below.
For more complex barcharts, a data frame containing the counts to be plotted needs to be constructed. This can be done easily using the table in conjunction with as.data.frame. To illustrate, we'll return to the movies data set which has the release dates and box office receipts for some of the all-time most popular movies. Suppose we want to see if the distribution of the day of the week the movies opened on has changed over time. First, we'll read the data and create a grouping variable for different time periods:
> movies = read.delim('http://www.stat.berkeley.edu/~spector/s133/data/movies.txt',as.is=TRUE,sep='|')
> movies$box = as.numeric(sub('\\$','',movies$box))
> movies$date = as.Date(movies$date,'%B %d, %Y')
> movies$year = as.numeric(format(movies$date,'%Y'))
> movies$weekday = factor(weekdays(movies$date),
+        levels=c('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'))
> movies$yrgrp = cut(movies$year,c(1936,1980,1985,1990,1995,2000,2006),
+        labels=c('1937-1979','1980-1984','1985-1990','1991-1995','1996-2000','2001-2005')) 
> counts = as.data.frame(table(yrgrp=movies$yrgrp,weekday=movies$weekday))
> barchart(Freq~weekday|yrgrp,data=counts,scales=list(rot=c(90,0)),as.table=TRUE)

The plot is shown below.
If the roles of Freq and weekday were reversed in the previous call to barchart, the bars would be drawn horizontally instead of vertically.

4  3-D Plots: cloud

The three-dimensional analog of the two-dimensional xyplot in the lattice library is cloud. By using a conditioning variable, cloud can allow us to consider the relationship of four variables at once. To illustrate, here's conditioning plot showing the relationship among four variables from the wine data frame:
cloud(Alcohol ~ Color.intensity + Malic.acid|cut(Hue,4),data=wine)

The plot is shown below:



File translated from TEX by TTH, version 3.67.
On 3 Mar 2009, 16:53.