|HOW DO I...
Aggregate and Subset
|We have had good results using S-Plus and R
in our courses to analyze Stat Labs data. Currently we are using R because
it is free, and versions are available for the unix, Mac, and PC
environments. The syntax of the two languages are nearly identical.
R can be downloaded from
The Comprehensive R Archive Network,
where you can also find a User's Guide.
We also provide here a short list of comonly used commands,
and their syntax.
Splus is available from
and a free student version is available from this
Instructions for the student version are available
We also recommend
Introductory Statistics with R by P. Dalgaard and
An Introduction to S and S-Plus by
Here you will find
answers to our students' frequently asked questions on how to use R and
Splus. The responses provided do not comprise a comprehensive user's guide
to the R or S-plus language. Refer to the user's guides listed on this
page for a more thorough introduction to these languages.
The following examples all use the birth weight data, called babies. It contains
variables bwt (birth weight in ounces), smoking (0 for nonsmoker, 1=yes
now, 2=until pregnant, 3=once did not know, 9=unkown), wt (mother's
prepregnancy weight), educ (categories 0-7, 9=unknown).
- Getting data from the web into R
- Replacing values
The 999 values in
bwt denote missing values. To replace 999 with NA.
The 9 values in smoke
denote missing values. To replace 9 with NA and recode 2 and 3 as 0, for
ismoke<- replace(smoke,smoke==2 |smoke==3,0)
To convert the values of gestation into weeks, and to also collapse these
values into 5 catgories,
- Creating matrices and vectors
To create a matrix of 0s and 1s, where there is one column for
each education level, and the 1s in that column indicate the mother has that
To create a 3 by 4 matrix of all 0s.
To create an array (0, 0.1, ..., 0.9, 1).
To create an array (1,1,1,12,34,34).
- Data frames, matrices, and lists
- Computing statistics on subgroups
To compute the average birthweight for smokers and nonsmokers.
To compute the average
birthweight for mothers who are above average in weight.
tapply(bwt,(wt > mean(wt)),mean)
- Selecting a subset
To select babies
whose mother smoked.
smokerbabes<-babies[is.na(ismoke) | ismoke==1]
To count babies according to whether they are premature or low birthweight:
To find the mean of height and weight for mothers according to
education and smoking status.
To regress bwt on mother's weight for mothers separatley for smokers and nonsmokers
- Multiple plots with the same axes
make two histograms, one for smokers and on for nonsmokers, where they
both have the same axes.
- Quantile plots
To make a
quantile-quantile plot of bwt for smokers versus nonsmokers
To make a gamma(5,1) quantile plot of bwt
- Multiple boxplots on one plot
box and whisker plots of bwt, one for each education level, on the same
- Putting curves on plots
To make a
plot of bwt by wt, and place on it a curve that is computed from a
- Sliding bin plots
- Contour plots
- Nonparametric bootstrap
Here we will
use the data from the video game survey. The sample size is 91, and the
population size is 314. We first create a bootstrap population of time.
Then we take a simple random sample of 91 from the bootstrap population
- Parametric bootstrap
Here we take 100
stratified randoms from a normal distribution, where there are two
strata. One strata has a mean of 10 and sd of 1, and the second has a
mean of 0 and sd 1. Each sample will consist of 10 observations from the
first stratum and 15 from the second.