S Hints

For this lab, you may find it useful to read the following sections in the S manual:

3.3.4	Generation of Random Numbers
3.2.3	Subscripts
3.2.4	Functions
6.1	Overview of Functions
6.2.2	Iteration

Other S tips that students have found useful are listed below.

General Hints

S allows you to do tricky things with short and simple commands. Here is an example: Suppose you have a vector x of a certain length (i.e. x <- 1:10 gives x[1]=1, x[2]=x[10]=10). Convince yourself that the following command will give you only the components of x with an odd index:

Workspace Management

All the variables and functions you declare will be permanently stored in a UNIX directory called .Data. This directory will not be listed if you type ls in UNIX to see the names of all your directories or files, but it can be accessed in the usual way, i.e. by typing cd .Data. Typing ls then will display the names of the files in the directory .Data, namely all the variables and functions you ever declared, starting from your first S session. So you will never lose any variables or functions once you quit S (unless of course you overwrite them). On the other hand, as we go along with the labs, the variables will consume more and more storage space in the computer - you will create quite big vectors in this lab already. So it is a good idea to go to the directory .Data from time to time and delete variables that you will not need in the future. Do this with the usual remove command rm (e.g. rm vec removes a variable called vec). Remember you must go to the .Data directory to do this. Another way to remove data during the S session is by using the S function rm(), e.g., type rm(vec) to remove a variable called vec. In any case make sure you know what you are removing!

Q-Q Plots

A Q-Q plot is simply a plot of the quantiles of one dataset against those of another dataset. If they form a straight line, specifically y=x, then you may conclude that they are from the same distribution. So, if you want to know whether your data follows a certain distribution, a plot of the order statistics of your data against the quantiles of the distribution of interest can be informative.

The simplest method for getting the quantiles of your data is by sorting. If you then number the ordered data, as in you have the order statistics for X.

For example, let

The order statistics of X are

These may also be used to represent the quantiles of X for . The values of p should be equally spaced on . A common formula for determining the quantiles of a dataset with n observations is:

For example, to see if the data in the vector x comes from an exponential distribution, use the following commands: p <- seq(.5,length(x)-.5,1)/length(x)
q <- qexp(p)
plot (sort(x),q)

In the commands above, sort is used to sort the vector x for use of the resulting order statistics. The command to determine p creates a vector containing the values as shown in (1). The qexp command calculates the quantiles of the standard exponential distribution associated with the same values of p. Finally, the plot command plots the order statistics of the data against the quantiles of the standard exponential distribution. To find out visually whether two samples come from the same distribution or whether a sample comes from a normal distribution, S has the convenient commands qqplot and qqnorm. Look them up in the manual!