STAT
314
Introduction
To R/Splus
Enter Splus:
elaine16:~> R
(elaine16:~>Splus
or Splus -g & for the graphical user interface)
Quit:
>q()
Get Help
>help(plot)
Work under emacs
If
you know emacs, you can run R/Splus in emacs and save your session in emacs.
elaine16:~> xterm #get into the X window system
or:
elaine16:~> xterm –display 128.12.168.72:0
elaine16:~> emacs #under
X window system
In the activate emacs
window, type “ Esc-x
shift-r”(Esc-x, shift-s), then you
would get into the R-mode(S-mode), which can be recognized by the ‘>’
prompt.
Assign value
>
x_1
>
y<-2:3
>
z_c(x, y, 4)
> x
[1]
1
>
y
[1]
2 3
>
z
[1]
1 2 3 4
> assign("x", c(1,2))
NULL
>
x
[1]
1 2
>
z[c(1,4)]_0
>
z
[1]
0 2 3 0
>
z[ ]_0
>
z
[1]
0 0 0 0
Function seq(), or how to
generate sequences:
>x_seq(from=0,to=2,by=0.7)
>
x
[1]
0.0 0.7 1.4
>
seq(1,3)
[1]
1 2 3
>
seq(from=10, length=5)
[1]
10 11 12 13 14
Numerical functions for
vector
+, -, *, /, %/%(entire
division), %% (modulo), ^, sin(), log(), mean(), min(), max(), sqrt(),
length(), range(), order(), sort(), sum()
>z
[1]
1 2 3 4
>
w_z*2+3
>
w
[1] 5
7 9 11
>
u_w^0.5
>
v_sqrt(w)
>
u
[1]
2.236068 2.645751 3.000000 3.316625
>
v
[1]
2.236068 2.645751 3.000000 3.316625
>
range(u)
[1]
2.236068 3.316625
>
length(u)
[1]
4
Subscripting
of vector:how to access the elements or a subset of the elements of a vector.
>
z_1:4
>
z
[1]
1 2 3 4
>
z[2]
[1]
2
>
x_c(2,4)
> z[x]
[1]
2 4
>
z[5]_7
>
z
[1]
1 2 3 4 7
>
z[8]_8
>
z
[1] 1
2 3 4 7 NA NA 8
>
z[x[1]]
[1]
2
>
z[3:6]
[1] 3
4 7 NA
>
z[c(1,3,2,1)]
[1]
1 3 2 1
>
z[-(2:4)] #as you are about to see, - discards the
elements with the corresponding indices
[1] 1 7
NA NA 8
>
z[z>3]
[1] 4 7
NA NA 8
>
z>3
[1] F
F F T T NA NA T
>
z[z<=3]
[1] 1
2 3 NA NA
Matrix:
>
A_matrix(1:12, nrow=4)
>
A
[,1] [,2] [,3]
[1,] 1
5 9
[2,] 2
6 10
[3,] 3
7 11
[4,] 4
8 12
>A_matrix(1:12, nrow=4, ncol=3, byrow=T) #please note the
difference with the default situation where we don't mention byrow=T
>
A
[,1] [,2] [,3]
[1,] 1
2 3
[2,] 4
5 6
[3,] 7
8 9
[4,] 10
11 12
>
B_matrix(c(1,0), nrow=3, ncol=2, byrow=T)
>
B
[,1] [,2]
[1,] 1
0
[2,] 1
0
[3,] 1
0
Function
rbind()(adjoining
a row), cbind()(adjoining
a column)
>
x1_rbind(c(1,2), c(3,4))
>
x1
[,1] [,2]
[1,] 1
2
[2,] 3
4
>
x2_10+x1
>
x2
[,1] [,2]
[1,] 11 12
[2,] 13 14
>
x3_cbind(x1,x2)
>
x3
[,1] [,2] [,3] [,4]
[1,] 1
2 11 12
[2,] 3
4 13 14
>
x4_rbind(x1,x2)
>
x4
[,1] [,2]
[1,] 1
2
[2,] 3
4
[3,] 11
12
[4,] 13
14
Numerical functions for
Matrix
%*%, (compared to *?), nrow(), ncol(),
t() : Transpose
svd(): svd decomposition
> x_matrix(1:4, nrow=2)
> x
[,1] [,2]
[1,] 1 3
[2,] 2 4
> x*2
[,1] [,2]
[1,] 2 6
[2,] 4 8
> x*x
[,1] [,2]
[1,] 1 9
[2,] 4 16
> x%*%x
[,1] [,2]
[1,] 7 15
[2,] 10 22
List
>
Y_c(10,20,30,41)
>
X_c(1,2,3,4)
> reg.y.on.x_lsfit(X,Y)
>
reg.y.on.x$coef
Intercept
X
-0.5 10.3
>
reg.y.on.x$resid
[1] 0.2 -0.1 -0.4 0.3
>
square_function(x){
+ y_x*x
+ y # or return(y)
+ }
>
square(4)
[1]
16
>
square(c(2,3))
[1]
4 9
Control
Functions:
if( ) { } else { }
for( i in 1:n ) { }
while( ) { }
> Sum_0
> X_1:10
To
calculate the sum of the numbers in vector X, we can use:
> for(i in 1:10)
+ Sum<-Sum+X[i]
> j_0
> while(j<10)
+ { j_j+1
+ Sum_Sum+X[j]
+ }
Actually if we want to get the sum of a
vector, we can simply use sum( ). There are many other simple statistics functions for vectors,
such as mean( ), max( ), min( ), var( ), sd( )…
Now suppose we have a matrix X,
>X_matrix(1:6,
nrow=2)
We
want to know the summation of each row. If we use sum( )
> sum(X)
[1] 21
we
see here, sum( ) just deem X as a vector. So to achieve our goal, we have to
sum each row separately:
> sumrow_numeric(2)
> sumrow
[1] 0 0
> for(i in 1:2)
+
sumrow[i]_sum(X[i,])
> sumrow
[1] 9 12
Another
function apply() may be handy when we need to do such iteration calculation.
apply(X, MARGIN, FUN, ...)
> apply(X, 1, sum)
[1] 9 12
An
example for a simple bootstrap.
1)
Read
in the data.
The data this course may use is available
at:
http://www.stanford.edu/class/stats314/Data/
For example, if we want to use “CD4data”, the data looks like:
t0 t12
[1,]
2.12 2.47
[2,]
4.35 4.61
...
[20,]
3.00 2.40
attr(,
".d"):
[1]
"20 HIV pats got Saq; shows ch4 counts at base and 12wks"
we can read in the data as
follows:
>cd4data_read.table(“http://www.stanford.edu/class/stats314/Data/cd4data”,
nrows=20, skip=1)
> dim(cd4data)
[1] 20 2
> cd4data[1,]
t0 t12
[1,] 2.12 2.47
>
cor(cd4data[,"t0"], cd4data[,"t12"])
[1] 0.7231654
2)
Bootstrap.
To calculate the SE for the
above correlations coefficient, we use the bootstrap method introduced in the
class.
##The
boot function provided by Prof. Efron:
boot<-function(x, B, func, ...)
{
#bootstrap func(x) B times
#x is nxp matrix of data, can be a vector
x <- as.matrix(x)
n <- nrow(x)
ff <- numeric()
i <- sample((1:n), n * B, replace = T)
i <- matrix(i, n)
for(b in 1:B) {
xx <- x[i[, b], ]
ff <- cbind(ff, c(func(xx, ...)))
if(b/10
== floor(b/10))
cat("
", b)
}
if(nrow(ff) == 1)
ff <- as.vector(ff)
else (dimnames(ff) <- list(NULL, NULL))
cat("\007")
ff
}
##We
apply “boot” to this specified problem.
theta_function(X)
{ cor(X[,1], X[,2]) }
thetahat_boot(cd4data,
500, theta)
##
Now we’ve got the 500 bootstrap “thetahat”, and then we can calculate the SE
for the first 25, 50, 100, 250, 500.
calculateSE_function(x,
index)
{
y_x[1:index]
n_index
ave_mean(y)
(sum((y-ave)^2)/(n-1))^0.5
}
track_as.matrix(c(25,
50, 100, 250,500))
result_apply(track,
1, calculateSE, x=thetahat)
> result
[1] 0.07990818
0.10888131 0.09809784 0.09778762 0.09375030
We are done!
>library(bootstrap)
>theta_function(index,
xdata)
+{
+cor(xdata[index,1],
xdata[index,2])
+}
>n_nrow(cd4data)
>thetahat_bootstrap(1:n,
500,theta, xdata=cd4data)
>result_apply(track,
1, calculateSE, x=thetahat$thetastar)
)
This
section is a compilation of useful remarks made in Krause and Olson, the Basics
of S-Plus, Springer. This is a very good book about Splus that focuses on the
language and does not primarily focus on statistical applications.
Functions
from the command line
In
Splus, you have the ability to write functions from the command line. For many
reasons, it is not necessarily the best choice for "long" scripts,
but for simple ones, it is quite handy. In that case, it may be a good idea to
type
options(prompt=";",continue="\t")
This
will allow you to copy and paste more easily
Writing
a function in a file
You
can write your scripts in any file, for instance myscripts.q (.q is the Splus
"natural" extension for such files).
Note
that the name of the file does not matter.
To
load the file muscripts.q in Splus, type
>source("myscripts.q")
You
can now use the functions you wrote in that file.
Some
debugging tools
What
we mention here works for Splus; I am not sure everything is valid in R.
browse() Adding
this command in your function opens an interactive session; you can then check
in what state the objects you are dealing with are (with the call objects()).
To quit, just type c.
Since
this would involve editing the function, it is not very elegant.
trace()
To
do essentially the same thing, you can use from the prompt:
>trace(f,browser)
that
would call browse() on entering f; or
>trace(f,exit=browser)
(enter
browse() on exiting f) or
>trace(f,browser,exit=browser)
that
would combine the two aforementioned functionalities.
debugger()
In case of a crash, if you want to see the complete state of the system,
debugger() is the most informative function.
Start
by typing
>options(error=expression(dump.frames()))
and
run the function again. Then type
>debugger()
to
inspect the state of the system. Once you're done, type
>options(error=expression(dump.calls()))
Splus
and loops
Computations
are done efficiently in Splus when they are vectorized; loops take much more
time. The functions apply,sapply,
lapply, and rapply have been implemented to help you vectorize your
computations. We refer to the online help for more information (the essential
difference is in the type of data these functions take as argument).
In
extreme cases, you can actually use C code in your Splus programs, but we won't
go into the details of that.
Default
arguments
In
a functions you can specify an argument by default by typing
>f<-function(x,y=3){return(x*y)}
If
you type
>f(5)
you'll
get 15; but f(6,4) will return 24.
You
can also make a function have an unspecified number of arguments (useful for
nesting) by declaring
function(x,y,…)
Comments
Splus
allows to simply put comments within your function: # discards the line that
follows it.
Also
Square<-
#This function
returns the square of its argument
function(x){return(x^2)}
is
an example of giving a description of the function that you can retrieve from
>?Square
or
>help("Square")
This
does not work with R.
Importing
data
The
latest version of Splus has a function importData()
that allows you to do it very efficiently and from many file formats. If
the extension is recognized
>excelData <-
importData("myExcelData.xls")
will
do the job.
Splus
also has an exportData function.
In
"bad situations", you'll have to do the thing manually…
read.table() is
helpful too, and should suffice to your needs in this class.
Printing,
outputing
Last
we mention the functions record(), that
allows you to save your commands and
their outputs to a file. The syntax
is:
>record("MySplusSession.txt")
To
generate Postscript graphs, use something like
>postscript(file="myepsfile.eps")
# your session here
>dev.off()
After
you type dev.off(), all your graphs are printed to myepsfile.eps
Under
Windows, it would be
>graphsheet(file="myepsfile.eps",format="EPS")
#your session here
>dev.off()