STAT 314

Introduction To R/Splus

 

Getting Started

 

Enter Splus:

elaine16:~> R

(elaine16:~>Splus or Splus -g & for the graphical user interface)

 

Quit:

>q()

 

Get Help

     >help(plot)

 

Work under emacs

     If you know emacs, you can run R/Splus in emacs and save your session in emacs.

elaine16:~> xterm  #get into the X window system

or:

elaine16:~> xterm –display 128.12.168.72:0

elaine16:~> emacs  #under X window system

 

In the activate emacs window, type Esc-x shift-r”(Esc-x, shift-s), then you would get into the R-mode(S-mode), which can be recognized by the ‘>’ prompt.

 

 

Vector & Matrix & List

 

Assign value

 

> x_1

> y<-2:3

> z_c(x, y, 4)

> x

[1] 1

> y

[1] 2 3

> z

[1] 1 2 3 4

> assign("x", c(1,2))

NULL

> x

[1] 1 2

> z[c(1,4)]_0

> z

[1] 0 2 3 0

> z[ ]_0

> z

[1] 0 0 0 0

 

Function seq(), or how to generate sequences:

 

>x_seq(from=0,to=2,by=0.7)

> x

[1] 0.0 0.7 1.4

> seq(1,3)

[1] 1 2 3

> seq(from=10, length=5)

[1] 10 11 12 13 14

 

Numerical functions for vector

+, -, *, /, %/%(entire division), %% (modulo), ^, sin(), log(), mean(), min(), max(), sqrt(), length(), range(), order(), sort(), sum()

 

>z

[1] 1 2 3 4

> w_z*2+3

> w

[1]  5  7  9 11

> u_w^0.5

> v_sqrt(w)

> u

[1] 2.236068 2.645751 3.000000 3.316625

> v

[1] 2.236068 2.645751 3.000000 3.316625

> range(u)

[1] 2.236068 3.316625

> length(u)

[1] 4

 

Subscripting of vector:how to access the elements or a subset of the elements of a vector.

 

> z_1:4

> z

[1] 1 2 3 4

> z[2]

[1] 2

> x_c(2,4)

> z[x]

[1] 2 4

> z[5]_7

> z

[1] 1 2 3 4 7

> z[8]_8

> z

[1]  1  2  3  4  7 NA NA  8

> z[x[1]]

[1] 2

> z[3:6]

[1]  3  4  7 NA

> z[c(1,3,2,1)] 

[1] 1 3 2 1

> z[-(2:4)] #as you are about to see, - discards the elements with the corresponding indices

[1]  1  7 NA NA  8

> z[z>3]

[1]  4  7 NA NA  8

> z>3

[1]  F  F  F  T  T NA NA  T

> z[z<=3]

[1]  1  2  3 NA NA

 

Matrix:

 

> A_matrix(1:12, nrow=4)

> A

     [,1] [,2] [,3]

[1,]    1    5    9

[2,]    2    6   10

[3,]    3    7   11

[4,]    4    8   12

>A_matrix(1:12, nrow=4, ncol=3, byrow=T) #please note the difference with the default situation where we don't mention byrow=T

> A

     [,1] [,2] [,3]

[1,]    1    2    3

[2,]    4    5    6

[3,]    7    8    9

[4,]   10   11   12

> B_matrix(c(1,0), nrow=3, ncol=2, byrow=T)

> B

     [,1] [,2]

[1,]    1    0

[2,]    1    0

[3,]    1    0

 

Function rbind()(adjoining a row), cbind()(adjoining a column)

 

> x1_rbind(c(1,2), c(3,4))

> x1

     [,1] [,2]

[1,]    1    2

[2,]    3    4

> x2_10+x1

> x2

     [,1] [,2]

[1,]   11   12

[2,]   13   14

 

> x3_cbind(x1,x2)

> x3

     [,1] [,2] [,3] [,4]

[1,]    1    2   11   12

[2,]    3    4   13   14

> x4_rbind(x1,x2)

> x4

     [,1] [,2]

[1,]    1    2

[2,]    3    4

[3,]   11   12

[4,]   13   14

 

 

Numerical functions for Matrix

%*%, (compared to *?),  nrow(), ncol(),

t() : Transpose

svd(): svd decomposition

 

> x_matrix(1:4, nrow=2)

> x

     [,1] [,2]

[1,]    1    3

[2,]    2    4

> x*2

     [,1] [,2]

[1,]    2    6

[2,]    4    8

> x*x

     [,1] [,2]

[1,]    1    9

[2,]    4   16

> x%*%x

     [,1] [,2]

[1,]    7   15

[2,]   10   22

 

List

 

> Y_c(10,20,30,41)

> X_c(1,2,3,4)

> reg.y.on.x_lsfit(X,Y)

> reg.y.on.x$coef

 Intercept    X

      -0.5 10.3

> reg.y.on.x$resid

[1]  0.2 -0.1 -0.4  0.3

 

Make your own functions

 

> square_function(x){

+ y_x*x

+ y # or return(y)

+ }

> square(4)

[1] 16

> square(c(2,3))

[1] 4 9

 

Control Functions:

if( ) { } else { }

for( i  in 1:n ) { }

while( ) { }

 

> Sum_0

> X_1:10

 

To calculate the sum of the numbers in vector X, we can use:

 

> for(i in 1:10)

+ Sum<-Sum+X[i]

 

Which would be equal to

 

> j_0

> while(j<10)

+ { j_j+1

+   Sum_Sum+X[j]

+ }

 

    Actually if we want to get the sum of a vector, we can simply use sum( ).  There are many other simple statistics functions for vectors, such as mean( ), max( ), min( ), var( ), sd( ) 

    Now suppose we have a matrix X,

 

>X_matrix(1:6, nrow=2)

 

We want to know the summation of each row. If we use sum( )

 

    > sum(X)

    [1] 21

 

we see here, sum( ) just deem X as a vector. So to achieve our goal, we have to sum each row separately:

 

        > sumrow_numeric(2)

> sumrow

[1] 0 0

> for(i in 1:2)

+ sumrow[i]_sum(X[i,])

> sumrow

[1]  9 12

 

Another function apply() may be handy when we need to do such iteration calculation.

      apply(X, MARGIN, FUN, ...)

 

     > apply(X, 1, sum)

     [1]  9 12

 

An example for a simple bootstrap.

1)    Read in the data.

  The data this course may use is available at:

http://www.stanford.edu/class/stats314/Data/

   For example, if we want to use “CD4data”, the data looks like:

 

t0  t12

[1,] 2.12 2.47

[2,] 4.35 4.61

...

[20,] 3.00 2.40

attr(, ".d"):

[1] "20 HIV pats got Saq; shows ch4 counts at base and 12wks"

 

we can read in the data as follows:

 

>cd4data_read.table(“http://www.stanford.edu/class/stats314/Data/cd4data”, nrows=20, skip=1)

> dim(cd4data)

[1] 20  2

                > cd4data[1,]

               t0  t12

           [1,] 2.12 2.47

> cor(cd4data[,"t0"], cd4data[,"t12"])

[1] 0.7231654

 

2)    Bootstrap. 

To calculate the SE for the above correlations coefficient, we use the bootstrap method introduced in the class.

 

##The boot function provided by Prof. Efron:

boot<-function(x, B, func, ...)

{

#bootstrap func(x) B times

#x is nxp matrix of data, can be a vector

x <- as.matrix(x)

n <- nrow(x)

ff <- numeric()

i <- sample((1:n), n * B, replace = T)

i <- matrix(i, n)

for(b in 1:B) {

   xx <- x[i[, b],  ]

   ff <- cbind(ff, c(func(xx, ...)))

if(b/10 == floor(b/10))

   cat(" ", b)

}

if(nrow(ff) == 1)

   ff <- as.vector(ff)

else (dimnames(ff) <- list(NULL, NULL))

cat("\007")

ff

}

 

##We apply “boot” to this specified problem.

theta_function(X)

{ cor(X[,1], X[,2]) }

thetahat_boot(cd4data, 500, theta)

 

## Now we’ve got the 500 bootstrap “thetahat”, and then we can calculate the SE for the first 25, 50, 100, 250, 500.

calculateSE_function(x, index)

{

y_x[1:index]

n_index

ave_mean(y)

(sum((y-ave)^2)/(n-1))^0.5

}

track_as.matrix(c(25, 50, 100, 250,500))

result_apply(track, 1, calculateSE, x=thetahat)

> result

[1] 0.07990818 0.10888131 0.09809784 0.09778762 0.09375030

 

We are done!

 

(We can also use the function offered in package “bootstrap”:

>library(bootstrap)

>theta_function(index, xdata)

+{

+cor(xdata[index,1], xdata[index,2])

+}

>n_nrow(cd4data)

>thetahat_bootstrap(1:n, 500,theta, xdata=cd4data)

>result_apply(track, 1, calculateSE, x=thetahat$thetastar)

)

 

 

Tips and tricks

 

This section is a compilation of useful remarks made in Krause and Olson, the Basics of S-Plus, Springer. This is a very good book about Splus that focuses on the language and does not primarily focus on statistical applications.

 

Functions from the command line

In Splus, you have the ability to write functions from the command line. For many reasons, it is not necessarily the best choice for "long" scripts, but for simple ones, it is quite handy. In that case, it may be a good idea to type

 

options(prompt=";",continue="\t")

 

This will allow you to copy and paste more easily

 

Writing a function in a file

You can write your scripts in any file, for instance myscripts.q (.q is the Splus "natural" extension for such files).

Note that the name of the file does not matter.

To load the file muscripts.q in Splus, type

 

>source("myscripts.q")

 

You can now use the functions you wrote in that file.

 

Some debugging tools

What we mention here works for Splus; I am not sure everything is valid in R.

browse() Adding this command in your function opens an interactive session; you can then check in what state the objects you are dealing with are (with the call objects()). To quit, just type c.

Since this would involve editing the function, it is not very elegant.

 

trace()

To do essentially the same thing, you can use from the prompt:

 

>trace(f,browser)

 

that would call browse() on entering f; or

 

>trace(f,exit=browser)

 

(enter browse() on exiting f) or

 

>trace(f,browser,exit=browser)

 

that would combine the two aforementioned functionalities.

 

debugger() In case of a crash, if you want to see the complete state of the system, debugger() is the most informative function.

Start by typing

>options(error=expression(dump.frames()))

and run the function again. Then type

>debugger()

to inspect the state of the system. Once you're done, type

 

>options(error=expression(dump.calls()))

 

Splus and loops

Computations are done efficiently in Splus when they are vectorized; loops take much more time. The functions apply,sapply, lapply, and rapply have been implemented to help you vectorize your computations. We refer to the online help for more information (the essential difference is in the type of data these functions take as argument).

In extreme cases, you can actually use C code in your Splus programs, but we won't go into the details of that.

 

Default arguments

In a functions you can specify an argument by default by typing

 

>f<-function(x,y=3){return(x*y)}

 

If you type

 

>f(5)

 

you'll get 15; but f(6,4) will return 24.

You can also make a function have an unspecified number of arguments (useful for nesting) by declaring

 

function(x,y,…)

 

Comments

Splus allows to simply put comments within your function: # discards the line that follows it.

Also

 

Square<-

#This function returns the square of its argument

function(x){return(x^2)}

 

is an example of giving a description of the function that you can retrieve from

 

>?Square

or

>help("Square")

 

This does not work with R.

 

Importing data

The latest version of Splus has a function importData() that allows you to do it very efficiently and from many file formats. If the extension is recognized

 

>excelData <- importData("myExcelData.xls")

 

will do the job.

Splus also has an exportData function.

In "bad situations", you'll have to do the thing manually…

read.table() is helpful too, and should suffice to your needs in this class.

 

Printing, outputing

Last we mention the functions record(), that allows you to save your commands and their outputs to a file. The syntax is:

 

>record("MySplusSession.txt")

 

To generate Postscript graphs, use something like

 

>postscript(file="myepsfile.eps")

# your session here

>dev.off()

 

After you type dev.off(), all your graphs are printed to myepsfile.eps

Under Windows, it would be

 

>graphsheet(file="myepsfile.eps",format="EPS")

#your session here

>dev.off()