RLabAdv

Next: About this document ...

R Lab 1

This lab is designed to be able to go through it on your own. If you come to the scheduled lab, we will get everyone started and then answer questions as they arise. You should try the commands on your own and not just read the text if you are unfamiliar with R. The exercises are generally to point out features of R that might be useful, or common mistakes; almost every exercise introduces a little something new. I would recommend you at least read through them. The beginning exercises will be need for later examples. If you are already familiar with R, there are still some handy tips sprinkled through.

Also, I have made a summary of useful commands by Category here. These do not describe the commands, but just give lists. Use the help function as described below to find out what they do.

Nuts & Bolts of Running R

To Download R to Personal Computer
- Go to http://cran.r-project.org
- Under "Precompiled Binary Distributions" pick the appropriate system for your machine, and find the correct .exe file under the folders (for Windows, it's under the folder "base" and is called "SetupR.exe")
On Leland system (Basic)
- Open a Leland Unix prompt (like Samson for Windows or Xterm if you are at a Unix station)
  NOTE: if you use "tree" you will not be able to run all of the help, because the help.start( ) requires html
- [Optional] Create a directory for this class called stat with the "mkdir" command (cd is for change directory)
  
  cardinal: > cd private cardinal: /private> mkdir stat cardinal: /private> cd stat
- Set the enviroment display to your computer (so the plots will show up on your screen) by typing at the prompt:
  
  > setenv DISPLAY MachineNameorIPaddress:0.0
- Type "R" at the prompt. A message should come up letting you know your are in R
Xservers - Plotting with Leland If you are not actually at a Unix work station (i.e. if you are using something like Samson) then to show the plots you need a program to simulate the Unix work station (this is what's on the course website about X servers). Most Stanford University machines should have this. To see that it's running see if there is an X icon from the program in the lower right corner, where the date is.
You can get X programs for your personal computer, and they have various trial periods. These are two I know about:
MI/X
Xwin32

Getting Started in R

R is a Command Line language, which means you type in the commands at the prompt ">" and the output comes after you hit return. In other words, there are not drop down menus and mouse commands. Even if you use the "Windows" version, it is still works basically this way. This gives you a lot of control, but can make it a little intimidating at first. It's often a good idea, if you are doing a lot of complicated things, to have another screen open (like Notebook) for text editing so that you can save your commands.

We are going to use preloaded data in this lab. To access it type the following command:

> library(MASS)

We are going to bring a particular set of data from the library into your R session: UScereal, a dataset about American cereals. So type at the prompt

> data(UScereal)

We also are going to want to save files into a folder for future reference. When I refer to mydir you should type in an appropriate directory. An example might be

"C:/Documents and Settings/lelandID/My Documents"

Note that Windows uses a " $\backslash$ " to separate subdirectories, unlike Unix, which uses "/". To give a full path name in R using Windows, you can use the Unix "/" or use " $\backslash\backslash$ ". You cannot just use the Windows " $\backslash$ ".

In the computer lab, we will give you instructions for what to insert in place of mydir. For more information about finding your working directory, etc., see http://www.stanford.edu/~epurdom/Saving/Saving.html

Creating R Objects
To save output to some variable name, use "<-" (or sometimes you'll see "_" )
- Simple Examples
  
  > x <- 3 > y <- 5 > x + y [1]8
  
  You can create descriptive names, as you like, though remember you may have to type them many times, so try to think of short, descriptive names.
- Vectors
  You will often be working with vectors, particularly if you are fine-tuning options. The function c( ) creates a vector
  
  > Numb<-c(2,4,-1.4)
  
  If you want to look and see the value of what you have created, you simply type its name at the prompt.
  
  > Numb [1]2.0 4.0 -1.4
  
  Exercise 1 Make a new vector, z = (4,-3,2). What happens with
  
  z + Numb z-Numb z*Numb Numb - x
  
  You can make some kinds of vectors quickly using seq and rep
  
  > seq(0,3,length=10) [1]0.0000000 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 [7]2.0000000 2.3333333 2.6666667 3.0000000 > rep(1,3) [1]1 1 1
  
  You can also give names to your vector to keep track of what they correspond to
  
  > names(Numb)<-c("Pat.1","Pat.2","Pat.3") > Numb Pat.1 Pat.2 Pat.3 2.0 4.0 -1.4
  
  Exercise 2 Try the following prompts to see the further abilities of seq and rep
  
  > 1:10 > seq(0,10,by=2) > rep(c(1,2,3), each=3) > rep(c(1,2,3), times=3)
- Matrices
  You can make matrices from scratch using matrix() or from previously made vectors using rbind(),cbind()
  
  > mat<-matrix(c(1,2,4,2,1,3),nrow=3,ncol=2,byrow=T) > mat2<-cbind(z,Numb) > colnames(mat2)<-c("X1","L3") > rownames(mat2)<-c("A","B","C") > mat2 X1 L3 A 4 2.0 B -3 4.0 C 2 -1.4
  
  Exercise 3 1. What does byrow=F do?
  2. What matrix operations do the following correspond to
  
  > mat2*mat > t(mat) > mat%*%t(mat2) > mat^2
- Factors
  This type of object allows R to deal with categorical variables. The different possible categories are called levels.
  
  > fac<-factor(c(2,1,2,2,4,-1),labels=c("A","B","C","D")) > fac [1]C B C C D A Levels: A B C D > levels(fac) [1]"A" "B" "C" "D"
- Dataframes
  Dataframes generally act like matrices, but allow different columns to be vectors of different types, like character, factor, numeric, etc.
  
  > mydf<-data.frame(x2=fac,y=c(z,Numb)) > mydf x2 y 1 C 4.0 2 B -3.0 3 C 2.0 4 C 2.0 5 D 4.0 6 A -1.4
  
  Exercise 4 What if I used cbind instead of data.frame in the above example?
- Lists
  Lists allow you to put together any kind of data to keep track of it.
  
  > mylist<-list(mymat=mat,vec=z,afac=fac) > mylist $mymat [,1][,2] [1,]1 2 [2,]4 2 [3,]1 3
  $vec [1]4 -3 2
  $afac [1]C B C C D A Levels: A B C D
- What do I have??
  You can use the functions of the format is.xx to figure out what type object you have.
  
  > is.vector(z) [1]TRUE > is.vector(mat) [1]FALSE
  
  You can also try to convert objects to different types using functions as.xx.
  
  > as.vector(mat) [1]1 4 1 2 2 3 > as.character(fac) [1]"C" "B" "C" "C" "D" "A"
  
  Exercise 5 What is the difference between
  
  > as.matrix(mydf) > data.matrix(mydf)
Functions Used:

c seq rep names

matrix rbind cbind rownames

colnames factor levels data.frame

as.xx is.xx
Manipulating Objects you have Made
- To see a list of the variables you have created and have available, type
  
  > ls()
  
  NOTE: If you later reassign something else to that same name, you lose the old information, and there is no warning before you do it, so use ls() to see if you have already used the name. You also should not give a name to your variable that is already a name of an R function because there will be unexpected consequences.
- Quitting
  
  To quit from R, type
  
  > q() Save workspace image? [y/n/c]:
  
  Type y, and then you should be out of R. If you saved your session, when you come back, all of these saved variables will still be there to work with, as long as you start in the same directory, that you created (In Windows/Mac, this will be the default directory that you get by opening R).
  
  Exercise 6 Try it! Quit from R and go back in and see that your variables are still there. What happens if you hit the up arrow at the prompt?
- Saving Objects
  This part is useful but not necessary, so feel free to skip this as needed.
  If you want to save one particular object that you made, say to transfer to another computer or to back it up, you can use the command save. If you make the extension of your file ``.rdata'' then Windows recognizes it as an R Data file and will autolaunch.
  
  > save(Numb,file="mydir/Numb.Rdata")
  
  If you exit R and look at the files in your directory, you will see Numb.Rdata. You can move this file around to different directories - this is a way of saving your information. dump() is another way, but not as nice. If you move it to a different directory you can get it back within R by using load()
  
  > load(file="mydir/Numb.Rdata")
  
  Of course, since you saved when you exited, you don't need to load the information back in. You can actually access all of your data that you saved when you exited by loading the .RData file.
  
  Exercise 7 Make a vectors temp and temp2. What happens if you type
  
  > save(list=c("temp","temp2"),file="Test.Rdata")
- Deleting Objects
  You can also remove objects using the rm() command, but it's permanent (there is no question asking if you really meant it!):
  
  > rm(x)
  
  Exercise 8 Try these codes and see what happens:
  
  > rm(list=c("x","z")) > rm(list=ls(pattern="temp"))
Functions Used:

ls q save load rm
Data Indexing (using the dataset "UScereal")
- Basic Indexing
  You can look at all of your data as said before by just typing the name at the prompt
  
  > UScereal                          mfr calories    protein       fat    sodium 100% Bran                 N 212.12121 12.1212121 3.0303030 393.93939 All-Bran                   K 212.12121 12.1212121 3.0303030 787.87879 All-Bran with Extra Fiber K 100.00000 8.0000000 0.0000000 280.00000 Apple Cinnamon Cheerios    G 146.66667 2.6666667 2.6666667 240.00000 Apple Jacks                K 110.00000 2.0000000 0.0000000 125.00000 Basic 4                    G 173.33333 4.0000000 2.6666667 280.00000 Bran Chex                  R 134.32836 2.9850746 1.4925373 298.50746 Bran Flakes                P 134.32836 4.4776119 0.0000000 313.43284 Cap'n'Crunch               Q 160.00000 1.3333333 2.6666667 293.33333 ... etc.
  
  But with a large data set, it will be too big to be displayed like this. Instead you want to look at a portion through indexing.
  Datasets are thought of like matrices, so you can pick off pieces of the dataset by specifying the row or column entry of part of the data by typing data[row,columns]. So UScereal[3,7] would list the entry in the 3rd row and 7th column. UScereal[ ,4] gives the entire 4th column, and so on. The following are examples of pulling out parts of the data.
  
  Exercise 9 Try the examples below and understand what they do
  
  1. > UScereal[2,2:5]                   2. > UScereal[c(1,5,6) , ]         3. > UScereal$mfr         4. > UScereal$calories[1:5]         5. > UScereal[c("All-Bran","Bran Chex"),]
  
  You can save a portion of your data, say to experiment on or to reduce the number of variables, by assigning it to a variable name
  
  > uscer<-UScereal[1:10,1:3 ] > uscer                           mfr calories   protein 100% Bran                   N 212.1212 12.121212 All-Bran                    K 212.1212 12.121212 All-Bran with Extra Fiber   K 100.0000 8.000000 Apple Cinnamon Cheerios     G 146.6667 2.666667 Apple Jacks                 K 110.0000 2.000000 Basic 4                     G 173.3333 4.000000 Bran Chex                   R 134.3284 2.985075 Bran Flakes                 P 134.3284 4.477612 Cap'n'Crunch                Q 160.0000 1.333333 Cheerios                    G 88.0000 4.800000
  
  As you may have noticed in the exercise, if the columns of your data set have names (like "mfr"), instead of typing UScereal[ ,1], you can type UScereal$mfr or UScereal[,"mfr"].
  
  Exercise 10
  1. What does names(UScereal) tell you? What about colnames(UScereal)
  2. Make another copy of UScereal to experiment on:
  
  > uscer.temp<-UScereal
  
  What happens to uscer.temp when you do
  
  > names(uscer.temp)<-c("Manf.","Cal.","Prot.","Fat","Na","Fiber", "Carbs", "Sugars","ShelfPos.","K","Vit.")
  
  OR
  
  > names(uscer.temp)[1]<-"Manufacturer"
  
  3. How do you find the ratio of fat to protien for each person? (i.e. fat/protein for each entry)
  4. What do the functions dim and length tell you about the different objects?
  5. What is the difference in the following? (hint: try is.vector and as.vector
  
  > mat[1,] > mat[1,,drop=F]
  
  Here is a link to a summary of indexing in R. Note that lists elements are indexed by xx$name if there are names. This is the common output from functions, such as lm (the regression function) to store many different kinds of output for the user.
- Attaching a dataset
  If you are going to be frequently using a dataset with many variables/columns, instead of constantly typing UScereal$proteins, and so forth, you can "attach" the data set. This means the names at the top of the columns will be variable names that you can use directly (but they won't show up when you type ls())
  
  > mfr > attach(UScereal) > mfr
  
  However, if you exit and come back to R later, it will no longer remember that your data is attached. So if, for example, you come back later to R and use an old command that used mfr, you will get an error. You would have to reattach the data for your old commands from this lab to work. From this point on, we will assume UScereal is "attached".
- Logical indexing
  You can evaluate the truth of statements element-wise in R using traditional logical commands
  
  > Numb==2 [1]FALSE FALSE TRUE > fac=="C" [1]TRUE FALSE TRUE TRUE FALSE FALSE
  
  You can use these T/F values to index a vector or matrix
  
  > Numb[Numb>1] [1]4 2 > UScereal[mfr=="G" | mfr=="K", 1:3] mfr calories protein All-Bran K 212.12121 12.121212 All-Bran with Extra Fiber K 100.00000 8.000000 Apple Cinnamon Cheerios G 146.66667 2.666667 Apple Jacks K 110.00000 2.000000 Basic 4 G 173.33333 4.000000 ....
  
  You can also find the indices directly using which
  
  > which(shelf==2 & sugars>5) [1]5 9 11 13 16 17 22 24 27 29 33 38 39 42 49 56 62
  
  Exercise 11
  1. How many cereals with manufacturer "G" have more than 50 calories?
  2. What does the following command %in% do?
  
  > c("G","F")%in%mfr > mfr%in%c("N","K")
- Factor Indexing
  Another related indexing is by factor variables, so that you can have the following
  
  > vit.colors<-c("red","green","purple") > vit.colors[vitamins] [1]"green" "green" "green" "green" "green" "green" "green" [8]"green" "green" "green" "green" "green" "green" "green" [15]"green" "green" "green" "green" "green" "green" "green" [22]"green" "green" "green" "green" "green" "green" "green" [29]"green" "green" "green" "green" "green" "green" "green" [36]"red" "green" "green" "green" "green" "green" "green" [43]"green" "green" "green" "red" "purple" "green" "green" [50]"green" "green" "green" "green" "purple" "purple" "green" [57]"green" "red" "red" "red" "green" "green" "green" [64]"green" "green" ...
Functions
- How functions work
  Many things that you will be doing in R will be calling an already created function. Simple examples are mean( ), scan( ), plot( ), sd( ), median( ), sort( ). ( ) means the information (usually data) that you need to feed to the function.
  NOTE: sd( ) is the standard deviation, dividing by n-1
  Example: finding the mean/average of the column "protein" in the dataset UScereal
  
  > mean(UScereal$protein) [1]3.683705
  
  Exercise 12
  1. Now find the mean, standard deviation, and median of the data column "sugars".
  2. What happens if you type the following
  
  > mean(UScereal) > summary(UScereal)
  
  3. What are the five smallest values of potassium? (You should not have to search for them manually)
- Learning about functions
  Functions may have many different options you can set when you call the function. You can out about a function by typing help(FunctionName).
  
  > help(boxplot) > help.search("linear model") > help.start
  
  If you just want to remember what the possible options are you can use args:
  
  > args(sort) function (x, partial = NULL, na.last = NA, decreasing = FALSE, method = c("shell", "quick"), index.return = FALSE) NULL
  
  You should save the outputs from your functions as a new variable so you can access them again
  
  > pmean<-mean(UScereal$potassium) > pmean+8 [1]167.1197
Useful summary functions you should know (try them out on UScereal if you're not sure):

mean sd median summary cor

range max min which.max which.min

length dim sort unique rowMeans

colMeans rowSums colSums cumsum

prod round zapsmall
Bringing in data
- Reading tab/comma/character deliminated text
  The function read.table reads in text where each row of the data is on a separate line and the columns of the data are separated by a fixed character. The default is ANY white space. Generally files with will be tab deliminated (" $\backslash$ t") or comma deliminated (","), and you can specify this specifically. You must give a file name or a URL
  
  > read.table("http://www.stanford.edu/~epurdom/state.txt", sep="\t",header=T,row.names=1)
  
  The resulting object is a data.frame and non-numeric values are made into factor variables. If "header"=T, the first row is taken to contain the names of the columns, not data.
- Exporting data
  You can write data to files using write.table
  
  > write.table(UScereal,"mydir/cereal.txt",sep="")

Plots

If you are working on the Leland prompt, it's important now that you have already set the environment display, Otherwise nothing will happen when you try to plot.

Basic Plots
- plot(xdata,ydata) - the standard x-axis vs. y-axis plot
  
  > plot(potassium, protein)
- hist(data)
- boxplot(data)
  
  Exercise 13
  
  1. Try each of the plots with the UScereal data.
  2. What is happening with the following commands and why?
  
  > plot(mfr,potassium) > plot(as.numeric(mfr),potassium)
Adding to Plots
The following functions add to an existing plot:

lines points curve rect

segments abline

> hist(sodium) > abline(v=mean(sodium),lty=3)

You must have already set up a plotting command with a function such as plot or hist to use these commands. You can set up the coordinates/axes/range without actually plotting anything using the option type="n"

> plot(sodium,potassium,type="n") > points(sodium[mfr=="G"],potassium[mfr=="G"],col="red") > abline(h=c(max(potassium)-1,min(potassium)+1),lty=c(2,3))

Exercise 14 What would have happened above if instead you had not done the plot command with type="n", and instead done

> plot(sodium[mfr=="G"],potassium[mfr=="G"],col="red") > abline(h=c(max(potassium)-1,min(potassium)+1),lty=c(2,3))
Saving Plots
- Using the GUI
  When you are looking at a graph, you can save your existing plot by going to "File-Save As" (you can also use the command savePlot to save the existing plot). In this same way you can also copy the plot to a metafile/bitmap and paste the graph into another program, like Word. For larger projects, though, it's generally better to save plots using the written commands below to control the final format and have a record of the name of plots.
  Also check out the "Recording" option under "History" menu (or recordPlot(),replayPlot() If you turn this option on, R will remember the plots made on that screen and you can use the "Page Up" and "Page Down" commands to scroll between your plots.
- Adobe Acrobat (.pdf) format
  While R saves the variables you name, in order to save your plot to print later, you need to save it separately. The easiest is to save the plot into .PDF format (i.e. Adobe Acrobat format). The following saves the x-y plot into a file "protein.pdf" in the directory you started R in.
  
  > pdf("mydir/protein.pdf") > plot(potassium, protein) > dev.off()
  
  NOTE: if you don't do dev.off() then any further plots you make will overwrite the plot you are trying to save.
- Postscript (.ps)
  Similarly to save in postscript format (in portrait this time, so I say horizontal=F)
  
  > postscript("mydir/sugars.ps", horizontal=F) > hist(sugars) > dev.off()
Exercise 15 Create a jpeg file of a plot using jpeg(). This is particularly useful if you have large numbers of points - pdf stores every point which takes up a lot of time and resources for opening/printing a graph of 10,000 points, while jpeg is just an picture of it.
Manipulating plots
- Multiple plot windows
  When you type in a graphing command, a plotting window comes up automatically. Sometimes you would like to have multiple plotting windows for different graphs.
  The command win.graph() (Windows) or x11() (Unix/Windows/Mac?) brings up another graphing window. To pick one, use the numbers at the top of the window as the argument for dev.set().
  
  > x11() > boxplot(sugars) > dev.set(2) > plot(potassium, protein)
- Prettying your graph
  (see help(par) for more details) - you will find sometimes these are quite easy to implement, but other times some of the settings don't want to work with the plotting function you are using. It takes a good bit of experimenting.
  Some commands you call independently, through the function par () and affect all graphs
  - par(mfrow=c(x,y))
    creates a grid of plots with x rows and y columns.
    Most are options that you put in the plot command just for a particular plot
  - xlab, ylab
    x-axis or y-axis labels
  - lty
    Line type (0=blank, 1=solid, 2=dashed, 3=dotted, 4=dotdash, 5=longdash, 6=twodash) or as one of the character strings `"blank"', `"solid"', `"dashed"', `"dotted"', `"dotdash"', `"longdash"', or `"twodash"', where `"blank"' uses `invisible lines' (i.e., doesn't draw them).
  - pch
    Style of points in graph
  - `bg' ,'col.lab' , `col.main' ,'col.sub'
    The color for the background, labels, main title, and subtitle respectively. Usually use values like "red" or "tan" to pick color. Type colors() to see all options.
  - las
    Style of axis labels. (0=parallel, 1=all horizontal, 2=all perpendicular to axis, 3=all vertical)
  - font
    1=plain, 2=bold, 3=italic, 4=bold italic
    You can also set some of these things after you have already made your main plot.
  - title(), axis()
Example:

> par(mfrow=c(1,2)) > plot(carbo,fibre,pch=3, las=1,main="Fiber versus Carbohydrates", sub="A cool subtitle is useful") > hist(calories, xlab="Calories", main="",sub="",col.lab="blue") > title(main="A Histogram of my Making", sub="A Fantastically Different Histogram from the Default")

Exercise 16
1. What happens if you add a third plot while you have
par(mfrow=c(1,2))? Return the plotting screen to 1x1.
2. Plot a histogram of the proteins and give it a title, subtitle, label for the x-axis and y-axis. Make the colors different for the axis and the titles.
3. Take the command below and change it to make the points different shapes and colors. Harder: make the points or color different for the different manufacturers

> plot(potassium, protein)

4. Try the following command using matplot() and figure out what it does:

> matplot(sodium,UScereal[,c("fat","potassium","calories")], pch=1:3,col=1:3) > legend("topleft",legend=c("fat","potassium","calories"), pch=1:3,col=1:3)

How would you make the x-axis values equally spaced, rather than dependent on the values of sodium?

Writing Functions

Basic Control Functions
You define a function in R using the command function. The following function returns the mean, standard deviation, and upper and lower 95% confidence interval limits in the form of a list.

> mysum<-function(x, conf.inv=T){     m<-mean(x)     if(conf.inv==T){         n<-length(x)         uppconf<-mean(x)+2*sd(x)/sqrt(n)         lowconf<-mean(x)-2*sd(x)/sqrt(n)         return(list(mean=m,sd=sd(x),uppconf=uppconf,lowconf=lowconf))     }     else return(list(mean=m,sd=sd(x))) } > mysum(potassium) $mean [1]159.1197
$sd [1]180.2886
$uppconf [1]203.8438
$lowconf [1]114.3957

Basic programming functions are,

if else while

break next for

stop and warning are functions that allow user to check that certain conditions are satisfied. You can comment your code using the # symbol
Note that for loops are generally slow in R, and using apply or sapply is preferable if the function is not actually recursive. For example, the following code that finds the upper confidence interval for each

> my.ind<-c(2,4,8) > x<-vector(length=length(my.ind)) > n<-nrow(UScereal) > for(i in 1:length(my.ind) ){ x[i]<-mean(UScereal[,my.ind[i]])+2*sd(UScereal[,my.ind[i]])/sqrt(n) } > x [1]164.890738   1.831168 11.498387

could be written as

> x<-apply(UScereal[,my.ind],2,function(y){mean(y)+2*sd(y)/sqrt(length(y))}) > x   calories        fat     sugars 164.890738   1.831168 11.498387

If the function is already defined, then apply is even easier:

> apply(UScereal[,my.ind],1,mean)

finds the row means.

Exercise 17 1. In if statements, you should use any, all for robust programing. Explain why here:

> if(Numb>2) print(fac) else print(Numb); > if(any(Numb>2)) print(fac) else print(Numb); > if(all(Numb>2)) print(fac) else print(Numb);

2. What's the problem with the following code? (This is a very annoying feature of R to watch for in programming... )

> apply(Numb,1,sum)

How would you fix this? (there are 2 obvious ways, depending on the circumstances - one of which uses sapply or lapply)

Some useful logical functions for programming and other things can be found here.
Finding errors in Your Program
R does not have great debugging mechanisms and the error messages are ... cryptic. Here are a couple of things that can be helpful
- Source your function
  Save your function by itself in a text/.R file. Then when you want to load it into R, use the source command. This reads the file and executes the file. For a file with just a function, this will load your function or changes, and most importantly, will tell you the line number of a syntax error.
- Traceback
  If you are calling functions within functions, as we did in calling mean and sd, traceback() tells you what function had the error
  
  > myerror<-function(){sum(fac)} > myerror() Error in Summary.factor(..., na.rm = na.rm) : sum not meaningful for factors > traceback() 4: stop(.Generic, " not meaningful for factors") 3: Summary.factor(..., na.rm = na.rm) 2: sum(fac) 1: myerror()
- Debugging
  There are several functions that are suppose to help debug your function. I find the most useful is debug. This allows you go along with the function and figure out what the problem is. My function is suppose to both subtract the mean of each column and each row(there's a function that centers matrices, by the way, sweep or scale)
  
  > myCentered<-function(x){ rsum<-apply(x,1,mean) rcentered<-x-rsum csum<-apply(x,2,mean) ccentered<-x-csum return(list(row=rcentered,col=ccentered)) } > myCentered(mat) $row [,1][,2] [1,]-0.5 0.5 [2,]1.0 -1.0 [3,]-1.0 1.0 $col [,1][,2] [1,]-1.000000 -0.3333333 [2,]1.666667 0.0000000 [3,]-1.000000 0.6666667
  
  There's no error, just not what I was wanting - the row centering worked, but the column centering didn't. I can use debug go into the function and try it as it is working. Namely, R pauses before each command and waits to execute it until you ask it to. To get R to execute the next line, hit "return" or type "n". Otherwise, you can just type in what you want within the operation of the function using the objects within the function. This is very helpful with large functions. Try to follow along with the code below to get the idea.
  
  > debug(myCentered) > myCentered(mat) > myCentered(mat) debugging in: myCentered(mat) debug: { rsum <- apply(x, 1, mean) rcentered <- x - rsum csum <- apply(x, 2, mean) ccentered <- x - csum return(list(row = rcentered, col = ccentered)) } Browse[1]> n debug: rsum <- apply(x, 1, mean) Browse[1]> n debug: rcentered <- x - rsum Browse[1]> n debug: csum <- apply(x, 2, mean) Browse[1]> csum Error: Object "csum" not found Browse[1]> n debug: ccentered <- x - csum Browse[1]> csum [1]2.000000 2.333333 Browse[1]> x [,1][,2] [1,]1 2 [2,]4 2 [3,]1 3 Browse[1]> x-csum # So I see this subtracts off across rows... [,1][,2] [1,]-1.000000 -0.3333333 [2,]1.666667 0.0000000 [3,]-1.000000 0.6666667 Browse[1]> t(x)-csum # the right answer, but transposed... [,1][,2] [,3] [1,]-1.0000000 2.0000000 -1.0000000 [2,]-0.3333333 -0.3333333 0.6666667 Browse[1]> t(t(x)-csum) #there we go [,1][,2] [1,]-1 -0.3333333 [2,]2 -0.3333333 [3,]-1 0.6666667 Browse[1]> Q > undebug(myCentered)#Turns off the debugging
Exercise 18 Write a function that for a given matrix, will plot the standard 95% confidence intervals for each column. Give options for line thickness and color. For example,

> myplot.CI(UScereal[,c(3,4,6)],col=c("red","green","blue"),lwd=1:3)

Should give something like this
A couple of hints for a good plot function:
1. Use "..." for a bunch of commands that you don't want to specify but you want passed on to another function, like plot
2. Use invisible for a function to return a value only if it is assigned to a new variable; good for returning the coordinates, etc. only if asked
3. Put in default values for non-essential options

About this document ...

Next: About this document ...

Elizabeth Anne Purdom 2006-01-17

`c`	`seq`	`rep`	`names`
`matrix`	`rbind`	`cbind`	`rownames`
`colnames`	`factor`	`levels`	`data.frame`
`as.xx`	`is.xx`

`mean`	`sd`	`median`	`summary`	`cor`
`range`	`max`	`min`	`which.max`	`which.min`
`length`	`dim`	`sort`	`unique`	`rowMeans`
`colMeans`	`rowSums`	`colSums`	`cumsum`
`prod`	`round`	`zapsmall`

lines	points	curve	rect
segments	abline

`if`	`else`	`while`
`break`	`next`	`for`