Packages (R's killer app)

Note to BB: remember to start recording.

Let's check out the packages on CRAN. In particular check out the CRAN Task Views.

Essentially any well-established and many not-so-established statistical methods and other functionality is available in a package.

If you want to sound like an R expert, make sure to call them packages and not libraries. A library is the location in the directory structure where the packages are installed/stored.

Using packages

Two steps:

  1. Install the package on your machine
  2. Load the package

To install a package, in RStudio, just do Packages->Install Packages.

From the command line, you generally will just do

install.packages("fields")

If you're on a network and are not the administrator of the machine, you may need to explicitly tell R to install it in a directory you are able to write in:

install.packages("fields", lib = "~/R")

Note that packages often are dependent on other packages so these dependencies may be installed and loaded automatically. E.g., fields depends on maps and on spam.

You can also install directly from a package zip/tarball rather than from CRAN by giving a filename instead of a package name.

General information about a package

You can use syntax as follows to get a list of the objects in a package and a brief description: library(help = packageName).

On CRAN there often vignettes that are an overview and describe usage of a package if you click on a specific package. The reference manual is just a single document with the help files for all of the objects/functions in a package, so may be helpful but often it's hard to get the big picture view from that.

The search path

To see the packages that are loaded and the order in which packages are searched for functions/objects: search().

To see what libraries (i.e., directory locations) R is retrieving packages from: .libPaths().

And to see where R is getting specific packages, searchpaths().

Package namespaces

Namespaces are way to keep all the names for objects in a package together in a coherent way and allow R to look for objects in a principled way.

A few useful things to know:

ls("package:stats")[1:20]
##  [1] "acf"                  "acf2AR"               "add1"                
##  [4] "addmargins"           "add.scope"            "aggregate"           
##  [7] "aggregate.data.frame" "aggregate.default"    "aggregate.ts"        
## [10] "AIC"                  "alias"                "anova"               
## [13] "anova.glm"            "anova.glmlist"        "anova.lm"            
## [16] "anova.lmlist"         "anova.mlm"            "ansari.test"         
## [19] "aov"                  "approx"
lm <- function(i) {
    print(i)
}
lm(7)
## [1] 7
x <- rnorm(10)
y <- rnorm(10)
lm(y ~ x)
## y ~ x
## <environment: 0x2946c90>
stats::lm(y ~ x)
## 
## Call:
## stats::lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      -0.685        1.105

Can you explain what is going on? Consider the results of search().

Looking inside a package

Packages are available as “Package source”, namely the raw code and help files, and “binaries”, where stuff is packaged up for R to use efficiently.

To look at the raw R code (and possibly C/C++/Fortran code included in some packages), download and unzip the package source tarball. From the command line of a Linux/Mac terminal:

$ curl http://www.cran.r-project.org/src/contrib/fields_6.8.tar.gz -o fields_6.8.tar.gz

$ tar -xvzf fields_6.8.tar.gz

$ cd fields

$ ls R

$ ls src

$ ls man

$ ls data

Creating your own R package

R is do-it-yourself - you can write your own package. At its most basic this is just some R scripts that are packaged together in a convenient format. And if giving it to someone ele, it's best to have some documentation in the form of function help files.

Why make a package?

See the devtools package and package.skeleton() for some useful tools to help you create a package. And there are lots of tips/tutorials online, including material from a workshop I held in May 2013.

Getting help online

There are several mailing lists that have lots of useful postings. In general if you have an error, others have already posted about it.

If you are searching you often want to search for a specific error message. Remember to use double quotes around your error message so it is not broken into individual words by the search engine.

Posting your own questions

The main rule of thumb is to do your homework first to make sure the answer is not already available on the mailing list or in other documentation. Some of the folks who respond to mailing list questions are not the friendliest so it helps to have a thick skin, even if you have done your homework. On the plus side, they are very knowledgeable and include the world's foremost R experts/developers.

Here are some guidelines when posting to one of the R mailing lists http://www.r-project.org/posting-guide.html

sessionInfo() is a function that will give information about your R version, OS, etc., that you can include in your posting.

You also want to include a short, focused, reproducible example of your problem that others can run.

Breakout

Nothing formal here, but if you're having trouble installing packages, grab one of us.