Chapter 4 Introducing R

4.1 Why we are incorporating R programming into our introductory course.

These days, computing is becoming an increasingly vital part of statistics in this era of ubiquitous data availability and the increased relevance of data analysis in all fields, done by non-statisticians and people with statistical training alike. In our opinion, using a sophisticated statistical programming language like R can really empower our students as they learn the basics of statistics. We will use it as a tool for understanding statistical concepts.

We will be using R in this course to do the following:

  • Exploratory Data Analysis and Visualization
  • Graphics
  • Probability calculations and simulations
  • Computing probabilities for known distributions
  • Study properties of statistics obtained from data
  • Hypothesis testing
  • Linear regression

4.2 Getting Started with RStudio

In order to use R, you need to first download R (this runs in the background and allows you to code in R).

Then you need to download RStudio. RStudio is GUI (graphical user interface) that allows you to save objects, create different plots and keep track of different bits of code (more to come on this later in the course).

First, download R and R Studio:

4.2.1 RStudio Layout

RStudio Layout

Figure 4.1: RStudio Layout

Top bar (1) This bar has useful tools for document management. The icon all the way on the left is for making new files. For now, just use the top option - R Script.

The next icons are for opening recent documents and saving current documents.

Top left pane (2)
This pane contains any open scripts you have. Here you can write and save code.

Using scripts ensures you don’t lose your code if you need to shut down R.

You can run code directly from a script by using command + return (on a Mac) or using ctrl + r (on a PC).

Bottom left pane (3)
You run any code you’ve written here, in the console.

You can either write code directly in the console or run code from a script.

The console will also show the output of a line of code. Try it out by typing 5 + 7 in the console now.

Top right pane (4)
This pane shows the defined objects currently in the global environment. You add to this by saving new objects.

You can remove objects in two ways:

  • rm(list = ls()) removes all objects
  • rm(<object_name>) removes a specific object

Bottom right pane (5)
Here you can see information about file import/export, plot view, package installations, and help

We will begin by using R as a calculator. Then we will see how to call various built in functions, and how to load data (.csv and .rda files), and code (.R scripts). Continuing on to

4.3 Using R as a (super) calculator

We can perform arithmetic computations. For example, if you want to add two numbers, say 6 and 3, you type 6+3 at the prompt which looks like >. Then press the Return/Enter key and you should see the result. Then you will see the prompt again, waiting to execute your next command.

Try it out. Think of what the code should be, and then check your answer.

4.3.1 Calculator operations

4.3.1.1 Arithmetic

Add 6 and 3 and then square the result.

4.3.1.2 Code

(6+3)^2
## [1] 81

4.3.2 Order of operations

4.3.2.1 Arithmetic

What would the result of 10^5-6/3 be?

1000, or 99998?

4.3.2.2 Code

10^5-6/3
## [1] 99998

You have to be careful of the order in which you perform

4.4 Programming in R

4.4.1 Objects

R is an object oriented language, which just means that you save numbers, series of numbers or functions as objects and refer to them later. Assign objects using either the = or <- operators. Both work the same way.

Example:

## here we make an object 'x' that has a value of 7:
x = 7 

## then we can reference our object 'x' later: 
x + 5
## [1] 12

All objects appear in the upper righthand corner in the Environment screen.

4.4.2 Functions & Arguments

While you can make your own functions, R already has many functions built in. You can look for functions on your own, but in this class, we’ll typically tell you which functions to use.

Functions have a set purpose and take any number of arguments.

The general pattern for functions is: function(argument1, argument2, etc...)

Ex: the mean function takes the average of a list of numbers. The one argument required is a list of numbers.

# we'll make an object y that is a list of numbers:
y = c(1,2,3,4,5,6)

# then we'll take the mean of that list
mean(y)
## [1] 3.5
# we could also just put the list directly into the mean function:
mean(c(1,2,3,4,5,6))
## [1] 3.5

4.4.3 Making notes in your code

Using a pound sign # tells R not to run or evaluate that line of code. This is really helpful for making notes to yourself or others in your code. Comments should tell you what the purpose or goal of the code is, not just what the code literally does in your own words.

4.4.4 The prompt in the console

In the console, the prompt > looks like a greater than symbol. If your prompt begins to look like a +symbol by mistake, simply click in your console and press the esc key on your keyboard to return to >.

R uses + when code is broken up across multiple lines and R is still expecting more code. A line of code does not usually stop until R finds an appropriate stop parameter, often a closed parenthesis ), closed bracket], etc.

4.4.5 Saving your script

The name of your script file is in the tab at the top of your script window (the top left pane) - the name defaults to Untitled1. If the name is red and followed by an asterisk * it means your script is not saved. Save your script by clicking “File” then “Save”, or command + s (Mac) or Ctrl + s (PC).

4.4.6 Adding packages to R

R comes with a large number of functions already pre-loaded, but sometimes you’ll need to add functions that aren’t already in R.

First, you need to use the install.packages command to add the package you need. Note: you need to wrap the package name in quotes - see below

Ex:

Then, once you’ve installed it, you need to add it to the list of packages you’re actively using. Do this using the library command.

Ex:

library(dplyr)

4.5 Getting help in R

There are tons of resources for getting help in R!

Get help directly in R
- You can get help directly in R by typing ?<function> in the console. The help screen then shows you information about the function & arguments you’re using. - Try it now! Type ?mean and look at the help text in the bottom right section of RStudio.

Stack overflow - if you get an error message you don’t understand and can’t figure out, just Google it. Usually this brings up a number of discussions on Stack overflow where the root of the issue is explained.

4.5.1 Extra Resources:

These are good resources to get comfortable with R but both cover things that might be outside the scope of the course, so keep that in mind when using them.

  • Data Camp: this has online classes that give you additional practice using R

  • Swirl Stats: You can install this and learn the basics of R directly in your RStudio.