3 Data Types and Vectors

In R, vectors are the primitive objects. A vector is simply an ordered collection of values grouped together into a single container. Some primitive types of vectors are numeric, logical, and character. A very important characteristic of these vectors is that they can only store values of the same type.

3.1 Data Types

A vector contains values that are homogeneous primitive elements. That is, a numeric vector contains only real numbers, a logical vector stores values that are either TRUE or FALSE, and character vectors store strings.

3.1.1 Example: Vectors of Measurements on a Family

We have created some artificial data on a 14-member family to help us explore many of the concepts in this chapter. These data are available in an RDA file for you to load into your R session and follow along by typing in the commands shown. We begin by loading the RDA file with

load("family.rda")

The names of the family members are in the vector called fnames, and we can see its contents with

fnames
##  [1] "Tom"    "Maya"   "Joe"    "Robert" "Sue"    "Liz"    "Jon"   
##  [8] "Sally"  "Tim"    "Tom"    "Ann"    "Dan"    "Art"    "Zoe"

The numbers [1] and [8] are there to help us keep track of the order of the elements in the vector. We see that "Tom" is 1st, "Maya" 2nd, …, "Sally" 8th, …, and Zoe 14th. We can confirm that fnames is indeed a character vector by calling the class() function and passing the vector fnames as input, i.e.,

class(fnames)
## [1] "character"

There are several other vectors with information on the family, including: fweight, weight in pounds; fbmi, body mass index (BMI); foverWt, whether or not BMI is above 25; and fsex, which is f for female and m for male. These vectors provide examples of several data types. The variable, fbmi is a numeric vector,

fbmi
##             a             b             c             d             e 
## 25.1623879871 21.3290554422 24.4588421417 24.4841413962 18.5556551026 
##             f             g             h             i             j 
## 28.9498062481 28.1879692416 20.6778251104 26.6642952286 30.0491124147 
##             k             l             m             n 
## 26.0536376396 22.6438385926 24.2612556350 22.9106030060

and foverWt is a logical vector,

foverWt
##  [1]  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
## [12] FALSE FALSE FALSE

Another type of number in R is the integer. The integer vector is similar to a numeric vector, except that the values must all be integers. The variable fage, which contains the person’s age in years, is an example,

class(fage)
## [1] "integer"

3.1.2 Factor

The factor is a somewhat special data type for use with qualitative measurements. The values are internally stored as integers, but each integer corresponds to a level, which is held as a character string. The values of fsex, which is a factor vector, are:

fsex
##  [1] m f m m f f m f m m f m m f
## Levels: f m

Notice that the values are not printed with quotation marks, as with character values. We confirm the data type of fsex() with a call to class(),

class(fsex)
## [1] "factor"

Also, the levels() function provides the levels or labels associated with the vector.

3.1.3 Special Values

R provides a few special values, including NULL, NA, NaN, and Inf. These stand for null or empty, not available, not a number, and infinity, respectively. NULL denotes an empty vector. The value NA can be an element of a vector of any type. It is different from the character string "NA". We can check for the presence of NA values in a vector with the function is.na() and for an empty vector with is.null().

The special values not a number and infinity come about from computations. In particular, they can occur when we divide by 0. Here are three examples that return infinity, negative infinity, and not a number, respectively:

12 / 0
-100 / 0
0 / 0

3.2 Finding Information on Vectors

R has many utility functions that provide information about vectors (and other objects that we will soon learn about). We mentioned already the two functions is.na() and is.null(), which provide us with information about the presence of NAs in a vector or whether or not a vector is empty, respectively. We may also be interested in the vector’s type. We can determine this with the class() function, or we can ask specifically if a vector is of a certain type with functions, such as is.factor(), is.logical(), etc. We can find the number of elements in a vector with length(); the first or last few values with head() and tail(), respectively; and the names of elements with names().

3.2.0.1 Example: Finding Details on the Family

Let’s use some of these functions to find out more about the family. The length of fnames, which contains the names of the family members, is

length(fnames)
## [1] 14

We can confirm that all of the vectors with family information have length 14 with further calls to length(), e.g., length(fbmi).

We may also want summary information about, say, the weights of the family members. We may want to know the smallest and largest values, the average, or the median weight. We can find these with, respectively, min(), max(), mean(), and median(). For example the smallest weight in the family is

min(fweight)
## [1] 98

Some of these functions return a single value, such as the minimum weight just calculated. Others return a value for each element of the input vector. One example is the names() function, e.g.,

names(fheight)
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"

The names() function returns a character vector the same length as the input vector and each element of the return value is the name of the corresponding element of the input vector. For example, "c" is the name of the third element in fheight. Not all vectors have named elements, and when this is the case, the return value is an empty vector, e.g.,

names(fweight)
## NULL

Often it can be helpful to examine the first few or last few values in a vector to confirm that the data are what we expect. The head() and tail() functions, respectively, return these initial or final values. By default we are given the first (last) 6 elements, i.e., the return value is a vector of length 6, e.g,.

head(fweight)
## [1] 175 124 185 156  98 190

However, we can request more or fewer elements by specifying a value for the n argument, e.g., to view the last two elements in fweight we use tail(fweight, n = 2) and the return value is 150 125.

In addition to finding the type of a vector with the class() function, e.g.,

class(fsex)
## [1] "factor"

we can ask whether or not a vector is a particular type. For example,

is.factor(fsex)
## [1] TRUE

Similarly, is.integer(fbmi) returns FALSE, but is.numeric(fbmi) returns TRUE. The fage variable is an integer vector, what do you think the call is.numeric(fage) returns? Try it and see. Since an integer is a special case of a number, R returns TRUE in this case.

3.2.1 Summary of Data Types

Primitive Types

In R, vectors are the basic or primitive objects. A vector is an ordered container of homogeneous values. In other words, a vector contains values that are the same type and these values have an ordering. Statisticians analyze measurements of some quantity on a group, and the vector is a convenient structure for this purpose.

The primitive types include:

  • numeric – real numbers,
  • logicalTRUE and FALSE only,
  • character – strings.

Factor Type

Qualitative measurements are an important kind of data, e.g., sex, marital status, and education level, and R has a factor data type for this purpose, i.e.,

  • factor – values are stored as integers, and each integer corresponds to a level, which is a string. The levels in a factor can be ordered, e.g., education level.

Typically we want to perform different types of calculations with factor data, e.g., for a summary, we want tallies of the number of observations at each level, not a mean or median. Many R functions, e.g., summary(), perform different operations on a vector depending on whether it is numeric or factor.

3.3 Vectorized Operations

The philosophy in R is that operations work on an entire vector. This makes sense given that vectors are the basic data types. A simple example is with subtraction. We can subtract one vector from another element-wise using the - operator. For example, if we want to know the difference between the actual weight and desired weight for our family members, then we can simply subtract fdesiredWt from fweight. That is,

fweight - fdesiredWt
##  [1]   0  10  10   6 -12  40  25   6  10  20  16   4  10   0

Here, the 1st element of fdesiredWt is subtracted from the 1st element of fweight to get 0, the 2nd element of fdesiredWt is subtracted from the 2nd element of fweight to get 10, and so on.

The notion of vectorized operations is very powerful and convenient. It allows us to express computations at a high-level, indicating what we mean rather than hiding it in a loop.

3.3.0.1 Example: Computing BMI

Although we are provided with the BMI of each family member in the fbmi variable, we can compute this quantity ourselves from the height and weight of each individual. The formula for BMI is \[\frac {weight~in~kg}{(height~in~m)^2}.\] If we use this formula, we need to convert our measurements from pounds into kilograms and from inches into meters. Since there are 2.2 pounds to a kilogram, we can change the units for the values in fweight with

fweight / 2.2
##  [1] 79.5454545455 56.3636363636 84.0909090909 70.9090909091 44.5454545455
##  [6] 86.3636363636 84.0909090909 56.3636363636 79.5454545455 97.7272727273
## [11] 75.4545454545 63.6363636364 68.1818181818 56.8181818182

Similarly, we can convert inches to meters with

fheight * 0.0254
##      a      b      c      d      e      f      g      h      i      j 
## 1.7780 1.6256 1.8542 1.7018 1.5494 1.7272 1.7272 1.6510 1.7272 1.8034 
##      k      l      m      n 
## 1.7018 1.6764 1.6764 1.5748

We see that this return vector appears somewhat differently than the return from dividing fweight by 2.2. This is because the elements in fheight are named so the return value has these names too.

We can combine these calculations into a single calculation of BMI with

(fweight / 2.2) / (fheight * 0.0254)^2
##             a             b             c             d             e 
## 25.1623879871 21.3290554422 24.4588421417 24.4841413962 18.5556551026 
##             f             g             h             i             j 
## 28.9498062481 28.1879692416 20.6778251104 26.6642952286 30.0491124147 
##             k             l             m             n 
## 26.0536376396 22.6438385926 24.2612556350 22.9106030060

Again, the names of the elements in fheight have been carried over to the return value. Also, we can examine fweight and fheight to check that, e.g., the 3rd weight in fweight and the 3rd height in fheight are properly combined to create the 3rd BMI in the return value.

For greater clarity, we can create intermediate variables for the converted height and weight with

WtKg = fweight / 2.2
HtM = fheight * 0.0254
bmi = WtKg / HtM^2

We have assigned our computation of BMI to the variable bmi.

3.3.1 Aggregator Functions

Some vectorized functions work on all of the elements of a vector, but return only a single value for the result. We have seen examples of these already, others include mean(), min(), max(), sum(), prod(), and median(). For example, the average BMI for the family is mean(bmi), which evaluates to about 24.61.

3.3.2 Relational Operations

In addition to the arithmetic operators, such as + and , R* also has relational operators for comparing values. These operators are greater than, less than, greater than or equal to, less than or equal to, not equal to, and equal to, i.e., >, <, >=, <=, !=, ==, respectively.
The relational operators are vectorized, meaning that if you give them a vector of length \(n\), they operate on all \(n\) elements.

3.3.2.1 Example: Compute Who is Over Weight

We can use relational operators to determine which of the family members are over weight. Our definition of over weight is a BMI that exceeds 25. Again, although the variable foverWt already contains this information, we can compute it ourselves by comparing bmi (fbmi) to 25. We do this with

overWt = bmi > 25
overWt
##     a     b     c     d     e     f     g     h     i     j     k     l 
##  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE 
##     m     n 
## FALSE FALSE

We can compare our vector overWt to the one supplied for the family (i.e., foverWt) to see if they match. To do this, we can use another relational operator, namely the “equal to”" operator (==).

overWt == foverWt
##    a    b    c    d    e    f    g    h    i    j    k    l    m    n 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Note that this computation checks that the 1st element of overWt equals the 1st element of foverWt, the 2nd element of overWt equals the 2nd element of foverWt, and so on. All of these values are TRUE so we have consistent results, i.e., the element of these two vectors have the same values. Rather than examine each of the 14 values to see if they are all TRUE, we can pass the return vector from our comparison to the all() function.

all(overWt == foverWt)
## [1] TRUE

The all() function returns TRUE if all of the elements in the input vector are TRUE.

Another helpful function for determining if two objects are the same is the identical() function. We call it with

identical(overWt, foverWt)
## [1] FALSE

This is somewhat surprising, given that we just compared the values of each of the elements and found them all to be the same. Can you figure out what is the problem? Notice that our new variable overWt is a vector with named elements, but foverWt is not. The identical() function checks more than the values of the elements. These two vectors are not identical because one has named elements and the other does not.

In addition to the relational operators, many functions in R are vectorized. The nchar() function is one example. If we give nchar() the character vector of family member names, then we get

nchar(fnames)
##  [1] 3 4 3 6 3 3 3 5 3 3 3 3 3 3

Here we have a vector of length 14 that contains mostly 3s because most family members’ names have only 3 letters in them.

3.3.3 Boolean Algebra

Boolean operators take logical vectors as inputs and perform Boolean algebra. The three common operations are “not”“,”or“”, and “and”.
The “not” operator, which is ! in R, is a unary operator because it has only one input. It turns TRUE into FALSE and vice versa. For example,

!foverWt
##  [1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE
## [12]  TRUE  TRUE  TRUE

This vector has TRUE for individuals who are not over weight and FALSE for those who are.

The “and” operator, &, compares the elements of two vectors and returns a logical vector where TRUE indicates the corresponding elements in the input vectors are both TRUE, and FALSE indicates otherwise.
The or operator, |, compares the elements of two vectors and returns a logical vector where TRUE denotes that either one or the other or both of the corresponding elements in the input vectors are TRUE.
Of course, these operations can be combined into compound statements. We provide an example.

3.3.3.1 Example: Identifying Certain Family Members

Let’s suppose we are interested in identifying female family members who are either over weight or under 45 years old. Then, we can create a logical vector that indicates these characteristics with the following expression:

fsex == "f" & (fage < 45 | foverWt)
##  [1] FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE
## [12] FALSE FALSE FALSE

We can check the values of fsex, fage and foverWt to confirm that our algebra is correct.

In addition to all(), another function that operate on logical vectors and can be quite useful is any(). As demonstrated in , the all() function returns TRUE if all of the elements of the logical vector are TRUE. The any() function returns TRUE if any of the elements are TRUE. For example, since some of the elements in foverWt are TRUE and some are FALSE, we find any(foverWt) returns TRUE, and all(foverWt) is FALSE.

3.3.4 Coercion

At times, we may want to change the type of a vector, e.g., from logical to numeric so that the vector contains 1s and 0s rather than TRUEs and FALSEs. We can do this with the collection of as. functions, e.g., as.numeric() attempts to convert the input vector into a numeric vector. We try it with the logical vector foverWt

as.numeric(foverWt)
##  [1] 1 0 0 0 0 1 1 0 1 1 1 0 0 0

As expected, a value of TRUE is converted to 1 and FALSE to 0. When we try to convert fnames, which is a character vector of the family member names, we get:

as.numeric(fnames)
## Warning: NAs introduced by coercion
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA

In this case, the conversion results in a vector of NA values. Notice also that a warning message is issued to let us know that we have missing values in our result. R can convert some character values into numbers, e.g.,

as.numeric(" -17.01")
## [1] -17.01

However, when a string is not made up of digits (and a possible period and negative sign), then the conversion results in NA.

The reverse coercion, i.e., converting a number into a character string, works as expected, with a few subtleties as shown in the next example.

Example: How Many Digits are in fbmi?

We can convert the numeric vector, fbmi, into a character vector with

as.character(fbmi)
##  [1] "25.1623879871136" "21.3290554422018" "24.4588421417243"
##  [4] "24.484141396238"  "18.5556551025638" "28.9498062481498"
##  [7] "28.1879692416195" "20.677825110357"  "26.664295228559" 
## [10] "30.0491124146844" "26.0536376395866" "22.6438385926275"
## [13] "24.2612556349581" "22.9106030059567"

These first two elements look a bit different from what we have seen before. When we print fbmi at the console, we see

head(fbmi)
##             a             b             c             d             e 
## 25.1623879871 21.3290554422 24.4588421417 24.4841413962 18.5556551026 
##             f 
## 28.9498062481

It appears that the first two values of fbmi are 25.16 and 21.50, not 25.1623879871136 and 21.5010639538325.

This is simply a matter of controlling the number of digits printed to the console. If we want more digits printed for fbmi, then we can explicitly specify this with the print() function, e.g.,

print(fbmi, digits = 22)
##                       a                       b                       c 
## 25.16238798711363244820 21.32905544220179194781 24.45884214172427206790 
##                       d                       e                       f 
## 24.48414139623799457013 18.55565510256377947940 28.94980624814977687720 
##                       g                       h                       i 
## 28.18796924161951622523 20.67782511035704828828 26.66429522855900202671 
##                       j                       k                       l 
## 30.04911241468440863400 26.05363763958658296360 22.64383859262751741426 
##                       m                       n 
## 24.26125563495805081971 22.91060300595674448232

Now we have more digits than in the character strings! We discuss the topic of how numbers are represented in R in xref linkend=“chap:shell”. For now, we simply recognize that what prints to the console may be different from the actual values in the vector.

We also mention that if we want to change the default number of digits printed for all subsequent computations in our R session, then we can do this through the options() function, e.g.,

options(digits = 12)
fbmi
##             a             b             c             d             e 
## 25.1623879871 21.3290554422 24.4588421417 24.4841413962 18.5556551026 
##             f             g             h             i             j 
## 28.9498062481 28.1879692416 20.6778251104 26.6642952286 30.0491124147 
##             k             l             m             n 
## 26.0536376396 22.6438385926 24.2612556350 22.9106030060

This example shows us that we should be careful when reading and comparing numeric values because what is printed at the console is not necessarily the actual value in a vector.

Converting Factors

Factors are a special data type and coercion of a factor has special behaviors. Recall that a factor consists of integer values where each integer represents a level, and a level is associated with a character string. When we coerce a factor into a number, we get the integer value of the level. However, when we coerce a factor into a character string, then the return value is the label for the level. We demonstrate with fsex:

as.numeric(fsex)
##  [1] 2 1 2 2 1 1 2 1 2 2 1 2 2 1

When we performed the relational operation, fsex == "f" in xref linkend=“ex:compoundLogical”, R implicitly converted fsex to a character vector before performing the comparison.

Implicit Coercion

Implicit coercion occurs when we operate on a vector in a way that is not intended for its type. For example, if we add 1 to a logical vector, then the logical values are converted to 0s and 1s implicitly, and 1 is added to each element.

1 + foverWt
##  [1] 2 1 1 1 1 2 2 1 2 2 2 1 1 1

As another example, when we use the logical operator > to compare the values in fweight to "150", R determines that it can convert "150" to a numeric value; that is,

fweight > '150'
##  [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
## [12] FALSE FALSE FALSE

In general, with the exception of adding 0 or 1 to a logical vector to convert it to numeric, it’s best to avoid implicit coercion. It can produce nonsensical and unexpected results, if we don’t understand well enough how coercion works. For example, can you figure out why the following comparison yields all FALSEs?

fweight > "abc"
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE

To help you, examine the built-in vector letters and see if you can make sense of the comparison,

letters > "c"
##  [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [23]  TRUE  TRUE  TRUE  TRUE

3.4 Creating Vectors

We have already used a few of R’s functions for creating vectors when we demonstrated how to subset by position and exclusion. In those examples, we saw that we can concatenate values together into a vector with the c() function and we can create simple sequences of integers with the : operator. In this section, we describe these functions in greater detail and introduce other, richer functions for creating vectors.

3.4.1 Concatenating Values into a Vector

The notion of concatenation was first introduced in linkend=“ex:subsetBMIposition”. There we saw how to use the c() function to take one or more values and put them into a new vector. These values do not need to be integers, or even numbers. For example,

y = c(1.2, 4.5, 3.2)
y
## [1] 1.2 4.5 3.2
c(TRUE, FALSE, FALSE, TRUE)
## [1]  TRUE FALSE FALSE  TRUE
c("Tim", "Jessica", "Jo", "Isabella")
## [1] "Tim"      "Jessica"  "Jo"       "Isabella"

Additionally, if we want to name the elements in the vector, we can do this with, e.g.,

c(bob = TRUE, x = FALSE, mm = TRUE)
##   bob     x    mm 
##  TRUE FALSE  TRUE

When we concatenate integers into a vector do we obtain an integer vector in return? Well, in R, all numbers that we type at the console are made into real numbers. This means that when we type c(1, 0, 200), we get a numeric vector because the individual values are treated as numeric. However, if we place an L after each integer, then an integer vector is created. For example,

x = c(1L, 0L, 200L)
class(x)
## [1] "integer"

The c() function can be used to concatenate vectors, e.g., z = c(y, x) yields the numeric vector 1.2 4.5 3.2 1.0 0.0 200.0. In this example, the integer vector x is converted to numeric in the concatenation. More on the topic of conversion/coercion in xref linkend=“sec:Coerce”.

3.4.2 Creating Sequences of Values

The : operator is a built-in syntax for creating an integer sequence, e.g.,

3:5
## [1] 3 4 5
class(3:5)
## [1] "integer"

The sequence can include negative integers and it can decrease rather than increase. For example,

-6:2
## [1] -6 -5 -4 -3 -2 -1  0  1  2

The : operator can also be used to create a sequence of 1-apart numerics, e.g., the expression 1.7:6.5 returns the numeric vector 1.7 2.7 3.7 4.7 5.7. Notice that 5.7 is the largest value in the form \(1.7 + n\) that is less than or equal to 6.5.

The : operator is a very specific and simple version of the more general seq() function. With seq(), we can create sequences with strides other than 1.
The arguments to the seq() function include from, to, by and length.out. They are not all required; all we need to provide is enough information to uniquely specify the sequence. This can be: from, to, and length.out; from, length.out, and by; from, to, and by; or to, length.out, and by. Below are examples of all of these possibilities:

seq(from = 1, to = 6, by= 2)
## [1] 1 3 5
seq(from = 1, to = 6, length.out = 3)
## [1] 1.0 3.5 6.0
seq(to = 6, by = 2, length.out = 3)
## [1] 2 4 6

We do not get the same sequence for each function call, even though we have used the same values for the arguments. That is, we have called seq() with all 4 possible subsets of 3 arguments from from = 1, to = 6, length.out = 3, and by = 2. The reason for this is that the to argument is taken to be the end value of the sequence when accompanied with the length.out argument. That is, with from = 1, to = 6, and length.out = 3, in order for the three elements to be equi-spaced and the beginning and end of the sequence to be 1 and 6, the middle element must the 3.5. Likewise, with to = 6, length.out = 3, and by = 2, the last element is 6 and working back by 2s, we get 2, 4, and 6 as our sequence. When we supply all 4 arguments, we get an error. Try it and see what the error message is. Does it make sense?

3.4.3 Functions for Creating Vectors

One function that can be very useful in creating vectors is rep(). Short for repeat, this function repeats the elements of a vector to create a new vector. The two arguments times and each perform this repetition differently one repeats the individual elements and the other repeats the entire sequence. An example makes this clear. Recall that we earlier created a vector x with values 1 0 200 so

x
## [1]   1   0 200
rep(x, times = 2)
## [1]   1   0 200   1   0 200
rep(x, each = 2)
## [1]   1   1   0   0 200 200

We can also supply a vector to the times argument that is the same length as x, where the elements in this vector indicate the number of times to repeat each element in x. For example, rep(x, times = c(2, 0, 3)) returns a vector of length 5 with values 1 1 200 200 200.

Other functions that offer general facilities for operating on vectors include sort(), rev(), and order(). These functions perform the computations that their names suggest. The sort() function sorts the elements of a vector into ascending or descending order, and rev() reverses the order of the elements. For example, recall z contains 1.2 4.5 3.2 1.0 0.0 200.0, in that order, then

sort(z)
## [1]   0.0   1.0   1.2   3.2   4.5 200.0
rev(z)
## [1] 200.0   0.0   1.0   3.2   4.5   1.2

The order() function returns a vector of positions. These positions can be used to re-order elements to obtain, e.g., the 2 smallest values in x.

order(z)
## [1] 5 4 1 3 2 6
z[ order(z)[1:2] ]
## [1] 0 1

Let’s examine this last expression carefully. We know that the return value from order(z) is the vector 5 4 1 3 2 6. This indicates that the smallest value in z is the 5th element, the next smallest is the 4th element, and so on. When we subset this return value, i.e., this vector of positions, with 1:2, we obtain the first two elements, 5 4. We use these to subset z by position to get the two smallest values in z. Of course, we could also call sort(z)[1:2] to obtain the same results.

3.4.4 Functions for Manipulating Character Vectors

For character vectors, the paste function is convenient for combining strings together, e.g., we can create one long string of all the names in fnames with

paste(fnames, collapse = "*")
## [1] "Tom*Maya*Joe*Robert*Sue*Liz*Jon*Sally*Tim*Tom*Ann*Dan*Art*Zoe"

Note that the collapse parameter is used to specify what, if any, character(s) to place between the strings being pasted together.

The function strsplit() splits strings by user-specified delimiters. For example, we can split the names in fnames on the letter “o”, which for the first two names of “Tom” and “Maya” returns

strsplit(fnames[1:2], "o")
## [[1]]
## [1] "T" "m"
## 
## [[2]]
## [1] "Maya"

We see that Tom is split into two pieces, the T before the o and the m after it. Whereas, Maya is not split at all because the string does not contain an o. If we split the name Yamomoto on o then we would get three strings, "Yam" "m" "t".

The substring() function can extract a piece of a string, e.g.,

substr("Yamomoto", start = 3, stop = 7)
## [1] "momot"

In addition, we can match and substitute text in strings using regular expressions. These capabilities are available in the functions grep(), gsub() and others.
See xref linkend=“chap:regexpr” for more details on string manipulation and regular expressions.

3.4.5 Creating Vectors from Different Types

We have emphasized that vectors must contain the same type of elements. If we try to combine different types of elements, R coerces them to an appropriate common type. That is, we cannot use a vector to store values of different types, such as number and strings. However, we can concatenate numbers and strings together into a vector, and when we do, the numbers are converted to strings so that we have a character vector. For example,

c("Hi", -2, "Bye", 10.3, 0)
## [1] "Hi"   "-2"   "Bye"  "10.3" "0"

Below are a few other examples of how R coerces values of different types into a single type:

c(1, 2, 3, TRUE)
## [1] 1 2 3 1
c(1L, 2L, 3L, 10.7)
## [1]  1.0  2.0  3.0 10.7

We see that when combining numbers and logicals, the logicals are coerced to 0s and 1s, and when concatenating integers and numerics, the integers are converted to numeric. You can try combining different elements and see what you get. For example, what do you think happens when you combine strings and logicals? And, how are the values coerced when we concatenate strings, logicals, and numeric values? Try it and see.