3 Data Types and Vectors
In R, vectors are the primitive objects. A vector is simply an ordered collection of values grouped together into a single container. Some primitive types of vectors are numeric, logical, and character. A very important characteristic of these vectors is that they can only store values of the same type.
3.1 Data Types
A vector contains values that are homogeneous primitive elements. That is, a numeric vector contains only real numbers, a logical vector stores values that are either TRUE
or FALSE
, and character vectors store strings.
3.1.1 Example: Vectors of Measurements on a Family
We have created some artificial data on a 14-member family to help us explore many of the concepts in this chapter. These data are available in an RDA file for you to load into your R session and follow along by typing in the commands shown. We begin by loading the RDA file with
load("family.rda")
The names of the family members are in the vector called fnames
, and we can see its contents with
fnames
## [1] "Tom" "Maya" "Joe" "Robert" "Sue" "Liz" "Jon"
## [8] "Sally" "Tim" "Tom" "Ann" "Dan" "Art" "Zoe"
The numbers [1]
and [8]
are there to help us keep track of the order of the elements in the vector. We see that "Tom"
is 1st, "Maya"
2nd, …, "Sally"
8th, …, and Zoe
14th. We can confirm that fnames
is indeed a character vector by calling the class()
function and passing the vector fnames
as input, i.e.,
class(fnames)
## [1] "character"
There are several other vectors with information on the family, including: fweight
, weight in pounds; fbmi
, body mass index (BMI); foverWt
, whether or not BMI is above 25; and fsex
, which is f
for female and m
for male. These vectors provide examples of several data types. The variable, fbmi
is a numeric vector,
fbmi
## a b c d e
## 25.1623879871 21.3290554422 24.4588421417 24.4841413962 18.5556551026
## f g h i j
## 28.9498062481 28.1879692416 20.6778251104 26.6642952286 30.0491124147
## k l m n
## 26.0536376396 22.6438385926 24.2612556350 22.9106030060
and foverWt
is a logical vector,
foverWt
## [1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE TRUE
## [12] FALSE FALSE FALSE
Another type of number in R is the integer. The integer vector is similar to a numeric vector, except that the values must all be integers. The variable fage
, which contains the person’s age in years, is an example,
class(fage)
## [1] "integer"
3.1.2 Factor
The factor is a somewhat special data type for use with qualitative measurements. The values are internally stored as integers, but each integer corresponds to a level, which is held as a character string. The values of fsex
, which is a factor vector, are:
fsex
## [1] m f m m f f m f m m f m m f
## Levels: f m
Notice that the values are not printed with quotation marks, as with character values. We confirm the data type of fsex()
with a call to class()
,
class(fsex)
## [1] "factor"
Also, the levels()
function provides the levels or labels associated with the vector.
3.1.3 Special Values
R provides a few special values, including NULL
, NA
, NaN
, and Inf
. These stand for null or empty, not available, not a number, and infinity, respectively. NULL
denotes an empty vector. The value NA
can be an element of a vector of any type. It is different from the character string "NA"
. We can check for the presence of NA
values in a vector with the function is.na()
and for an empty vector with is.null()
.
The special values not a number
and infinity come about from computations. In particular, they can occur when we divide by 0. Here are three examples that return infinity, negative infinity, and not a number, respectively:
12 / 0
-100 / 0
0 / 0
3.2 Finding Information on Vectors
R has many utility functions that provide information about vectors (and other objects that we will soon learn about). We mentioned already the two functions is.na()
and is.null()
, which provide us with information about the presence of NA
s in a vector or whether or not a vector is empty, respectively. We may also be interested in the vector’s type. We can determine this with the class()
function, or we can ask specifically if a vector is of a certain type with functions, such as is.factor()
, is.logical()
, etc. We can find the number of elements in a vector with length()
; the first or last few values with head()
and tail()
, respectively; and the names of elements with names()
.
3.2.0.1 Example: Finding Details on the Family
Let’s use some of these functions to find out more about the family. The length of fnames
, which contains the names of the family members, is
length(fnames)
## [1] 14
We can confirm that all of the vectors with family information have length 14 with further calls to length()
, e.g., length(fbmi)
.
We may also want summary information about, say, the weights of the family members. We may want to know the smallest and largest values, the average, or the median weight. We can find these with, respectively, min()
, max()
, mean()
, and median()
. For example the smallest weight in the family is
min(fweight)
## [1] 98
Some of these functions return a single value, such as the minimum weight just calculated. Others return a value for each element of the input vector. One example is the names()
function, e.g.,
names(fheight)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"
The names()
function returns a character vector the same length as the input vector and each element of the return value is the name of the corresponding element of the input vector. For example, "c"
is the name of the third element in fheight
. Not all vectors have named elements, and when this is the case, the return value is an empty vector, e.g.,
names(fweight)
## NULL
Often it can be helpful to examine the first few or last few values in a vector to confirm that the data are what we expect. The head()
and tail()
functions, respectively, return these initial or final values. By default we are given the first (last) 6 elements, i.e., the return value is a vector of length 6, e.g,.
head(fweight)
## [1] 175 124 185 156 98 190
However, we can request more or fewer elements by specifying a value for the n
argument, e.g., to view the last two elements in fweight
we use tail(fweight, n = 2)
and the return value is 150 125
.
In addition to finding the type of a vector with the class()
function, e.g.,
class(fsex)
## [1] "factor"
we can ask whether or not a vector is a particular type. For example,
is.factor(fsex)
## [1] TRUE
Similarly, is.integer(fbmi)
returns FALSE
, but is.numeric(fbmi)
returns TRUE
. The fage
variable is an integer vector, what do you think the call is.numeric(fage)
returns? Try it and see. Since an integer is a special case of a number, R returns TRUE
in this case.
3.2.1 Summary of Data Types
Primitive Types
In R, vectors are the basic or primitive objects. A vector is an ordered container of homogeneous values. In other words, a vector contains values that are the same type and these values have an ordering. Statisticians analyze measurements of some quantity on a group, and the vector is a convenient structure for this purpose.
The primitive types include:
numeric
– real numbers,logical
–TRUE
andFALSE
only,character
– strings.
Factor Type
Qualitative measurements are an important kind of data, e.g., sex, marital status, and education level, and R has a factor
data type for this purpose, i.e.,
factor
– values are stored as integers, and each integer corresponds to a level, which is a string. The levels in a factor can be ordered, e.g., education level.
Typically we want to perform different types of calculations with summary()
, perform different operations on a vector depending on whether it is
3.3 Vectorized Operations
The philosophy in R is that operations work on an entire vector. This makes sense given that vectors are the basic data types. A simple example is with subtraction. We can subtract one vector from another element-wise using the fdesiredWt
from fweight
. That is,
fweight - fdesiredWt
## [1] 0 10 10 6 -12 40 25 6 10 20 16 4 10 0
Here, the 1st element of fdesiredWt
is subtracted from the 1st element of fweight
to get 0, the 2nd element of fdesiredWt
is subtracted from the 2nd element of fweight
to get 10, and so on.
The notion of vectorized operations is very powerful and convenient. It allows us to express computations at a high-level, indicating what we mean rather than hiding it in a loop.
3.3.0.1 Example: Computing BMI
Although we are provided with the BMI of each family member in the fbmi
variable, we can compute this quantity ourselves from the height and weight of each individual. The formula for BMI is \[\frac {weight~in~kg}{(height~in~m)^2}.\] If we use this formula, we need to convert our measurements from pounds into kilograms and from inches into meters. Since there are 2.2 pounds to a kilogram, we can change the units for the values in fweight
with
fweight / 2.2
## [1] 79.5454545455 56.3636363636 84.0909090909 70.9090909091 44.5454545455
## [6] 86.3636363636 84.0909090909 56.3636363636 79.5454545455 97.7272727273
## [11] 75.4545454545 63.6363636364 68.1818181818 56.8181818182
Similarly, we can convert inches to meters with
fheight * 0.0254
## a b c d e f g h i j
## 1.7780 1.6256 1.8542 1.7018 1.5494 1.7272 1.7272 1.6510 1.7272 1.8034
## k l m n
## 1.7018 1.6764 1.6764 1.5748
We see that this return vector appears somewhat differently than the return from dividing fweight
by 2.2. This is because the elements in fheight
are named so the return value has these names too.
We can combine these calculations into a single calculation of BMI with
(fweight / 2.2) / (fheight * 0.0254)^2
## a b c d e
## 25.1623879871 21.3290554422 24.4588421417 24.4841413962 18.5556551026
## f g h i j
## 28.9498062481 28.1879692416 20.6778251104 26.6642952286 30.0491124147
## k l m n
## 26.0536376396 22.6438385926 24.2612556350 22.9106030060
Again, the names of the elements in fheight
have been carried over to the return value. Also, we can examine fweight
and fheight
to check that, e.g., the 3rd weight in fweight
and the 3rd height in fheight
are properly combined to create the 3rd BMI in the return value.
For greater clarity, we can create intermediate variables for the converted height and weight with
WtKg = fweight / 2.2
HtM = fheight * 0.0254
bmi = WtKg / HtM^2
We have assigned our computation of BMI to the variable bmi
.
3.3.1 Aggregator Functions
Some vectorized functions work on all of the elements of a vector, but return only a single value for the result. We have seen examples of these already, others include mean()
, min()
, max()
, sum()
, prod()
, and median()
. For example, the average BMI for the family is mean(bmi)
, which evaluates to about 24.61
.
3.3.2 Relational Operations
In addition to the arithmetic operators, such as >
, <
, >=
, <=
, !=
, ==
, respectively.
The relational operators are vectorized, meaning that if you give them a vector of length \(n\), they operate on all \(n\) elements.
3.3.2.1 Example: Compute Who is Over Weight
We can use relational operators to determine which of the family members are over weight. Our definition of over weight is a BMI that exceeds 25. Again, although the variable foverWt
already contains this information, we can compute it ourselves by comparing bmi
(fbmi
) to 25. We do this with
overWt = bmi > 25
overWt
## a b c d e f g h i j k l
## TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
## m n
## FALSE FALSE
We can compare our vector overWt
to the one supplied for the family (i.e., foverWt
) to see if they match. To do this, we can use another relational operator, namely the “equal to”" operator (==
).
overWt == foverWt
## a b c d e f g h i j k l m n
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Note that this computation checks that the 1st element of overWt
equals the 1st element of foverWt
, the 2nd element of overWt
equals the 2nd element of foverWt
, and so on. All of these values are TRUE
so we have consistent results, i.e., the element of these two vectors have the same values. Rather than examine each of the 14 values to see if they are all TRUE
, we can pass the return vector from our comparison to the all()
function.
all(overWt == foverWt)
## [1] TRUE
The all()
function returns TRUE
if all of the elements in the input vector are TRUE
.
Another helpful function for determining if two objects are the same is the identical()
function. We call it with
identical(overWt, foverWt)
## [1] FALSE
This is somewhat surprising, given that we just compared the values of each of the elements and found them all to be the same. Can you figure out what is the problem? Notice that our new variable overWt
is a vector with named elements, but foverWt
is not. The identical()
function checks more than the values of the elements. These two vectors are not identical because one has named elements and the other does not.
In addition to the relational operators, many functions in R are vectorized. The nchar()
function is one example. If we give nchar()
the character vector of family member names, then we get
nchar(fnames)
## [1] 3 4 3 6 3 3 3 5 3 3 3 3 3 3
Here we have a vector of length 14 that contains mostly 3s because most family members’ names have only 3 letters in them.
3.3.3 Boolean Algebra
Boolean operators take logical vectors as inputs and perform Boolean algebra. The three common operations are “not”“,”or“”, and “and”.
The “not” operator, which is !
in R, is a unary operator because it has only one input. It turns TRUE
into FALSE
and vice versa. For example,
!foverWt
## [1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
## [12] TRUE TRUE TRUE
This vector has TRUE
for individuals who are not over weight and FALSE
for those who are.
The “and” operator, &
, compares the elements of two vectors and returns a logical vector where TRUE
indicates the corresponding elements in the input vectors are both TRUE
, and FALSE
indicates otherwise.
The or
operator, |
, compares the elements of two vectors and returns a logical vector where TRUE
denotes that either one or the other or both of the corresponding elements in the input vectors are TRUE
.
Of course, these operations can be combined into compound statements. We provide an example.
3.3.3.1 Example: Identifying Certain Family Members
Let’s suppose we are interested in identifying female family members who are either over weight or under 45 years old. Then, we can create a logical vector that indicates these characteristics with the following expression:
fsex == "f" & (fage < 45 | foverWt)
## [1] FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE
## [12] FALSE FALSE FALSE
We can check the values of fsex
, fage
and foverWt
to confirm that our algebra is correct.
In addition to all()
, another function that operate on logical vectors and can be quite useful is any()
. As demonstrated in all()
function returns TRUE
if all of the elements of the logical vector are TRUE
. The any()
function returns TRUE
if any of the elements are TRUE
. For example, since some of the elements in foverWt
are TRUE
and some are FALSE
, we find any(foverWt)
returns TRUE
, and all(foverWt)
is FALSE
.
3.3.4 Coercion
At times, we may want to change the type of a vector, e.g., from logical to numeric so that the vector contains 1s and 0s rather than TRUE
s and FALSE
s. We can do this with the collection of as.
functions, e.g., as.numeric()
attempts to convert the input vector into a numeric vector. We try it with the logical vector foverWt
as.numeric(foverWt)
## [1] 1 0 0 0 0 1 1 0 1 1 1 0 0 0
As expected, a value of TRUE
is converted to 1 and FALSE
to 0. When we try to convert fnames
, which is a character vector of the family member names, we get:
as.numeric(fnames)
## Warning: NAs introduced by coercion
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
In this case, the conversion results in a vector of NA
values. Notice also that a warning message is issued to let us know that we have missing values in our result. R can convert some character values into numbers, e.g.,
as.numeric(" -17.01")
## [1] -17.01
However, when a string is not made up of digits (and a possible period and negative sign), then the conversion results in NA
.
The reverse coercion, i.e., converting a number into a character string, works as expected, with a few subtleties as shown in the next example.
Example: How Many Digits are in fbmi
?
We can convert the numeric vector, fbmi
, into a character vector with
as.character(fbmi)
## [1] "25.1623879871136" "21.3290554422018" "24.4588421417243"
## [4] "24.484141396238" "18.5556551025638" "28.9498062481498"
## [7] "28.1879692416195" "20.677825110357" "26.664295228559"
## [10] "30.0491124146844" "26.0536376395866" "22.6438385926275"
## [13] "24.2612556349581" "22.9106030059567"
These first two elements look a bit different from what we have seen before. When we print fbmi
at the console, we see
head(fbmi)
## a b c d e
## 25.1623879871 21.3290554422 24.4588421417 24.4841413962 18.5556551026
## f
## 28.9498062481
It appears that the first two values of fbmi
are 25.16 and 21.50, not 25.1623879871136 and 21.5010639538325.
This is simply a matter of controlling the number of digits printed to the console. If we want more digits printed for fbmi
, then we can explicitly specify this with the print()
function, e.g.,
print(fbmi, digits = 22)
## a b c
## 25.16238798711363244820 21.32905544220179194781 24.45884214172427206790
## d e f
## 24.48414139623799457013 18.55565510256377947940 28.94980624814977687720
## g h i
## 28.18796924161951622523 20.67782511035704828828 26.66429522855900202671
## j k l
## 30.04911241468440863400 26.05363763958658296360 22.64383859262751741426
## m n
## 24.26125563495805081971 22.91060300595674448232
Now we have more digits than in the character strings! We discuss the topic of how numbers are represented in R in xref linkend=“chap:shell”. For now, we simply recognize that what prints to the console may be different from the actual values in the vector.
We also mention that if we want to change the default number of digits printed for all subsequent computations in our R session, then we can do this through the options()
function, e.g.,
options(digits = 12)
fbmi
## a b c d e
## 25.1623879871 21.3290554422 24.4588421417 24.4841413962 18.5556551026
## f g h i j
## 28.9498062481 28.1879692416 20.6778251104 26.6642952286 30.0491124147
## k l m n
## 26.0536376396 22.6438385926 24.2612556350 22.9106030060
This example shows us that we should be careful when reading and comparing numeric values because what is printed at the console is not necessarily the actual value in a vector.
Converting Factors
Factors are a special data type and coercion of a factor has special behaviors. Recall that a factor consists of integer values where each integer represents a level, and a level is associated with a character string. When we coerce a factor into a number, we get the integer value of the level. However, when we coerce a factor into a character string, then the return value is the label for the level. We demonstrate with fsex
:
as.numeric(fsex)
## [1] 2 1 2 2 1 1 2 1 2 2 1 2 2 1
When we performed the relational operation, fsex == "f"
in xref linkend=“ex:compoundLogical”, R implicitly converted fsex
to a character vector before performing the comparison.
Implicit Coercion
Implicit coercion occurs when we operate on a vector in a way that is not intended for its type. For example, if we add 1
to a logical vector, then the logical values are converted to 0s and 1s implicitly, and 1
is added to each element.
1 + foverWt
## [1] 2 1 1 1 1 2 2 1 2 2 2 1 1 1
As another example, when we use the logical operator >
to compare the values in fweight
to "150"
, R determines that it can convert "150"
to a numeric value; that is,
fweight > '150'
## [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## [12] FALSE FALSE FALSE
In general, with the exception of adding 0 or 1 to a logical vector to convert it to numeric, it’s best to avoid implicit coercion. It can produce nonsensical and unexpected results, if we don’t understand well enough how coercion works. For example, can you figure out why the following comparison yields all FALSE
s?
fweight > "abc"
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE
To help you, examine the built-in vector letters
and see if you can make sense of the comparison,
letters > "c"
## [1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [12] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [23] TRUE TRUE TRUE TRUE
3.4 Creating Vectors
We have already used a few of R’s functions for creating vectors when we demonstrated how to subset by position and exclusion. In those examples, we saw that we can concatenate values together into a vector with the c()
function and we can create simple sequences of integers with the
3.4.1 Concatenating Values into a Vector
The notion of concatenation was first introduced in linkend=“ex:subsetBMIposition”. There we saw how to use the c()
function to take one or more values and put them into a new vector. These values do not need to be integers, or even numbers. For example,
y = c(1.2, 4.5, 3.2)
y
## [1] 1.2 4.5 3.2
c(TRUE, FALSE, FALSE, TRUE)
## [1] TRUE FALSE FALSE TRUE
c("Tim", "Jessica", "Jo", "Isabella")
## [1] "Tim" "Jessica" "Jo" "Isabella"
Additionally, if we want to name the elements in the vector, we can do this with, e.g.,
c(bob = TRUE, x = FALSE, mm = TRUE)
## bob x mm
## TRUE FALSE TRUE
When we concatenate integers into a vector do we obtain an integer vector in return? Well, in R, all numbers that we type at the console are made into real numbers. This means that when we type c(1, 0, 200)
, we get a numeric vector because the individual values are treated as numeric. However, if we place an
x = c(1L, 0L, 200L)
class(x)
## [1] "integer"
The c()
function can be used to concatenate vectors, e.g., z = c(y, x)
yields the numeric vector 1.2 4.5 3.2 1.0 0.0 200.0
. In this example, the integer vector x
is converted to numeric in the concatenation. More on the topic of conversion/coercion in xref linkend=“sec:Coerce”.
3.4.2 Creating Sequences of Values
The :
operator is a built-in syntax for creating an integer sequence, e.g.,
3:5
## [1] 3 4 5
class(3:5)
## [1] "integer"
The sequence can include negative integers and it can decrease rather than increase. For example,
-6:2
## [1] -6 -5 -4 -3 -2 -1 0 1 2
The :
operator can also be used to create a sequence of 1-apart numerics, e.g., the expression 1.7:6.5
returns the numeric vector 1.7 2.7 3.7 4.7 5.7
. Notice that 5.7 is the largest value in the form \(1.7 + n\) that is less than or equal to 6.5.
The :
operator is a very specific and simple version of the more general seq()
function. With seq()
, we can create sequences with strides other than 1.
The arguments to the seq()
function include from
, to
, by
and length.out
. They are not all required; all we need to provide is enough information to uniquely specify the sequence. This can be: from
, to
, and length.out
; from
, length.out
, and by
; from
, to
, and by
; or to
, length.out
, and by
. Below are examples of all of these possibilities:
seq(from = 1, to = 6, by= 2)
## [1] 1 3 5
seq(from = 1, to = 6, length.out = 3)
## [1] 1.0 3.5 6.0
seq(to = 6, by = 2, length.out = 3)
## [1] 2 4 6
We do not get the same sequence for each function call, even though we have used the same values for the arguments. That is, we have called seq()
with all 4 possible subsets of 3 arguments from from = 1
, to = 6
, length.out = 3
, and by = 2
. The reason for this is that the to
argument is taken to be the end value of the sequence when accompanied with the length.out
argument. That is, with from = 1
, to = 6
, and length.out = 3
, in order for the three elements to be equi-spaced and the beginning and end of the sequence to be 1 and 6, the middle element must the 3.5. Likewise, with to = 6
, length.out = 3
, and by = 2
, the last element is 6 and working back by 2s, we get 2, 4, and 6 as our sequence. When we supply all 4 arguments, we get an error. Try it and see what the error message is. Does it make sense?
3.4.3 Functions for Creating Vectors
One function that can be very useful in creating vectors is rep()
. Short for repeat, this function repeats the elements of a vector to create a new vector. The two arguments times
and each
perform this repetition differently x
with values 1 0 200
so
x
## [1] 1 0 200
rep(x, times = 2)
## [1] 1 0 200 1 0 200
rep(x, each = 2)
## [1] 1 1 0 0 200 200
We can also supply a vector to the times
argument that is the same length as x
, where the elements in this vector indicate the number of times to repeat each element in x
. For example, rep(x, times = c(2, 0, 3))
returns a vector of length 5 with values 1 1 200 200 200
.
Other functions that offer general facilities for operating on vectors include sort()
, rev()
, and order()
. These functions perform the computations that their names suggest. The sort()
function sorts the elements of a vector into ascending or descending order, and rev()
reverses the order of the elements. For example, recall z
contains 1.2 4.5 3.2 1.0 0.0 200.0
, in that order, then
sort(z)
## [1] 0.0 1.0 1.2 3.2 4.5 200.0
rev(z)
## [1] 200.0 0.0 1.0 3.2 4.5 1.2
The order()
function returns a vector of positions. These positions can be used to re-order elements to obtain, e.g., the 2 smallest values in x
.
order(z)
## [1] 5 4 1 3 2 6
z[ order(z)[1:2] ]
## [1] 0 1
Let’s examine this last expression carefully. We know that the return value from order(z)
is the vector 5 4 1 3 2 6
. This indicates that the smallest value in z
is the 5th element, the next smallest is the 4th element, and so on. When we subset this return value, i.e., this vector of positions, with 1:2
, we obtain the first two elements, 5 4
. We use these to subset z
by position to get the two smallest values in z
. Of course, we could also call sort(z)[1:2]
to obtain the same results.
3.4.4 Functions for Manipulating Character Vectors
For character vectors, the paste
function is convenient for combining strings together, e.g., we can create one long string of all the names in fnames
with
paste(fnames, collapse = "*")
## [1] "Tom*Maya*Joe*Robert*Sue*Liz*Jon*Sally*Tim*Tom*Ann*Dan*Art*Zoe"
Note that the collapse
parameter is used to specify what, if any, character(s) to place between the strings being pasted together.
The function strsplit()
splits strings by user-specified delimiters. For example, we can split the names in fnames
on the letter
strsplit(fnames[1:2], "o")
## [[1]]
## [1] "T" "m"
##
## [[2]]
## [1] "Maya"
We see that Tom
is split into two pieces, the T
before the o and the m
after it. Whereas, Maya
is not split at all because the string does not contain an o. If we split the name Yamomoto
on o
then we would get three strings, "Yam" "m" "t"
.
The substring()
function can extract a piece of a string, e.g.,
substr("Yamomoto", start = 3, stop = 7)
## [1] "momot"
In addition, we can match and substitute text in strings using regular expressions. These capabilities are available in the functions grep()
, gsub()
and others.
See xref linkend=“chap:regexpr” for more details on string manipulation and regular expressions.
3.4.5 Creating Vectors from Different Types
We have emphasized that vectors must contain the same type of elements. If we try to combine different types of elements, R coerces them to an appropriate common type. That is, we cannot use a vector to store values of different types, such as number and strings. However, we can concatenate numbers and strings together into a vector, and when we do, the numbers are converted to strings so that we have a character vector. For example,
c("Hi", -2, "Bye", 10.3, 0)
## [1] "Hi" "-2" "Bye" "10.3" "0"
Below are a few other examples of how R coerces values of different types into a single type:
c(1, 2, 3, TRUE)
## [1] 1 2 3 1
c(1L, 2L, 3L, 10.7)
## [1] 1.0 2.0 3.0 10.7
We see that when combining numbers and logicals, the logicals are coerced to 0s and 1s, and when concatenating integers and numerics, the integers are converted to numeric. You can try combining different elements and see what you get. For example, what do you think happens when you combine strings and logicals? And, how are the values coerced when we concatenate strings, logicals, and numeric values? Try it and see.