August 2013, UC Berkeley
Jacob Lynn
if
-else
statementsval <- rnorm(1)
val
## [1] -0.8455
if (val < 0) {
"val is negative!"
} else {
"val is positive"
}
## [1] "val is negative!"
Chaining if
statements:
val <- rnorm(1)
val
## [1] 0.1711
if (val < -1) {
"val is more than one standard deviation below the mean."
} else if (abs(val) <= 1) {
"val is within one standard deviation of the mean."
} else {
"val is more than one standard deviation above the mean."
}
## [1] "val is within one standard deviation of the mean."
Zero evaluates to FALSE
, all other numbers evaluate to TRUE
. (And the string "true" evaluates to TRUE
too... but not other strings.)
val <- 3.1
if (val) {
"3.1 is true?"
}
## [1] "3.1 is true?"
if ("true") {
"true is true?"
}
## [1] "true is true?"
if ("bear") {
"bear is true?"
}
## Error: argument is not interpretable as logical
One of the major reasons why programming languages exist is to automate away annoying or tedious tasks. Loops are a basic way to do that.
But be careful -- frequently, there is a more R-standard (and usually faster) way to do the same thing.
for
loopAbstract structure of for loop
:
for (variable in sequence) {
statement
}
More concretely:
myseq <- seq(5, 20, by = 5)
for (i in myseq) {
print(i)
}
## [1] 5
## [1] 10
## [1] 15
## [1] 20
And more directly:
for (i in seq(2, 8, by = 2)) {
print(i)
}
## [1] 2
## [1] 4
## [1] 6
## [1] 8
while
loopAbstractly:
while (condition) {
statements
}
Concretely:
i <- 0
while (i < 10) {
i <- i + 1
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
It's easy to create infinite loops!
First 12 Fibonacci numbers:
myseq[1] <- 0
myseq[2] <- 1
for (i in seq(3, 12)) {
myseq[i] <- myseq[i - 2] + myseq[i - 1]
}
myseq
## [1] 0 1 1 2 3 5 8 13 21 34 55 89
Fibonacci numbers less than 500:
myseq[1] <- 0
myseq[2] <- 1
i <- 2
currentVal <- 1
while (currentVal < 500) {
myseq[i + 1] <- currentVal
currentVal <- myseq[i] + myseq[i + 1]
i = i + 1
}
myseq
## [1] 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377
next
and break
statementsnext
skips the current evaluation of the loop statements:
for (i in seq(1, 10)) {
if (i == 5) {
next
}
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
break
immediately ends loop evaluation:
val <- 0
i <- 0
while (TRUE) {
i <- i + 1
val <- val + rnorm(1)
if (abs(val) > 3) {
break
}
}
val
## [1] -3.137
i
## [1] 22
Loops are frequently the wrong solution.
vals <- rnorm(100)
maxVal <- vals[1]
for (val in vals) {
if (val > maxVal) {
maxVal = val
}
}
maxVal
## [1] 1.838
Try to use builtin functions instead:
max(vals)
## [1] 1.838
Operate directly on vectors rather than looping over them:
myseq <- seq(2, 20, by = 2)
for (i in seq(1, 10)) {
myseq[i] <- myseq[i] + 2
}
myseq
## [1] 4 6 8 10 12 14 16 18 20 22
seq(2, 20, by = 2) + 2
## [1] 4 6 8 10 12 14 16 18 20 22
Less code (although sometimes opaque) and faster (R-specific).
Functions: take arguments as input, (usually) return values as output.
Why define your own functions? Primary reasons:
Remember this?
val <- 3.1
if (val) {
"3.1 is true?"
}
## [1] "3.1 is true?"
if ("true") {
"true is true?"
}
## [1] "true is true?"
if ("bear") {
"bear is true?"
}
## Error: argument is not interpretable as logical
isItTrue <- function(val) {
if (val) {
return(paste(val, "is true"))
} else {
paste(val, "ain't true")
}
}
isItTrue(3.1)
## [1] "3.1 is true"
isItTrue("false")
## [1] "false ain't true"
isItTrue("bear")
## Error: argument is not interpretable as logical
Note:
histNormal <- function(N) {
vals <- rnorm(N)
hist(vals)
invisible(max(vals))
}
histNormal(1000)
max <- histNormal(1000)
max
## [1] 3.213
newFunction <- function(num, threshold = 0, modifier = 2) {
if (num < threshold) {
return(num/modifier)
} else {
return(num * modifier)
}
}
newFunction(2.6)
## [1] 5.2
R lazily matches arguments from left to right:
newFunction(2.6, 3)
## [1] 1.3
newFunction(2.6, 3, 1.3)
## [1] 2
But we can explicitly specify which argument is which:
newFunction(2.6, modifier = 1.3, threshold = 3)
## [1] 2
And we can pass the other arguments to most pre-defined R functions:
hist(sapply(rnorm(10000), newFunction), breaks = 60, freq = FALSE)
hist(sapply(rnorm(10000), newFunction, modifier = 1), breaks = 60, freq = FALSE)
...
argument
args(hist)
## function (x, ...)
## NULL
histNormalWrapper <- function(N, ...) {
vals <- rnorm(N)
hist(vals, ...)
}
histNormalWrapper(1000)
histNormalWrapper(1000, breaks = 50)
Define throwaway functions on the fly:
hist(sapply(rnorm(10000), function(x) {
x * 3
}), breaks = 60, freq = FALSE)
(Note that a function is just an object like any other R object -- that means that it can be passed as an argument to other functions like sapply().)
Scope refers to which variables a given piece of code can access.
R uses lexical scoping: functions can access variables in their own scope relative to where they are defined (not relative to where they are called).
Scoping is a hard topic... maybe best understood by example:
a <- 1
b <- 2
f <- function(x) {
if (x == "a") {
a
} else if (x == "b") {
b
}
}
g <- function(x) {
a <- 2
b <- 1
f(x)
}
g("a")
## [1] 1
a
## [1] 1
(global vs. local scope)
Write a function that returns the mean of N normally distributed random numbers.
Extend exercise #1: allow the user to pass the mean and standard deviation of the normal distribution (but provide default values in case they don't), and return both mean and median of the generated random numbers.
Write a function that repeatedly generates random normal variables until it generates a random number more than N standard deviations from the mean. Return the number of samples performed up to that point.