# Standard Error

The expected value of a random variable is like the mean of a list: It is a measure of location—a typical value. This chapter introduces a measure of spread for random variables—the standard error (SE). The SE of a random variable is like the SD of a list; both are measures of spread. The SE is a measure of the width of the probability histogram of a random variable, just as the SD is a measure of the width of the histogram of a list. Any list can be written as the mean of the list plus a list of deviations from the mean; the SD of the list is the square-root of the mean of the squares of those deviations (the rms of the deviations). Similarly, any random variable can be written as its expected value plus chance variability, a random departure from its expected value. The SE of the random variable is the square-root of the expected value of the square of the chance variability. Loosely speaking, if a random variable is likely to be very close to its expected value, its SE is small, while if a random variable is likely to differ substantially from its expected value, its SE must be large. (Chebychev's inequality for random variables, presented in makes this precise.) Quantifying the expected size of the chance variability will be important in later chapters for measuring the accuracy of estimators of numerical properties of a population or a probability distribution, which are called parameters. This chapter presents the tools we need to find the SE of discrete random variables, and calculates the SEs of some common random variables.

## The Expected Value of Transformations

The SE is calculated from the expected value of the square of the chance variability (the SE is its square-root). The square of the chance variability is a random variable; this section shows how to calculate the expected value of such transformations of random variables.

A transformation or function of a random variable is another random variable. is dynamic: The wording will tend to change when you reload the page.

We saw in that the expected value of an affine transformation of a random variable is just the same affine transformation applied to the expectation of the random variable. For transformations that are not affine, the situation is a bit more complicated. Suppose that the discrete random variable Y is defined in terms of the discrete random variable X, so that Y = g(X) for some known function g. There are two ways to calculate E(Y), the expected value of Y:

• Work directly from the definition of the expected value: If the possible values of Y are y1, y2, y3, … , then

E(Y) = y1×P(Y=y1) + y2×P(Y=y2) + y3×P(Y=y3) + …

• Use the following shortcut: If the possible values of X are x1, x2, x3, …, then

E(Y) = g(x1)×P(X = x1) + g(x2)×P(X = x2) + g(x3)×P(X = x3) + …

illustrates calculating the expected value of the square of a random variable using the shortcut.

What is the expected value of the square of a binomial random variable X with parameters n=3 and p=50% (for example, the number of heads in 3 independent tosses of a fair coin)?

Solution. The question asks us to compute E(g(X)) where g(x)=x2. essentially contains the calculation, using the "shortcut" method presented above: E(X2) is the sum of x2 times the chance that X takes the value x, over all possible values x that X can take. The first column of the table lists the possible values x of X, which are {0, 1, 2, 3}. The second column lists the probabilities of each those values; the first two columns comprise the probability distribution of X. The third column gives the values of the function g(x)=x2 for each possible value x of X. The fourth column lists the products g(x)×P(X=x), for x=0, 1, 2, 3. The sum of the entries in the fourth column, 24/8=3, is the expected value of g(X)=X2. Note that the expected value of the square of X, E(X2), is not equal to the square of the expected value of X, (E(X))2, which is (3/2)2 = 2 1/4. Only if g is an affine transformation is E(g(X)) necessarily equal to g(E(X)), no matter what the probability distribution of the random variable X might be.

Calculation of the expected value of X2
x P(X=x) x2 x2×P(X=x)
0 1/8 0 0
1 3/8 1 3/8
2 3/8 4 12/8
3 1/8 9 9/8
sum 100% 24/8 = 3

The next exercise checks whether you can calculate the expected value of a transformation of a random variable.

The following exercise verifies that you can tell whether a transformation is affine.

## Standard Error (SE) of a Random Variable

Just as the SD of a list is the rms of the differences between the members of the list and the mean of the list, the standard Error (SE) of a random variable is the (weighted) rms of the differences between the possible values of the random variable and the expected value of the random variable. (The weights are the probabilities, just like in the definition of the expected value.) The SE of a random variable is a measure of how "spread out" its probability is over its set of possible values, just as the SD of a list is a measure of how "spread out" the list is. The SE of a random variable is a measure of the width of its probability histogram; the SD of a list is a measure of the width of its histogram.

Recall that if the SD of a list is zero, all the elements of the list must be equal (to the mean of the list). If the SE of a random variable X, SE(X) is zero, the random variable is (essentially) equal to its expected value. That is, if SE(X)=0 then P(X = E(X)) = 100%. If X has a nonzero chance of taking two or more distinct values, SE(X) must be larger than zero.

One can think of a random variable as being a constant (its expected value) plus a contribution that is zero on average (i.e., its expected value is zero), but that differs randomly from zero. That is, one can think of X as being equal to E(X) + Y, where Y= X − E(X). Then

E(Y) = E(X − E(X)) = E(X) − E(E(X)) = E(X) − E(X) = 0.

The SE of a random variable X is the square-root of the expected value of (X − E(X))2:

SE(X) = (E((X − E(X))2) )½.

SE(X) is a measure of the expected distance between X and the expected value of X. The units of SE(X) are the same as the units of X. The SE of a random variable is completely determined by the probability distribution of the random variable: If two random variables have the same probability distribution, they have the same SE. Thus there is no risk of confusion in referring to the SE of a probability distribution versus the SE of a random variable that has that probability distribution.

A random variable is its expected value plus chance variability

Random variable = expected value + chance variability

The expected value of the chance variability is zero.

The "typical size" of the chance variability is the SE of the random variable.

A random variable is typically about equal to its expected value, give or take an SE or so.

The SE of a random variable is the square-root of the expected value of the squared difference between the random variable and the expected value of the random variable.

In symbols,

SE(X) = ( E(X−E(X))2 )½.

The frequentist interpretation of the standard error is as follows: The SE is the long-run RMS difference between a random variable and its long-run average. Make repeated observations of a random variable. Average the observations. Average the squared difference between the observations and their average, and take the square root. As the number of observations increases, that value converges to the standard error.

allows us to study the sampling distribution of the sample sum, which will help us understand the SE of the sample sum.

For any fixed sample size, the SD of the observed values of the sample sum tends to approach the SE of the sample sum as the number of samples grows.

Experiment using by drawing a large number of samples from different boxes; pay attention to "SD(samples)," which gives the standard deviation of the observed values of the sample sum, each of which is the sum of n draws. For each box, this standard deviation will tend to stabilize after a few thousand samples. It is an empirical estimate of the SE of the sample sum. You should find that it approaches n½×(SD(box)), which is listed as "SE(sum)" at the left side of Vary the contents of the box and the sample size n to confirm that the SD of the observed values of the sample sum still tends to approach (sample size)½×(SD(box)) after several thousand samples. Below we shall verify mathematically that SE of the sample sum of n draws with replacement is indeed n½×(SD(box))

illustrates calculating the SE of a discrete random variable from the probability distribution of the random variable. This example will tend to change when you reload the page, to provide a variety of illustrations.

Solution. To find the expected value of X, we need to sum the possible values of X, weighted by their probabilities:

The sum of the entries in rightmost column is the expected value of X,

To find the standard error of X, we first sum the values of (X − E(X))2 corresponding to the possible values of X, weighted by the probabilities that X takes each of those values:

The sum of the entries in the rightmost column is the expected value of (X−E(X))2, The square root of the expected value of (X−E(X))2 is the standard error,

The following exercise checks whether you can compute the SE of a random variable from its probability distribution.

## The SE of Transformations of Random Variables

Suppose we know the SE of the random variable X. Can we use that to find the SE of f(X)? That depends on the nature of the function f.

To find the SE of a transformation of a random variable or a collection of random variables, generally one must work from the definition of the SE. However, if f is an affine transformation, SE(f(X)) bears a simple relationship to the SE(X). The SE of a sum of independent random variables (defined presently) bears a simple relationship to the standard errors of the summed variables. Later in this chapter, we shall use these two results to derive the SE of some random variables from the SE of simpler random variables.

### The Standard Error of an Affine Transformation

A transformation of a random variable is another random variable. Recall that an affine transformation consists of multiplying by a constant, then adding a constant: f(x) = ax + b. The SE of an affine transformation of a random variable is related to the SE of the original variable in a simple way: It does not depend on the additive constant b, just the multiplicative constant a. The SE of the new variable is the absolute value of the multiplicative constant a, times the SE of the original variable.

Standard Error of an Affine Transformation of a Random Variable

If Y = aX + b, where a and b are constants (i.e., if Y is an affine transformation of X), then

SE(Y) = |a|×SE(X).

## Independent Random Variables

In general, calculating the SE of a function of two or more random variables is hard. However, if the random variables bear a special relationship to each other, the calculations simplify.

Recall that two events A and B are independent if and only if the chance of the intersection of A and B is the product of the chance of A and the chance of B:

P(AB) = P(A)×P(B).

A random variable X "determines" many events: all the events that can be defined in terms of the value that X takes in the experiment. For example, the event

A = {a < X ≤ b}

is determined by X: A occurs if and only if X happens to be larger than a and no larger than b.

Two random variables, X and Y, are independent if and only if every event determined by X is independent of every event determined by Y. (Actually, it suffices for every event of the form {a<X≤b} to be independent of every event of the form {c<Y≤d}.) If two random variables are not independent, they are dependent. Heuristically, two random variables are independent if knowing the value of one does not help predict the value of the other.

Consider tossing a fair coin 10 times:

• Let X be the number of heads in the first 6 tosses and let Y be the number of heads in the last 4 tosses. Then X and Y are independent: the event that X is in any range of values is independent of the event that Y is in any range of values. Knowing the value of X does not help one predict the value of Y.

• Let X be the number of heads in the first 6 tosses and let Y be the number of heads in the last 5 tosses. Then X and Y are dependent because, for example, the event {5< X ≤ 6} and the event {−1 < Y ≤0} are dependent (in fact, those events are mutually exclusive).

• Let X be the number of heads in the first 6 tosses and let Y be the number of tails in the first 2 tosses. Then X and Y are dependent because, for example, the event {5< X ≤6} and the event {1< Y ≤2} are dependent (in fact, those events are mutually exclusive).

What sorts of experiments lead to independent random variables? Sums and averages of non-overlapping sequences of draws with replacement from a box, non-overlapping sequences of coin tosses, and non-overlapping sequences of die rolls are some examples. The second and third items in show why the sequences must not overlap.

An important fact about independent random variables is that the expected value of a product of independent random variables is the product of their expected values; we shall use this result often.

Expected Value of the Product of Independent Random Variables

If the random variables X and Y are independent, then

E(X×Y) = E(X) × E(Y).

The converse is not true in general: E(X×Y) = E(X) × E (Y) does not imply that X and Y are independent.

The fact that the expected value of a product of independent random variables is the product of the expected values of the random variables implies that the SE of a sum of independent random variables is the square root of the sum of the squares of the individual standard errors of the random variables.

The Standard Error of a Sum of Independent Random Variables

If X1, X2, X3, … , Xn are independent random variables, then

SE( X1 + X2 + X3 + … + Xn ) = ( SE2(X1) + SE2(X2) + SE2(X3) + … + SE2(Xn) )½.

## Standard Errors of some common Random Variables

This section presents the standard errors of several random variables we have already seen: a draw from a box of numbered tickets, the sample sum and sample mean of n random draws with and without replacement from a box of tickets, binomial and hypergeometric random variables, geometric random variables, and negative binomial random variables. Some of these results are derived directly; others are derived from each other using the rules about the SE of affine transformations and of sums of independent random variables.

### The SE of a single draw from a box of numbered tickets

We saw in that the expected value of a random draw from a box of tickets labeled with numbers is the average of the labels, Ave(box). To find the SE, we first need to find the expected value of the square of the difference between the number drawn and the expected value of the number drawn, then take the square-root. That is, we need to find the sum of the squares of the differences between each label it is possible to draw and the expected value, each times the chance of drawing that number. The chance of drawing each possible label is the number of tickets with that label, divided by the total number of tickets. Let {x1, x2, …, xN} be the set of distinct numbers on the ticket labels. We want to find the square-root of the following sum:

(x1−Ave(box))2×(# tickets with x1on them)/(total # tickets) +

(x2−Ave(box))2×(# tickets with x2 on them)/(total # tickets) +

(x3−Ave(box))2×(# tickets with x3 on them)/(total # tickets) + …

+ (xN−Ave(box))2×(# tickets with xN on them)/(total # tickets),

where Ave(box) is the average of the list of the numbers on all the tickets. If the numbers on the tickets are all different, then the number of tickets with the number x on them would be one for every possible value x, and

(# tickets with x on them)/(total # tickets)

would be equal to

1/(total # tickets).

The sum would then be the sum of the squares of the deviations between each ticket and the average of the box (which is the average of the list of numbers on all the tickets in the box); taking the square-root would give the rms of the deviations, which is how we define the SD of the list of the numbers on the tickets. If the number x appears on more than one ticket, then in computing the SD of the list of numbers on the tickets, the term

(x − Ave(box))2×1/(total # tickets)

would occur (# tickets with x on them) times. Adding those terms together gives

(x − Ave(box))2×(# tickets with x on them)/(total # tickets).

As a result, the square-root of the sum displayed above is still exactly the SD of the list of numbers on the tickets: The SE of a draw from a box is the SD of the list of numbers on all the tickets in the box, which we shall denote SD(box). illustrates numerically that this is true in a particular case.

The expected value of the draw is

1×(1/4) + 3×(2/4) + 5×(1/4) = 3,

which is also the average of the list of the numbers on the tickets: (1 + 3 + 3 + 5)/4 = 3.

The expected value of the square of the difference between the draw and the expected value of the draw is

(1−3)2×(1/4) + (3−3)2×(2/4) + (5−3)2×(1/4)

= 4×(1/4) + 0×(2/4) + 4×(1/4)

= 2.

The SE of the draw is thus 2½. The SD of the list of the numbers on the tickets is

( (1−3)2 + (3−3)2 + (3−3)2 + (5−3)2)/4 )½

= ( (4 + 0 + 0 + 4)/4 )½

= 2½,

which is the same as the SE of the draw (as it must be).

### SE of the Sample Sum of n Random draws with Replacement from a Box of Tickets

We just calculated the SE of a single draw; now we consider the SE of the sum of the labels on n tickets drawn at random with replacement from a box of tickets labeled with numbers. Because the tickets are drawn with replacement, stirring between draws, the numbers on the n tickets drawn are independent random variables. The sample sum is the sum of these independent random variables, so, as we established earlier in the chapter, the SE of the sample sum is the square root of the sum of the squares of the SEs of the individual draws. They all have the same SE, namely, SD(box), so

SE(sample sum of n draws with replacement) = ( SD2(box) + SD2(box) + … + SD2(box))½

= (n×SD2(box))½

= n½ ×SD(box).

The expected value of the sum of n random draws with replacement from a box is n×Ave(box), and the SE of the sum of n draws with replacement from a box is n½ × SD(box). (Refer to for help interpreting the SE of the sample sum.) As the sample size n grows, the expected value grows faster than the SE (assuming that neither Ave(box) nor SD(box) is zero): The variability of the sample sum increases with the sample size, but the variability divided by the sample size decreases with the sample size.

### The SE of the Sample Mean of n random Draws from a Box of numbered Tickets

The sample mean of n independent random draws (with replacement) from a box is the sample sum divided by n. This is an affine transformation of the sample sum, so

SE(sample mean) = 1/n × SE(sample sum)

= 1/n × n½ × SD(box)

= SD(box)/n½.

shows the sampling distribution of the sample mean.

The SD of the observed values of the sample sum tends to approach the SE of the sample sum as the number of samples grows.

Using verify that the SD of the observed values of the sample mean tends to approach SD(box)/n½, the SE of the sample mean of n random draws with replacement from the population box at the right of the figure, as the number of samples of size n grows. Change Sample size and contents of the population box (which initially contains 0, 1, 2, 3, and 4) and confirm that this result remains true.

As the sample size n grows, the SE of the sample sum of n independent draws from a box of numbered tickets increases like n½, and the SE of the sample mean of n independent draws decreases like n−½. These facts are summarized in the square root law.

The Square-Root Law

In drawing n times at random with replacement from a box of tickets labeled with numbers, the SE of the sum of the draws is

n½ ×SD(box),

and the SE of the average of the draws is

SD(box)/n½,

where SD(box) means the SD of the list of all the numbers on all the tickets in the box.

The SE of the sample sum grows as the square-root of the sample size; the SE of the sample mean shrinks as the square-root of the sample size.

Because the SE of the sample mean of n draws with replacement shrinks as n grows, the sample mean is increasingly likely to be extremely close to its expected value, the mean of the labels on all the tickets in the box. This is called the Law of Averages. The Law of Large Numbers is a special case of the Law of Averages for 0-1 boxes. The Law of Averages can be proved using the Square-Root Law and Chebychev's inequality for random variables, which is discussed in

The Law of Averages

For every positive number e>0, as the sample size n grows, the chance that the sample mean of n random draws with replacement from a box of numbered tickets is within e of the average of the numbers on the tickets in the box converges to 100%.

In symbols, as n increases,

P( | sample mean of n draws − average of box | < e) approaches 100%.

It is extremely important to distinguish between SE(sample mean) and SD(box); review if the distinction is not clear to you. The SE of the sample mean depends on the sample size—it is a measure of the chance variability of the sample mean. The SD of the box does not depend on the sample size—it is a property of the numbers on all the tickets in the box. The SE of the sample mean can be related to the sample size and the SD of the list of numbers on the tickets in the box:

The difference between SE and SD

The SE is a property of a random variable; the SD is a property of a list of numbers.

The SE of the sample mean and the SE of the sample sum of independent random draws from a box of numbered tickets have simple relationships to SD(box), the SD of the numbers on all the tickets in the box. Those relationships are given by the square-root law.

• The sample mean and sample sum are random variables: their values depend on the sample. SE measures the typical size of the chance variation of a random variable. The SE of the sample mean gets smaller as the sample size increases, and the SE of the sample sum gets larger as the sample size increases.
• SD(box) is constant, regardless of the sample size. It is a measure of the scatter of the numbers on all the tickets in the box around their (population) average.

### The SE of a Binomial Random Variable

We can think of a binomial random variable with parameters n and p as the sample sum of n independent draws from a 0-1 box with a fraction p of tickets labeled "1." Equivalently, a binomial random variable is the sum of n independent random variables {X1, X2, X3, …, Xn } each of which takes the value 1 with probability p, and the value 0 with probability (1−p). Random variables whose possible values are only 0 and 1 are called indicator random variables: They indicate whether or not some event occurs. In this case, the indicator random variable Xj indicates whether the jth trial results in "success" or "failure." If the jth trial results in "success," Xj = 1; if the jth trial results in "failure," Xj = 0. The sum

X1+X2+X3+ …+ Xn

is the total number of successes in the n trials.

What is the SE of each Xj? We saw in that the expected value of each Xj is

E(Xj) = 0×(1−p) + 1×p = p.

The SE of X1 is the square-root of

E( (X1−E(X1))2 ) = E( (X1− p)2 )

= (0 − p)2×(1−p) + (1−p)2×p

= p2×(1−p) + (1−p)2×p

= p×(1−p)×(p + (1−p))

= p×(1−p),

which is (p×(1−p))½. The random variables X1, X2, X3, …, Xn all have the same probability distribution, so they all have the same SE. These variables are independent because the trials are independent, so the SE of their sum, the number of successes in n independent trials each with probability p of success, is the square-root of the sum of the squares of their SEs:

SE( X1+X2+X3+ … +Xn ) = ( SE2(X1) + SE2(X2) + SE2(X3)+ … + SE2(Xn) )½

= ( p×(1−p) + p×(1−p) + p×(1−p) + … + p×(1−p) )½

= ( n×p×(1−p) )½

= n½×(p×(1−p))½.

This is a special case of drawing at random with replacement from a box of numbered tickets—a 0-1 box. If a proportion p of the tickets are labeled "1" and a proportion (1−p) are labeled "0," then the sum of n random draws with replacement from the box has a binomial distribution with parameters n and p. Because the SE of the sum of n draws from such a box is n½×SD(box), what we must have just shown, then, is that the SD of such a box is (p×(1−p))½.

Something slightly more general is true: If a box contains tickets labeled with only two distinct numbers, a and b, the SD of the box is |a−b|×(p(1−p))½, where p is the fraction of tickets labeled with the number a.

SD of a box with only two kinds of tickets

If each ticket in a box has one of two numbers on it, a or b, and the fraction of tickets with a on them is p (so the fraction of tickets with b on them is 1−p), then

SD(box) = (p×(1−p))½×|a−b|,

where SD(box) is the SD of the list of all the numbers on all the tickets.

### The SE of Geometric and Negative Binomial Random Variables

The SE of a random variable with the geometric distribution with parameter p is (1−p)½/p. Verifying that this is true by calculating it directly is beyond the level of this text. However, starting with the SE of the geometric distribution, we can calculate the SE of the negative binomial distribution, because, as we saw in a random variable with the negative binomial distribution with parameters r and p has the same distribution as the sum of r independent geometric random variables with parameter p. The SE of the sum of independent random variables is the square-root of the sum of the squares of the SEs of those variables, so the SE of a random variable with the negative binomial distribution with parameters r and p is r½(1−p)½/p.

## SE of the Sample Sum and Mean of a Simple Random Sample

When tickets are drawn at random from a box without replacement (by simple random sampling), the numbers on the tickets drawn are dependent, not independent as they are for sampling with replacement. The dependence makes it harder to calculate the SE of the sample sum or sample mean of n draws without replacement—but we possess the tools to calculate them nonetheless. Formulae for the SE of the sample mean for random sampling without replacement, for the SE of the sample percentage for random sampling without replacement, and for the SE of the hypergeometric distribution all follow from the formula for the SE of the sample sum for random sampling without replacement:

• The formula for the SE of the sample mean of a simple random sample follows from the formula for the SE of the sample sum using the rule for the SE of an affine transformation, because the sample mean is the sample sum dividied by the sample size.
• The formula for the SE of a random variable with the hypergeometric distribution is the special case of the SE of the sample sum when the box is a 0-1 box.
• The formula for the SE of the sample percentage for a simple random sample is the special case of the SE of the sample mean when the box is a 0-1 box.

The following subsections present these formulae, which are derived in footnotes.

### The SE of the Sample Sum of a Simple Random Sample

The SE of the sample sum of the labels on simple random sample of n tickets from a box of N tickets labeled with numbers has almost the same formula as the SE of the sample sum of a random sample of size n drawn with replacement. The difference is the finite population correction f = (N−n)½/(N−1)½:

SE(sample sum without replacement) = f×SE(sample sum with replacement) =

(N−n)½/(N−1)½ × n½×SD(box),

where SD(box) is the SD of the list of numbers on all the tickets in the box (including repeated values as often as they occur). This result is proved in a footnote.

The finite population correction f captures the difference between sampling with and without replacement. When the sample size is n=1, there is no difference between sampling with and without replacement, so it should be the case that then f=1, which is true:

f = (N−n)½/(N−1)½ = (N−1)½/(N−1)½ = 1.

At the other extreme, if the sample size n equals the population size N, every member of the population is in the sample exactly once. The sample is always equal to the population, and the sample sum is always equal to the sum of the labels on all the tickets—the sample sum is constant, so the SE of the sample sum is zero. Thus it should be the case that then f=0, which also is true:

f = (N−n)½/(N−1)½ = (N−N)½/(N−1)½ = 0.

For intermediate values of n, the SE of the sample mean for simple random sampling is less than the SE of the sample mean for random sampling with replacement, by a factor equal to the finite population correction f. Heuristically, for sampling without replacement, each additional element in the sample gives information about a different ticket in the box, while for sampling with replacement, there is some chance that the sample will contain the same ticket twice, which would be less informative. If the population is much larger than the sample, the chance that a sample with replacement contains the same ticket twice is very small, so the SE for sampling with replacement should be nearly equal to the SE for sampling without replacement.

### The SE of the Hypergeometric Distribution

The distribution of the sample sum of n draws without replacement from a 0-1 box that contains N tickets of which G are labeled "1" has an hypergeometric distribution with parameters N, G, and n. We saw previously in this chapter that the SD of a 0-1 box is

(p×(1−p))½,

where p is the fraction of tickets labeled "1," which is G/N. Therefore, the SE of a random variable with the hypergeometric distribution with parameters N, G, and n is

f×n½×SD(box) = (N−n)½/(N−1)½ × n½ × (G/N × (1−G/N))½.

But for the finite population correction, the formula is the same as the formula for the SE of a binomial random variable with parameters n and p= G/N: the sample sum of n independent random draws with replacement from a 0-1 box with a fraction p = G/N of tickets labeled "1."

### SE of the Sample Mean and Sample Percentage of a Simple Random Sample

The sample mean of a simple random sample is the sample sum of a simple random sample, divided by the sample size n. This is an affine transformation of the sample sum. It follows that the SE of the sample mean of a simple random sample is the SE of the sample sum of a simple random sample, divided by n. The SE of the sample mean of a simple random sample thus is the finite population correction times the SE of the sample mean for a random sample with replacement:

SE(sample mean of simple random sample) = f×SE(sample mean of sample with replacement)

= f× SD(box)/n½,

where SD(box) is the SD of the list of numbers on all the tickets, counting duplicated values as often as they occur. In the special case that the box is a 0-1 box with a fraction p of tickets labeled "1," this implies that the SE of the sample percentage φ for random sampling without replacement is

SE(φ) = f × (p×(1−p))½/n½.

allows us to study the distribution of the sample sum and sample mean, with and without replacement: the check box at the top of the figure controls whether the samples are drawn with replacement. Try taking a few thousand samples with and without replacement. Notice that the SD of the observed values of the sample sum approaches the number given as "SE(sum)," and that it is smaller for sampling without replacement than for sampling with replacement. (The tool calculates SE(sum) appropriately, using the finite population correction for sampling without replacement.) Vary the contents of the population box at the right of to confirm that this does not depend on the numbers on the tickets.

Standard Errors of Some Common Random Variables

The SE of the sample sum of n independent draws from a box of tickets labeled with numbers is

n½ ×SD(box).

The SE of the sample mean of n independent draws from a box of tickets labeled with numbers is

n−½ × SD(box).

The SE of the sample sum of a simple random sample of size n from a box of tickets labeled with numbers is

( (N−n)/(N−1) )½ × n½ ×SD(box).

The SE of the sample mean of a simple random sample of size n from a box of tickets labeled with numbers is

(N−n)½/(N−1)½ × n−½ × SD(box).

The SE of the sample percentage of a simple random sample of size n from a box of tickets, each labeled either "0"zero or "1" is

( (N−n)/(N−1) )½ × ( p × (1−p) )½/ n½.

The SE of a random variable with the binomial distribution with parameters n and p is

n½ × ( p×(1−p) )½.

The SE of a random variable with the hypergeometric distribution with parameters N, G, and n is

(N−n)½/(N−1)½ × n½ × (G/N × (1− G/N) )½.

The SE of a random variable with the geometric distribution with parameter p is

(1−p)½/p.

The SE of a random variable with the negative binomial distribution with parameters r and p is

r½(1−p)½/p.

The probability calculator in displays the standard errors of some common discrete distributions, in addition to their expected values.

The following exercises check your ability to identify the distributions of random variables from verbal descriptions, and to calculate their standard errors.

## Summary

Any random variable can be written as its expected value plus chance variability that has expected value equal to zero. The typical size of the chance variability is the standard error (SE) of the random variable. The SE is a measure of the spread of the probability distribution of the random variable, and is directly analogous to the SD of a list. The SE of a random variable is the square-root of the expected value of the square of the chance variability:

SE(X) = ( E( (X−E(X))2 ) )½.

The SE of a random variable is completely determined by the probability distribution of the random variable, and we speak of the SE of a random variable and of its probability distribution interchangeably. To calculate the SE of a random variable requires calculating the expected value of a transformation of the random variable. The expected value of a transformation Y=g(X) of a discrete random variable X can be calculated directly from the definition of the expected value of Y, or by a shortcut method: If Y=g(X) and the possible values of X are x1, x2, x3, … , then

E(Y) = g(x1)×P(X=x1) + g(x2)×P(X=x2) + g(x3)×P(X=x3) + …

In many cases, to calculate the standard error of a random variable defined in terms of other random variables requires starting from scratch, but some special cases are particularly simple. For example, if Y = a×X+b, where a and b are constants, then

SE(Y) = |a|×SE(X).

A collection of random variables is independent if every event determined by a sub-collection of the random variables is independent of every event determined by the other random variables in the collection. Examples of independent random variables include the numbers on tickets in different random draws with replacement from a box of numbered tickets. If a collection of random variables is not independent, it is dependent. The expected value of a product of independent random variables is the product of their expected values, and the SE of a sum of independent random variables is the square-root of the sum of the squares of their standard deviations. (The expected value of a sum of random variables is the sum of their expected values, whether the random variables are independent or dependent.) The SE of the sample sum of n independent random draws with replacement from a box of tickets labeled with numbers is

n½×SD(box),

where SD(box) is the standard deviation of the list of labels on the tickets in the box, counting duplicates as often as they occur. For a 0-1 box with a fraction p of tickets labeled "1,"

SD(box) = (p×(1−p))½.

Because the sample sum of n independent random draws with replacement from a 0-1 box with a fraction p of tickets labeled "1" has a binomial distribution with parameters n and p, the SE of the binomial distribution with parameters n and p is

n½×(p×(1−p))½.

The SE of the sample mean of n independent random draws with replacement from a box of tickets labeled with numbers is

n−½×SD(box).

The SE of the sample percentage φ of a random sample of size n with replacement from a 0-1 box is

n−½×(p×(1−p))½,

where p is the fraction of tickets in the box labeled "1." For random samples without replacement, simple random samples, all these results are modified by the finite population correction

f = (N−n)½/(N−1)½,

where n is the sample size and N is the number of tickets in the box. For example, the SE of the sample sum of the labels on a simple random sample of n tickets drawn from a box of N tickets labeled with numbers is

f×n½×SD(box),

and the SE of the sample mean of the labels is

f×n−½×SD(box).

It follows that the SE of a random variable with an hypergeometric distribution with parameters N, G, and n is

f×n½×(G/N × (1−G/N))½

and that the SE of the sample percentage φ of a simple random sample from a 0-1 box is

f×n−½× (p×(1−p))½,

where p= G/N is the fraction of tickets labeled "1" in the box. The SE of the geometric distribution with parameter p is

(1−p)½/p.

A random variable with a negative binomial distribution with parameters r and p can be written as a sum of r independent random variables with geometric distributions with the same parameter p, so the SE of the negative binomial distribution with parameters r and p is

r½×(1−p)½/p.

## Key Terms

• 0-1 box
• affine transformation
• average
• binomial
• Chebychev’s Inequality
• converge
• dependent
• deviation
• discrete
• event
• expectation
• expected value
• experiment
• finite population correction
• geometric distribution
• hypergeometric distribution
• independent events
• independent random variables
• indicator random variable
• intersection
• Law of Large Numbers
• location
• mean
• mutually exclusive
• negative binomial distribution
• population
• probability
• probability distribution
• random variable
• rms
• sample mean
• sample percentage
• sample size
• sample sum
• sampling distribution
• simple random sample