Random experiments and random variables have long-term regularities. The Law of Large Numbers says that in repeated independent trials, the relative frequency of each outcome of a random experiment tends to approach the probability of that outcome. That implies that the long-term average value of a discrete random variable in repeated experiments tends to approach a limit, called the expected value of the random variable.
The expected value of a random variable depends only on the probability distribution of the random variable. The expected value has properties that can be exploited to find the expected value of some complicated random variables in terms of simpler ones. These properties allow us to find the expected value of the sample sum and sample mean of random draws with and without replacement from a box of numbered tickets. Special cases include the expected values of random variables with binomial and hypergeometric distributions. We also find the expected values of random variables with geometric and negative binomial distributions.
The Law of Large Numbers says that in repeated independent trials with probability p of success in each trial, the chance that the fraction of successes is close to p grows as the number of trials grows. More precisely, for any tolerance e>0,
P(| (fraction of successes in n trials) - p | < e)
approaches 100% as the number n of trials grows. This expresses a long-term regularity of repeated independent trials with a shared probability of success.
The Law of Large Numbers
The chance that the fraction of successes in n independent trials with probability p of success is close to p approaches 100% as n grows.
More precisely, as n increases, for every number e > 0,
P( | (fraction of successes in n trials) − p | < e ) approaches 100%.
The Law of Large Numbers does not say that the number of successes in repeated independent trials tends to be increasingly close to the chance of success times the number of trials, only that the fraction of successes in repeated independent trials tends to be increasingly close to the probability of success in each trial. illustrates the Law of Large Numbers. Let X denote the number of successes in n independent trials with the same chance p of success in each trial. plots either of two variables:
To toggle between these two plots, click the button near the bottom of the figure. has two scrollbars. One controls the number of trials, which you can increase to 20,000; the other controls the chance of success in each trial. Increase the number of trials to 20,000; verify that the the simulation agrees with the Law of Large Numbers: The difference X − np tends to grow as the number of trials n grows, and that the difference 100%×(X − np)/n tends to shrink as the number of trials n grows. Change the chance p of success in each trial using the scrollbar and verify that the result remains true.
The Law of Large Numbers says that the probability distribution of the fraction of successes in repeated independent trials with the same probability p of success gets more and more concentrated near p as the number of trials increases. As the number of trials increases, the chance that a given range of values around p contains the fraction of successes increases. Conversely, as the number of trials increases, narrower and narrower intervals around p have the same chance of containing the observed fraction of successes.
In games of chance, you should not expect the number of times you win to approach (the number of times you play)×(the chance of winning in each game). However, you should expect the fraction of times you win to approach the chance of winning each game. There is no such thing as "odds pressure" tending to make winning more likely after a losing streak. Rather, a string of losses tends to be "diluted" over a larger and larger number of trials. Once you get behind, you are not likely to catch up in absolute terms (i.e., to reduce your losses). You are only likely to reduce your average loss per play of the game. Your losses can continue to mount even as the average loss per play shrinks. In fact, once you are behind by much, that is what tends to happen.
As a thought experiment, suppose we agree to play a coin-tossing game in which if the coin lands heads, you pay me $1, and if the coin lands tails, I pay you $1. If I asked for an advantage of 25 tosses (i.e., starting $25 ahead), I doubt you would agree to play.
On the other hand, suppose that we played this game, starting even (without the advantage I asked for), and that in the course of play, by chance I developed a lead of $25. Would you feel that the coin now "owes" you 25 tails? Can the coin tell the difference between an advantage of $25 that you give me, and an advantage of $25 that occurs by chance during the game? In either case, the $25 advantage is likely to be diluted as we play the game longer and longer, so that as a percentage of the total, the $25 becomes tiny, and the relative frequency with which each of us wins approaches 50%. However, the difference between, say, 50.5% of $1,000,000 and 49.5% of $1,000,000 is $10,000, which is still a lot of money, even if it is only a small fraction of the total we have wagered.
The Law of Large Numbers implies that in repeated random sampling with replacement from a box of tickets, as the number of samples increases, the fraction of times each ticket is drawn is increasingly likely to be 1/(#tickets in the box). If the tickets are labeled with numbers, the fraction of times each label appears is increasingly likely to be close to the probability of drawing a ticket with that label (the fraction of tickets in the box that have that label). shows how the Law of Large Numbers can be used to reason about real-world situations.
The fraction of children born who are girls is a very stable demographic variable across societies and geography: about 49% of children born are girls and about 51% are boys. (Because their mortality rate is lower, women outnumber men later in life.) Suppose that the gender of children is determined at random at the time of conception, with probability 49% of being female and probability 51% of being male. Suppose the genders of different children are independent. Consider a city in which about 10,000 children are born each year, and a town in which about 100 children are born each year. In a given year, is it more likely that of the children born in the city, the percentage of who are female will exceed 55%, or that of the children born in the town, the percentage who are female will exceed 55%?
Solution. The fraction of girl children born in the city is like the fraction of successes in about 10,000 independent trials with the same 49% chance of success in each trial. The fraction of girl children born in the town is like the fraction of successes in about 100 independent trials with the same 49% chance of success in each trial. The larger the number of trials, the less likely it is that the fraction of successes differs from the chance of success by any given amount (such as 55%−49%=6%). It is more likely that the fraction of girl children born in the town will exceed 55%.
The Law of Large Numbers says that in repeated independent trials, the relative frequency with which an event occurs tends to approach the probability that the event occurs (the precise statement is slightly different: The chance that the two differ finitely goes to zero as the number of trials increases). For example, in repeated draws with replacement from a box of tickets, the fraction of times each ticket is drawn will tend to approach 1/(#tickets in the box). If the tickets are labeled with numbers, the average number on the tickets drawn will tend to approach the average of list of the labels on all the tickets, including repeated labels as many times as they occur.
Suppose a box contains one ticket labeled "0," two tickets labeled "1," and one ticket labeled "4:"
Box to illustrate expected value
0
1
4
Repeated random draws with replacement from the box will yield the ticket labeled "0" about one fourth of the time, a ticket labeled "1" about half the time, and the ticket labeled "4" about one fourth of the time. In the long run, the average of the draws will tend to approach (0+1+1+4)/4 = 6/4 = 1.5, the average of the numbers on the tickets. This long-run average outcome of a draw from a box is called the expected value of the draw. The expected value of a random variable X is the long-run limiting average of the values X takes in repeated trials.
The expected value of a random variable is analogous to the mean of a list: It is the balance point of the probability histogram, just as the mean is the balance point of the histogram of the list. lets us investigate the distribution of the sample sum empirically.
Draw a ticket a few times by clicking the Take Sample button in . Sometimes the ticket drawn will be labeled "0," sometimes it will be labeled "1," and sometimes it will be labeled "4." You will see a histogram of the labels on the tickets drawn evolve as you take more samples. In repeated draws, about one-fourth of the draws will give the ticket labeled "0," about half of the draws will give a ticket labeled "1," and about one-fourth of the draws will give a the ticket labeled "4." Change the Take ________ samples to 1000, and click the Take Sample button a few times.
The empirical distribution of the draws will tend to stabilize with about 1/4 of the draws giving the ticket labeled "0," about 1/2 of the draws giving the ticket labeled "1," and about 1/4 of the draws giving the ticket labeled "4." That the empirical fraction of the draws that give "0" tends towards the probability of drawing the ticket labeled "0," and the empirical fraction of the draws that give "1" tends towards the probability of drawing a ticket labeled "1," etc. is a consequence of the Law of Large Numbers. Because the tickets in the box tend to be represented in equal proportion in the samples, the average of the draws will tend to be close to the average of the list of labels on the tickets in the box:
0×(1/4) + 1×(1/2) + 4×(1/4) = 1.5.
This is the expected value of the number on a ticket drawn at random from the box.
The quantity "Mean(values)," shown on the left, is the mean of the observed values of the sample sum, and "SD(values)" is the SD of the observed values of the sample sum. Once you have drawn a few thousand samples, Mean(values) should be very close to 1.5.
Change the Sample Size to 2 in and click the Take Sample button a few times to take a few thousand samples of size 2 with replacement from the box. The possible values of the sample sum of two draws with replacement from the box are 0, 1, 4, 5, and 8. The sample sum is zero if both tickets drawn are labeled "0." That happens with probability 1/4×1/4=1/16. The sample sum is one 1 if one of the tickets drawn is labeled "0" and one is labeled "1." That can happen by drawing a ticket labeled "0" on the first draw and a ticket labeled "1" on the second draw, which occurs with probability 1/4×1/2=1/8. It also can happen by drawing a ticket labeled "1" on the first draw and a ticket labeled "0" on the second draw, which occurs with probability 1/2×1/4=1/8, so the total chance that the sample sum of two draws with replacement equals one 1 is 1/8+1/8=1/4. The sample sum is two 2 if both draws give tickets labeled "1," which occurs with probability 1/2×1/2=1/4. shows the probability distribution of the sum of two draws with replacement from the box:
If you draw samples of size two with replacement from the box repeatedly, the fraction of the time that the sample sum is 0 will tend to approach the probability that the sample sum is 0, 1/16. The fraction of the time that the sample sum is 1 will tend to approach the probability that the sample sum is 1, 1/4. The fraction of the time that the sample sum is 2 will tend to approach the probability that the sample sum is 2, 1/4 … , and the fraction of the time that the sample sum is 8 will tend to approach the probability that the sample sum is 8, 1/16. The average of the values of the sample sum that occur will tend to approach
0×1/16 + 1×1/4 + 2×1/4 + 4×1/8 + 5×1/4 + 8×1/16 = 3 = 2×1.5,
twice the average of the labels on all the tickets in the box. This is the expected value of the sum of two random draws with replacement from the box. The expected value of the sum of three random draws from the box with replacement is
3×1.5 = 4.5,
three times the average of the labels on the tickets in the box. The expected value of the sum of n random draws from the box with replacement is n×1.5, n times the average of the labels on the tickets in the box. If you change the sample size in and draw a few thousand samples, Mean(samples) should be quite close to the expected value of the sample sum.
In general, the expected value of a discrete random variable X can be calculated from the probability distribution of X: The expected value of X is a weighted average of the values X can take. The weight of each possible value of X is the probability that X equals that value. This is exactly how we would calculate the mean of a list that has repeated entries: We would sum the distinct numbers in the list, each times its relative frequency in the list. If there were no repeated numbers, the relative frequencies would all be 1/(#items in the list), which gives the ordinary arithmetic mean, the sum of the values divided by the number of values.
In symbols, one writes E(X) for the expected value of the random variable X. The expected value of X depends only on the probability distribution of X: If two random variables have the same probability distribution, they have the same expected value. For example, the expected number of times a fair coin lands heads in five tosses is the same as the expected number of times a fair die lands showing an even number of spots in five rolls, because both have a binomial distribution with parameters n=5 and p=50%. Because the expected value of a random variable depends only on its probability distribution, we can speak of the expected value of a random variable or the expected value of a probability distribution interchangeably.
The Expected Value of a Discrete Random Variable
The expected value of a discrete random variable X, denoted E(X), is a weighted average of the values that X can take, where the weight of each value is the probability that X takes that value.
If the values X can take are denoted x_{1}, x_{2}, x_{3}, … , then
(expected value of X) = x_{1}×P(X = x_{1}) + x_{2}×P(X = x_{2}) + x_{3}×P(X = x_{3}) + … .
The expected value of X is sometimes called the "expectation of X."
Let X denote the label on a ticket drawn at random from the box containing four tickets, numbered 0, 1, 1, and 4. shows the probability distribution of X.
The expected value of X is the weighted average of its possible values, taking the probability of each value as its weight:
(Expected value of X) = 0×P(X=0) + 1×P(X=1) + 4×P(X=4)
= 0×1/4 + 1×(1/2) + 4×(1/4)
= 1.5,
the average of the numbers on the tickets (counting repeated values), just as we found before.
Expected value is a technical term. It has little to do with the ordinary English usage of the word "expect." The expected value of a random variable does not have to be a possible value of the random variable. In particular, a random variable that only takes integer values can have an expected value that has a fractional part. For instance, suppose you invite 3 friends over for dinner. They each toss a fair coin, independently, to decide whether to come: Heads they come, tails they don't. (Your friends are such geeks. Time to make friends with some humanities majors …)
You invite them all many many many many times. Sometimes, all three friends' coins land heads; then you have 3 guests. That happens about 1/8 of the time, in the long run. Sometimes all three coins land tails; then nobody shows up. That also happens about 1/8 of the time, in the long run. Sometimes one or two friends come; that happens about 3/8 of the time each.
On average, in the long run, the number of friends who come over is
0×1/8 + 1×3/8 + 2×3/8 + 3×1/8 = 12/8 = 1.5.
That's the expected number of friends who come over. It is impossible for a fraction of a dinner guest show up, but the expected value has a fractional part.
The following exercise asks you to calculate the expected value of a random variable from its probability distribution.
What is the expected value of the sample sum of two draws with replacement from the box that contains one ticket labeled "0," two labeled "1," and one labeled "4"?
By the Law of Large Numbers, in repeated draws, the fraction of times a particular ticket (say the jth ticket) appears is increasingly likely to be close to 1/(#tickets in the box). Drawing a pair of tickets with replacement M times gives 2M tickets in all. As M grows, the fraction of times the jth ticket appears is increasingly likely to be close to 2/(#tickets in the box), so the jth ticket is increasingly likely to contribute about
to the sample sum. The total of the contributions from all the tickets to the sum of M draws of a pair of tickets is thus increasingly likely to be close to
The average of the sum of two draws is thus increasingly likely to be close to 2×(average of the numbers on all the tickets). Indeed, this is the expected value of the sample sum of two random draws from the box. Similar reasoning shows that in general the expected value of the sample sum of n draws with replacement from a box of numbered tickets is
n×(average of the numbers on all the tickets).
Perhaps surprisingly, that is also the expected value of the sample sum of n draws without replacement (of course, then the number of draws, n, can not be larger than the number of tickets in the box, N).
shows the sampling tool again, but now with a check box that lets you control whether the sampling is with or without replacement. If the box is checked, the sample is drawn with replacement. In sampling without replacement, the size of each sample cannot exceed the number of tickets in the box. Experiment by looking at the average of the values of the sample sum for a large number of samples, with and without replacement. Try putting different lists of numbers in the box. Whether the sampling is with or without replacement, the average of the sample sums (Mean(values)) will be close to (sample size)×(average of box) after a few thousand draws. If the sample size is greater than one, the SD of the values of the sample sum will tend to be smaller for sampling without replacement than for sampling with replacement.
The Expected Value of the Sample Sum of n random Draws from a Box
If a box contains tickets labeled with numbers, the expected value of the sample sum of the labels on n tickets drawn at random with or without replacement from the box is
n×(average of the labels on the tickets in the box),
where (average of the labels on the tickets in the box) means the average of the list of numbers on all the labels, including repeated values as many times as they occur on different tickets.
(If the draws are without replacement, the number of draws cannot exceed the number of tickets in the box.)
The following exercise checks your ability to compute the expected value of the sample sum of random draws from a box of numbered tickets.
When the tickets in the box are labeled only "0" or "1," the the sum of n random draws with replacement has a binomial distribution, and the sum of n random draws without replacement has an hypergeometric distribution. These special cases of the expected value of the sum of random draws from a box are particularly simple.
Boxes with Tickets labeled "0" and "1" (0-1 boxes)
If a box has a fraction p of tickets labeled "1" and a fraction (1-p) of tickets labeled "0," the average of the labels on the tickets in the box is
0×(1-p) + 1×p = p.
If a box has a fraction p of tickets labeled "1" and a fraction (1-p) of tickets labeled "0," the expected value of the sum of n random draws with or without replacement from the box is
n×(average of the labels on all the tickets in the box) = n×p.
Because the sum of n random draws with replacement from a box that contains a fraction p of tickets labeled "1" and a fraction (1-p) of tickets labeled "0" has a binomial distribution with parameters n and p, this means that the expected value of (a random variable with) a binomial distribution with parameters n and p is n×p. You can also confirm this from the formula for binomial probabilities (but the algebra is complicated):
0×_{n}C_{0} p^{0}(1-p)^{n-0} + 1×_{n}C_{1} p^{1}(1-p)^{n-1} + 2×_{n}C_{2} p^{2}(1-p)^{n-2} + … + n×_{n}C_{n} p^{n}(1-p)^{0} = n×p.
On average, each draw results in p tickets labeled "1."
Similarly, because the sum of n random draws without replacement from a box of N tickets of which G are labeled "1" and (N-G) are labeled "0" has an hypergeometric distribution with parameters N, G, and n, the expected value of (a random variable with) an hypergeometric distribution with parameters N, G, and n is n×G/N.
is the binomial probability histogram. Vary n and p in the figure and verify by eye that the balance point of the probability histogram is n×p.
The following exercise checks your ability to calculate the expected value of a random variable with an hypergeometric distribution. The wording of the problem might change when you reload the page.
The expected value satisfies several quite useful rules that make it easy to find the expected value of sums of random variables and constants times random variables. These properties follow from the facts that
Properties of the Expected Value
The expected value of a constant (a constant random variable) is that constant: for any number a,
E(a) = a.
The expected value of the sum of two or more random variables is the sum of their expected values: If X, Y, Z, … are random variables, then
E(X+Y+Z+ …) = E(X) + E(Y) + E(Z) + … .
The expected value of a constant times a random variable is that constant times the expected value of the random variable: for any constant a and any random variable X,
E(a × X) = a × E(X).
These rules can be summarized as follows: if a and b are any two numbers, and X and Y are any two random variables, then
E(a × X + b × Y) = a × E(X) + b × E(Y).
In general, E(X × Y) is not equal to E(X) × E(Y) (but see
Recall that a transformation of x of the form y = a×x+b, where a and b are fixed numbers, is called an affine transformation. A graph of an affine transformation is a straight line. The properties above show that the expected value of an affine transformation of a random variable is just the same affine transformation of the expected value of the random variable:
E(a×X+b) = a×E(X) + b.
The sample mean is the sample sum divided by the sample size, n. Dividing by the sample size is the same as multiplying by its reciprocal, 1/n, so we can use the properties of expectation to find the expected value of the sample mean from the expected value of the sample sum: The expected value of the sample mean is the expected value of the sample sum, times 1/n. Because the expected value of the sample sum is
n×(average of the numbers on all the tickets in box),
whether the sampling is with or without replacement, the expected value of the sample mean is the average of the numbers on all the tickets in the box, for random sampling with or without replacement.
Consider a 0-1 box of tickets. The population percentage is the fraction of tickets labeled "1." Equivalently, it is the mean of the list of the numbers on all the tickets. The sample percentage φ is the fraction of tickets labeled "1" in a random sample of size n from such a box; equivalently, it is the sample mean of a random sample of size n from such a box:
φ = (#tickets in the sample labeled "1")/n.
It follows that the expected value of the sample percentage φ is the population percentage p, for random sampling with or without replacement.
Expected value of the Sample Mean and the Sample Percentage
The expected value of the sample mean of n random draws with or without replacement from a box of tickets labeled with numbers is
(average of the numbers on all the tickets in the box).
The expected value of the sample percentage φ of tickets labeled "1" in n random draws with or without replacement from a box of tickets each labeled "0" or "1" is
(percentage of tickets in the box labeled "1").
In games of chance, the expected value of the payoff (taking into account the cost of the bet) determines whether a bet is favorable. If the expected value of the payoff is negative, in the long run, you will tend to lose money. If the expected payoff is positive, you will tend to win money in the long run. If the expected payoff is zero, you tend to break even in the long run. In casino games, the expected payoff is always negative. The expected amount you lose to the house (the casino) for each $1 bet is called the house edge.
In roulette, a bet of $1 on red pays $2 if you win. There are 18 red positions on the wheel, out of a total of 38 positions. By assumption, the ball is equally likely to land in any of the 38 positions; you win the bet on red if the ball lands in a red position. The expected payoff for a $1 bet is thus $2×(18/38) - $1 = -$.0526, so the house edge is 5.26 cents, the expected loss to you for each $1 you bet.
Fair bets
A fair bet is one for which the expected value of the payoff is zero, after accounting for the cost of the bet.
If a bet is fair, you should expect to break even in the long run in repeated play.
Consider playing the following dice game: we roll a pair of dice five times. We lose $100 each time the pair of dice does not land showing double ones, and win $1000 each time the pair lands showing double ones. What is the expected net profit or loss? Is this a fair bet?
Solution. Let X be the number of double ones in five rolls of a pair of fair dice. Then X has a binomial distribution with parameters n = 5 and p = 100%/36, so
E(X) = 5/36 = 0.138 pairs of ones.
Your winnings are
W = X × $1000 - (5 - X) × $100.
W is a constant times X, plus another constant; that is, W is an affine transformation of X:
W = $1100 × X - $500.
Thus the expected value of W is
E(W) = $1100 × E(X) - $500 = -$347.22.
This is not a fair bet—you should expect to lose money in the long run in repeated play.
Here is another way to calculate your expected net winnings. The net winnings in this game is the sum of the winnings in five shorter games, each of which consists of rolling the pair of dice once. Your expected winnings in each of the smaller games is
$1000 × 1/36 - $100 × 35/36 = -$69.44.
Your expected total winnings from the 5 games is the sum of my expected winnings from each game:
-$69.44 - $69.44 - … - $69.44 = 5 × -$69.44 = -$347.22.
This section summarizes some earlier results of the chapter, and presents without proof the expected values of some other common discrete random variables we have seen earlier in the book.
Expected Values of some discrete Random Variables
The expected value of a binomial random variable with parameters n and p is n×p.
The expected value of a geometric random variable with parameter p is 1/p.
The expected value of a negative binomial random variable with parameters p and r is r/p.
The expected value of an hypergeometric random variable with parameters N, G, and n is n×(G/N).
The expected value of the sample sum of n random draws with or without replacement from a box of labeled tickets is
n×(average of the labels on all the tickets in the box).
The following exercises check your ability to determine the distribution of a random variable from a verbal description, to find the expected value of the random variable, and to determine whether a bet is fair.
In repeated independent trials with the same probability of success, as the number of trials increases, the fraction of successes is increasingly likely to be close to the probability of success in each trial. This is the Law of Large Numbers. As a consequence of the Law of Large Numbers, if a discrete random variable is observed repeatedly in independent experiments, the fraction of experiments in which the random variable equals any of its possible values is increasingly likely to be close to the probability that the random variable equals that value. The mean of the observed values of the random variable in repeated independent experiments is thus increasingly likely to be close to a weighted average of the possible values, where the weights are the probabilities of the values. That weighted average is called the expected value of the random variable. The expected value of a random variable depends only on the probability distribution of the random variable, so we can speak interchangeably of the expected value of a random variable or of its probability distribution.
The expected value of the sample sum of n random draws with or without replacement from a box of numbered tickets is n times the mean of the numbers on all the tickets in the box, including duplicates as often as they occur. For the special case of a 0-1 box, this implies that the expected value of the binomial distribution with parameters n and p is n×p, and that the expected value of the hypergeometric distribution with parameters N, G, and n is n×G/N. The expected value of a constant times a random variable is that constant times the expected value of the random variable. The expected value of the sum of two or more random variables is the sum of their expected values. It follows that the expected value of the sample mean of n draws with or without replacement from a box of numbered tickets is the mean of the numbers on all the tickets in the box; as a special case, the expected value of the sample percentage of n random draws with or without replacement from a 0-1 box is the population percentage of tickets in the box labeled "1." The expected value of the geometric distribution with parameter p is 1/p. The expected value of the negative binomial distribution with parameters r and p is r/p. The house edge in a bet is the expected value of the net loss to the bettor per $1 bet. A bet is fair if the expected net payoff (accounting for the ante) is zero.