Probability is the study of randomness. It has a mathematical aspect and a philosophical aspect. The mathematical aspect is described in The philosophical aspect, described in this chapter, connects the mathematical theory with the world—it says what "probability" means when we make statements like "the probability that a fair coin lands heads is 50%."
What does random mean? In ordinary speech, we use random to denote things that are unpredictable or not deliberate. For instance, we might speak of "a random person off the street," or quip that "Philip said the most random thing in lecture today!" In this course, we distinguish between random and haphazard. Events that are random are not perfectly predictable, but they have long-term regularities that we can describe and quantify using probability. In contrast, haphazard events do not necessarily have long-term regularities. Moreover, we limit the term probability to events for which we can specify all possible outcomes.
How a fair coin lands when it is tossed vigorously is a canonical example of a random event. One cannot predict perfectly whether the coin will land heads or tails; however, in repeated tosses, the fraction of times the coin lands heads will tend to settle down to a limit of 50%. The outcome of an individual toss is not perfectly predictable, but the long-run average behavior is predictable. Thus it is reasonable to consider the outcome of tossing a fair coin to be random.
What does it mean to say that a fair coin toss has a 50% chance of landing heads? There are several standard answers, called Theories of Probability.
Theories of Probability assign meaning to statements like "the probability that A occurs is p%." Theories of probability connect the mathematics of probability to the real world. As you might expect, bridging mathematics to reality is not so easy—the philosophical problems are deep, and it is hard to be consistent without being circular. We shall examine three theories of probability. The one I find most satisfactory is the Frequency Theory, although the others have their places too. In all three theories, probability is on a scale of 0% to 100%. "Probability" and "chance" are synonymous.
This theory of probability is the oldest. It originated in the study of games of chance, such as dice games and card games. In the Theory of Equally Likely Outcomes, probability assignments depend on the assertion that no particular outcome is preferred over any other by Nature; generally, arguments that Nature should show no preference among the outcomes appeal to the symmetry of the system studied (such as a "fair" coin or die).
If a given experiment or trial has n possible outcomes among which—it is assumed—Nature should show no preference, this theory defines them to be equally likely. The probability of each outcome is then 100%/n.
Applying this theory of probability involves arguing that no particular outcome should occur in preference to another, typically by appeal to physical symmetries or other considerations—e.g., that there is no reason one marble would be predisposed to turn up rather than another in drawing a marble from a well-stirred bowl of marbles. For example, if a coin is balanced well, there is no reason for it to land heads in preference to tails when it is tossed vigorously, so according to the Theory of Equally Likely Outcomes, the probability that the coin lands heads is equal to the probability that the coin lands tails, and both are 100%/2 = 50%. (This ignores the nearly impossible outcomes that the coin does not land at all or lands balanced on its edge.) Similarly, if a die is fair (properly balanced) the chance that when it is rolled vigorously it lands with the side with one spot on top (the chance that the die shows one spot) is the same as the chance that it shows two spots or three spots or four spots or five spots or six spots: 100%/6, about 16.7%.
If an event consists of more than one possible outcome, the chance of the event is the number of ways it can occur, divided by the total number of things that could occur. For example, the chance that a die lands showing an even number of spots is the number of ways it could land showing an even number of spots (3, namely, landing showing 2, 4, or 6 spots), divided by the total number of things that could occur (6, namely, landing showing 1, 2, 3, 4, 5, or 6 spots). Since the total number of possible outcomes is n, the maximum possible probability of any event is 100%×n/n = 100%. Thus, in the Theory of Equally Likely Outcomes, probabilities are between 0% and 100%, as claimed.
In the Frequency Theory of Probability, probability is the limit of the relative frequency with which an event occurs in repeated trials. Relative frequencies are always between 0% (the event essentially never happens) and 100% (the event essentially always happens), so in this theory as well, probabilities are between 0% and 100%. According to the Frequency Theory of Probability, what it means to say that "the probability that A occurs is p%" is that if you repeat the experiment over and over again, independently and under essentially identical conditions, the percentage of the time that A occurs will converge to p. For example, under the Frequency Theory, to say that the chance that a coin lands heads is 50% means that if you toss the coin over and over again, independently, the ratio of the number of times the coin lands heads to the total number of tosses approaches a limiting value of 50% as the number of tosses grows. Because the ratio of heads to tosses is always between 0% and 100%, when the probability exists it must be between 0% and 100%.
In the Subjective Theory of Probability, probability measures the speaker's "degree of belief" that the event will occur, on a scale of 0% (complete disbelief that the event will happen) to 100% (certainty that the event will happen). According to the Subjective Theory, what it means for me to say that "the probability that A occurs is 2/3" is that I believe that A will happen twice as strongly as I believe that A will not happen. The Subjective Theory is particularly useful in assigning meaning to the probability of events that in principle can occur only once. For example, how might one assign meaning to a statement like "there is a 25% chance of an earthquake on the San Andreas fault with magnitude 8 or larger before 2050?" (See Freedman and Stark, 2003, for more discussion of theories of probability and their application to earthquakes.) It is very hard to use either the Theory of Equally Likely Outcomes or the Frequency Theory to make sense of the assertion. Can you think of other examples?
These three theories of probability assign different meanings to the statement "the chance that A occurs is p%." Each theory has situations in which it is most natural, and each theory has shortcomings. This book uses the Frequency Theory primarily.
While each of the theories has attractive elements, all have shortcomings as well. The shortcomings involve hidden assumptions, limited domains of applicability, and changes of subject.
Even when there are only a few possible outcomes, it is not always clear whether they should be deemed equally likely. For example, consider tossing two coins at the same time. The possible outcomes could be {two heads, not two heads} or {two heads, one head and one tail, two tails} or {two heads, head on coin 1 and tail on coin 2, tail on coin 1 and head on coin 2, two tails}.
The last of these assigns the same probabilities the Frequency Theory does. For instance, if the equally likely outcomes are taken to be {two heads, head on coin 1 and tail on coin 2, tail on coin 1 and head on coin 2, two tails}, both the Theory of Equally Likely Outcomes and the Frequency Theory would say that the chance of two heads is 25%. If one is using probability to bet on games of chance, long-term relative frequencies—which the Frequency Theory contemplates—are perhaps the most important consideration, because they determine how much one wins or loses in the long run. It seems rather artificial to introduce a distinction between two otherwise identical coins in order to make the probabilities calculated using the Theory of Equally Likely Outcomes agree with the probabilities calculated using the Frequency Theory. Perhaps a more serious limitation of the Theory of Equally Likely Outcomes is that many situations do not have natural symmetries to exploit to decide which outcomes are equally likely. For example, what is the chance that a thumbtack lands with its point up when it is tossed vigorously? What is the chance that a die that has been "loaded" (modified to be unbalanced) lands showing one spot? Neither of these problems has a natural symmetry from which to argue that the outcomes are equally likely. Moreover, in many situations there are an infinite number of possible outcomes; dividing 100% by infinity yields zero.
The Frequency Theory requires an assumption about how the world works: The relative frequency with which an event occurs in repeated trials is assumed to converge to a limit. What is a limit? In the case of coin tossing, the theory says that for any positive number a, no matter how small, there is some number M, which can depend on a, such that
| (#heads in n tosses)/n − 50% | < a
whenever the number of tosses n > M. Not all sequences of heads and tails satisfy this assumption. For example, suppose the first toss gives a head. The relative frequency of heads is then 100%. Suppose the next 3 tosses give tails. The relative frequency of heads is then 25%. Suppose the next 100 tosses give heads. The relative frequency of heads is then over 97%. Suppose the next 5000 tosses give tails. The relative frequency of heads is then about 20%. If we continue in this way, with ever longer runs of heads and of tails, the relative frequency of heads never approaches a limit.
The Empirical Law of Averages says this never happens: The world works in such a way that the relative frequency with which a random event occurs in repeated trials always settles down to a limit. This "law" is an assumption about how the world works. It is not a mathematical fact, and it is not an observation because no one can continue tossing coins forever to see whether the relative frequency of heads starts to vary again after, say, 100,000,000,000,000 tosses. The Empirical Law of Averages is essential to the Frequency Theory.
The second limitation of the Frequency Theory is that many events to which we might like to assign probabilities are not the outcomes of repeatable experiments. For example, what is the probability that the universe will end in a "big crunch?" What is the probability that my 2015 tax return will be audited? What is the probability that in 2020 more online textbooks than paper textbooks will be sold? What is the probability that the Dow Jones Industrial Average reaches 20,000 before the year 2020? Can you think of other examples?
The principal shortcoming of the Subjective Theory is that colloquially we think of probability as being a property of an event in the external (objective) world, not merely a reflection of our state of mind. When I say "this thumbtack has probability 66% of landing point up when I toss it," you probably think I am talking about the tack, not about my state of mind with respect to the tack: this theory changes the subject. Similarly, under the Subjective Theory, you and I can disagree about the probability of an event and both be correct, which seems unsatisfactory in many scientific settings.
There are a variety of technical difficulties in the Subjective Theory regarding how to measure the probability of an event. One possible resolution is to study the bets you are willing to take. Would you be indifferent between a bet that a coin lands heads and a bet with the same stakes that it lands tails? If so, some theorists would conclude that your subjective probability that the coin lands heads is 50%. Some factors can complicate this approach. For example, even though you know that buying a lottery ticket is almost certainly throwing your money away, you might buy a ticket anyway, reasoning that you would not particularly miss the $1 cost of the ticket, while you would definitely notice winning $20,000,000. In this scenario, the probability of winning is less of an issue than the possibility of winning.
Here is another example: I will bet you $1,000,000 against $500 that there will not be a nuclear bomb dropped on Berkeley, California, by the year 2020. Even if I am confident that a nuclear bomb will be dropped, if it is dropped, I won't have to pay off the lost bet (I live in Berkeley), but if it is not dropped, I could use the $500 you would owe me.
Another problem with the Subjective Theory has to do with scientific method. Some philosophers of science maintain that unless an hypothesis can, in principle, be shown to be false, it is not scientific. An hypothesis that in principle can be disproved is called falsifiable. In the Frequency Theory, one can collect evidence against the statement that "the probability that A occurs is p%" by repeating an experiment over and over and looking at the fraction of times the event A occurs. In the Subjective Theory, evidence against the hypothesis that "the probability that A occurs is p%" is found by psychological testing to see whether the individual making the statement is telling the truth and is internally consistent in his assignments of probability. Running the real-world experiment over would not be relevant.
The following exercises check whether you understand the differences among the Theories of Probability. The examples illustrate using the Fundamental Rule of Counting and the Theories of Probability in more complex settings.
(Reminder: Examples and exercises may vary when the page is reloaded; the video shows only one version.)
In Lotto, you pick six numbers between 1 and 53; 6 numbers are drawn "at random" in such a way that every set of 6 distinct numbers between 1 and 53 is equally likely. To win the jackpot, all 6 of the numbers drawn must match the numbers you picked. The total number of combinations of 53 numbers taken 6 at a time is \({}_{53}C_6 = 22,957,480\). Only one of those combinations matches your exactly, so the chance of winning the jackpot is \(1/(22,957,480) = 0.0000044\%\).
There is a payoff of $5 if you match 3 of the 6 numbers. To match 3, the draw has to result in 3 of your 6 numbers, and in 3 of the 47 numbers you did not pick. You can think of this as a sequence of two experiments: drawing 3 numbers from among your 6 (which can result in \({}_6C_3 = 20\) possible outcomes), and drawing 3 numbers from the 47 remaining numbers that are not among your six (which can result in \({}_{47}C_3 = 16,215\) possible outcomes). Because the number of possibilities in the second experiment does not depend on what happened in the first, we can apply the fundamental rule of counting to conclude that the total number of ways of drawing 3 of the 6 numbers you picked, and 3 of the 47 you did not pick, is _{6}C_{3}×_{47}C_{3}. The chance that the numbers drawn will match exactly 3 of your 6 numbers is therefore
_{6}C_{3}×_{47}C_{3}/_{53}C_{6} = 20×16,215/22,957,480 = 1.41%.
What is the chance of being dealt exactly one pair (two of a kind) in a 5 card hand from a well shuffled deck of cards?
Solution. How many distinct 5-card hands can one form from a deck of 52 cards? In a hand of cards, the order in which you receive the cards does not matter. The number of 5-card hands is the number of ways of picking a set of 5 things from 52 things, which is _{52}C_{5} = 52!/(5! 47!). Assuming that the deck is well-shuffled, every 5-card hand is equally likely, so the chance of being dealt exactly one pair is the number of hands that contain exactly one pair, divided by the total number of hands (which we just found).
To specify a hand containing one pair, we need to specify what kind the pair is of, the two suits of the cards in the pair, and the three remaining cards. Because we can think of this as a sequence of trials, we can invoke the Fundamental Rule of Counting to find the total number of hands that contain exactly one pair by multiplying the number of possible outcomes of each of these trials:
By the Fundamental Rule of Counting, the total number of distinct 5-card hands that contain exactly one pair is
13×6×220×64 = 1,098,240.
The probability of being dealt one of those hands from a well-shuffled deck is thus
1,098,240/_{52}C_{5} = 42.3%.
What is the chance of being dealt two pair in a 5-card hand from a well-shuffled deck?
Solution. We need to specify which two kinds the pairs are of, the suits of those kinds, and the remaining (fifth) card. There are _{13}C_{2} ways of choosing the two kinds. For each of those choices, there are _{4}C_{2} ways of specifying the suits of each of those kinds individually (so there are _{4}C_{2} × _{4}C_{2} ways of specifying the suits of the two cards of the two kinds). There are 52 - 4 - 4 = 44 cards in the deck that we could use for the fifth card, without making the hand be a full house, so there are _{44}C_{1} = 44 ways of choosing the fifth card. By the fundamental rule of counting, there are
_{13}C_{2} × _{4}C_{2} × _{4}C_{2} × _{44}C_{1}
distinct five-card hands that comprise two pair. If the deck is well shuffled, each of the _{52}C_{5} five-card hands is equally likely, so the chance of being dealt two pair is
_{13}C_{2} × _{4}C_{2} × _{4}C_{2} × _{44}C_{1} / (_{52}C_{5}).
What is the chance of being dealt a royal flush from a well-shuffled deck?
Solution. A royal flush is {10, J, Q, K, A} of the same suit. The only thing unspecified is the suit. There are 4 ways of picking a suit from the four (_{4}C_{1} = 4), so there are 4 ways of getting a royal flush. Again, assuming the deck is well-shuffled, the chance of a royal flush is 4/(_{52}C_{5}).
The following exercise checks that you can calculate the chance of a given poker hand.
Theories of Probability assign meaning to probability statements about the world. The Theory of Equally Likely Outcomes says that if an experiment must result in one of n outcomes, and there is no reason Nature should prefer one of the outcomes to another, then the probability of each outcome is 100%/n. The Frequency Theory says that the probability of an event is the limit of the relative frequency with which the event occurs in repeated trials under essentially identical conditions. The Subjective Theory says that probability is a measure of strength of belief on a scale of 0 to 100%.